Materials Informatics

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Assignment 1

Below are the questions for Assignment 1 of the course Materials Informatics.

  • Please complete these tasks independently and adhere to the NUS rules regarding academic honesty.
  • The questions are designed with varying difficulty levels, so you might not be able to solve all of them. However, please try to solve as many as you can.
  • Even if you can't get the correct results, you can still earn credits by showing your efforts.
  • Remove your API-KEY before submitting the assignment.
  • Write your name, email, and student ID in the cell below:

  • Name:
  • Email:
  • Student ID:

Problem 1 Code Performance and Profiling (20%)

Matrix multiplication is a fundamental operation in many numerical algorithms. In this problem, you are required to implement a matrix multiplication function and compare its performance with the built-in function in Numpy. Assume all the matrices are square matrices in this problem. You can use numpy (only for comparison), time, random, matplotlib and line_profiler libraries in this problem.

1.1 Naive Matrix Multiplication (10%)

  • Implement a matrix multiplication function matmul(A, B) that takes two 2D arrays A and B as input and returns the product of the two matrix. This step you just do it in naive way, i.e. you should use the formula of matrix multiplication to implement the function. You should only use the built-in Python functions and shouldn't use Numpy in matmul(). Print out some results check if you get the same results as naive approach and Numpy library. (5%)
defmatmul(A,B):'''Yoy should write a matrix mutliplication functionArguments:A: 2D array square matrix (Python list)B: 2D array square matrix (Python list)return:2D array'''
  • Plot the time taken for matrix multiplication as a function of the size of the matrix (from 20x20 to 200x200 with a step size of 20x20) and compare the results with Numpy. Use line_profiler to profile the performance of your function and identify the bottleneck in your code. (5%)
defplot_matmul_scaling():'''You should plot the time taken for the matrix multiplication as a function of the size of the matrix'''

Solution 1.1

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. You can insert more cells if needed.

1.2 Faster Matrix Multiplication (10%)

Implement Strassen's algorithm for matrix multiplication and compare its performance with the naive approach. You can find more details about this algorithm here https://en.wikipedia.org/wiki/Strassen_algorithm. The libraries you can use are time, random, matplotlib and line_profiler.

  • Assuming the shape of A and B is 2^n * 2^n (n = 1, 2,...). Divide the matrix into 4 equal submatrices. (2%)
  • (Hard) Implement a Strassen's algorithm for matrix multiplication matmul_Strassen(A, B) that takes two 2D arrays A and B as input and returns the product of the two arrays. You should only use the built-in Python functions and shouldn't use Numpy in matmul_Strassen(). Print out some results check if you get the same results as naive approach and Numpy library. (5%)
defmatmul_Strassen(A,B):'''Strassen's algorithm for matrix multiplicationArguments:A: 2D array square matrixB: 2D array square matrixreturn:2D array'''
  • (Hard) Plot the time taken for the matrix multiplication as a function of the size of the matrix (from 2^1 x 2^1 to 2^10 x 2^10), then compare the execution time with the naive method as shown abov using a plot. Use line_profiler to profile the performance of your function and identify the bottleneck of your code. (3%)
defplot_matmul_Strassen_scaling():'''You should plot the time taken for the matrix multiplication as a function of the size of the matrix and compare the results with the naive method'''

Solution 1.2

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. You can insert more cells if needed.

Problem 2 Materials Project (50%)

This part of the assignment is related to the Materials Project. You will use the Materials Project API to pull the required data. Don't forget to remove your API-KEY before submitting the assignment.

2.1 Chemistry (10%)

Show the distribution of elements (number of structures vs. element) for all the experimental structures in materials project.

  • Pull the required data from Materials Project using Python API. (3%)
  • You should write a plot_elem() function like below, which takes the data and plots the distribution of the elements for all elements as histogram. The count should be shown in log scale. (3%)
defplot_elem(data):'''data: a list of dictionaries, each dictionary contains the elements for a material'''# plot resultsplot_all_elem(data)
  • (Hard) Show the amount data for each element as heatmap (in log scale) on a periodic table. You can use external libraries (pymatviz) for this purpose. (4%)

Solution 2.1

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. You can insert more cells if needed.

2.2 Band Gap (15%)

Show distributions of band gaps (number of structures vs. band gap) of the non-metallic materials for the following groups: [carbides, nitrides, oxides, fluorides] and [chlorides, bromides, and iodides], and [oxides, sulfides, selenides, and tellurides].

  • Get all band gap data for the experimentally observed non-metallic materials in the Materials Project database using the Python API, or read from a saved file from the previous step. (4%)
  • Write a Python plot_bandgap() function as below and call the function to plot for the 2 groups of elements separately. (5%)
defplot_bandgap(data,elements):'''data: a list of dictionaries, each dictionary contains the band gap data for a materialelements: a list of elements'''# Then you can call the function like this:plot_bandgap(data, ['C','N','O','F'])plot_bandgap(data, ['F','Cl','Br','I'])plot_bandgap(data, ['O','S','Se','Te'])
  • (Hard) Use the distplot() from seaborn to plot the distribution of the band gap using a kernel density estimate (KDE). (3%)
  • (Hard) Show the band gap distribution data as histograms on a periodic table. You can use external libraries (pymatviz) for this purpose. (3%)

Solution 2.2

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. Please add more cells if needed.

2.3 Symmetry (10%)

  • Plot the distribution (number of structures) of the crystal system (2%) and space group (3%) of all experimental structures in Materials Project. Only show the top 10 space groups. You should write a function like below: ```python def plot_crystal_system_sg(data): ''' data: a list of dictionaries, each dictionary contains the crystal system and space group data for a material '''

Then you can call the function like this:

plot_crystal_system_sg(data) ```

Solution 2.3

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. Please add more cells if needed.

2.4 Polymorphs (15%)

Polymorphs are materials that have the same chemical composition but in different crystal structures. In this problem, you are required to analyze the polymorphs in the Materials Project database.

  • Pull the data from Materials Project database and group polymorphs. (5%)

    defgroup_polymorphs(data):'''Group the polymorphs in the databaseArguments:data: a list of dictionaries, each dictionary contains the polymorph data for a materialreturn:a dictionary, key is the formula, value is a list of data associated with the formula'''
  • (Hard) Analyze the correlation between density and thermodynamic stability (measured by energy_above_hull) between different polymorphs within the same formula and show the results as scattered plots with trend lines. You should perform this analysis for the top 5 formulas with the highest number of polymorphs. (10%)

defanalyze_polymorphs(polymorph):'''Analyze the polymorphs in the database and show the plotsArguments:polymorph: a dictionary, key is the formula, value is a list of data associated with the formula'''

Solution 2.4

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. Please add more cells if needed.

Problem 3 Atomistic Structure (30%)

You will complete tasks regarding atomistic structures using the pymatgen library.

3.1. Structure Symmetrization (15%)

  • Write a piece of Python code to symmetrize a pymatgen Structure. It will return the symmetrized structure (10%) and the space group number (5%) of this structure. The function should be something like:
defsymmetrize_structure(structure,position_tolerance,angle_tolerance):"""Symmetrize a crystal structure.Args:structure: pymatgen structure objectposition_tolerance: float, the tolerance for determining if two sites are at the same position wktn default value of 0.1.angle_tolerance: float, the tolerance for determining if two angles are the same, with a default value of 5 degrees.Returns:symmetrized_structure: pymatgen structure objectspace_group: int, the space group number of the symmetrized structure"""returnsymmetrized_structure,space_group# Then you should call this function using:structure=Structure(lattice=[[5.4437022209,0,0],[0,5.4437022209,0],[0,0,5.4437022209]],species=['Si']*8,coords=[[0.75,0.75,0.25], [0.00,0.50,0.50], [0.75,0.25,0.75], [0.00,0.00,0.00], [0.25,0.75,0.75], [0.50,0.50,0.00], [0.25,0.25,0.25], [0.50,0.00,0.50],] )sym_structure,sg_number=symmetrize_structure(structure)print(f"Original Structure:{structure}")print(f"Symmetrized Structure:{sym_structure}")

Solution 3.1

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. You can insert more cells if needed.

3.2. Vacancy Ordering (15%)

  • You are given a pymatgen Structure object. You need to write a Python funciton ordering() to get all unique ordering of this structure by replacing some sites (passed as a list of site indices) in this structure. The function should be something like below. You can use pymatgen.transformations.site_transformaitons.ReplaceSiteSpeciesTransformation for the site specie substitution (10%).
defordering(structure,vacancy_fraction,site_indices):"""Ordering a crystal structure with vacancies.Args:structure: pymatgen structure objectvacancy_fraction: float, the fraction of vacancies in the structuresite_indices: list of int, the indices of the sites that will be replaced by vacanciesReturns:ordered_structures: a transmuter with the ordered structures"""returnordered_structure# Then you should call this function using:API_KEY='your-api-key'structure=Structure.from_id(id_='mp-22526',api_key=API_KEY)structure.make_supercell([2,2,2])ordered_structures=ordering(structure,0.25, [0,1,2,3])print(ordered_structures[0])
  • (Hard) The vacancy fraction might not match with the input unit cell (the composition has to be changed during ordering process, like the example below). Make sure your code can handle this and raise a ValueError with a proposed a smallest supercell size of the input unit cell(5%).
# Below will raise ValueError and suggesting a supercell size of 4.ordered_structures=ordering(structure,0.5, [0,1])

Solution 3.2

Write your code in the next cell. Please run the cell after you finish writing the code and save your results. You can insert more cells if needed.

发表评论

电子邮件地址不会被公开。 必填项已用*标注