MLE 5217 : Take-Home Assignments

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

MLE 5217 : Take-Home Assignments

Objectives

Based on the chemical composition of materials build a classification model to distinguish metals and non-metals (Model 1), and then build a regression model to predict the bandgap of non-metallic compounds (Model 2).

Please use a separate jupyter notebook for each of the models.

Data

The data contains the chemical formula and energy band gaps (in eV) of experimentally measured compounds. These measurements have been obtained using a number of techniques such as diffuse reflectance, resistivity measurements, surface photovoltage, photoconduction, and UV-vis measurements. Therefore a given compound may have more than one measurement value.

Tasks

Model I (30 marks)

Dataset: Classification data.csv

Fit a Support Vector Classification model to separate metals from non-metals in the data. Ensure that you:

  • Follow the usual machine learning process.
  • Use a suitable composition based feature vector to vectorize the chemical compounds.
  • You may use your judgement on how to differentiate between metals & non-metals. As a guide, two possible options are given below.
Option 1 : for metals Eg = 0, and Non-metals Eg > 0
Option 2: for metals Eg ≤ 0.5, for non-metals Eg > 0.5
  • Use suitable metrics to quantify the performance of the classifier.
  • For added advantage you may optimize the hyper-parameters of the Support Vector Classifier. Note: Opti mization algorithms can require high processing power, therefore may cause your computer to freeze (Ensure you have saved all your work before you run such codes). In such a case you may either do a manual optimization or leave the code without execution.
  • Comment on the overall performance of the model.

Model II (30 marks)

Dataset: Regression data.csv

Fit a Regression Equation to the non-metals to predict the bandgap energies based on their chemical composition


  • Use a suitable composition based feature vector to vectorize the chemical compounds. You may try multiple feature vectors and analyse the outcomes.
  • You may experiment with different models for regression analysis if required.
  • Comment on the overall performance of the model and suggest any short-comings or potential improvements. 
Important : Comments
  • Write clear comments in the code so that a user can follow the logic.
  • In instances where you have made decisions, justify them.
  • In instances where you may have decided to follow a different analysis path (than what is outlined in the tasks), explain your thinking in the comments.
  • Acknowledge (if any) references used at the bottom of the notebook.


Submission
  • Ensure that each of the cells of code in the final Jupyter notebooks have been Run for output (Except for the hyper-parameter optimization if any).
  • The two models (I and II) have been entered in two separate notebooks.
  • Name the files by your name as ”YourName 1.ipynb” and ”YourName 2.ipynb”
  • It is your responsibility to Ensure that the correct files are being submitted, and the file extensions are in the correct format (.ipynb).
  • Submission will be via Canvas, and late submissions will be penalized.

Evaluation

The primary emphasis will be on the depth and thoroughness of your approach to the problem. Key areas of focus will include:

* Data Exploration: Demonstrating a thorough investigation of the data, exploring different analytical possibilities, and thoughtfully selecting the best course of action.

* Implementation: Translating your chosen approach into clean and efficient code.

* Machine Learning Process: Executing the machine learning process correctly and methodically, ensuring proper data handling, model selection, and evaluation.

* Clarity of Explanation: Providing clear explanations of each step, with logical reasoning for the decisions made.

*Critical Analysis: Identifying any limitations of the approach, suggesting potential improvements, and making relevant statistical inferences based on the results.

================================================================

发表评论

电子邮件地址不会被公开。 必填项已用*标注