Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Objectives
Based on the chemical composition of materials build a classification model to distinguish metals and non-metals (Model 1), and then build a regression model to predict the bandgap of non-metallic compounds (Model 2).
Please use a separate jupyter notebook for each of the models.
Data
Tasks
Model I (30 marks)
Fit a Support Vector Classification model to separate metals from non-metals in the data. Ensure that you:
- Follow the usual machine learning process.
- Use a suitable composition based feature vector to vectorize the chemical compounds.
- You may use your judgement on how to differentiate between metals & non-metals. As a guide, two possible options are given below.
Option 1 : for metals Eg = 0, and Non-metals Eg > 0Option 2: for metals Eg ≤ 0.5, for non-metals Eg > 0.5
- Use suitable metrics to quantify the performance of the classifier.
- For added advantage you may optimize the hyper-parameters of the Support Vector Classifier. Note: Opti mization algorithms can require high processing power, therefore may cause your computer to freeze (Ensure you have saved all your work before you run such codes). In such a case you may either do a manual optimization or leave the code without execution.
- Comment on the overall performance of the model.
Model II (30 marks)
Fit a Regression Equation to the non-metals to predict the bandgap energies based on their chemical composition
- Use a suitable composition based feature vector to vectorize the chemical compounds. You may try multiple feature vectors and analyse the outcomes.
- You may experiment with different models for regression analysis if required.
- Comment on the overall performance of the model and suggest any short-comings or potential improvements.
- Write clear comments in the code so that a user can follow the logic.
- In instances where you have made decisions, justify them.
- In instances where you may have decided to follow a different analysis path (than what is outlined in the tasks), explain your thinking in the comments.
- Acknowledge (if any) references used at the bottom of the notebook.
- Ensure that each of the cells of code in the final Jupyter notebooks have been Run for output (Except for the hyper-parameter optimization if any).
- The two models (I and II) have been entered in two separate notebooks.
- Name the files by your name as ”YourName 1.ipynb” and ”YourName 2.ipynb”
- It is your responsibility to Ensure that the correct files are being submitted, and the file extensions are in the correct format (.ipynb).
- Submission will be via Canvas, and late submissions will be penalized.
Evaluation
The primary emphasis will be on the depth and thoroughness of your approach to the problem. Key areas of focus will include:
* Data Exploration: Demonstrating a thorough investigation of the data, exploring different analytical possibilities, and thoughtfully selecting the best course of action.
* Implementation: Translating your chosen approach into clean and efficient code.
* Machine Learning Process: Executing the machine learning process correctly and methodically, ensuring proper data handling, model selection, and evaluation.
*Critical Analysis: Identifying any limitations of the approach, suggesting potential improvements, and making relevant statistical inferences based on the results.