CS5062 – Machine Learning

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

School of Natural and Computing Sciences

Department of Computing Science

MSc in Artificial Intelligence

2024 – 2025

Assessment Item 2 of 2 Briefing Document – Individually Assessed (no teamwork)

Title: CS5062 – Machine Learning

Note: This assessment accounts for 50% of your total mark of the course.

Learning Outcomes

On successful completion of this component a student will have demonstrated competence in the following aspects:

• Have knowledge and understanding of fundamentals of machine learning, including a range of popular machine learning algorithms.

• Be able to use existing machine learning tools, frameworks, and libraries to build solutions for real-world or benchmark problem solving.

• Be able to perform data pre-processing for machine learning.

• Be able to critically examine the strengths and limitations of machine learning algorithms when solving a specific problem.

• Be able to write reports for machine learning solutions.

Application Problem Definition: Predict Battery Status of an Electric Vehicle

The objective of this task is to analyse a real-world dataset for predicting the battery status of an electric vehicle (EV). For the sake of reducing time for training and running a model, only speed is used to predict the battery status. EVs are viewed as an attractive option to reduce greenhouse gas emissions and fuel consumption, but battery management remains a challenge for EV’s. In this application, we will predict battery status which is significant for battery management. The dataset can be downloaded from MyAberdeen. It is based on data from a research project that investigates how speed affects battery status. The dataset includes speed profiles and the corresponding battery status for a number of trips, which will need to be utilised throughout this assessment. . The task is to develop a set of classification models for automatically predicting battery status, based on speeds which are sequential sensing data (feature). Please notice that the EV has electrical energy recovery systems so the power can be charging or discharging. No prior knowledge of the domain problem is needed or assumed to fulfil the requirements of this assessment.

Feature information in the dataset include:

• Speed (only one feature)

Labels in the dataset include:

• The corresponding battery status (charging: “1” or discharging: “0”)

Activities: During the data collection, speeds were recorded and then the data are labelled by the power according to battery voltage and current.

The unit of speed is kilometer per hour (km/h).

Report Guidance & Requirements

Your report must conform to the below structure and include the required content as outlined in each section. Each subtask has its own marks allocated. You must supply a written report, along with the corresponding code, containing all distinct sections/subtasks that provide a full critical and reflective account of the processes undertaken.

This assessment centres around machine learning and focuses on classification problem, which is an important problem that machine learning experts are facing in real-world situations. For each trip, it is a time series data. The task requires you to expand and elaborate upon the principles of machine learning on how these techniques can be used in real-world problem – “Battery Status Prediction of an EV”.

Task 1: Develop learning-based model(s) for Classification (17/17)

The problem we aim at tackling has been clearly described and defined earlier. This task includes a number of subtasks, each of which bears its own marks.

Subtasks:

1. Since the battery status can be charging or discharging, it can be formulated as a static binary classification problem. Then, static classification methods, including naïve Bayes classifier (BN), k-nearest neighbour (KNN), ensemble learning (EL), and support vector machines (SVM), can be used to tackle the problem. Please briefly explain how these methods can be used for a classification problem and build model(s) to classify the battery status. (8 marks)

a. Methods: Please use your own word to describe how BN, KNN, EL, and SVM works (approx. 200 words in total).

b. Data preparation and import: Please provide a short description of data about the number of data samples in the training set and testing set; show how to import the data into your programming environment; provide snippets of code for these purposes.

c. At least, you need to implement one above mentioned model.

d. Implementation Details: The settings of implementing model(s) can be chosen by yourself while you should clearly report the settings of the developed model(s)"; provide snippets of code for these purposes.

2. To inspect the results, the confusion matrix of each model, based on the predictions of your developed model(s) and corresponding labels in the provided dataset, must be plotted. Additionally, use the following five metrics to report the model’s performance, i.e. Precision, Recall, Accuracy, and F1-score and Area under the curve (AUROC). When reporting performance, please only use the test set to evaluate the performance. (5 marks)

3. Justification and evaluation: you are required to analyse, comment, and elaborate on your findings of the experimental results. (4 marks)

a. You can provide some deep explanation on why the experimental results are obtained. For example, the different parameter settings of models (e.g. Gaussian NB, the value of k, ensemble strategies, kernel functions) may lead to different results. You can have a comprehensive discussion about it. This is only an example. You can find other points to have a deep exploration.

b. Then, select one experimental output from 3.a to briefly explain how the selected method is used for classifying charging and discharging along with using feature (speed) and their principles of classification (approx. 200 words).

Task 2: Develop recurrent neural network(s) for sequence-to-sequence classification (33/33)

The problem we aim at tackling has been clearly described and defined earlier. This task includes a number of subtasks, each of which bears its own marks.

Subtasks:

4. Develop recurrent neural network(s) to classify the sequential data provided in the dataset. The network(s) can be obtained by using simple RNN, LSTM, GRU and bi-directional recurrent network. The settings of implementing model(s) can be chosen by yourself, such as the number of layers, activation functions, and optimisers, dropout rate, etc. However, different settings may lead to different performance. You should provide a comprehensive explanation on how to implement the model(s) including the steps of importing data, processing data, and building model(s). Please notice it suggests to include at least one recurrent layer and fully-connected layer when building the RNN(s). (12 marks).

a. Data preparation and import: Please provide a short description of data (e.g., the range of speeds, the range of the length of trips, the number of trips within training and testing datasets, respectively); show how to import the data into your programming environment; provide snippets of code for these purposes.

b. Data pre-processing: show how to process raw data to the input format of RNN(s); provide snippets of code for these purposes.

c. At least, you need to implement one above mentioned RNN

d. Implementation Details: The settings of implementing model(s) can be chosen by yourself while you should clarify the settings of developed model(s). Please also explain the network architecture of developed model(s); provide snippets of code for these purposes.

5. To inspect the results, the confusion matrix of each model, based on the predictions of your developed model(s) and corresponding labels in the provided dataset, must be plotted as well. Use the following five metrics to report the model’s performance, i.e. Precision, Recall, Accuracy, F1-score and Area under the curve (AUROC). When reporting performance, please only use the test set to evaluate the performance (5 marks)

6. Justification and evaluation: you are required to analyse, comment, and elaborate on your findings of the experimental results. Ideally, you can provide some deep explanation on why the experimental results are obtained. (8 marks)

a. If you only implemented one model, you can have a evaluation with different parameters settings. However, please do not only mention the improvement of the percentages of accuracy, recall, etc. For example, if you find “adam” is better than “SGD”, you need to explain why adam is better than SGD. The reasons behind the observation(s). You can have a comprehensive discussion about it. This is only an example, and you can find other points to have a deep exploration. Only presenting the improvement of metrics is not enough.

b. For deep explanation, two or more models are implemented so that you can do the comparison in principle level.

7. Extension (8 marks):

a. Please select one method (e.g., BN, KNN, EL, and SVM) to compare with one RNN that you used above. Then, you need to conduct a comprehensive analysis about their differences, strengths, and limitations. Tips: you can use experimental results obtained from Task 1 and 2 to analyse and compared these two methods in a more solid way. (approx. 400 words).

b. Please highlight your ideas and thoughts on how to extend LSTM. (up to 400 words).

Useful Information

• Please describe and justify each step that is needed to reproduce your results by using code snippets, screenshots and plots. When using screenshots or plots generated in Python please make sure they are clearly readable

• As the provided dataset is a subset of a real-life problem, the performance expected might not be as high as you might think. Therefore, as long as your implementations and justifications are correct the performance achieved will not have any effect on your marks whatsoever

• If you use open source code, you must point out where it was obtained from (even if the sources are online tutorials or blogs) and detail any modifications you have made to it in your tasks. You should mention this in both your code and report. Failure to do so will result in zero marks being awarded on related (sub)tasks

Marking Criteria

• Quality of the report, including structure, clarity, and brevity

• Reproducibility. How easy is it for another MSc AI student to repeat your work based on your report and code?

• Quality of your experiments, including design and result presentation (use of figures and tables for better reporting)

• Configured to complete the task and the parameter tuning process (if needed)

• In-depth analysis of the results generated, including critical evaluation, insights into data, and significant conclusions

• Quality of the source code, including the documentation of the code




发表评论

电子邮件地址不会被公开。 必填项已用*标注