Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
COMP5328 - Advanced Machine Learning
Assignment 2
Introduction
Three input datasets are given. For each dataset, the training and validation data contains class-conditional random label noise, whereas the test data is clean. You need to build at least two different classifiers trained and validated on the noisy data, which can have a good classification accuracy on the clean test data. You are required to compare the robustness of the two algorithms to label noise.
For the first two datasets, the transition matrices are provided. You can directly use the given transition matrices for designing classifiers that are robust to label noise.
For the last dataset, the transition matrix is not provided. You are required to build a transition matrix estimator to estimate the transition matrix. Then, employ your estimated transition matrix for classification. Your estimated transition matrix must be included in your final report. Note that to validate the effectiveness of your transition matrix estimator, you could use your estimator on the first two datasets and compare your estimation to the given transition matrices. The code contained in tutorial 9 could be a good starting point.
Data prepossessing is allowed, but please remember to clarify and justify it in the report carefully.
1 A Guide to Using the Datasets
1.1 Attributes Contained in a Dataset
The following code is used to load a dataset and check the shape of its attributes.
import numpy a s np
1.1.1 Training and validation data
The variable Xtr contains the features of the training and validation data. The shape is (n, image shape) where n represents the total number of the instances.
Note that do not use all the n examples to train your models. You are required to independently and randomly sample 80% of the n examples to train a model and use the rest 20% examples to validate the model.
The variable Xts contains features of the test data. The shape is (m, image shape), where m represents the total number of the test instances.
The variable Yts contains the clean labels of the m instances. The class set of the clean labels is also {0, 1, 2, 3}.
1.2 Dateset Description
Number of the training and validation examples n = 24000.
Number of the test examples m = 4000.
The shape of each example image shape = (28 × 28).
1.2.2 FashionMINIST0.6.npz
Number of the training and validation examples n = 24000.
1.2.3 CIFAR.npz
Number of the test examples m = 4000.
The shape of each example image shape = (32 × 32 × 3).
2 Performance Evaluation
To have a rigorous performance evaluation, you need to train each classifier at least 10 times with the different training and validation sets generated by random sampling. Then report both the mean and the standard deviation of the test accuracy.
3 Tasks
3.1 Image Classification with Known Flip Rates
3.2 Image Classification with Unknown Flip Rates
3.3 Report
- In abstract, you should briefly introduce the topic of this assignment, your methods, and describe the organization of your report.
- In introduction, you should first introduce the problem of learning with label noise, and then its significance and applications. You should give an overview of the methods you want to use.
- In related work, you are expected to review the main idea of related label noise methods (including their advantages and disadvantages).
- In methods, you should describe the details of your classification models, including the formulation of the cost functions, the theoretical foundations or views (if any) of the cost functions, and the optimization methods. You should describe the details of the transition matrix estimation methods, the oretical foundations (if any), and optimization algorithms.
- In experiments, you should introduce your experimental setup (e.g., datasets, algorithms, evaluation metric, etc.). Then, you should show the experimental results, compare, and analyze your results. If possible, give your personal reflection or thoughts on these results.
- In conclusion, you should summarize your methods, results, and your in sights for future work.
- In references, you should list all references cited in your report and format ted all references in a consistent way.
- In appendix, you should provide instructions on how to run your code.
- Font: Times New Roman; Title: font size 14; Body: font size 12
- Length: Ideally 10 to 15 pages - maximum 20 pages
4 Submissions
(a) report (a pdf file): the report should include each member’s details (student id and name).(b) code (a compressed folder)
i. algorithm (a sub-folder): your code could be multiple files.ii. data (an empty sub-folder): although two datasets should be inside the data folder, please do not include them in the zip file. We will copy those datasets to the data folder when we test the code.
5 Marking scheme
|
Category |
Criterion |
Marks |
Comments |
|
Report [80] |
Abstract [3]
•problem, methods, and organization
Introduction [6]
•the problem you intend to solve
•the importance of the problem
Previous work [8]
•previous relevant methods used in literature
•their advantages and disadvantages
Label noise methods with known flip
rates [23]
•pre-processing (if any)
•label noise methods’ formulation
•cross-validation method for model selection
or avoiding overfitting (if any)
•experiments
•discussions
Noise rate estimation method [12]
•noise rate estimation method’s formulation
•experiments
•discussions
Label noise methods with unknown flip rates [10]
•pre-processing (if any)
•label noise methods’ formulation (if different from above)
•cross-validation method for model selection or avoiding overfitting (if any)
•experiments
•discussions
Conclusions and future work [3]
•meaningful conclusions based on the results
•meaningful future work suggested
7Presentation [8]
•academic style, grammatical sentences, no spelling mistakes
•good structure and layout, consistent formatting
•appropriate citation and referencing
•use graphs and tables to summarize data
Other [7]
•at the discretion of the assessor: illustrate outstanding comprehensive theoretical analysis, demonstrate the insightful and compre hensive assessment of the significance of their results, provide descriptions and explanations that have depth but clarity, and are concisely worded
|
|
|
|
Code [20] |
•reasonable code running time
•well organized, commented and documented
Note: Marks for each category is indicated in square brackets. The minimum mark for the assignment will be 0 (zero).
|
|
|