AI3013 Machine Learning
Course Project
Description:
This is a GROUP project (each group should have 4-5 students), which aims at applying machine learning models as well as machine learning techniques (including but not limited to those covered in our lectures) to solve complex real-world tasks using Python and relevant libraries.
Possible Project Topics (students can choose from the following or propose their own):
1. Medical Imaging Analysis:
. Dataset: A collection of medical images, like X-rays or MRI scans.
. Task: Disease detection or categorization.
2. Financial Fraud Detection:
. Dataset: Transaction data from a financial institution.
. Task: Identify potentially fraudulent transactions.
3. Sentiment Analysis on Large-scale Reviews:
. Dataset: Reviews from platforms like Amazon, IMDb, or Yelp.
. Task: Extract and categorize sentiments. Maybe even predict product or movie success based on early reviews.
4. Healthcare Predictive Analytics:
. Dataset: Medical records, patient data.
. Task: Predict patient readmission, disease outbreak prediction, or early diagnosis of diseases.
5. E-Commerce Product Recommendations:
. Dataset: Historical purchase data from an online retailer.
. Task: Recommend products to users based on their purchase history.
6. Natural Language Processing for Customer Support:
. Dataset: Collection of customer support inquiries.
. Task: Categorize or prioritize support requests or predict the response time.
7. Energy Consumption Prediction:
. Dataset: Historical data of energy consumption from a city or large facility.
. Task: Predict future energy consumption patterns.
8. Agricultural Yield Prediction:
. Dataset: Satellite images, weather data, soil quality data.
. Task: Predict crop yields or detect diseases in crops using image recognition.
9. Urban Planning & Traffic Analysis:
. Dataset: Traffic data, possibly including images from traffic cameras, sensor data, etc.
. Task: Predict traffic congestion or optimal routing.
10. Music Genre Classification:
. Dataset: Audio files or spectrograms of various music tracks.
. Task: Classify them into their respective genres.
You are suggested to pick something that you can get excited and passionate about, e.g., either an application area that you're interested in, or pick some subfield of machine learning that you want to explore more.
Notice on Deep Learning Models:
You may decide to work on Deep learning models, and since our course mainly focus on machine learning models and techniques, deep learning model not be considered as more superior than other machine learning models if you just repeat a model that is designed by others. Also, training deep learning models can be very time consuming, so make sure you have the necessary computing resources.
To undertake the project, the following steps are essential:
1. Form your project group.
2. Select one topic.
3. Survey on existing research on relevant topics by searching related keywords on an
academic search engine such as: http://scholar.google.com.
4. Collect, read, and analyze relevant materials /data.
. An important aspect of designing your project is to identify one or several datasets suitable for your topic of interest. Get the benchmark datasets and validate your learning algorithms on the benchmark datasets is preferred. We don't want you to spend much time collecting raw data.
. If you choose to use prepared datasets (e.g. from Kaggle or iDataScience), we encourage you to do some data exploration and analysis to get familiar with the problem.
5. Design and implement learning algorithms and validate the proposed algorithms on benchmark/collected dataset.
. We expect a solid methodology, comprehensive validation and detail discussion of the experimental results.
. Replicating the results in a paper can be a good way to learn. However, instead of just replicating a paper, also try using the technique on another application, or do some analysis of how each component of the model contributes to final performance.
6. Produce a progress report that includes abstract, introduction, related works and techniques, and methodology.
7. Produce a report and give the presentation.
. A very good project report will be a publishable or nearly-publishable piece of written work. You may read some recent papers and follow the writing styles.
Submission Requirement:
Upon completion, each group must submit the following materials:
1. Progress report
a) Abstract
b) Introduction: problem statement, motivation and background of the topic
c) Related works and existing techniques of the topic
d) Methodology
2. Project report, your report should contain but not limited to the following content:
a) Abstract
b) Introduction: problem statement, motivation and background of the topic
c) Related works and existing techniques of the topic
d) Methodology
e) Experimental study and result analysis
f) Future work and conclusion
g) References
h) Contribution of each team member
3. Link and description to the Dataset and the implementation code.
4. Your report should be a minimum of 9 pages and a maximum of 12 pages, The similarity index for all submitted reports must not exceed 20%.
5. Put all files (including: source code, presentation ppt and project report) into a ZIP file, then submit it on iSpace.
Deadline:
The progress report should be submitted on or before the last day of4th May. The presentation will be arranged on Weeks 13 &14.
Project Report should be submitted on or before the last day ofWeek 14.
Assessment:
In general, projects will be evaluated based on:
. Significance. (Did the authors choose an interesting or a “real" problem to work on, or only a small “toy" problem? Is this work likely to be useful and/or have impact?)
. The technical quality of the work. (i.e., Does the technical material make sense? Are the things tried reasonable? Are the proposed algorithms or applications clever and interesting? Do the student convey novel insight about the problem and/or algorithms?)
. The novelty of the work. (Do you have any novel contributions, e.g., new model, new technique, new method, etc.? Is this project applying a common technique to a well- studied problem, or is the problem or method relatively unexplored?)
. The workload of the project. (The workload of your project may depend on but not limit to the following aspects: the complexity of the problem; the complexity of your method; the complexity of the dataset; do you test your model on one or multiple datasets? do you conduct a thorough experimental analysis on your model?)
Evaluation Percentage:
. Progress Report: 10%
. Final Report: 40%
. Presentation: 30% (Each group will have 15 minutes for presentation, and each student must present no less than two minutes)
. Code: 20%
It is YOUR responsibility to make sure:
. Your submitted files can be correctly opened.
. Your code can be compiled and run.
Late submission = 0; Plagiarism (cheating) = F