IS 327 Course Project Requirement
Project Objective:
To design and implement a machine learning model using a dataset from the UCI Machine
Learning Repository. The project aims to allow students to apply basic machine learning concepts and techniques learned during the course to solve a real-world problem.
Students will work in pairs to promote collaboration and knowledge sharing.
Team Composition:
1. Team Size: Students are allowed to form teams of up to two members. Solo projects are also permitted if a student prefers to work individually.
2. Collaboration: Teams are encouraged to collaborate effectively, dividing tasks and responsibilities in a manner that leverages each member's strengths. All team members must contribute significantly to the project.
3. Registration: Teams must register by submitting the names and student IDs of all team members along with their project proposal. Changes to team composition after registration are not permitted without prior approval from the instructor.
Dataset:
Students are required to select a dataset from the UCI Machine Learning Repository (https://archive.ics.uci.edu/). The chosen dataset must meet the following criteria:
1. It should be classified under the "Classification" or "Regression" task.
2. The dataset must not have been used in any example or assignment during the course.
3. The dataset should have at least 5 attributes.
4. Datasets with missing values are allowed, but students must implement a strategy to handle them.
Project Requirements:
1. Proposal Submission: Submit a 1-page proposal that includes the chosen dataset, the problem statement, and a preliminary plan of action. The proposal must be approved by the instructor before proceeding (See submission guideline below).
2. Data Preprocessing: Implement data cleaning and preprocessing techniques. This includes handling missing values, normalizing or standardizing data, and feature selection or extraction, as necessary.
3. Model Selection and Implementation:
a. Explore at least three different machine learning models suitable for the task (e.g., linear regression, logistic regression, SVM, decision trees...). Check for "Regressor" and "Classifier" models in sklearn for different tasks.
b. Document the reason for choosing a particular model as the final model.
c. Implement cross-validation to select hyperparameters.
4. Evaluation: Choose appropriate metrics to evaluate the model’s performance. For classification tasks, consider accuracy, precision, recall, and F1 score. For regression tasks, consider R-squared, MSE, RMSE, and MAE.
5. Report: Prepare a final report that includes:
a. An introduction to the problem and dataset.
b. A detailed description of the data preprocessing steps.
c. An overview of the explored models and rationale for the selected model.
d. Evaluation results and interpretation.
e. Conclusion and possible improvements if the project were to be continued.
6. Code: Submit well-documented code that includes comments explaining the logic behind major steps and decisions.
Rules:
1. Collaboration: Students are encouraged to discuss ideas and strategies but must implement their code independently within their teams.
2. External Libraries: The use of external libraries is allowed but should be properly cited. The core implementation should be the students' original work.
Submission Guidelines:
The project will have two main submission milestones: the project proposal and the final report and code submission.
1. Project Proposal:
a. Due Date: April 12, Friday
b. Length: at least 1 page (font size at most 12)
c. Content: The proposal must include the selection of the dataset from the UCI Machine Learning Repository, a clear statement of the problem being addressed, and an initial plan outlining the approach to data preprocessing, model selection, and evaluation strategies.
d. Submission Format: The proposal should be submitted as a PDF document.
e. Where to Submit: Proposals should be uploaded to the assignment "Project Proposal" on Canvas.
2. Final Report and Code Submission:
a. Due Date: May 8, Wednesday
b. Length: at least 3 pages (font size at most 12)
c. Content:
i. Report: The final report should include an introduction to the problem and dataset, detailed descriptions of data preprocessing, explored models with rationales for the final choice, evaluation results with interpretations, conclusions, and suggestions for future work.
ii. Code: Submit all code files necessary to replicate your project's results, including data preprocessing, model training, and evaluation. The code must be well-documented and organized.
d. Submission Format: The report should be submitted in PDF format, and the code should be zipped into a single file containing all necessary scripts and documentation.
e. Where to Submit: The final report and code should be uploaded to the assignment "Project Report" on Canvas.
Evaluation Criteria:
• Projects will be evaluated based on the completeness of the implementation, adherence to project requirements, originality of approach, performance of the model, quality of the report, and code readability and documentation.
• This project requirement aims to provide a comprehensive learning experience by applying theoretical knowledge to practical problems, fostering teamwork, and encouraging critical thinking and innovation.