DATA301 Project

DATA301 Project

In this series of artifacts, students will complete a short, application focused project using one of the Recommender Systems and Personalization Datasets provided by the University of California San Diego (UCSD) research lab: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Background

Please read through the dataset descriptions and choose one that matches your personal interest. Then read through the project proposal assignment on Learn and the following link on
structuring a research question: https://dissertation.laerd.com/how-to-structure-quantitative-research-questions.php

Sample project code

I am providing a starting point for projects to help. See this minimal complexity (would score 1 on complexity) project code that answers the question “What are the 10 most common words associated with restaurant reviews that are 1, 2, 3, 4, and 5 star?”:
https://colab.research.google.com/drive/1P7fYsSzJj_ZaUh_CH6nBhPfVtdrBWmas#scrollTo=9fe 2Am_MXZ-r

Requirements (and when to meet them!)

1. Students will complete their own project deliverables. Students may join a group that is working on the same data product and may share data loading code (not algorithms) and get help from other students in the course with planning, algorithms, debugging code as long as the individual student writes their own proposal, research question and design, notebook code, and reports.

Please let us know in your final report if you have shared code or worked with anyone during the project (similar to listing code / papers you have used from online sources).

2. Artifacts (things you need to submit – each has its own Learn assignment to submit), project total = 40% of final course grade

a. [6%: April 26 at 11:59PM] Project Proposal including Motivation, Research Question, and Design
b. [4%: May 17 at 11:59PM] Progress Reports (1)
c. [15% May 31 at 11:59PM] Software Implementation
d. [15% May 31 at 11:59PM] Written report including Test Results, Critique of Design and Project Reflection
3. Methods
You will use python and Dask to analyze a recommender/personalization data set. You need to formulate a research question that can be answered by the data set, choose an algorithm from the ones we have studied in the MMDS textbook, and implement this algorithm to perform a data analysis that answers your research question. The easiest way to do this is to pick a problem from a lab assignment and adapt your code to work for your data set and research question.
Subjective requirements: your algorithm must process a sufficiently large portion of the data set and involve inter-relationships between data records. Complexity: algorithms should involve multiple steps or combine multiple algorithms and heuristics / metrics to guide the analysis.
Note: this is NOT a machine learning project so certain algorithms not already covered in DATA301 are discouraged although we have done some predictive analysis and clustering.
Marking and general rubric
For the final written report:
1/1 Abstract / Summary
2/2 Introduction and Motivation
2/2 Experimental Design and Methods
3/3 Results
3/3 Conclusion (suggest 3 paragraphs total, one for each prompt)
2/2 Critique of Design and Project
1/1 Reflection
1/1 References
For the final software submission:
5/5 Working Code [does it run? are their errors and/or limitations?]
10/10 Complexity [note: this is a qualitative measurement by the instructor and tutors of the complexity of your algorithms, methods, and code – this may not always translate to amount of time spent/effort but should if you have grasped key concepts in the course]

发表评论

电子邮件地址不会被公开。 必填项已用*标注