machine learning predictive

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

PROJECT 4

Deadline:
Submit by midnight 3 th June 2025
Evaluation:
35% of your final course grade.
Late Submission:
No late submissions accepted since this is the of the semester.
Work
This assignment is to be done in groups of up four students. You will need to fill out and submit a form (to be provided) indicating your contribution to the project. You will be asked to evaluate your group members’ as well as your contribution to the project. Identical grades are not guaranteed for each student in a group.
Purpose:
To work in a group setting and to apply machine learning, data mining, visualisation and data sense-making skills learned so far in class, on a chosen real-world problem. Create an artefact/software that demonstrates your work and present this to the class. Learning outcomes 1 - 5 from the course outline.

You are expected to come up with topics for your group at the earliest possible stage so that you can commence work on development. Preferably, discuss your chosen topic and what it is you plan to develop with the teaching staff before commencing work.

Groups must be formed by May 2 and the class Google Doc listing your members and proposed topic must be filled out. https://docs.google.com/spreadsheets/d/1CxgPKnIwzakbml iKiz1toatGz45HFQynaLh54RRU2lo/edit#gid=0

PROJECT OUTLINE:

Create a data science-related software product (i.e. web app) based on a chosen real-world problem domain/dataset which generates insights.

You are strongly recommended to develop machine learning predictive and/or forecasting models which are deployed through some software artefact. My recommendation is that you use Streamlit as a front-end for your model outputs. Since we do not have space in this course to teach you how to develop Streamlit apps, we are permitting you to use your favourite LLM to help you write code only for this component of the assignment. You can of course use some other web app that you are familiar with too. At a bare minimum though, you should have a Jupyter notebook which encapsulates and communicates the main aspects of your work.

Some ideas for possible projects:

1. Data from current events in the news cycle – something that is ideally topical and interesting
2. Perhaps something using LLMs in combination with your machine learning models
3. Time-series analysis, time-series forecasting, perhaps using LLM timeseries pretrained models
4. Survival analysis models applied to domains not involving sensitive data
5. Causal analysis modelling (quite advanced though but something good to learn)
6. Recommender engine: create application for making recommendations based on user preferences.
7. Fitness data: analysis of your personal or some group’s FitBit data.
8. X: text classification, semantic analysis, network visualisation, geospatial visualisation etc.
9. Data journalism: data visualisation – implementation of interactive graphs (web enabled), infographics.
10. A live Kaggle competition problem dataset https://www.kaggle.com/competitions (see notes below)
11. ...or something entirely different. Talk to the teaching team if you’re stuck or have doubts about your topic

Topics NOT to cover:

1. Currency markets, BitCoin, share market stock prices
2. Closed Kaggle competition datasets
3. Previously researched topics for which there are existing notebooks
4. Definitely NO to the TITANIC dataset
5. No more COVID-related topics – we’re all sick of this

OTHER NOTES

DATA SOURCES

This is a recommendation, not a requirement: be as original as you can with your data sources. Some datasets are very popular and have come up repeatedly in assignments over the years. Unfortunately, because they are popular there are a lot of online sources that have scripts published for those datasets. In many cases, related assignment submissions involve some form of plagiarism. While the internet is a big place, we have seen a lot of these scripts before and it is easy to catch. Unless you are going to do something genuinely novel with a well-used data source (you will know it is well-used if you can easily find python kernels for it), avoid these data sources. The safest bet is a dataset that is integrated from multiple disparate sources.

WARNING ABOUT CHOOSING A KAGGLE DATASET

Discuss this with the lecturer first if you really want to pursue this. A higher standard is set when marking Kaggle-related submissions. If you use a Kaggle dataset, we recommend you do not look at related Kaggle kernels as there can be a temptation to copy what you see. Copying without attribution is plagiarism which could lead to zero marks for this assignment. Be aware that markers are familiar with Kaggle kernels, in part due to marking assignments for other papers and cohorts. We will also be looking through related kernels prior to marking.

TECHNOLOGY

Again, you are encouraged to use Streamlit as it is Python-based and any LLM will help you develop a simple and functional Streamlit front-end. This does not need to be fancy. In previous years, other web app frameworks have been used depending on each group’s familiarity with various technologies. In previous years, some students have created web-based applications which have both front-end and back-end components that both serve webpages and perform some data science related tasks. You can make this as simple or as complex as you like, but the main point is to focus on the machine learning aspect. It is sufficient that your application runs on localhost but Streamlit lets you deploy online very easily via GitHub.

If you choose to build a GUI based application, Python does possess libraries that facilitate this; however, you can use Qt or technologies like .NET which allows you to call your Python methods that implement the logic in your application.

PRESENTATIONS

We will conduct them in a hybrid manner, both in-person, online and as well as by playing pre-recorded presentations. All on-campus students are expected to attend please. The presentations will take place in week 13 – June 4 at our regular teaching slot. Each person in the group will need to present. The presentations will be short and to the point. We would like you to aim for a presentation using only a handful of slides, lasting up to 10 minutes, or an application demo lasting up to 15 minutes MAX.

Make your presentation interesting. Don't focus on technical details. Consider your audience to be tech-savvy executives. Focus instead on the story that you are trying to tell and sell to the audience/decision makers. The presentations will be marked in part by your peers.

PROJECT REQUIREMENTS:

Make sure you do these four things:
1. One submission per team
2. Submit a separate document (or include this at the top of a notebook) that details what each team member contributed to the assignment. Not all contributors will be awarded the same mark. Each team member must submit their own version of how each team member contributed.
3. Each member of the class will be marked individually
4. Watch and mark others’ presentations

MARKING CRITERIA:

Marks will be awarded for different components of the project using the following rubric:

Component
Requirements
Marks
Project presentation
- You can either make your team presentation live (in person or online) or make a recording and upload your presentation.
20%
Project code

python code (or other non-Python code), Notebooks, application of data science, substance and difficulty of the work undertaken.

- Submit ONE notebook that outlines your entire project work and findings
- Submit all your web app and other supporting code defining your software artefact
50%
Originality, difficulty and creativity
- There needs to be some level of novelty in your work
- Given that this project represents 35% of your mark, it needs to reflect a substantial level of difficulty and rigour
- Discuss in your submission any relevance your academic reading has had on your project’s methodological choices and design.
25%
Submission of team member contribution document (every team member must submit their own version)
- Submission of documents outlining contributions
- Submission of project presentation marking sheets
5%
Reading Log
Each team member must submit:
- The compiled reading logs for the relevant period.
- The peer discussion summaries for each week.
- Any relevant connections between your readings and your analytical work in the notebook. If a research paper influenced how you approached an implementation, mention it.
PASS


Hand-in: Zip-up all your notebooks, python and other application source files into a single file. Submit this file via stream.

If you have any questions or concerns about this assignment, please ask the lecturer sooner rather than closer to the submission deadline.

Use of Generative AI in This Assignment

The use of generative AI for this project is along the same lines as in previous assignments, being restricted to planning, explanation, and concept development. The only exception being made here is in the development of the software artefact or web app to productionise your work. In this instance, you can leverage LLMs as much as you need.

Allowed Uses of AI for assignment 4

You may use AI along the lines of the following prompts to:
- Build a web app:
• Example: "Write for me a Streamlit app that displays two figures, has a title and an input box"

Prohibited Uses of AI for assignment 4

You must NOT:
  • Copy AI-generated code directly into your submission for analytics components.
  • Input the assignment questions directly into AI and use its responses as your own.
  • Paraphrase AI-generated explanations/code and present them as original work.
  • Ask AI to write step-by-step solutions to any of the assignment tasks except for the web app or other similar tool.

It is mandatory that any assessment items that you submit during your University study are your own work. Massey University takes a firm stance on academic misconduct, such as plagiarism and any form of cheating.

Plagiarism is the copying or paraphrasing of another person’s work, whether published or unpublished, without clearly acknowledging it. It includes copying the work of other students and reusing work previously submitted by yourself for another course. It also includes the copying of code from unacknowledged sources.

Academic integrity breaches impact on students as it disadvantages honest students and undermines the credibility of your qualification. Plagiarism, and cheating in tests and exams will be penalised; it is likely to lead to loss of marks for that item of assessment and may lead to an automatic failing grade for the course and/or exclusion from reenrolment at the University.

Please see the Academic Integrity Guide for Students on the University website for more information. The Guide steps you through the University Academic Integrity Policy and Procedures.

For example, you will find definitions of academic integrity misconduct, such as plagiarism; how misconduct is determined and managed; and where to find resources and assistance to help develop the skills of academic writing, exam preparation and time management. These skills will help you approach university study with academic integrity.


发表评论

电子邮件地址不会被公开。 必填项已用*标注