Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Assignment Sheet
|
Unit Name |
Introduction to Data Science |
|
Unit Code |
FIT 1043 |
|
Unit Teacher Name |
Ts. Dr. Sicily Ting |
|
Assignment Name |
Assignment 2 (20%) |
|
Aim of this assignment |
to conduct predictive analytics, by building predictive models on a dataset using Python in the Jupyter Notebook environment |
Learning Outcomes
|
Learning Outcome Number |
Learning Outcome Description |
|
5 |
Classify the kinds of data analysis and statistical methods available for a data science project; |
|
6 |
Locate suitable resources, software and tools for a data science project. |
Weighting
Requirements
|
Assignment Type |
Individual Task (20%) |
|
Response Format / Hand-in Requirement |
There are 2 submissions for this, they are
● Moodle submission
● Kaggle
submission (Refer to the Moodle’s announcement on this)
1.) Moodle Submission:
○ Submit the following 2 files (including a Jupyter notebook file (.ipynb) containing your Python code, answers and explanations(if required) to all the questions, and CSV file for your prediction in task A4 respectively)
1. Jupyter notebook file (.ipynb) containing your Python code to all the questions respectively
a. A copy of your working Python code to answer the questions.
b. make use of markdown for any observation explanation/ justification.
2. A csv file of your predictions in task A4
2. Kaggle Submission
The purpose of the Kaggle submission is to provide you with an introductory experience on how machine learningmodels are evaluated.
Another file, called the “FIT1043-MusicGenre-Submission.csv” consists of data where there are no labels (no ‘music_genre’ column). The whole purpose is to be able to predict those labels for this data set.
You are to output the data to a CSV file that contains 6490 rows (6491 if include the headers) and 2 columns, the column “instance_id” and another column named “music_genre”.
A sample file withoutthe ‘music_genre’ entries is also available “99999999-YourName-v1.csv”.
|
|
Response Specifications |
1.) Moodle Submission Link:
2 separate files (i.e., .ipynb file, and csv file). Zip, rar or any other similar file compression format is not acceptable and will have a penalty of 10%.
2.) Kaggle’s Submission - the csv file with 2 columns (ref. “99999999-YourName-v1.csv”
|
|
Due Date |
11.55pm (MYT), Thursday (8 May 2025), Week 9 |
|
Disclaimer |
Generative AI tools cannot be used for any assessments in this unit.
In this unit, you must not use generative artificial intelligence (AI) to generate any materials or content in relation to your assessment. (see Learn HQ) |
|
Notes: |
The main submission must be done via the Moodle site’s submission link.
Kindly refer back to the late penalty on the Assessment tab of Moodle site/ the rubric file. |
|
Sanity Checks |
● After you are done with the tasks, do sanity checks.
● Make sure that your submission contains everything we've asked for.
|
Aim
This assignment will test your ability to:
● Read and describe the data using basic statistics,● Split the dataset into training and testing,● Conduct multi-class classification using Support Vector Machine (SVM)**,● Evaluate and compare predictive models,● Explore different datasets and select a particular dataset that meets certain criteria● Conduct clustering using k-means
Data
Format: each file is a single comma separated (CSV) file
Description: These two datasets were derived from a list containing features of the list of songs and their music genre.
Columns: There should be 15 columns consisting of the features of the song and the class/label of the song (Hint: the music_genre column)
|
Column Header |
Description |
|
instance_id |
an unique ID assigned for each entry |
|
artist_name |
the name of the artist |
|
track_name |
the name/title of that song |
|
popularity |
The popularity of the track. The value will be between 0 and 100, with 100 being the most popular. |
|
acousticness |
A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic |
|
danceability |
Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
|
duration_ms |
The duration of the track in milliseconds. |
|
energy |
Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. |
|
instrumentalness |
Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. |
|
liveness |
Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. |
|
loudness |
The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. |
|
speechinesss |
Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words.Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. |
|
tempo |
The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. |
|
valence |
A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
|
music_genre |
Music genres are represented by the following code:
0 - Alternative
1 - Anime
2 - Blues
3 - Classical
4 - Country
5 - Electronic
6 - Hip-hop
7 - Jazz
8 - Rap
9 - Rock
|
Assignment Tasks:
This assignment is worth 20% of this Unit’s assessment. This assignment has to be done using the Python programming language in the Jupyter Notebook environment. It should also be formatted properly using the Markdown language. Below is an example from a pastsubmission. Note: You need to use Python to complete all tasks.
Good practice:
Example 2