Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Assignment Sheet

Unit Name	Introduction to Data Science
Unit Code	FIT 1043
Unit Teacher Name	Ts. Dr. Sicily Ting
Assignment Name	Assignment 2 (20%)
Aim of this assignment	to conduct predictive analytics, by building predictive models on a dataset using Python in the Jupyter Notebook environment

Learning Outcomes

This assignment assesses the following learning outcomes:

Learning Outcome Number	Learning Outcome Description
5	Classify the kinds of data analysis and statistical methods available for a data science project;
6	Locate suitable resources, software and tools for a data science project.

Weighting

This assignment is worth [20%] of your overall grade for this unit.

Requirements

This assignment has the following requirements:

Assignment Type	Individual Task (20%)
Response Format / Hand-in Requirement	There are 2 submissions for this, they are ● Moodle submission ● Kaggle submission (Refer to the Moodle’s announcement on this) 1.) Moodle Submission: ○ Submit the following 2 files (including a Jupyter notebook file (.ipynb) containing your Python code, answers and explanations(if required) to all the questions, and CSV file for your prediction in task A4 respectively) 1. Jupyter notebook file (.ipynb) containing your Python code to all the questions respectively a. A copy of your working Python code to answer the questions. b. make use of markdown for any observation explanation/ justification. 2. A csv file of your predictions in task A4 2. Kaggle Submission The purpose of the Kaggle submission is to provide you with an introductory experience on how machine learningmodels are evaluated. Another file, called the “FIT1043-MusicGenre-Submission.csv” consists of data where there are no labels (no ‘music_genre’ column). The whole purpose is to be able to predict those labels for this data set. You are to output the data to a CSV file that contains 6490 rows (6491 if include the headers) and 2 columns, the column “instance_id” and another column named “music_genre”. A sample file withoutthe ‘music_genre’ entries is also available “99999999-YourName-v1.csv”.
Response Specifications	1.) Moodle Submission Link: 2 separate files (i.e., .ipynb file, and csv file). Zip, rar or any other similar file compression format is not acceptable and will have a penalty of 10%. 2.) Kaggle’s Submission - the csv file with 2 columns (ref. “99999999-YourName-v1.csv”
Due Date	11.55pm (MYT), Thursday (8 May 2025), Week 9
Disclaimer	Generative AI tools cannot be used for any assessments in this unit. In this unit, you must not use generative artificial intelligence (AI) to generate any materials or content in relation to your assessment. (see Learn HQ)
Notes:	The main submission must be done via the Moodle site’s submission link. Kindly refer back to the late penalty on the Assessment tab of Moodle site/ the rubric file.
Sanity Checks	● After you are done with the tasks, do sanity checks. ○ Run the code and make sure it can be run without errors. ○ You should never submit code that immediately generates an error (warnings are usually fine) when run! ● Make sure that your submission contains everything we've asked for.

Aim

The main objective of Assignment 2 is to conduct predictive analytics, by building predictive models on a dataset using Python in the Jupyter Notebook environment.

This assignment will test your ability to:

● Read and describe the data using basic statistics,

● Split the dataset into training and testing,

● Conduct multi-class classification using Support Vector Machine (SVM)**,

● Evaluate and compare predictive models,

● Explore different datasets and select a particular dataset that meets certain criteria

● Conduct clustering using k-means

** Not taught in this unit, you are to explore and elaborate these in your report submission. This will be a mild introduction to life-long learning to learn by yourself.

Data

We will explore the following datasets in Part A (plus a dataset of your choice in Part B):

1. FIT1043-MusicGenre-Dataset.csv

2. FIT1043-MusicGenre-Submission.csv

Format: each file is a single comma separated (CSV) file

Description: These two datasets were derived from a list containing features of the list of songs and their music genre.

Columns: There should be 15 columns consisting of the features of the song and the class/label of the song (Hint: the music_genre column)

Column Header	Description
instance_id	an unique ID assigned for each entry
artist_name	the name of the artist
track_name	the name/title of that song
popularity	The popularity of the track. The value will be between 0 and 100, with 100 being the most popular.
acousticness	A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
danceability	Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
duration_ms	The duration of the track in milliseconds.
energy	Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.
instrumentalness	Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
liveness	Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
loudness	The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.
speechinesss	Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words.Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
tempo	The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
valence	A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
music_genre	Music genres are represented by the following code: 0 - Alternative 1 - Anime 2 - Blues 3 - Classical 4 - Country 5 - Electronic 6 - Hip-hop 7 - Jazz 8 - Rap 9 - Rock

This data is pre-processed data that was extracted from Spotify and provided on Kaggle. You DO NOT have to download or process/wrangle the data from the original source.

Assignment Tasks:

This assignment is worth 20% of this Unit’s assessment. This assignment has to be done using the Python programming language in the Jupyter Notebook environment. It should also be formatted properly using the Markdown language. Below is an example from a pastsubmission. Note: You need to use Python to complete all tasks.

Example 1

This example has a code cell, the output, which is a rather nice pie chart (with some labels that aren’t ideal) and a short explanation.

Good practice:

As good practice, you should start your assignment by providing the title of the assignment and unit code, your name and student ID, e.g.

Example 2

This is also a sample from past submissions..

文章

FIT 1043 ntroduction to Data Science