Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
DSCI 552: Machine Learning for Data Science (Summer 2024
Units: |
4 |
|
Instructor: |
Mohammad Reza Rajati, PhD PHE 412 [email protected] – Include DSCI 552 in subject. |
|
Office Hours: |
Right after the lecture, by appointment |
|
Webpage: |
Personal Homepage at Intelligent Decision Analysis |
|
TA(s): |
Will be introduced on Piazza. |
|
Lecture |
Tuesday, Wednesday, Thursday, 3:30 pm –5:20 pm OHE 100D & Online |
|
Webpages: |
Piazza Class Page for discussions, announcements, and course materials and USC DEN Class Page for exams and grades and GitHub for code submission
– All HWs, handouts, solutions will be posted in PDF format
– Student has the responsibility to stay current with webpage material
|
|
Prerequisite: |
Prior courses in multivariate calculus, linear algebra, probability, and statistics.
– This course is a prerequisite to DSCI 558.
|
|
Other Requirements: |
Computer programming skills.
Using Python is mandatory.
Students must know Python or must be willing to learn it.
|
|
Tentative Grading: |
Assignments 45%
Midterm 1 20%
Midterm 2 25%
Final Project 10%
Participation on Piazza* 5%
|
|
Letter Grade Distribution: |
≥ 93.00 A
90.00 - 92.99 A- 87.00 - 89.99 B+ 83.00 - 86.99 B 80.00 - 82.99 B-
77.00 - 79.99 C+
|
73.00 - 76.99 C 70.00 - 72.99 C- 67.00 - 69.99 D+ 63.00 - 66.99 D 60.00 - 62.99 D- ≤ 59.99 F |
Disclaimer: Although the instructor does not expect this syllabus to drastically change, he reserves every right to change this syllabus any time in the semester.
Note on e-mail vs. Piazza: If you have a question about the material or logistics of the class and wish to ask it electronically, please post it on the piazza page (not e-mail). Often times, if one student has a question/comment, other also have a similar question/comment. Use private Piazza posts with the professor, TA, graders only for issues that are specific to your individually (e.g., a scheduling issue or grade issue). Minimize the use of email to the course staff and only use it when absolutely necessary.
Catalogue Description: Practical applications of machine learning techniques to real-world problems. Uses in data mining and recommendation systems and for building adaptive user interfaces.
Course Description: This is a foundational course with the primary application to data analytics, but is intended to be accessible both to students from technical backgrounds such as computer science, computer engineering, electrical engineering, or mathematics; and to students from less technical backgrounds such as business administration, communication, accounting, various medical specializations including preventative medicine and personalized medicine, genomics, and man agement information systems. A basic understanding of engineering and/or technology principles is needed, as well as basic programming skills, sufficient mathematical background in probability, statistics, and linear algebra.
• Final Project Due: Wednesday, July 24, 5:30 PM. Grace period: the project can be submitted until 11:59 PM of the same day with 30% penalty. Any change in the project after the deadline is considered late submission. One second late is late. The project is graded based on when it was submitted, not when it was finished. Homework late days cannot be used for the project.
Important Note: Please make absolutely sure that you can make the above dates. No make-up exams can be offered for any reason whatsoever. Moreover, no online exam will be offered to on-campus students for any reason. If a student misses Midterm 1 due to a valid reason (e.g., documented medical or family emergency), the grade of Midterm 2 will be considered as the grade of Midterm 1. If a student misses Midterm 2 due to a valid reason, they will receive a grade of IN (Incomplete) and they must take the exam in the next semester with the students of that semester. Unexcused absence in an exam warrants a grade of zero.
6.5 = 89.62 instead of (10+65+80+85+90+95+100+100)/8=78.13. This policy makes up for missing assignments because of heavy workload, sickness, etc. Remember that if you miss an assignment because of heavy workload in other courses and then miss another one because of sickness, only the second assignment’s grade will be completely dropped from your score. Be aware of this when you decide not to submit an assignment, because later you may become sick.
Wednesday |
Thursday |
Tuesday |
May 15th 1
Introduction to Statistical
Learning (ISLR Chs.1,2,
ESL Chs.1,2)
Motivation: Big Data
Supervised vs. Unsupervised
Learning
|
16th
2
Introduction to Statistical
Learning (ISLR Chs.1,2,
ESL Chs.1,2)
Regression, Classification
The Regression Function
Nearest Neighbors
|
21st
3
Introduction to Statistical
Learning (ISLR Chs.1,2,
ESL Chs.1,2)
Model Assessment
The Bias-Variance Trade-off
No Free Lunch Theorem
|
22nd
4
Linear Regression (ISLR
Ch.3, ESL Ch. 3)
Estimating Coefficients
Estimating the Accuracy of
Coefficients
|
23rd
5
Linear Regression (ISLR
Ch.3, ESL Ch. 3)
Variable Selection and
Hypothesis Testing
Multiple Regression
Analysis of Variance and the
F Test
|
28th
6
Linear Regression (ISLR
Ch.3, ESL Ch. 3)
Stepwise Variable Selection
Qualitative Variables
|
29th
7
Classification (ISLR Ch. 4,
ESL Ch. 4)
Multi-class and Multi-label
Classification
Logistic Regression
Class Imbalance
Hypothesis Testing and
Variable Selection
|
30th
8
Classification (ISLR Ch. 4,
ESL Ch. 4)
Subsampling and Upsampling
SMOTE
Multinomial Regression
Bayesian Linear Discriminant
Analysis
|
June 4th
9
Classification (ISLR Ch. 4,
ESL Ch. 4)
Measures for Evaluating
Classifiers
Quadratic Discriminant
Analysis*
Comparison with K-Nearest
Neighbors
The Na¨ıve Bayes’ Classifier
Text Classification
Feature Creation for Text
Data
Handling Missing Data
|
5th
10
Resampling Methods
(ISLR Ch. 5, ESL Ch. 7)
Model Assessment
Validation Set Approach
Cross-Validation
The Bias-Variance Trade-off
for Cross-Validation
The Bootstrap
Bootstrap Confidence
Intervals
|
6th
11
Linear Model Selection
and Regularization (ISLR
Ch.6, ESL Ch. 3)
Subset Selection
AIC, BIC, and Adjusted R2 )
Shrinkage Methods
Ridge Regression
|
11th
12
Linear Model Selection
and Regularization (ISLR
Ch.6, ESL Ch. 3)
The LASSO
Elastic Net
Dimension Reduction
Methods*8
|
12th
13
Tree-based Methods
(ISLR Ch. 8, ESL Chs. 9, 10)
Regression and Classification
Trees
Cost Complexity Pruning
|
13th
14
Tree-based Methods (ISLR
Ch. 8, ESL Chs. 9, 10, 16)
Bagging, Random Forests,
and Boosting
|
18th
15
Support Vector Machines
(ISLR Ch. 9, ESL Ch. 12)
Maximal Margin Classifier
Support Vector Classifiers
|
19th
Juneteenth
|
20th
16
Support Vector Machines
(ISLR Ch. 9, ESL Ch. 12)
The Kernel Trick
Support Vector Machines
L1 Regularized SVMs
Multi-class and Multilabel
Classification
The Vapnik-Chervonenkis
Dimension*
Support Vector Regression
|
25th
17
Unsupervised Learning
(ISLR Ch. 12, ESL Ch. 14)
K-Means Clustering
Hierarchical Clustering
|
26th
18
Unsupervised Learning
(ISLR Ch. 12, ESL Ch. 14)
Practical Issues in Clustering
|
27th
19
Unsupervised Learning
(ISLR Ch. 12, ESL Ch. 14)
Principal Component
Analysis
Anomaly Detection*
Association Rules*
Mixture Models and Soft
K-Means*
|
July 2nd
20
Active and
Semi-Supervised Learning
Semi-Supervised Learning
Self-Training
Co-Training
Yarowsky Algorithm
Refinements
Active vs. Passive Learning
Stream-Based vs. Pool-Based
Active Learning
Query Selection Strategies
|
3rd
21
Neural Networks and
Deep Learning (ISLR Ch.
10, ESL Ch. 11, DL Ch. 6)
The Perceptron
Feedforward Neural Networks
Backpropagation and
Gradient Descent
Overfitting
|
4th
Independence Day
|
9th
22
Neural Networks and
Deep Learning (DL Chs. 6,
7)
Autoencoders and Deep
Feedforward Neural Networks
Regularization
Early Stopping and Dropout
Adversarial Training
|
10th
23
Neural Networks and
Deep Learning (ISLR Ch.
12, DL Chs. 9, 10)
Convolutional Neural
Networks
Sequence Modeling
Recurrent Neural Networks
|
11th
24
Neural Networks and
Deep Learning (ISLR Ch.
12, DL Ch. 10)
Sequence-to-Sequence
Modeling*
Long Short Term Memory
(LSTM) Neural Networks
|
16th
25
Hidden Markov Models
(AL Ch. 15)
Principles
The Viterbi Algorithm
|
17th
26
Reinforcement Learning*
Definitions
Task-Reward-Policy
Formulation
Total Discounted Future
Reward
Optimal Policy
Value Function
Q-Function
The Bellman Equation
Q-Learning
Exploration- Exploitation
Temporal Difference Learning
Extensions to Stochastic
Environments and Rewards
Deep Reinforcement Learning
|
18th
27
Fuzzy Systems*
Fuzzy Sets
Set Operations
T-norms, T-conorms, and
Fuzzy complements
Cylindrical Extensions and
Fuzzy Relations
Fuzzy If-Then Rules as
Association Rules
|
23rd
28
Fuzzy Systems*
Inference from Fuzzy Rules
Fuzzification and
Defuzzification
Learning Fuzzy Rules from
Examples
The Wang-Mendel Algorithm
Fuzzy C-Means Clustering
|
Homework and Project Due Dates
May 20th 1
Homework 0 Due (not graded)
27th 2
Homework 1 Due (Moved to Tuesday May 28)
June 3rd 3
Homework 2 Due
10th 4
Homework 3 Due
17th 5
Homework 4 Due
24th 6
Homework 5 Due
July 1st 7
Homework 6 Due
8th 8
Homework 7 Due
15th 9
Homework 8 Due
22nd 10
Final Project Due (moved to Wednesday July 24)
Monday
Using Generative AI and Large Language Models:
Use of AI and specifically Large Language Models (LLMs) is allowed. However, it is only allowed as a tool to assist in learning. That is to say, that you may use AI models such as ChatGPT or Claude 2 to help understand the assignments, to ask generic questions about programming and to generate code samples that could be of use to explain how certain programming constructs work.
Submitting assignments completely generated by AI is strictly prohibited and when discovered will be awarded 0 points for the assignment. We will be utilizing additional software to check for code generated by an AI. You must also specify which part of each assignment was done using help from AI.
Students and Disability Accommodations:
Counseling and Mental Health - (213) 740-9355 – 24/7 on call studenthealth.usc.edu/counseling
Free and confidential mental health treatment for students, including short-term psychotherapy, group counseling, stress fitness workshops, and crisis intervention.
Relationship and Sexual Violence Prevention Services (RSVP) - (213) 740-9355(WELL), press “0” after hours – 24/7 on call studenthealth.usc.edu/sexual-assault
Free and confidential therapy services, workshops, and training for situations related to gender based harm.
Office for Equity, Equal Opportunity, and Title IX (EEO-TIX) - (213) 740-5086 eeotix.usc.edu
Information about how to get help or help someone affected by harassment or discrimination, rights of protected classes, reporting options, and additional resources for students, faculty, staff, visitors, and applicants.
Reporting Incidents of Bias or Harassment - (213) 740-5086 or (213) 821-8298 usc-advocate.symplicity.com/care report
Avenue to report incidents of bias, hate crimes, and microaggressions to the Office for Equity, Equal
Opportunity, and Title for appropriate investigation, supportive measures, and response.
The Office of Student Accessibility Services (OSAS) - (213) 740-0776 osas.usc.edu
OSAS ensures equal access for students with disabilities through providing academic accommodations and auxiliary aids in accordance with federal laws and university policy.
USC Campus Support and Intervention - (213) 821-4710 campussupport.usc.edu
Assists students and families in resolving complex personal, financial, and academic issues adversely affecting their success as a student.
Diversity, Equity and Inclusion - (213) 740-2101 diversity.usc.edu
Information on events, programs and training, the Provost’s Diversity and Inclusion Council, Diver sity Liaisons for each academic school, chronology, participation, and various resources for students.
USC Emergency - UPC: (213) 740-4321, HSC: (323) 442-1000 – 24/7 on call dps.usc.edu,emergency.usc.edu
Emergency assistance and avenue to report a crime. Latest updates regarding safety, including ways in which instruction will be continued if an officially declared emergency makes travel to campus infeasible.
USC Department of Public Safety - UPC: (213) 740-6000, HSC: (323) 442-120 – 24/7 on call dps.usc.edu Non-emergency assistance or information.
Office of the Ombuds - (213) 821-9556 (UPC) / (323-442-0382 (HSC) ombuds.usc.edu
A safe and confidential place to share your USC-related issues with a University Ombuds who will work with you to explore options or paths to manage your concern.
Occupational Therapy Faculty Practice - (323) 442-3340 or [email protected] chan.usc.edu/otfp
Confidential Lifestyle Redesign services for USC students to support health promoting habits and routines that enhance quality of life and academic performance.