Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CX4240: Introduction to Computational Data Analysis (2024 Spring)
Logistics
- Lecture time: Mons and Weds, 3:30pm-4:45pm
- Location: J. Erskine Love Manufacturing 185
- Instructors: Chao Zhang
- Teaching Assistant: Yue Yu <[email protected]> and Lingkai Kong <[email protected]>
-
Office Hours:
- Instructor: Mons 2-3pm @ https://gatech.zoom.us/j/91570480132
- TA Office Hour: Weds 5-6pm @ https://gatech.zoom.us/j/4595635754
- Piazza: https://piazza.com/gatech/spring2024/cx4240
Course Content
Q: What will be covered in this course? A: This course introduces techniques for computational data analysis, with an emphasis on machine learning algorithms and their applications to real-world data. On the technique side, we will cover key machine learning methods (linear regression, logistic regression, neural networks, tree-based models) and self-supervised learning for foundation models. On the application side, it will introduce various applications of these techniques, particularly on text data analysis and natural language processing. It will introduce how to formulate real-world tasks as data analysis problems, key methods for solving these problems, and their advantages and disadvantages.
Q: Who will benefit from this course? A: The learning objective is that by the end of this course, the students are able to formulate their data analysis problems at hand, choose appropriate computational models to acquire insights from data automatically, and even come up with innovative solutions for solving open problems in this field. The course will be helpful for students who want to solve practical problems using machine learning and data science techniques. The course will provide useful techniques for students who want to do edge-cutting research in data mining, machine learning, natural language processing, and others.
Q: What are the prerequisites? A: Prerequisites for this course include 1) solid knowledge of probability, statistics, calculus, and linear algebra; 2) basic knowledge of machine learning; 3) solid programming skills, preferably in Python.
Schedule
Date | Topic | Due | |
---|---|---|---|
Module 1: Background | |||
01/08/2024 | Course Overview | ||
01/10/2024 | Probability and MLE | Piazza Signup | |
01/15/2024 | No Class (Martin Luther King Day) | ||
01/17/2024 | Data Analysis Toolbox | ||
Module 2: Linear Models | |||
01/22/2024 | Linear Regression | ||
01/24/2024 | Linear Regression | HW1 Out | |
01/29/2024 | Example Projects | ||
01/31/2024 | Logistic Regression | ||
02/05/2024 | Naïve Bayes Classifier | ||
02/07/2024 | Feature Design and Learning for Text | ||
02/12/2024 | Feature Design and Learning for Text | ||
Module 3: Neural Networks | |||
02/14/2024 | Neural Networks | HW1 Due | |
02/19/2024 | Project checkpoint & discussion | HW2 Out | |
02/21/2024 | Neural Networks | ||
02/26/2024 | CNNs and RNNs | ||
02/28/2024 | Transformers | ||
Module 4: Tree Models | |||
03/04/2024 | Decision Trees | HW2 Due | |
03/06/2024 | Random Forest | HW3 Out | |
03/11/2024 | Midterm Exam | ||
03/13/2024 | Project checkpoint & discussion | ||
03/18/2024 | No Class (spring break) | ||
03/20/2024 | No Class (spring break) | ||
Module 5: Large Language Models | |||
03/25/2024 | Large Language Model (LLM) | ||
03/27/2024 | LLM Instruction Fine-Tuning | HW3 Due | |
04/01/2024 | LLM Alignment | ||
04/03/2024 | LLM Agents and Decision Making | Project presentation signup | |
04/08/2024 | Project checkpoint & discussion | ||
Module 6: Projects | |||
04/10/2024 | Project presentation | ||
04/15/2024 | Project Presentation | ||
04/17/2024 | Project Presentation | ||
04/22/2024 | Project Presentation | ||
04/24/2024 | No Class (Reading Day) | ||
04/28/2024 | Project Report Due | Project Report Due |
Grading
Homework (30%)
There will be three assignments, each account for 10% towards your final score. Each assignment includes written analysis and/or programming for testing your understanding of the taught content.
-
Late policy: Assignments are due at 11:59PM of the due date. You will be allowed 2 total late days (48 hours) without penalty for the entire semester (for homework only, not applicable to exams or projects). Once those days are used, you will be penalized according to the following policy:
- Homework is worth full credit before the due time.
- It is worth 75% credit for the next 24 hours.
- It is worth 50% credit for the second next 24 hours.
- It is worth zero credit after that.
- Follow the Georgia Tech Academic Honor Code.
Project (30%)
You need to complete a project on using computational data analysis techniques to tackle a real-life data analysis problem. Each project needs to be completed in a team of 2-4 people. Here are some guidelines and resources for doing your project smoothly.
Exam (40%)
One exam will be held on March 11 in lieu of the regular class:
- It will be a closed-book exam, so no notes or communication with peers is allowed.
- There will be no make-up exams, so be sure to attend on the scheduled date. Missing the exam will result in zero credit.
Resources
- The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
- Machine learning, by Tom Mitchell
- Pattern recognition and machine learning, by Christopher Bishop
- Data Mining: Concepts and Techniques, by Jiawei Han, Micheline Kamber, and Jian Pei
- Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- Dive into Deep Learning, by Aston Zhang, Zack C. Lipton, Mu Li, and Alex Smola
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurélien Géron
Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.