Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
CSE 357: Statistical Methods for Data Science Fall 2024
Course Description
This interdisciplinary course introduces the mathematical concepts required to interpret results and subsequently draw conclusions from data in an applied manner. The course presents different techniques for applied statistical inference and data analysis, including their implementation in Python, such as parameter and distribution estimators, hypothesis testing, Bayesian inference, and likelihood.More informally, this 3-credit, undergraduate-level course covers probability and statistics topics required for data scientists to analyze and interpret data. The course will involve theoretical topics and some programming assignments. The course is targeted primarily for junior and senior undergraduate students who are comfortable with concepts relating to probability and are comfortable with basic programming. Undergraduates from Computer Science, Applied Mathematics and Statistics, and Electrical and Computer Engineering would be well suited for taking this class. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, and Regression. For more details, refer to the syllabus below.
The class is in-person, and is expected to be interactive and students are encouraged to participate in class discussions.
Grading will be on a curve, and will be based primarily on assignments and exams. For more details, refer to the section on grading below.
Prerequisites: C or higher in CSE 316 or CSE 351; AMS 310; CSE or DAS major. See Bulletin for definitive information. Comfort in probability theory and proficiency with Python (since programming assignments tasks will be in Python) will be helpful.
Learning Objectives:An understanding of core concepts of probability theory and standard statistical techniques. An understanding of random variables, distributions, and hypothesis testing. An ability to apply quantitative research methods (correlation and regression), and modern techniques of optimization and machine learning such as clustering and prediction.
Syllabus & Schedule
Date | Topic | Readings | Notes |
---|---|---|---|
Aug 27 (Tu) [Lec 01] |
Course introduction, class logistics | ||
Aug 29 (Th) [Lec 02] |
Probability review - 1
|
AoS 1.1 - 1.5 MHB 3.1 - 3.4 |
assignment 1 out, due Sep 9th |
Sep 03 (Tu) [Lec 03] |
Probability review - 2
|
AoS 1.6, 1.7 MHB 3.3 - 3.6 |
|
Sep 05 (Th) [Lec 04] |
Random variables - 1
|
AoS 2.1 - 2.3, 3.1 - 3.4 MHB 3.7 - 3.9 |
Python scripts: draw_Bernoulli, draw_Binomial, draw_Geometric |
Sep 10 (Tu) [Lec 05] |
Random variables - 2
|
AoS 2.4, 3.1 - 3.4 MHB 3.7 - 3.9, 3.14.1 |
Python scripts: draw_Uniform, draw_Exponential, draw_Normal assignment 2 out, due Sept 18 |
Sep 12 (Th) [Lec 06] |
Random variables - 3
|
AoS 2.5 - 2.7 MHB 3.10, 3.13 |
|
Sep 17 (Tu) [Lec 07] |
Probability inequalities
|
AoS 4.1 - 4.2, 5.3 - 5.4 MHB 3.14.2, 5.2 |
|
Sep 19 (Th) [Lec 08] |
Non-parametric inference - 1
|
AoS 6.1, 6.2, 6.3.1 |
assignment 3 out, due Oct 4 |
Sep 24 (Tu) [Lec 09] |
Non-parametric inference - 2
|
AoS 6.3.1, 7.1 - 7.2 |
Python scripts: binomial, eCDF |
Sep 26 (Th) [Lec 10] |
Confidence intervals
|
AoS 6.3.2, 7.1 | |
Oct 01 (Tu) [Lec 11] |
Parametric inference - 1
|
AoS 6.3.1 - 6.3.2, 9.1 - 9.2 | |
Oct 03 (Th) [Lec 12] |
Python review | (optional) | |
Oct 08 (Tu) [Lec 13] |
Mid-term 1 review | ||
Oct 10 (Th) | Mid-term 1 | ||
Oct 15 (Tu) | Fall Break | No class |
Resources
- Recommended text: (AoS) "All of Statistics : A Concise Course in Statistical Inference" by Larry Wasserman (Springer publication).
- Students are strongly suggested to purchase a copy of this book.
- Recommended text: (MHB) "Performance Modeling and Design of Computer Systems: Queueing Theory in Action" by Mor Harchol-Balter (Cambridge University Press)
- Suggested for probability review and stochastic processes.
- There is copy placed on reserve in the library. The instructor also has a few personal copies that you can borrow.
- Recommended text: (DSD) "The Data Science Design Manual" by (our very own) Steven Skiena (Springer publication).
- Suggested for data science topics in the second half of the course.
- Others:
- S.M. Ross, Introduction to Probability Models, Academic Press
- S.M. Ross, Stochastic Processes, Wiley
Grading (tentative)
- Assignments: 40%
- 6 assignments during the semester. Expect 5-7 questions per assignment, including some programming questions (especially after mid-term 1).
- Collaboration is allowed (max group size 4). You are free to form your own groups, and group membership can change between assignments.
- Submit one softcopy solution per group, typed or handwritten, but should be legible.
- Assignments are due in class, at the beginning of the lecture. No late submissions allowed.
- Exams: 60%
- Two in-person exams.
- Mid-term 1: 25%.
- Mid-term 2: 35%.
- Easier than the assignments and no long derivations or programming questions.
- Attendance: 0%
- Attendance is not required but strongly encouraged.
- Lectures will not be recorded.
- Exam questions are often based on class discussions, so attendance is helpful!
- Important:
- Academic dishonesty will immediately result in an F and the student will be referred to the Academic Judiciary. See below section on Academic Integrity.
- Grading will be on a curve.
- Assignment of grades by the instructor will be final; no regrading requests will be entertained.
-
There is a University policy on grading, as well as a set of grading guidelines agreed upon by the CS faculty. The instructor is obligated to uphold these policies.
No exceptions will be made for any student and no special circumstances will be entertained.