Econ 128: Machine Learning for Economists
Winter 2024
General Information
Description
This course develops the theory and computation of recent methods at the intersection of econometrics and machine learning as used in economics and business. It builds on intermediate knowledge of econometric methods and techniques and expands these using ideas and tools from machine learning.
Students are expected to be familiar with classical approaches such as regression analysis, maximum likelihood and statistical inference. Basic concepts in data science will be introduced with a focus on the estimation and testing of high-dimensional models. We will focus on the estimation of nonlinear models including models for classification, splines, and tree-based models such as random forests. Other topics may include Deep Learning and Unsupervised Learning. The course will cover both theoretical and practical issues and the course will contain extensive applications to real data and require the use of statistical software (Python). The goal of the course is to provide students with a toolkit of machine learning methods for drawing inferences from a variety of data encountered in economics and business. While we will follow standard statistical approaches to machine learning, we will deviate at times to emphasize applications and approaches specific to economics. We will place a greater emphasis on the analysis of causal effects over pure predictions and understand the difference between them.
Pre-requisites
Econ 123A
Course Materials
Required Textbooks
James, G., Witten, D., Hastie, T. and Tibshirani, R., Taylor J. 2023. An introduction to statistical
learning with Applications in Python, New York: Springer.
The book and other materials are freely available online.
Recommended Textbook
Taddy, Hendrix, Harding, 2022. Modern Business Analytics, McGraw-Hill.
Software
In addition to understanding the theoretical principles of machine learning as applied in economics and business, this class will rely heavily on understanding how to use statistical software to analyze a number of real world datasets. We will focus exclusively on using Python, a modern, open-source, statistical software package. The software is covered extensively in the textbook with step-by-step instructions tailored to the statistical models encountered in class. Successful completion of the final will require Python.
Course policies
By signing up for this class you agree to be available for all classes, assignments, and exams.
Valid reasons for missing a course-component are limited to events beyond your control that make it impossible to complete a task or attend classes. Written documentation (doctor’s note etc.) needs to be submitted. Make-up tests or assignments are not offered. All dates are fixed.
This class is structured as follows. Tuesdays we discuss a chapter and the corresponding theory and on Thursdays we discuss the corresponding lab. We will aim to follow the textbook closely. I will present on Tuesdays and the class will follow a typical lecture format. On Thursdays however students will be asked to present solutions to problems based on parts of the lab in front of the class. This will involve coding, explaining what it does, being able to make changes to the code or run additional analyses. You will be graded for this interactive presentation each time. Since presenters will be drawn at random each time, it is possible that some students will present multiple times. Furthermore, some students may present more times than other students (see below on grading). You are expected to work on your presentations prior to each Thursday and you are welcome to present additional material too. For example, in preparing a lab for a section in a chapter you may find an alternative or better way of accomplishing the same task. You can present it in class even if it does not appear in the textbook. You can also volunteer to present an idea or extension of your own.
Course credit
In this class you can earn a total of 100 points. These are distributed as follows:
1. Class attendance: 10 points.
2. Interactive class presentations: 30 points. Each time you present you will receive between 0 and 10 presentation points. At the end of the class, I will average the best 3 presentation points and re-scale to class points.
Example: you present 5 times and receive the following set of presentation points: 3, 10, 8, 7, 10. The best three outcomes are 10, 10, 8. Average 9.33. Class points: 60*9.33/10=55.98.
3. Take-home midterms: Jan 23 and Feb 27 (30 points)
4. Final Exam: Tues, March 19, 10.30-12.30pm (30 points)
The letter grade distribution is determined by me at the end of the class, and it will consider factors such as the overall level of the class. I do not have a target distribution in mind, but the grade
distribution will be similar to other advanced classes you may have taken in economics or statistics.
Academic Dishonesty Policy
Academic honesty is a requirement for passing this class. Any student who compromises the academic integrity of this course is subject to a failing grade. The work you submit must be your own. Academic dishonesty includes, but is not limited to copying answers from another student, allowing another student to copy your answers, communicating exam answers to other students during an exam, attempting to use notes or other aids during an exam, or tampering with an exam after it has been corrected and then returning it for more credit. If you do so, you will be in violation of the UCI Policies on Academic Honesty
Course Outline
Week 1: Introduction to Big Data
Week 2: Regression, Bootstrap, Bias-Variance Trade-Off (Ch. 1, 2, 3)
Week 3: Resampling Methods (Ch. 5)
Week 4: Regularization and Lasso (Ch. 6)
Week 5: Classification (Ch. 4)
Week 6: Nonlinear models and splines (Ch. 7)
Week 7: Tree based Methods (Ch. 8)
Week 8: Causal Inference and Double Machine Learning
Week 9: Dimensionality Reduction and Factor Models (Ch. 11)
Week 10: Deep Learning (Ch. 10)
Exam Week (March 19)