INFO 3440 - Project Summary
Winter 2024
Description
Who wants to win at Daily Fantasy Sports on Draftkings? Everybody? Anybody? Well, let’s be honest, this project isn’t going to make you win, but it is a fun optimization problem! And that’s really the point in this class.
Imagine that we were interested in joining one of the Draftkings’ National Basketball Association (NBA) contests on Feb 22, 2024. There were 12 real NBA games played amongst 24 teams. The fantasy contest involves selecting a collection of players (a lineup) in these 12 games and you “play” against other people that (potentially) selected different lineups. The winner of the contest is the person whose team accumulated the most fantasy points. The fantasy points are accumulated according to a set of rules (see Figure 1) that are related to how the players perform in their real-life games. Selecting a lineup is subject to a fictitious salary cap and several additional constraints that are outlined below. The goal of this project is to select an “optimal” lineup. What is “optimal?” That is for you to decide!
Data
Here are the data that you need to solve this problem. I will provide a data dictionary on Canvas related to each data set describing the column names.
· DKSalaries.csv – Data from Draftkings website including player names, teams, salary, position, and average points per game.
· Actual game data from Feb 22, 2024
o XXX.csv – Data from each of the 12 games on Feb 22 related to the players.
· Season Statistics
o PlayerStats.csv – Data from the NBA season (October 24, 2023 through Feb 21, 2024)
Objectives
1. Use python to apply optimization models and solution techniques learned in class to solve a real-world problem.
2. You will have to do some exploratory data analysis using the historical data to carefully consider your objective function coefficients.
3. You will solve at least two optimization problems for this project:
a. One problem will involve selecting a lineup based on historical information. Tell me what your strategy was in choosing players, and how it would outperform the naïve strategy of picking just based on average daily points. Perhaps older players will play better than they averaged because these players had a lot of rest? Maybe players do better at home than on the road? Maybe players do better if they are playing against a bad team, or worse if they are playing against a good team? Maybe you want to pick people who are on the same team, because if one player is scoring, his teammate might be getting the assist? If you don’t know anything about basketball, use Google (or your favorite AI)!
b. The second problem will involve selecting the optimal lineup based on how the players performed during their actual games (i.e., the “cheat code” if you had it prior to selecting your lineup.
4. Write a report summarizing your results.
Constraints
1. The fantasy team must have eight players.
2. The salary of your eight players may not exceed $50,000.
3. You need players from at least two different games.
4. Your 8-player fantasy team roster should include one player in each of these positions:
a. Point Guard (PG)
b. Shooting Guard (SG)
c. Small Forward (SF)
d. Power Forward (PF)
e. Center (C)
f. Guard (either PG or SG)
g. Forward (either SF or PF)
h. Utility (any of the 5 positions)
Deliverables
1. You will be submitting HTML files that were generated from your Jupyter/Colab notebook with embedded final report as an html file. This means this should be an HTML file that explains to someone every step you followed to get to the final answer, without needing to understand code. If you aren’t sure how detailed this should be, look at https://colab.research.google.com/drive/15uxrAeCCL327kWH9N0X-ogKwf2zErjP5 for a sample story about bond risk and Silicon Valley Bank last year.
2. As an appendix, ask your favorite GAI what it would do, and submit that conversation as part of your HTML file. Show any code blocks from there that you might have used as part of the appendix
3. As another appendix, you will also submit any documentation of the use of GAI, and how it may have helped your strategy or your answers.
Notes
1. You absolutely do not need to create an account on Draftkings, or any other fantasy sports-related website, to solve this problem.
2. If you do not want to work on this project, you are welcome to select your own. See below for a description of a project proposal. I will have to approve your project prior to you beginning work. I don’t want you to waste your time if the project is not suitable for this class.
Figure 1. Draftkings scoring for an NBA game
Data Dictionary
File: DKSalaries.csv
Columns/Variables:
1. Position – The player’s position in the game.
2. Name + ID – Player name plus their ID in Draftkings
3. Name – Player name
4. ID – Player ID in Draftkings
5. Roster Position – All the legitimate positions that this player can fill in your Draftkings fantasy roster. Note that all players can also be considered for the utility (Util) position.
6. Salary – Player salary in the game. Remember that each fantasy team roster in Draftkings may not exceed $50,000 in total salary.
7. Game Info – General information related to the game including teams that are playing, time of game, etc.
8. TeamAbbrev – The player’s team name (abbreviated).
9. AvgPointsPerGame – The average Draftkings points for the player in each game up to this point in the season.
File: PlayerStats.csv (and the final stats data)
Columns/Variables:
1. Rank – the player’s rank for the year in terms of number of points scored for the season
2. player – player name
3. pos – player position
4. age – player’s age
5. Tm – player’s team
6. g – games played in the data set (note that this should only be one in this data set)
7. gs – games started
8. mp – minutes played
9. FG – number of shots (field goals) made
10. FGA – number of shots (field goals) attempted
11. FG% - field goal percentage. Calculated as (FG/FGA)
12. 3P – number of Three-point shots made
13. 3PA – number of Three-point shots attempted
14. 3P% - Three-point shot percentage. Calculated as (3P/3PA)
15. 2P – number of Two-point shots made
16. 2PA – number of Two-point shots attempted
17. 2P% - Two-point shot percentage. Calculated as (2P/2PA)
18. Efg% - effective field goal percentage = (FG + (0.5*3P))/FGA. This statistic weights Three-point shots as 50% more than Two-point shots. A Three-point shot obviously scores 50% more points than a Two-point shot, but it is also more difficult to make as they are much further from the basket.
19. FT – free throws made
20. FTA – free throws attempted
21. FT% - free thrown percentage, calculated as (FT/FTA)
22. ORB – offensive rebounds
23. DRB – defensive rebounds
24. TRB – total rebounds, calculated as (ORB + DRB)
25. AST – total assists
26. STL – total steals
27. BLK – total shots blocked
28. TOV – total turnovers (times this player gave the ball to the other team, by losing it, or throwing it out of bounds, or having it stolen from them)
29. PF – personal fouls
30. PTS – total points scored. Calculated as (FT + 2*2P + 3*3P)
Project Proposal
(From Professor Keeling’s class)
Summary: Your proposal will be, at most, a one-page document that describes the problem that you intend to work on. The purpose of the proposal is to make sure that you’ve identified (i) a reasonable target application, (ii) an optimization model that is appropriate for the application, and (iii) adequate data sources to ensure your success.
(i) Target Application. In your proposal, identify a single application, your reason or motivation for choosing it, and why it is amenable to optimization modeling. You might start by brainstorming about real-world scenarios you’ve encountered that seem amenable to the types of optimization modeling we have done in this class, especially LP and IP. If you’re having trouble getting started, go back through the examples we’ve worked in class and try to think of how you might extend these to situations you’ve experience first hand.
(ii) Optimization Model. Once you’ve targeted a specific application, write down the corresponding decision variables, objective function and types of constraints. Be as detailed as you can -- if you can write down the exact mathematical formulation, great! If not, describe in English as specifically as you can. What kind of model do you intend to use (e.g., product mix, covering constraints, facility location, logistics)? Will you be using binary constraints? Integer constraints? Consider writing a “toy” model in Excel with a small number of decision variables and constraints, and imaginary data, to help flesh out what your full scale model might look like.
(iii) Data. Your model will need data in the form of objective coefficients, constraint coefficients, and constraint RHS values. Where will these data come from? Will you, compile data from the web, use existing data you have or can collect, or create synthetic (fake) data? What will you have to do to get the data into a format that you can read into a pyomo model? Does this seem realistic for you to accomplish?
Proposal Evaluation. The purpose of this project is to get you as close to building a “real-world” optimization model as possible. As such, the closer you get in terms of the target application, modeling techniques and sophistication, and realistic data sources, the more favorable your evaluation. Don’t be too ambitious because you might make the project too difficult to execute. But try to be creative in coming up with an application area that interests you and seems feasible as a project that can be completed over the course of 3 weeks.
NOTE: It is understood that as you actually work on the final parts of your project that your topic may morph into something different – that is OK.