首页 » 数据科学与大数据技术 » Workshop 2 -- Optimization at Citibike

Workshop 2 -- Optimization at Citibike

2025-02-20 Admin 写评论

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Introduction

Citibike is the leading bikesharing platform in New York, including the districts of Manhattan, Brooklyn, Queens & Jersey City. That said, the company is far from being profitable due to growing competition from other modes of transportation (e.g., scooters, electric bikes, etc.). According to a consulting firm hired a few months ago, the operations of Citibike are inefficient due to the lack of data-driven decision processes. If Citibike were only 4% more efficient, it would become profitable. During peak hours (e.g., morning commute time), the bikesharing platform is particularly unreliable. Approximately 20% of the dock stations face a "stock-out" situation, meaning that there are no bikes available. In such cases, commuters choose other modes of transportation.

Your assignment

The consulting firm recommends using optimization models to improve the operations, especially during peak time. As part of this new strategy, you are recently hired as a lead data scientist of Citibike. You suspect that the inventory of bikes and the dock stations are not well positioned given the incoming demand at peak times. In this workshop, you will develop a data-driven optimization model to improve the positioning of inventory at peak time. In particular, you will formulate and implement integer programming models.

Workshop instructions: Please carefully read all the instructions below:

The code cells are partially filled. You need to complete and execute the code in each cell. You can consult any material posted on Canvas to help you get started with the syntax. You can ask questions to the TAs and discuss with your teammates.
Make sure to answer all the questions along the way in a detailed manner.
This is an individual assignment; you should differentiate your coding style, answers, commenting, etc. from those of your teammates.
Only Q1-Q3.6 are meant to be covered in class. You are free to start working on the other questions, but you should not exchange with your teammates on all remaining questions.

Submission: Your submission should include the following:

Notebook: An html file along with a .ipynb file (85 pt)
Executive summary: A 1 pager report summarizing all your findings (at least 11 pt font, 1 inch margin, pdf/doc format). This report should take the form of an executive summary that combines elements of your analysis and business recommendations. Supporting evidence can be provided in an appendix, or by referencing the questions of the workshop. The report will be evaluated along 3 dimensions: clarity, scientific validity, practical relevance. (15 pt)

Q1. Visualize the data (15 pt)

Load the data

Your colleague has generated data sets that contains all the needed information about the demand and available inventory of bikes during one hour at peak time. Your initial analysis will be based on the following data sets:

demand.csv: describes the number of bikes rented at each dock station (during one peak hour),
starting_inventory.csv: describes the initial inventory of bikes at each dock station (before peak hour),
distances.csv: describes the distance between any two dock stations (in km unit).

Ideally, the inventory of bikes available the beginning of the peak hour should approximately match the expected amount of demand. For example, if you expect 5 user requests in station A, and 5 user requests in station B in the next hour, the number of bikes in station A and B should be roughly equal, otherwise there is an imbalance of inventory. Using spatial visualization, you will check if the initial inventory is balanced or imbalanced.

Execute the cells below. Read carefully all the comments

Next, you need to load the data sets as dataframes using the Pandas function pd.read_csv().

Q1.0. What is the distance between the dock stations "120" and "146"? What is the maximum inventory of bikes initially available over all dock stations?

Insert answer here:

Plot the heatmaps of demand and inventory

In the next cells, we will generate a heatmap of New York, representing the level of demand in different regions.

The code is provided. Feel free to experiment with different parameters to better understand their respective roles.

The next cell will plot a heatmap of the demand.

Ideally, you would like to compare the expected demand to the initial inventory at the dock stations. To this end, you need to visualize the inventory of bikes initially available.

By re-using the code above, construct a heatmap for the inventory of bikes in the city (starting_inventory). You need to edit the lines of code relative to the construction of the demand list.

Q1.1. Are there apparent mismatches between demand and inventory as you compare the two heatmaps?

Insert answer here:

Your colleagues argue that that it is difficult to eyeball the differences between the two heatmaps. It would be preferable to have a single heatmap showing the "imbalance" between demand and inventory. Specifically, you would like to identify areas where the initial bike inventory is either insufficient or excessive, compared to the amount of demand.

Q1.2. How can you quantify the degree of imbalance between demand and inventory? Create heatmaps for the newly defined imbalance metric(s) showing the areas with lack or excess of bikes.

Hint: Define two new metrics that quantify the imbalance between demand inventory and reuse the same code as above:

The first metric could capture the deficit of inventory at the stations where demand_agg > inventory_agg,
The second metric could capture the excess of inventory at the stations where inventory_agg > demand_agg.

Insert answer here:

Q1.3. Add one more visualization of your own. Describe what it shows and what your learn from it in 1-2 lines

Insert answer here:

Q2. Optimal Rebalancing of Bike Inventory (20 pt)

Matching supply with demand: Given the imbalances between demand and inventory, Citibike runs frequent "rebalancing" operations, where bikes are relocated from a dock station having an excess of inventory to a station having insufficient inventory. These rebalancing operations are run in an ad-hoc fashion. You would like to construct an optimization model to rigorously rebalance the inventory before the beginning of the peak hour.

The economics of rebalancing: Rebalancing operators have to be paid $1 per km per relocated bike. There are no geographical restrictions on the rebalancing of bikes (from any station to any other station). After the rebalancing is performed, the amount of demand satisfied at each station will the minimum between the new initial inventory level and the total demand (we assume that there is no further replenishment during the peak hour). Each satisfied user request brings $4 of revenue, on average. Hence, when bikes are unavailable and user requests are unfulfilled, Citibike loses the revenue opportunity of $4 per user.

In what follows, you will formulate and implement the rebalancing problem as an integer program. Starting with the inventory levels of starting_inventory, you want to find the bikes relocations that maximize the total net revenue. For simplicity, you can assume that the rebalancing operations happen instantaneously before the beginning of the peak hour. There is no increase in the supply of bikes during the peak hour.

Model creation

Q2.1. Create a GUROBI new model "m", named "rebalancing"

Decision variables

Clearly your main decisions have to do with the "relocation" of bikes.

Q2.2. Explain in detail what the code in the cell above is doing.

Insert your answer here:

Your colleague thinks that additional decision will be needed for the problem. She suggests the following line of code:

Q2.3. Why are the variables satisfied_demand needed? How do we refer to such variables (which do not describe actual decisions)?

Insert your answer here:

Constraints

Q2.4. Is there an upper bound on the number of bikes that can be relocated from one station to another? Incorporate the upper bound into the model.

Hint: Use starting_inventory.loc[i,"count"] to access the initial number of bikes at station i.

Insert your answer here:

The satisfied demand in each station is the minimum between the demand and the inventory after rebalancing. For example, if station A has 10 bikes before the relocation and we relocate 2 extra bikes, we can satisfy up to 12 users. If the demand at station A is 8, we satisfy all 8 users. If the demand at station A is 15, we can only satisfy 12 users, and we lose 3 user requests. In general, we have the equation:

Below, you will add two ensembles of constraints to our model m to capture the notion of satisfied demand.

Q2.5. Add constraints imposing that satisfied_demand is smaller or equal to the demand, in each station.

Q2.6. Explain what the additional constraints below on the satisfied_demand are doing. Why they are needed?

Insert your answer here:

Objective

Q2.7. How would you formulate the net revenue as a linear expression? Specify the objective of the model.

Hint: Recall that the objective function is specified using: m.setObjective(EXPRESSION,GRB.MAXIMIZE)

Insert your answer here:

Solve

Congratulations! You have formulated and implemented the integer program. You can now optimize the rebalancing and printout the optimal revenue/

Q2.8. Read the output of the optimization: What is the optimal net revenue?

Insert your answer here:

Q2.9. How much revenue does Citibike gain using the optimal rebalancing compared to no rebalancing? Is this a significant increase? Carefully explain how you obtain the answer (feel free to add code below).

Hint: The goal here is to quantity how the revenue changes after the optimal rebalancing vs. no rebalancing. To compute the revenue in the absence of rebalancing, you could reuse our optimization model with a small modification of the variables relocation.

Insert your answer here:

The relocations are visualized on the heatmap in the next cell.

Code is provided, Execute the next cells. Feel free to vary the parameters to understand their role.

Q2.10. How do you interpret the plot? Is this what you would expect?

Insert your answer:

Q3. Rebalancing with replenishment (20 pt)

You manager has a concern about your analysis in Q2:

I like the idea of rebalancing bikes based on an optimization model. However, you have omitted an important aspect of the problem. When a user completes her current trip, she returns the bike to the system. A new bike is made available at the destination station. You don't account for this in your current inventory, but this organic replenishment of bikes might be very helpful since it increases the number of bikes available at the destination station!

The goal of Q3 is to update the model accordingly. You will reuse most of the existing code, with minor modifications. Your analysis will be based on the following additional data set:

replenishment.csv: describes the number of extra bikes that will be made available at each station after current users complete their trips.

For simplicity, we will assume that the replenishment happens instantaneously before the beginning of the peak hour.

Q3.1. Based on your manager's comment, did you under-estimate or over-estimate the inventory at each station in the previous question Q2?

Insert your answer here:

Loading the data

In [ ]:

replenishment=pd.read_csv("replenishment.csv",index_col=0)

You can visualize the replenishment data using .describe().

In [ ]:

replenishment.describe()

Model creation

Q3.2 Create the model object

Hint: Reuse the same code as Q2.1.

In [ ]:

#Insert your code here:

Decision variables

Q3.2. Add the decision variables to m

Hint: Reuse the same code as Q2.2 and Q2.3

In [ ]:

#Insert your code here:

Constraints

Q3.3. Add an upper bound on the number of bikes that can be relocated from one station to another.

Hint: Reuse the same code as Q2.4

In [ ]:

#Insert your code here:

Q3.4. Add constraints requiring that satisfied_demand is smaller or equal to the demand.

Hint: Reuse the same code as Q2.5

In [ ]:

#Insert your code here:

Q3.5. The satisfied_demand should be also be related to inventory. How would you modify the constraint of Q2.6 to account for replenishment? Add these constraints to the model.

Hint: You should use the same constraint as Q2.6, but also incorporate the replenishment quantity. For simplicity, you can assume that all the replenishment happens before the demand arrives. This assumption is reasonable if the system is "stationary".

Insert your answer here:

In [ ]:

#Insert your code here:

Objective

Q3.6. What is the objective function? Specify the objective of the model m.

Hint: Reuse the code of Q2.7

In [ ]:

#Insert your code here:

Insert your answer here:

Solve

Congratulations! You have formulated and implemented the integer program. You can now optimize the rebalancing and printout the optimal revenue.

In [ ]:

# Run the optimization# Note: it is not convenient to printout the relocation solution. We will develop a suitable visualization tool.defprintSolution():ifm.status==GRB.OPTIMAL:print('\nNet revenue: %g'%m.objVal)else:print('No solution:',m.status)m.optimize()printSolution()

Q3.7. Read the output of the optimization: What is the optimal net revenue? Is it greater or smaller than that obtained without replenishment in question Q.2.8? Why?

Insert your answer here:

Q3.8. How much revenue does Citibike gain using the optimal rebalancing compared to no rebalancing? How does this compare to Q2.9? Why?

Hint: The goal here is to quantity the revenue change after vs. before the optimal rebalancing. To compute the revenue in the absence of rebalancing, you could reuse our optimization model with a very small modification of the variables relocation.

Insert your answer here:

The relocations are visualized on the heatmap in the next cell. Execute the next cell.

In [ ]:

base_map=generateBaseMap()# generates our base map of NY# Next we compute the imbalance ratio between demand and inventoryimbalance=np.divide(demand_agg,1+inventory_agg)imbalance_list=imbalance.fillna(0).reset_index().values.tolist()# The next loop plots the relocation lines between any two stations.# The more opaque is the line, the more relocations are madeforiinstations:forjinstations:ifrelocation[i,j].x>0:p1=[demand.loc[i,"latitude"],demand.loc[i,"longitude"]]p2=[demand.loc[j,"latitude"],demand.loc[j,"longitude"]]opacity=relocation[i,j].x/starting_inventory.loc[i,"count"]+relocation[i,j].x/20folium.PolyLine(locations=[p1,p2],color='black',opacity=opacity).add_to(base_map)arrows=get_arrows(locations=[p1,p2],color='black',n_arrows=1,opacity=opacity)forarrowinarrows:arrow.add_to(base_map)#Plots the heatmapHeatMap(data=imbalance_list,radius=0,max_zoom=15,minOpacity=0).add_to(base_map)#Displays the mapbase_map

Q3.9. How do you interpret the plot? Is this what you would expect compared to Q2.10?

Insert your answer:

Q4. Vehicle Capacity (10 pt)

In reality, Citibike can utilize two types of vehicles:

Large capacity vehicles (LCV): These are the default vehicles. They can carry any arbitrary number of bikes. Due to the large size of the vehicle, rebalancing operators have to be paid $1 per km per relocated bike (similar to Q2 and Q3).
Small capacity vehicles (SCV): These specialized vehicles can handle up to 10 bikes per rebalancing operation. Rebalancing operators have to be paid $0.5 per km per relocated bike.

For each relocation operation (corresponding to a pair of stations), the firm needs to choose exactly one type of vehicle. The goal of Q4 is to update the model constructed in Q3 to incorporate the following logical condition: EITHER we use a small capacity vehicle and the number of relocated bikes is less than 10 OR we use a large capcity vehicle.

Decision variables: Small vs. large vehicles

We add relocation variables corresponding to the use of small vs. large vehicles.

In [ ]:

# The decision variables describe the number of bikes relocated from station A to station Brelocation_SCV=m.addVars(stations,stations,vtype=GRB.INTEGER,lb=0,name="relocation_small")relocation_LCV=m.addVars(stations,stations,vtype=GRB.INTEGER,lb=0,name="relocation_large")

Constraints

In the next constraint, we impose that the total number of relocations is the sum of relocations using SCVs and LCVs.

In [ ]:

m.addConstrs(relocation[i,j]==relocation_SCV[i,j]+relocation_LCV[i,j]foriinstationsforjinstations);

Q4.1. Add constraints requiring that EITHER we use an SCV with at most 10 bikes OR we don't use an SCV

Hint: Use the big-M method covered in today's class. Define a binary auxiliary variable for each pair of stations to switch between the two conditions.

In [ ]:

# Uncomment and fill the code below:# M =# auxiliary = m.addVars(stations,stations,vtype=GRB.BINARY, name="OR_auxiliary")# m.addConstrs(relocation[i,j] <= for i in stations for j in stations)# m.addConstrs(relocation_SCV[i,j] <= for i in stations for j in stations)

Objective

The objective function is modified accordingly. Execute the cells below.

In [ ]:

# Set the objective: maximize the net revenuem.setObjective(4*satisfied_demand.sum()-1*quicksum(relocation_LCV[i,j]*distances.loc[i,j]foriinstationsforjinstations)-0.5*quicksum(relocation_SCV[i,j]*distances.loc[i,j]foriinstationsforjinstations),GRB.MAXIMIZE)

Solve

Congratulations! You have formulated and implemented the integer program. You can now optimize the rebalancing and printout the optimal revenue.

In [ ]:

The relocations are visualized on the heatmap in the next cell. Execute the next cell.

In [ ]:

base_map=generateBaseMap()# generates our base map of NY# Next we compute the imbalance between demand and inventoryimbalance=np.divide(demand_agg,1+inventory_agg)imbalance_list=imbalance.fillna(0).reset_index().values.tolist()# The next loop plots the relocation lines between any two stations.# The more opaque is the line, the more relocations are madeforiinstations:forjinstations:ifrelocation[i,j].x>0:p1=[demand.loc[i,"latitude"],demand.loc[i,"longitude"]]p2=[demand.loc[j,"latitude"],demand.loc[j,"longitude"]]opacity=relocation[i,j].x/starting_inventory.loc[i,"count"]+relocation[i,j].x/20folium.PolyLine(locations=[p1,p2],color='black',opacity=opacity).add_to(base_map)arrows=get_arrows(locations=[p1,p2],color='black',n_arrows=1,opacity=opacity)forarrowinarrows:arrow.add_to(base_map)#Plots the heatmapHeatMap(data=imbalance_list,radius=0,max_zoom=15,minOpacity=0).add_to(base_map)#Displays the mapbase_map

Q4.2. How do you interpret the output compared to Q3.7-Q3.9? Does the net revenue increase or decrease?

Insert your answer:

Q5. Procurement of vehicles (5pt)

In reality, Citibike does not operate the SCVs and LCVs, which are booked through an external provider. There are various costs and constraints associated with the procurement of small and large vehicles. Specifically, we have

Fixed cost for small vehicles: You should pay a fixed cost of 80$ to be able to use small vehicles. Note that this cost does not scale with the number of small vehicles.
Per vehicle cost: You should pay a variable cost of 10$ per vehicle.
Number of rebalancing operations per vehicle: Suppose that each vehicle can perform at most 3 distinct rebalancing operations witin the imparted time.

How would you modify the model to incorporate procurement costs? Build a new model that captures all the above requirements.

In [ ]:

#Insert your code here:

Is the rebalancing operation still marginally profitable?

Insert your answer:

Q6. Qualitative Discussion (15 pt)

Please provide detailed answers to the following questions.

Q6.0

What are the main limitations of our current modeling approach? What other aspects of the rebalancing would you account for?

Insert your answer:

Q6.1.

In reality, you don't know in advance the exact number of user request in the next hour. How could you remedy this issue?

Insert your answer:

Q6.2.

One of your colleagues has further concerns about how the demand is estimated: Our historical data only records the customers who are satisfied and complete their trip. Many of our potential customers walk out without even taking a bike!

Explain the above claim. Is this an important issue? Are you currently over-estimating or under-estimating the value of rebalancing? How would you deal with this approach?

Insert your answer:

Q6.3.

In practice, rebalancing is not an easy operation (see the picture below). It requires to hire specialized labor and rent trucks.

What other aspects can be incorporated into the optimization model to make it more realistic?

Insert your answer:

Q6.4

For which other operational decisions could Citibike utilize optimization models to improve its efficiency?

Insert your answer:

Executive Summary Report (15 pts)

Write a 1 pager report summarizing your findings (11 pt font, 1 inch margin, pdf/doc format). This report should take the form of an executive summary that combines elements from your analysis and business recommendations. You can also suggest additional analysis that should be conducted to augment the current rebalancing model. Supporting evidence can be provided in the appendix or by referencing the questions of the workshop. The report will be evaluated along 3 dimensions: clarity, scientific validity, and practical relevance.

Q6. Opening new dock stations (Optional Bonus, 5 pt)

Optimization can be also utilized to decide on where it would be useful to construct new dock locations. Your analysis will be based on the following data sets:

demand_extended.csv: describes the number of user requests in each potential location.
starting_inventory_extended.csv: describes the initial inventory of bikes at each location (only dock stations have bikes),
distances_extended.csv: describes the distance between any two locations (in km unit).
docks_extended.csv: describes the set of locations that currently hold a dock station, and those where a new dock station can be opened.

Suppose that you can construct one new dock station to better accommodate the demand. You may assume that users can be routed to any dock station in a radius of 1 km, from their original location. Formulate and implement an integer programming model to decide on the precise locations.

In [ ]:

starting_inventory_extended=pd.read_csv('starting_inventory_extended.csv',index_col=0)demand_extended=pd.read_csv('demand_extended.csv',index_col=0)docks_extended=pd.read_csv('docks_extended.csv',index_col=0).set_index("location_id")distances_extended=pd.read_csv('distances_extended.csv',index_col=0)distances_extended.columns=list(map(lambdax:int(x),distances_extended.columns.tolist()))# the column names are converted into integers

In [ ]:

# Insert your code in the cells below:

发表评论

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

验证码 *