Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
DS2000
Fall 2024
Homework 6
Assigned: November 1st, 2024
Deadline: November 8th, 2024 at 9pm note this homework is not eligible to be re-done for the second-chance homework
Submit each program as a .py file in gradescope (filenames are specified below). You may submit multiple times right up until the deadline. You may submit up to 48 hours late for no penalty. This policy exists for those times you're having a tough week, are feeling sick, or are falling behind in your work; we won't make any exceptions to this policy. You will have an opportunity at the end of the semester to submit one of homeworks 1-5 for a new grade. Your solution will be graded according to the DS2000 general rubric and style guide. Submit Plots Problem 2 and 3 asks you to create a few visualizations; submit them in gradescope along with your code. Style Guide Focus (pay attention to these items in particular for this Homework, but the entire style guide will be used during grading, so please make sure you review it!) ● Functions ● Dictionaries ● Advanced loops ● Everything! (this is the last DS 2000 homework) The 30-minute Guideline If you get stuck on a homework problem, come by office hours, post on Piazza, or take a break! We recommend you spend about 30 minutes trying to figure out a problem -- enough time that you can try a few things to get unstuck, but not SO much time that you’re banging your head against the wall. Try for 30 minutes, then take a break, take a walk, and/or ask us. :) Review the Autograder Output When you submit your solution, gradescope will run your code and print out the results so you can see what we’ll see when grading. Look at this output! It serves as a sanity-check to make sure your code produces what you wanted; if it doesn’t, you can make revisions and resubmit up until the deadline. It’s fine to work with friends and share ideas with each other; it is not fine to share code. Do not show your code to classmates, ChatGPT, or post code on piazza. |
Files
Starter code:
● income_restricted_housing.py
Plot images to submit:
● neighborhood_units.png
● neighborhood_counts.png
Source: https://data.boston.gov/dataset/income-restricted-housing
This data includes public housing owned by the Boston Housing Authority (BHA), privately- owned housing built with funding from the Department of Neighborhood Development and/or on land that was formerly City-owned, and privately-owned housing built without any City subsidy, e.g., created using Low-Income Housing Tax Credits (LIHTC) or as part of the Inclusionary Development Policy (IDP).
● housing_boston_small.csv (develop your program using this one)
● housing_boston.csv (run your program on this one to generate your final plots)
In this data, there are five columns:
● neighborhood name
● total number of market-rate units
● number of market-rate units that are owned
● total number of income-restricted units
● number of income-restricted units that are owned
Each row represents one building in the given neighborhood.
For this homework, you'll be writing all your code in the same .py file. Each problem asks you to write some new functions, but you are 100% welcome (and might need) to call functions from previous problems to solve the next one.
Problem 1 - Percent Owned
First, investigate the percentage of market-rate units owned versus the percentage of income-restricted units owned in the buildings in your dataset.
Unfortunately, though, your data is not 100% clean! You'll need to start by making it into exactly the format that we want to be working with.
● Function #1: read_file_as_list_of_dicts (add to what we've given you in the starter code!)
● Parameters: a string filename, a list of strings
● Returns: list of dicts
● Does: reads the given file as a list of dictionaries; while reading, cleans the data—all columns indicated in the given list of strings should have their values changed from strings into integers; your function should change any empty strings to zeroes; it should be able to handle strings that are formatted as floats (but we want them to be ints)—for example, "8.0" should be changed to 8
Ex. Given the following arguments as described above – a string filename and a list of strings
"housing_boston_small.csv", ["TtlMarket", "MarketOwn"]
The function should return a list of dicts: [{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': '16', 'Income-Restricted Ownership': '0.0'}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': '150', 'Income-Restricted Ownership': '0.0'}, ...]
Notice that for this example the "TtlMarket" and "MarketOwn" values have been changed to ints, but the "Total Income-Restricted" and "Income-Restricted Ownership" have not been.
● Function #2: sum_column
● Parameters: a list of dicts, a string column name
● Returns: an int
● Does: sums all values in the given column
Ex. Given the following arguments as described above – a list of strings
[{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 16, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': 150, 'Income-Restricted Ownership': 0}], 'Total Income-Restricted'
The function should return an int: 166
Answer the following questions and print the results (leave these calculations and prints in main):
● What is the total number of market-rate units?
● What percentage of market-rate units are owned?
● What is the total number of income-restricted units?
● What percentage of income-restricted units are owned?
Example output (on boston_housing_small.csv):
HW 6: problem 1 --------------- The total number of market-rate units is: 608 The percent of market-rate units owned is 0.16%
The total number of income-restricted units is: 1318 The percent of income-restricted units owned is 0.01% |
Problem 2 - Income-Restricted Neighborhood Buildings
For the second problem, we’re interested in seeing where in Boston income-restricted buildings are located.
You'll need two new functions to achieve this (though you may need to call one or more functions from previous problems as well).
● Function #1: count_neighborhoods
● Parameters: a list of dicts, a string neighborhood column
● Returns: dictionary of strings to ints
● Does: counts the number of buildings in each neighborhood
Ex. Given the following arguments as described above – a list of dicts and a string neighborhood column
[{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 16, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': 150, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Mattapan', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 50, 'Income-Restricted Ownership': 0}], 'Neighborhood'
The function should return a dict of strings to ints: {'Roxbury': 2, 'Mattapan': 1}
● Function #2: plot_neighborhood_bar
● Parameters: a dictionary of strings to ints, a list of the base string color names
● Returns: nothing
● Does: creates, saves, and displays a bar chart of the neighborhoods in this dataset. Your plot will have one bar per neighborhood with height corresponding to the number of buildings with income restricted units in that neighborhood. Use the list of string color names to color each bar in a cycle. Notice that the list of colors is not the same length as the number of neighborhoods. You may find that defining a function to create the full list of colors you need provides nice structure.
Ex. Given the following arguments as described above – a list of dicts and a string neighborhood column
{'Roxbury': 2, 'Mattapan': 3, 'Jamaica Plain': 4, 'Dorchester': 1, 'Mission Hill': 7}, ['red', 'blue', 'yellow']
You would produce a bar chart where the bars for Roxbury and Dorchester are red, Mattapan and Mission Hill are blue, and Jamaica Plain is yellow.
You'll need to use plt.savefig('neighborhood_counts.png', bbox_inches="tight") for the x-value labels not to get cut off.
Run your code on the large data set and save this plot as neighborhood_counts.png to turn in.
Problem 3 - Income-Restricted Neighborhood Units
For the final problem, we’re interested in seeing how many income-restricted units are in some neighborhoods. Problem two showed us where buildings with income-restricted units are located, but how many units are in these neighborhoods?
● Function #1: sum_units
● Parameters: a list of dicts, a string name of neighborhood column, a string name of target unit column
● Returns: dict of strings to ints
● Does: finds the total number of units within each neighborhood for the given units column
Ex. Given the following arguments as described above – a string neighborhood and a list of strings
[{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 16, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': 150, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Mattapan', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 50, 'Income-Restricted Ownership': 0}], 'Neighborhood', 'Total Income-Restricted'
The function should return a dict of strings to ints: {'Roxbury': 166, 'Mattapan': 50}
● Function #2: plot_comparison_bar
● Parameters: a dictionary of strings to ints for the first set of bars, a dictionary of strings to ints for the second set of bars, a string label 1, a string label 2
● Returns: nothing
● Does: creates, saves, and displays a bar chart of the numbers of different kinds of units in different boston neighborhoods. Your plot will have two bars per neighborhood, and you should place the bars next to each other. Displayed values on the x-axis should be neighborhood names. Colors are up to you!
Use this function to plot the total number of market units per neighborhood versus the total number of income-restricted units per neighborhood. Read the hints below!
Ex. Given the following arguments as described above – a string neighborhood and a list of strings
{'Roxbury': 166, 'Mattapan': 50}, {'Roxbury': 200, 'Mattapan': 25}, 'Market Rate', 'Income-Restricted'
Your code should produce something like:
Hint #1: to set the position of a bar in a bar chart, you'll want to use numeric x-values, even though you want string labels in the end
Hint #2: to re-set the x labels of a bar chart, use the tick_label optional parameter for plt.bar, which takes a list that is the same length as the number of points given
Hint #3: to control the width of a bar, use the width optional parameter for plt.bar
Hint #4: get the chart working for just one of the given dictionaries first
Hint #5: if the x-values of both sets of bars are the same, the bars will be plotted on top of each other
Save your plots as .png files; do not submit screenshots. Make sure you review the style guide for the specific requirements that apply to your plots!