DS2000 Fall 2024 Homework 6

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

DS2000

Fall 2024

Homework 6

Assigned: November 1st, 2024

Deadline: November 8th, 2024 at 9pm note this homework is not eligible to be re-done for the second-chance homework

Submit each program as a .py file in gradescope (filenames are specified below). You may submit multiple times right up until the deadline.

You may submit up to 48 hours late for no penalty. This policy exists for those times you're having a tough week, are feeling sick, or are falling behind in your work; we won't make any exceptions to this policy.  You will have an opportunity at the end of the semester to submit one of homeworks 1-5 for a new grade.

Your solution will be graded according to the DS2000 general rubric and style guide.

Submit Plots

Problem 2 and 3 asks you to create a few visualizations; submit them in gradescope along with your code.

Style Guide Focus

(pay attention to these items in particular for this Homework, but the entire style guide will be used during grading, so please make sure you review it!)

● Functions

● Dictionaries

● Advanced loops

● Everything! (this is the last DS 2000 homework)

The 30-minute Guideline

If you get stuck on a homework problem, come by office hours, post on Piazza, or take a break! We recommend you spend about 30 minutes trying to figure out a problem -- enough time that you can try a few things to get unstuck, but not SO much time that you’re banging your head against the wall. Try for 30 minutes, then take a break, take a walk, and/or ask us. :)

Review the Autograder Output

When you submit your solution, gradescope will run your code and print out the results so you can see what we’ll see when grading. Look at this output! It serves as a sanity-check to make sure your code produces what you wanted; if it doesn’t, you can make revisions and resubmit up until the deadline.

It’s fine to work with friends and share ideas with each other; it is not fine to share code. Do not show your code to classmates, ChatGPT, or post code on piazza.

 

Files

Starter code:

● income_restricted_housing.py 

Plot images to submit:

● neighborhood_units.png

● neighborhood_counts.png

Source: https://data.boston.gov/dataset/income-restricted-housing 

This data includes public housing owned by the Boston Housing Authority (BHA), privately- owned housing built with funding from the Department of Neighborhood Development and/or on land that was formerly City-owned, and privately-owned housing built without any City subsidy, e.g., created using Low-Income Housing Tax Credits (LIHTC) or as part of the Inclusionary Development Policy (IDP).

● housing_boston_small.csv (develop your program using this one)

● housing_boston.csv (run your program on this one to generate your final plots)

In this data, there are five columns:

● neighborhood name

● total number of market-rate units

● number of market-rate units that are owned

● total number of income-restricted units

● number of income-restricted units that are owned

Each row represents one building in the given neighborhood.

For this homework, you'll be writing all your code in the same .py file. Each problem asks you to write some new functions, but you are 100% welcome (and might need) to call functions from previous problems to solve the next one.

Problem 1 - Percent Owned

First, investigate the percentage of market-rate units owned versus the percentage of income-restricted units owned in the buildings in your dataset.

Unfortunately, though, your data is not 100% clean! You'll need to start by making it into exactly the format that we want to be working with.

● Function #1: read_file_as_list_of_dicts (add to what we've given you in the starter code!)

● Parameters: a string filename, a list of strings

● Returns: list of dicts

● Does: reads the given file as a list of dictionaries; while reading, cleans the data—all columns indicated in the given list of strings should have their values changed from strings into integers; your function should change any empty strings to zeroes; it should be able to handle strings that are formatted as floats (but we want them to be ints)—for example, "8.0" should be changed to 8 

Ex. Given the following arguments as described above – a string filename and a list of strings

"housing_boston_small.csv", ["TtlMarket", "MarketOwn"]

The function should return a list of dicts: [{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': '16', 'Income-Restricted Ownership': '0.0'}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': '150', 'Income-Restricted Ownership': '0.0'}, ...]

Notice that for this example the "TtlMarket" and "MarketOwn" values have been changed to ints, but the "Total Income-Restricted" and "Income-Restricted Ownership" have not been.

● Function #2: sum_column

● Parameters: a list of dicts, a string column name

● Returns: an int

● Does: sums all values in the given column 

Ex. Given the following arguments as described above – a list of strings

[{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 16, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': 150, 'Income-Restricted Ownership': 0}], 'Total Income-Restricted'

The function should return an int: 166

Answer the following questions and print the results (leave these calculations and prints in main):

● What is the total number of market-rate units?

● What percentage of market-rate units are owned?

● What is the total number of income-restricted units?

● What percentage of income-restricted units are owned?

Example output (on boston_housing_small.csv):

HW 6: problem 1

---------------

The total number of market-rate units is: 608

The percent of market-rate units owned is 0.16%

 

The total number of income-restricted units is: 1318

The percent of income-restricted units owned is 0.01%

 

Problem 2 - Income-Restricted Neighborhood Buildings

For the second problem, we’re interested in seeing where in Boston income-restricted buildings are located.

You'll need two new functions to achieve this (though you may need to call one or more functions from previous problems as well).

● Function #1: count_neighborhoods

● Parameters: a list of dicts, a string neighborhood column

● Returns: dictionary of strings to ints

● Does: counts the number of buildings in each neighborhood

Ex. Given the following arguments as described above – a list of dicts and a string neighborhood column

[{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 16, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': 150, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Mattapan', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 50, 'Income-Restricted Ownership': 0}], 'Neighborhood' 

The function should return a dict of strings to ints: {'Roxbury': 2, 'Mattapan': 1}

● Function #2: plot_neighborhood_bar

● Parameters: a dictionary of strings to ints, a list of the base string color names

● Returns: nothing

● Does: creates, saves, and displays a bar chart of the neighborhoods in this dataset. Your plot will have one bar per neighborhood with height corresponding to the number of buildings with income restricted units in that neighborhood. Use the list of string color names to color each bar in a cycle. Notice that the list of colors is not the same length as the number of neighborhoods. You may find that defining a function to create the full list of colors you need provides nice structure.

Ex. Given the following arguments as described above – a list of dicts and a string neighborhood column

{'Roxbury': 2, 'Mattapan': 3, 'Jamaica Plain': 4, 'Dorchester': 1, 'Mission Hill': 7}, ['red', 'blue', 'yellow']

You would produce a bar chart where the bars for Roxbury and Dorchester are red, Mattapan and Mission Hill are blue, and Jamaica Plain is yellow.

You'll need to use plt.savefig('neighborhood_counts.png', bbox_inches="tight") for the x-value labels not to get cut off.

Run your code on the large data set and save this plot as neighborhood_counts.png to turn in.

Problem 3 - Income-Restricted Neighborhood Units

For the final problem, we’re interested in seeing how many income-restricted units are in some neighborhoods. Problem two showed us where buildings with income-restricted units are located, but how many units are in these neighborhoods?

● Function #1: sum_units

● Parameters: a list of dicts, a string name of neighborhood column, a string name of target unit column

● Returns: dict of strings to ints

● Does: finds the total number of units within each neighborhood for the given units column

Ex. Given the following arguments as described above – a string neighborhood and a list of strings

[{'Neighborhood': 'Roxbury', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 16, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Roxbury', 'TtlMarket': 52, 'MarketOwn': 0, 'Total Income-Restricted': 150, 'Income-Restricted Ownership': 0}, {'Neighborhood': 'Mattapan', 'TtlMarket': 0, 'MarketOwn': 0, 'Total Income-Restricted': 50, 'Income-Restricted Ownership': 0}], 'Neighborhood', 'Total Income-Restricted' 

The function should return a dict of strings to ints: {'Roxbury': 166, 'Mattapan': 50}

● Function #2: plot_comparison_bar

● Parameters: a dictionary of strings to ints for the first set of bars, a dictionary of strings to ints for the second set of bars, a string label 1, a string label 2

● Returns: nothing

● Does: creates, saves, and displays a bar chart of the numbers of different kinds of units in different boston neighborhoods. Your plot will have two bars per neighborhood, and you should place the bars next to each other. Displayed values on the x-axis should be neighborhood names. Colors are up to you!

Use this function to plot the total number of market units per neighborhood versus the total number of income-restricted units per neighborhood. Read the hints below!

Ex. Given the following arguments as described above – a string neighborhood and a list of strings

{'Roxbury': 166, 'Mattapan': 50}, {'Roxbury': 200, 'Mattapan': 25}, 'Market Rate', 'Income-Restricted'

Your code should produce something like:

Hint #1: to set the position of a bar in a bar chart, you'll want to use numeric x-values, even though you want string labels in the end

Hint #2: to re-set the x labels of a bar chart, use the tick_label optional parameter for plt.bar, which takes a list that is the same length as the number of points given

Hint #3: to control the width of a bar, use the width optional parameter for plt.bar

Hint #4: get the chart working for just one of the given dictionaries first

Hint #5: if the x-values of both sets of bars are the same, the bars will be plotted on top of each other

Save your plots as .png files; do not submit screenshots. Make sure you review the style guide for the specific requirements that apply to your plots!

发表评论

电子邮件地址不会被公开。 必填项已用*标注