Note: Please be creative in defining the new variables as part of the data manipulation and write your description at the end of each code as the comment. We will read your logic and description for the assessment.
Part 1: 75 points (85 points with the extra credits in the Bonus Question)
Question 2
Part (a): Create a new column called "Recommendation", which is how well the property is recommended:
For ‘starrating’ of 5: Highly Recommended
For ‘starrating’ of 4 and above: Great Value
For ‘starrating’ of less than 4: Meh
Part (b): Which country receives the largest amount of ‘Highly Recommended’ and ‘Great Value’?
Question 4
Part (a): For each property, there are some abnormal values of 0 in the “onsiteprice”. To better organize the data, you would like to create a new column “replaced onsiteprice” in the dataset by retaining the original non-zero “onsiteprice” of one specific property and replacing the zero value with its median of non-zero “onsiteprice”.
Part (b): For each property, calculate the mean and variance value of “replaced onsiteprice”, and store these two into corresponding two columns named “Mean” and “Variance”. Then create a column named “Standardized Mean” to store the standardized form of the “Mean” column.
Question 5
Part (a): A party of four is planning a trip. How many available hotels do offer a room with the “maxoccupancy” of 4 or 2? Available hotel are those whose “propertype” are “Hotels”, “close” are “N”, and “hotelblock” are not “sold out” .
Part (b): If this party does not want to pay a room for an average “replaced onsiteprice” higher than 230 per night, how many hotels are still available? Use the mean of “replaced onsiteprice” to compare with 230 due to price fluctuation.
Bonus Question:
Merge data, filter, groupby, merge three times
Part (a): For each zip code, find the most expensive property by using “replaced onsiteprice”. Provide id, name, rating, city, country, zip code, address, and average “replaced onsiteprice” of these properties.
Part (b): For each zip code, find the cheapest property by using “replaced onsiteprice”. Provide id, name, rating, city, country, zip code, address, and average “replaced onsiteprice” of these properties.
Hint: Each country has numbers of hotels, and each hotel has numbers of prices due to price fluctuation. You need to find the average “replaced onsiteprice” for each hotel first, and sort out the cheapest and the most expensive hotels then.
Part 2 (25 Points)
For this part, we look at the logic and how you solve the problems.
Part (a):
1- You need to find "5" interesting business questions based on the datasets. Please make sure that these quastions are not similar with other groups... 2- Write Python code to answer the questions. 3- Visualize your results for each question.Part (b):
Write a 300-word summary of your answers and business insights you get from answering these 5 questions based on your code. Ensure that you have clearly explained why we should care about your questions and your results. Clearly explain your findings.This part will be evaluated based on the following criteria:
1. You need to ask five business-relevant questions. (5 points) 2. You need to answer these five questions using Python and the two datasets. (5 points) 3. You need to have at least "5" graphs to visualize your insights. (6 points) 4. Your executive summary should be well-written. (6 points) 5. Your results and business insights should be interesting and meaningful. (3 points)Note: You may use this cell to write your 5 questions
Question 1:
Question 2:
Question 3:
Question 4:
Question 5:
write here
Grading:
PART 1 - 75 points (85 points with the extra credits in the Bonus Question)
- Question 1: 9 points (6 points for part (a) and 3 points for part (b))
- Question 2: 15 points (9 points for part (a) and 6 points for part (b))
- Question 3: 12 points (9 points for part (a) and 3 points for part (b))
- Question 4: 21 points (9 points for part (a) and 12 points for part (b))
- Question 5: 18 points (9 points for part (a) and 9 points for part (b))
- Bonus Question: 10 points (extra credit): (8 points for part (a) and 2 points for part (b))
PART 2 - 25 points
- You need to ask five business-related questions (5 points).
- You need to answer these five questions using Python and the two datasets (5 points).
- You need to have at least "5" graphs to visualize your insights (6 points).
- Your executive summary should be well-written (6 points).
- Your results and business insights should be interesting and meaningful (3 points).
Good Luck!