APAN5205: Applied Analytics Framework & Methods II

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Group 9

APAN5205: Applied Analytics Framework & Methods II

February 27th, 2024

INTRODUCTION

In the heart of New York City, Manhattan's real estate market has become the center of the global economy and cultural significance. Characterized by its urban landscape and skyline, Manhattan’s property sales prices reflect the economic success of the city, its people, and socio-economic dynamics. As the urbanization trend continues, understanding the factors that drive real estate values in such a dynamic market is of high importance to investors, policymakers, and urban planners alike. After COVID-19, real estate prices have strongly increased with Manhattan standing out due to its high popularity and scarce space. This has led to a growing interest among researchers and practitioners to investigate the factors that influence property sales prices in Manhattan.

LITERATURE REVIEW

The Manhattan real estate market is complex as it is impacted by various factors that influence property sales prices. Studies like those conducted by Li & Neal (2019) have demonstrated the role of physical attributes, e.g., square footage and the age of the building, in determining property prices. They found a clear preference for modern, spacious environments that cater to the demands of urban living. This emphasis on property characteristics is further supported by research from Ng (2020), who emphasizes the significant influence of location on property prices. This suggests that the neighborhood's social and economic context is key in shaping real estate values in Manhattan.

Additionally, Mulheirn & Menzies (2022) found a seasonal pattern in real estate transactions, with peaks during certain times of the year. This illustrates the market's sensitivity to seasonal changes. Meanwhile, investigations into the impact of amenities and sustainability (Bernstein et al., 2022; Zhong & Li, 2021) show that luxury features and environmental certifications can enhance property appeal. At the same time, they might also be a risk to the market if overemphasized. These insights offer a first view of the Manhattan real estate market. The research studies underscore the complexity of the number of factors influencing property prices in Manhattan.

RESEARCH PROBLEM

As a result, this study attempts to investigate the influence of various building and neighborhood characteristics on the sales prices of properties in Manhattan. Specifically, we aim to show how factors such as square footage, building age, building class, and neighborhood location (e.g., ZIP code) contribute to the sales price dynamics in this area. Moreover, we intend to explore the temporal aspect of sales, investigating whether seasonal trends significantly impact property valuations in Manhattan.

Research Question 1: How does the age of buildings correlate with their sales prices in Manhattan? We hypothesize that newer buildings in Manhattan are associated with higher sales prices, reflecting a premium on modern amenities, architectural designs, and fewer maintenance concerns.

Research Question 2: What is the relationship between the square footage of buildings and their sales prices in Manhattan? We expect that buildings with greater square footage in Manhattan are sold at higher prices since larger living spaces are highly sought after in the densely populated urban environment as they offer more comfort and utility to the occupants.

Research Question 3: Is there a significant difference in the sales prices of Manhattan properties across different neighborhoods? We hypothesize that properties located in specific neighborhoods of Manhattan exhibit higher sales prices compared to others (e.g., SoHo, West Village, …), driven by factors such as safety, prestige, and accessibility.

Research Question 4: Do the sales prices of properties in Manhattan exhibit a seasonal pattern throughout the year? We suggest that there exists a seasonal trend in the sales prices of Manhattan properties, with certain times of the year showing elevated sales prices due to variations in demand (e.g., August when students move to the city), market dynamics, and buyer sentiment.

DATA DESCRIPTION

For our study, we use a dataset from the New York City Department of Finance. This dataset captures property sales in Manhattan over twelve months. Also, this dataset includes variables such as neighborhood, building class category, tax class, block, lot, building class at present, residential and commercial units, land and gross square feet, year built, along with the sale price and date. These elements are important for analyzing Manhattan's complex real estate dynamics and provide insights into how various factors like property use, location, size, and age influence sales prices. Moreover, the dataset's complexity allows for a good understanding of market trends and property predictions.

To clarify the dataset's contents for our analysis, a glossary of terms can be found at: https://www.nyc.gov/site/finance/property/glossary-property-sales.page. For instance, 'Borough' and 'Neighborhood' denote the property's location, 'Building Class Category' and 'Tax Class at Present' classify properties by their use and tax implications, and 'Block' and 'Lot' provide a unique identifier for real estate parcels. Additionally, 'Building Class at Present' and 'Building Class at Time of Sale' describe the property's use and structure, 'Sales Price' and 'Sale Date' record the transaction details, and 'Land Square Feet' and 'Gross Square Feet' measure the property's size.

DATA PREPARATION

We started the data preparation process with the exclusion of the initial four rows via the “readxl” library to focus only on the relevant data. This step was followed by a multi-step approach involving both numerical and categorical data handling, outlier management, and data transformation to correct skewness and address missing values:

Fist, we began with basic data exploration using packages like “skimr” for summarizing the dataset. Key steps included renaming variables for consistency, i.e., removing white space in column names, and conducting initial visualizations to understand the distribution of sale prices. This showed a right-skewed pattern which was further confirmed through skewness analysis.

A critical aspect of our preparation involved addressing the dataset's structure and content. We removed irrelevant columns such as “BOROUGH” (since all data pertained to Manhattan) and “EASEMENT” (lacking significant data), among others, to streamline the dataset. Furthermore, we parsed new variables like “SEASON” to capture seasonal trends in sales data. This enhanced the dataset's analytical depth. The transformation of 'SALE_DATE' into a date object and subsequent derivation of 'SALE_YEAR' allowed for further analysis of property sales.

Our handling of categorical data was particularly detailed, creating a dummy variable for luxury streets to investigate their impact on sales prices. We further applied binning to various categorical variables to reduce complexity and facilitate analysis. The process of dealing with missing and empty data included the imputation of missing values for “TOTAL_UNITS” and “YEAR_BUILT” using the “mice” package and the removal of columns and rows with substantial missing data.

Outlier analysis was another step we took. We identified and managed extreme sale price values to ensure the dataset's analytical integrity. This step was important as it improved the reliability of our findings. Subsequent correlation analysis and visualization helped us identify the most significant predictors for sale prices, enabling a focused approach to modeling.

In our final steps, we performed label encoding and log transformation on skewed variables to normalize their distribution, specifically addressing the “COMMERCIAL_UNITS” variable's high skewness (>30). The result of our data preparation approach is a clean and structured dataset, which will be pivotal for advanced statistical analysis to uncover the dynamics of Manhattan's real estate market. In the end, this dataset forms the foundation for our predictive modeling and analysis, aiming to shed light on the factors influencing property sale prices in Manhattan.






发表评论

电子邮件地址不会被公开。 必填项已用*标注