MKTG 4110 – Research and Analytics, Spring 2024
Homework 2 (due by 2 pm, 29-Feb)
Case: Amazon Consumers
This case pertains to Amazon consumers, some of whom have prime subscription and others do not. The data has tracked a small subset of consumers (each row in the data is for a unique consumer) and shows their purchase behavior across a few categories. Also included in the data is information related to the consumer’sdemographics.
The Amazon Dataset (csv) Description
Variable name |
Type |
Description |
ID |
integer |
Customer ID |
Columns 2 - 18 apps, automotive, beauty, books, clothing, cycling, electronics, exercise, music, furniture, garden, jewelry, luggage, movies, petsupplies, powertools, wine |
logical |
Has the customer purchased from Category by this name |
Age |
integer |
Consumer age in completed years |
Sex |
factor |
Consumer gender |
MaritalStatus |
factor |
Consumer marital status |
New |
factor |
Whether consumer is new to Amazon or not? |
Prime |
factor |
Whether consumer is a prime subscriber or not? |
AmazonSpend |
numeric |
Dollar value of the cumulative spend by the consumer |
Discounts |
numeric |
Dollar value of the discounts consumer has received |
Cluster |
factor |
Cluster to which consumer belongs to |
QUESTIONS (answer these questions using the 2210 customer sample data - Amazon.csv)
1. Which category do consumers in this sample buy the most of? (0.5 point)
RStudio Hint: summary() |
2. What percent of people who purchased jewelry and powertools were men? (0.5 point)
RStudio Hint: table(), use $ sign to grab variables, use == for comparison |
3. Calculate the mean (mean), standard deviation (sd),and sum (sum) of the total dollar Amazon Spend (AmazonSpend) by the marital status (MaritalStatus). (0.5 point)
RStudio Hint: aggregate() |
4. Select out all the people who have purchased both books (books) and movies (movies). What is the mean discount (discounts) received by these people? (0.5 point)
RStudio Hint: use $ sign to grab variables, use == for comparison, mean() |
5. Perform at-test for one sample mean. Specifically, test whether the mean of the buyer discount (discounts) is significantly different from $15 (on a 5% significance level). Comment about the testing result. Can you reject the null hypothesis? Report the t-stat, p-value and confidence interval. (0.5 point)
RStudio Hint: t.test() |
6. Perform at-test for two sample means difference. Specifically, test whether the mean spend of the customer variable (AmazonSpend) is significantly different between the Amazon Prime and non-Prime groups (Prime) (on a 5% significance level). State the null hypothesis. Comment about the test result. Report the t-stat, p-value and confidence interval. (0.75 point)
RStudio Hint: t.test() |
7. Estimate the correlation coefficient between customers'total spending (AmazonSpend) and the total discount received (Discounts). Test whether the correlation coefficient is significantly different from zero (on a 5% significance level). State the null hypothesis. Comment about the testing result. Report the t-stat, p-value and confidence interval. (0.75 points)
RStudio Hint: cor.test() |
8. Generate across-tabulation of the 2 variables: gender (Sex) and books (books). Perform achi- square test. State the null hypothesis. Interpret the result & comment about the association of the 2 variables. (0.5 points)
RStudio Hint: table(), chisq.test() |
9. Plot the relationship between customers'age (Age) and total spending (AmazonSpend). Do you see any relationship? (0.5 point)
RStudio Hint: plot() |
General guides on getting started with Case: Prepare the .R script file (template) and data • Download the template (Homework 2_Amazon_template.r) and csv data file (Amazon.csv) to desktop Open and clear r-studio • Session -> Clear workspace -> Yes. Or type: rm(list=ls()) Set working directory • Session -> Set working directory -> Choose directory -> Select desktop -> Open; to check directory, type: getwd() Open .r script file • File -> Open file -> locate .r script template file • NOTE: Don’t forget to save your .r script file as you write your codes in the template file. To save, press ctrl + s (command + save for Mac) Open .csv data • Locate the file in the File tab -> Click on the file -> Select Import Dataset… -> change data name if you like -> click "yes" for heading -> press import; or if the data is placed in the working directory, type: dat = read.csv("Anazon.csv", header=TRUE); or File -> Import dataset - > From text (base) -> locate .csv file -> change data name if you like -> click "yes" for heading - > press import; |
Overall hints, assistance, and submission guidance: • Refer to class 10 contents: Bookbinder Case (pdf & script file). • Refer to class notes (8, 9 & 10) for statistical inference • Rather than jumping in and analyzing this homework, having a firm understanding about the class material (statistical tests, execution of the .R codes, interpretations, etc.) might help • Provide aword or pdf write-up. Don’t forget to include the R code and output in your file. Feel free to copy, cut and paste (image) any R results. Attach the codes you used at the end of your write-up as an appendix. DUE: 2pm, Feb 29 (Class 13) • Use your resources (office hours, review sessions, emails, mediasite recording) as much as possible. Start early. • Email me ([email protected]) with your codes if you have any technical difficulties (don’t |