MKTG 4110 – Research and Analytics, Spring 2024

MKTG 4110 – Research and Analytics, Spring 2024

Homework 2 (due by 2 pm, 29-Feb)

Case: Amazon Consumers

This case pertains to Amazon consumers, some of whom have prime subscription and others do not. The data has tracked a small subset of consumers (each row in the data is for a unique consumer) and shows their purchase behavior across a few categories. Also included in the data is information related to the consumer’sdemographics.

The Amazon Dataset (csv) Description

Variable name

Type

Description

ID

integer

Customer ID

Columns 2 - 18

apps, automotive,       beauty, books,         clothing,         cycling, electronics,     exercise, music, furniture, garden, jewelry, luggage,    movies,    petsupplies, powertools, wine

logical

Has the customer purchased from Category by this name

Age

integer

Consumer age in completed years

Sex

factor

Consumer gender

MaritalStatus

factor

Consumer marital status

New

factor

Whether consumer is new to Amazon or not?

Prime

factor

Whether consumer is a prime subscriber or not?

AmazonSpend

numeric

Dollar value of the cumulative spend by the consumer

Discounts

numeric

Dollar value of the discounts consumer has received

Cluster

factor

Cluster to which consumer belongs to

QUESTIONS (answer these questions using the 2210 customer sample data - Amazon.csv)

1.    Which category do consumers in this sample buy the most of? (0.5 point)

RStudio Hint: summary()

2.    What percent of people who purchased jewelry and powertools were men? (0.5 point)

RStudio Hint: table(), use $ sign to grab variables, use == for comparison

3.    Calculate the mean (mean), standard deviation (sd),and sum (sum) of the total dollar Amazon Spend (AmazonSpend) by the marital status (MaritalStatus). (0.5 point)

RStudio Hint: aggregate()

4.    Select out all the people who have purchased both books (books) and movies (movies). What is the mean discount (discounts) received by these people? (0.5 point)

RStudio Hint: use $ sign to grab variables, use == for comparison, mean()

5.    Perform at-test for one sample mean. Specifically, test whether the mean of the buyer discount (discounts) is significantly different from $15 (on a 5% significance level). Comment about the testing result. Can you reject the null hypothesis? Report the t-stat, p-value and confidence interval. (0.5 point)

RStudio Hint: t.test()

6.     Perform at-test for two sample means difference. Specifically, test whether the mean spend of the customer variable (AmazonSpend) is significantly different between the Amazon Prime and  non-Prime groups (Prime) (on a 5% significance level). State the null hypothesis. Comment about the test result. Report the t-stat, p-value and confidence interval. (0.75 point)

RStudio Hint: t.test()

7.    Estimate the correlation coefficient between customers'total spending (AmazonSpend) and the total discount received (Discounts). Test whether the correlation coefficient is significantly different from zero (on a 5% significance level). State the null hypothesis. Comment about the testing result. Report the t-stat, p-value and confidence interval. (0.75 points)

RStudio Hint: cor.test()

8.    Generate across-tabulation of the 2 variables: gender (Sex) and books (books). Perform achi-   square test. State the null hypothesis. Interpret the result & comment about the association of the 2 variables. (0.5 points)

RStudio Hint: table(), chisq.test()

9.    Plot the relationship between customers'age (Age) and total spending (AmazonSpend). Do you see any relationship? (0.5 point)

RStudio Hint: plot()


General guides on getting started with Case:

Prepare the .R script file (template) and data

• Download the template (Homework 2_Amazon_template.r) and csv data file (Amazon.csv) to desktop

Open and clear r-studio

 Session -> Clear workspace -> Yes. Or type: rm(list=ls())

Set working directory

 Session -> Set working directory -> Choose directory -> Select desktop -> Open; to check directory, type: getwd()

Open .r script file

• File -> Open file -> locate .r script template file

• NOTE: Don’t forget to save your .r script file as you write your codes in the template file. To save, press ctrl + s (command + save for Mac)

Open .csv data

• Locate the file in the File tab -> Click on the file -> Select Import Dataset… -> change

data name if you like -> click "yes" for heading -> press import; or if the data is placed in the

working directory, type: dat = read.csv("Anazon.csv", header=TRUE); or File -> Import dataset - > From text (base) -> locate .csv file -> change data name if you like -> click "yes" for heading - > press import;


Overall hints, assistance, and submission guidance:

• Refer to class 10 contents: Bookbinder Case (pdf & script file).

• Refer to class notes (8, 9 & 10) for statistical inference

• Rather than jumping in and analyzing this homework, having a firm understanding about the class material (statistical tests, execution of the .R codes, interpretations, etc.) might help

• Provide aword or pdf write-upDont forget to include the R code and output in your file. Feel free to copy, cut and paste (image) any R results. Attach the codes you used at the end of your

write-up as an appendix. DUE: 2pm, Feb 29 (Class 13)

• Use your resources (office hours, review sessions, emails, mediasite recording) as much as possible. Start early.

• Email me ([email protected]) with your codes if you have any technical difficulties (don’t



发表评论

电子邮件地址不会被公开。 必填项已用*标注