data


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


Question 1

dataset: cusBrand.csv

This question is based on a study on brand perception. The dataset ‘cusBrand.csv’ contains information on customer ratings of various brands, the brand names, and whether the customer will rebuy the brand.

A cluster analysis was also conducted on the rating of perception of brands. This is to cluster brands together based on their perception ratings. There are 3 clusters based on the cluster analysis, which is displayed as a new variable named “cluster”.

The description of the variables in the dataset is as follows:

Variable name
Description
Measurement
Perception of brand
A scale of 1 to 10 where 1 represents low agreement and 10 represents high. (treat these variables as numeric variables)
performance
The brand’s performance is strong.

leadership
The brand is a leader in the field

productLatest
The brand has the most recent products

fun
The brand is fun

serious
The brand is serious.

bargain
The brand products are a bargain.

bestValue
The brand products are of good value

trendiness
The brand is trendy

Other variables

repeatBuy
I would buy this brand again
A scale of 1 to 10 where 1 represents low agreement and 10 represents high. (treat this variable as a numeric variable)
name
Names of the brand
Character variable on the names of the brands
cluster
Clusters developed from a cluster analysis on the perception of brand ratings.
A categorical variable with 3 categories representing the clusters.

You are required to conduct the following:
i) Wrangle relevant variables to change them into factors. Use the ‘across’ function together with other functions to wrangle all these variables together.

ii) Prior analyses of this dataset have found that many of the brand perception variables are related. Hence, based on this previous analysis, it is advisable not to use these variables individually but to obtain summated proxy variables such as the following:

a. leader = (performance + leadership + serious)/3
b. value = (bargain + bestValue +productLatest)/3
c. trend= (fun + trendiness) / 2

Use dplyr data wrangling methods to create these three new variables and add them as a proxy to measure brand perceptions.

iii) Obtain ONE descriptive analysis table to describe the 3 clusters of respondents (from the variable cluster) by their rating of the perception variables (Use the 3 proxy perception variables created in (ii)) and their likelihood to buy the brand again. You will have to use dply wrangling methods, the across function and the gt package together with descriptive measures to obtain this descriptive table. Interpret the results.

iv) Present and interpret TWO different visualisations to represent the relationship between any of the independent variables with the dependent variable. The independent and dependent variables are mentioned in the next question. Your graph should add some aesthetics and should be presented well. If there is a need to wrangle your data to get a better visualisation, you should do so. Interpret the graphs.

v) Use the workflow approach in tidymodels to do this question. Fit an appropriate machine learning model (from the two methods we learned in class) on the training data to determine the factors influencing repeatBuy. The independent variables should consist of the three proxy brand perception variables computed in (ii), the cluster variable and the brand names. Present the results in a tidy, professional manner. Interpret the influence of all the independent variables on repeatBuy.

Note:

For this question, you will only need to fit the model and interpret the influence of the independent variables on the dependent variables. You do not need to evaluate the performance of the model. 

Question 2

The following are the results of a logistics regression model to predict whether a customer will subscribe to a marketing campaign of a bank.

The variables for the study are as follows:

Variables and their Measurements:

Variables
Description
Independent Variables
age
Age of the customer (years)
balance
The balance of the customer’s bank account (in Euros)
campaign
Number of contacts made with the customer during the campaign.
duration
Length of the last contact during the marketing campaign.
housing
Whether the customer has a housing loan
Categories: yes, no
marital
Marital status of the customer.
Categories: divorced, married, single, unknown
education
Education level of customer.
Categories: primary, secondary, tertiary, unknown
Dependent (Response) Variable
y
Whether the customer subscribed to the marketing campaign
Categories: yes, no

Predictive Performance Metrics


Note:

All information for Question 2 should be answered in the comment section of the quarto file. You are not required to write anything in code chunks for this question.

You are required to conduct the following:

i) Manually compute the sensitivity and specificity metrics. Show the formula you had used in terms of the numeric values used to compute the calculation.
ii) Interpret the accuracy, sensitivity, specificity and ROC-AUC metrics you had obtained either from the table above or from your computations.

iii) Let’s say the following are codes to make predictions:

Complete the codes above to get the ROC curve.

iv) Interpret the influences of the independent variable on the dependent variable.

发表评论

电子邮件地址不会被公开。 必填项已用*标注