Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Question 1

dataset: cusBrand.csv

This question is based on a study on brand perception. The dataset ‘cusBrand.csv’ contains information on customer ratings of various brands, the brand names, and whether the customer will rebuy the brand.

A cluster analysis was also conducted on the rating of perception of brands. This is to cluster brands together based on their perception ratings. There are 3 clusters based on the cluster analysis, which is displayed as a new variable named “cluster”.

The description of the variables in the dataset is as follows:

Variable name	Description	Measurement
Perception of brand		A scale of 1 to 10 where 1 represents low agreement and 10 represents high. (treat these variables as numeric variables)
performance	The brand’s performance is strong.
leadership	The brand is a leader in the field
productLatest	The brand has the most recent products
fun	The brand is fun
serious	The brand is serious.
bargain	The brand products are a bargain.
bestValue	The brand products are of good value
trendiness	The brand is trendy
Other variables
repeatBuy	I would buy this brand again	A scale of 1 to 10 where 1 represents low agreement and 10 represents high. (treat this variable as a numeric variable)
name	Names of the brand	Character variable on the names of the brands
cluster	Clusters developed from a cluster analysis on the perception of brand ratings.	A categorical variable with 3 categories representing the clusters.

You are required to conduct the following:

i) Wrangle relevant variables to change them into factors. Use the ‘across’ function together with other functions to wrangle all these variables together.

ii) Prior analyses of this dataset have found that many of the brand perception variables are related. Hence, based on this previous analysis, it is advisable not to use these variables individually but to obtain summated proxy variables such as the following:

a. leader = (performance + leadership + serious)/3

b. value = (bargain + bestValue +productLatest)/3

c. trend= (fun + trendiness) / 2

Use dplyr data wrangling methods to create these three new variables and add them as a proxy to measure brand perceptions.

iii) Obtain ONE descriptive analysis table to describe the 3 clusters of respondents (from the variable cluster) by their rating of the perception variables (Use the 3 proxy perception variables created in (ii)) and their likelihood to buy the brand again. You will have to use dply wrangling methods, the across function and the gt package together with descriptive measures to obtain this descriptive table. Interpret the results.

iv) Present and interpret TWO different visualisations to represent the relationship between any of the independent variables with the dependent variable. The independent and dependent variables are mentioned in the next question. Your graph should add some aesthetics and should be presented well. If there is a need to wrangle your data to get a better visualisation, you should do so. Interpret the graphs.

v) Use the workflow approach in tidymodels to do this question. Fit an appropriate machine learning model (from the two methods we learned in class) on the training data to determine the factors influencing repeatBuy. The independent variables should consist of the three proxy brand perception variables computed in (ii), the cluster variable and the brand names. Present the results in a tidy, professional manner. Interpret the influence of all the independent variables on repeatBuy.

Note:

For this question, you will only need to fit the model and interpret the influence of the independent variables on the dependent variables. You do not need to evaluate the performance of the model.

Question 2

The following are the results of a logistics regression model to predict whether a customer will subscribe to a marketing campaign of a bank.

The variables for the study are as follows:

Variables and their Measurements:

Variables	Description
Independent Variables
age	Age of the customer (years)
balance	The balance of the customer’s bank account (in Euros)
campaign	Number of contacts made with the customer during the campaign.
duration	Length of the last contact during the marketing campaign.
housing	Whether the customer has a housing loan Categories: yes, no
marital	Marital status of the customer. Categories: divorced, married, single, unknown
education	Education level of customer. Categories: primary, secondary, tertiary, unknown
Dependent (Response) Variable
y	Whether the customer subscribed to the marketing campaign Categories: yes, no

Predictive Performance Metrics

Note:

All information for Question 2 should be answered in the comment section of the quarto file. You are not required to write anything in code chunks for this question.

You are required to conduct the following:

i) Manually compute the sensitivity and specificity metrics. Show the formula you had used in terms of the numeric values used to compute the calculation.

ii) Interpret the accuracy, sensitivity, specificity and ROC-AUC metrics you had obtained either from the table above or from your computations.

iii) Let’s say the following are codes to make predictions:

Complete the codes above to get the ROC curve.

iv) Interpret the influences of the independent variable on the dependent variable.

文章

data

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Question 1

Question 2

发表评论