Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
DPBS 1190: Data Insights and Decisions: Brief for Assessment 2
(Summarised the key points from the Assessment Guide)
• Read the relevant section of the Assessment Guide carefully to understand all requirements for this assessment task. The assessment guide is available in the Course Moodle Page.
• This assessment is due on Friday, 14
th June, 4pm (Sydney time) via Turnitin in the Course Moodle Page. Submission must be done in word document.
• The total word limit for this assessment is 1000; excluding R codes and references. However, a 10% variation from this limit is acceptable.
• The dataset for this assessment task is available in the Course Moodle Page under the Assessment Resources in the Section: Assessment 2: Individual Project Report.
• You are required to focus on descriptive analytics - visualization and descriptive statistics.
• In undertaking your analysis, you should particularly take the following into account:
• Project goal: you should clearly outline what’s your project goal?
• Data sub setting: you must do appropriate data sub setting using the R codes as shown in the class. The subsetting should be done in line with your project goal. You are required to perform multiple subsetting.
• Visualization: In accordance with your project goal, you should create barplot, line chart, histogram, bubble plot, and box plot (specifically for outlier analysis). You need to discuss clearly the implications for outlier analysis on your chosen variables. Ensure that your visualization and descriptive statistics are derived from your subsetted data. Further subsetting of data will allow you to incorporate various perspectives into your analysis and thereby enhancing its rigour.
• Descriptive statistics: for this, you must use ‘moments’ package in R.
• Insights: You should clearly present insights from your data analysis using visualization and descriptive statistics. Note: providing commentary on visualization/descriptive statistics is not enough, your analysis should focus on the decision-making context in line with your project goal(s).
• R codes: You must clearly show all R codes in the Appendix and the visualization graphs/charts, and descriptive statistics in the main body of the report. It is expected that you will use R codes in line with the scope of the course and that has been discussed in the class. R codes will not be included in the word count. Non-inclusion of R codes will lead to significant reduction of marks.
• Ensure that you write/select variables the same way as it appears in the dataset. Note R is case sensitive.
• Appropriate referencing must be done as per the requirement outlined in the Assessment Guide. This will not be included in the word count.
• Assignment submitted late will be penalised at a rate 10% per day including the weekends and public holidays.
• Remember, academic integrity is highly important. The work must be your original work. Any suspected deviation from this will be referred to the appropriate authority for review and subsequent action. This could result in awarding zero marks for your assessment and further measures according to the UNSW policy. We will use the similarity and AI percentage generated by Turnitin, which compares your work with other students’ submission, institutional repository, including UNSW, and internet sources. This helps us assess compliance with academic integrity. Your answer must not be AI generated. These requirements are clearly outlined in the Assessment Guide.
• Some of the sample R codes are given here for your reference. Please note the data set is saved as myData. If you save in another name, then you should use that name while writing in R Studio. Note these codes are only examples, and by no means are exhaustive. Therefore, you should not limit your analysis to just these codes.
# Subset the data as per the City (Sydney is used here as an example).
sydney = subset(myData, City =="Sydney")
# Subset the data for Successful and Not Successful Companies and develop the bar plot using average profit.
# Sydney as an example. You can apply the similar process for other cities to develop bar plots.
sydneySuccessful = subset(sydney, IsSuccessful == 1)
sydneyNotSuccessful = subset(sydney, IsSuccessful == 0)
sydneySuccessfulProfitAvg = mean(sydneySuccessful$Profit)
sydneyNotSuccessfulProfitAvg = mean(sydneyNotSuccessful$Profit)
sydneycompanies = c(sydneySuccessfulProfitAvg,sydneyNotSuccessfulProfitAvg)
colnames=c("Successful","Not-Successful")
barplot(sydneycompanies,las=1,col="red",ylim=c(0,200000),names.arg=colnames,main="Sydney
ComaniesAverage Profit",xlab="Success/Non-Success",ylab="Profit")
# Line Chart on Marketing Expenses, as an example for Successful and Not Successful companies. It is expected that you should also consider other variables for developing line charts for making comparison.
#The lwd indicates the width of the line. The default lwd in R is 1. You can define the line width; For instance if you give lwd=2, the line width will be twice as wide.
# Successful companies
successful = subset(myData, IsSuccessful ==1)
notSuccessful = subset(myData, IsSuccessful ==0)
adata1 = aggregate(Marketing~StartYear, successful, mean)
plot(adata1,type="l",col="red",ann=FALSE,lwd=1.5)title(xlab="Year",ylab="Marketing Expenses",main="Marketing Expenses of Successful Startup Companies")
# For descriptive statistics, you are required to use ‘moments’ package.
library(moments) function should be used, and you should calculate mean, variance, standard deviation, skewness and kurtosis. You can select variable from the dataset for calculating descriptive statistics in line with your project goal and link these results with your visualization.
# For outlier analysis, you should use boxplot and boxplot.stats function in R. You need to clearly articulate implications of outlier in your analysis.
# For Histogram, you should use hist function as shown in the class, using appropriate variables.
# For Bubble plot you should use the R codes as shown in the class. Of course, you need to select variables from the dataset in line with your project goals.