INFS5730 - Social Media Analytics in Practice
SAS Hands-On Assignment - SAS Visual Text Analytics
In this hands-on assignment you are required to conduct a textual analysis using SAS Visual Text Analytics and submit a report on Moodle course site through Turnitin. The due date of this assignment is on Week 5, Friday 5:00pm 15th March 2024 (AEST).
Please note that this assignment is worth 20% of your overall course mark.
Requirements
The purpose of this assignment is to use SAS Visual Text Analytics to analyse a dataset called coffee_amazon_reviews available on Moodle as a CSV file. The dataset consists of a sample of 4,691 customer reviews of coffee products purchased on Amazon Website.
The file coffee_amazon_reviews.csv, available on Moodle, includes the following fields:
• ProductId: Unique identifier for the product
• UserId: Unique identifier for the user
• Score: Rating between 1 and 5
• Review: Text of the review
The dataset includes reviews about the following coffee products:
• Coffee Pods/Capsules
o SENSEO Coffee Pods - (ProductId: B000UBD88A)
o Gloria Jean's Hazelnut Keurig K-Cups - (ProductId: B000TQEWM2)
o Green Mountain Coffee Dark Magic single serve K-Cup pods - (ProductId: B001EO5Y8Y)
o San Francisco Bay OneCup, Fog Chaser, 12 Single Serve Coffees - (ProductId: B005ZBZLT4)
o Timothy's World Coffee, Breakfast Blend K-Cup - (ProductId: B002AQ0OL2)
o Gloria Jean's Coffees Butter Toffee, Single-Serve Keurig K-Cup Pods - (ProductId: B002HQLY7S)
o Van Houtte French Vanilla, Light Coffee, K-Cup Portion Pack - (ProductId: B00395DVQS)
o Van Houtte Colombian Medium Roast K-Cups - (ProductId: B00817GPWQ)
• Ground Coffee
o Starbucks Natural Fusions Vanilla Ground Coffee - (ProductId: B003GTR8IO)
o Marley Coffee,Talkin Blues, Jamaica Blue Mountain Ground Coffee Portion Packs - (ProductId: B005VOONGM)
o Lavazza Espresso Dark Roast Ground Coffee - (ProductId: B001E5E0D8)
o Lavazza Crema E Gusto Espresso - (ProductId: B005GRCWDU)
• Whole Bean Coffee
o Lavazza Super Crema Whole Bean Coffee Blend - (ProductId: B000SDKDM4)
• Ready-to-Drink Coffee
o illy Ready-to-Drink Caffè - (ProductId: B002IEZJMA)
o illy Ready-to-Drink Cappuccino - (ProductId: B002IEVJRY)
This is a sample dataset derived from a larger dataset available at https://huggingface.co/datasets/jhan21/amazon-food-reviews-dataset.
You are required to conduct a data analysis of the customer reviews provided in the dataset coffee_amazon_reviews.csv using SAS Visual Text Analytics in two parts. Part 1 consists of exploring predefined concepts and automatically generated topics to derive insights from the data. Part 2 consists of defining your own custom concepts and custom categories to answer specific research questions.
Your report should have the following components:
• A standard cover page (available on Moodle).
• Part 1
o Predefined Concepts (worth 20% of the available marks) - up to 600 words An exploration of the dataset using TWO (2) relevant predefined concepts.
For each selected predefined concept, your answer must include the following:
- An explanation of why you think the selected predefined concept can be relevant to your data analysis.
- A discussion of the findings and the insights that you could unveil from these findings. Include relevant screenshots from SAS Visual Text Analytics.
- A discussion of the benefits and limitations of relying only on the selected predefined concepts.
o Auto-generated Topics (worth 20% of the available marks) - up to 600 words
- An exploration of the dataset using TWO (2) relevant topics among those automatically generated by SAS.
For each selected topic, your answer must include the following:
- An explanation of why you think the selected topic can be relevant to your data analysis.
- A discussion of the findings and the actionable insights you could derive from these findings. Include relevant screenshots from SAS Visual Text Analytics.
• Part 2
o Custom Concepts (worth 30% of the available marks) - up to 800 words Write TWO (2) custom concepts, each using a different concept rule type. For each custom concept, your answer must include the following:
- An explanation of the objectives of your analysis
- A justification of the reasons behind your choice of the concept rule type
- The custom concept rule to fulfil the objectives of your analysis
- A detailed explanation of the concept rule syntax
- A discussion of the findings and insights that you could derive from these findings. Include relevant screenshots from SAS Visual Text Analytics.
o Custom Categories (worth 30% of the available marks) – up to 800 words Write TWO (2) custom categories.
For each custom category, your answer must include the following:
- An explanation of the objectives of your analysis
- The custom category rule to fulfil the objective of your analysis
- A detailed explanation of the category rule syntax
- A discussion of the findings of your analysis and insights that you could unveil from these findings. Include relevant screenshots from SAS Visual Text Analytics.
Submission instructions
Please submit a word document to the Turnitin assessment submission link on Moodle.
Late submission will incur a penalty of 5% per hour or part thereof from the due date and time unless special consideration has been approved. An assignment is considered late if the requested format, such as hard copy or electronic copy, has not been submitted on time or where the ‘wrong’ assignment has been submitted.
Font should be no smaller than Arial 12, with standard margins. The spacing must be 1.5. Please note that material exceeding the word limit for each question will not be considered when grading the assignment. Please also note that screenshots do not count towards the word limit.
Instructions on how to load the data into SAS Visual Text Analytics
• Using your SAS Profile, log in and launch SAS Viya for Learners 3.5 from the VFL launch page:
• Access SAS Studio in VFL by clicking the Applications menu in the upper-left corner and select Develop SAS code.
• If it is not already selected, click the Explorer tab in the left bar of SAS Studio. Then click the triangle to the left of the file directory icon -- it will have a name beginning with pdcesx.
• Right-click casuser, and then click Upload Files. Do not create a subset folder under casuser; instead upload the file directly under casuser.
• A dialogue box will open. Click the (+) plus icon on the right, browse/select the file on your
Local drive that you want to upload. Recommended format for Data Set is: .sas7bdat, .sashdat or .csv file in UTF-8 format. The dataset coffee_amazon_reviews.csv provided on Moodle is a csv file in UTF-8 format.
• Add the file as attachment to the dialogue box. Click Upload to upload the filetocasuser in the ‘Home’ directory in SAS Studio.
• Data set/file is now available incasuser in the ‘Home’ directory in SAS Studio in VFL and can be found under Data Sources when selecting the dataset for SAS Visual Text Analytics project.
o Open Model Studio
o Create a new project
o Browse available datasets
o Select Data Sources to add the previously uploaded file as a data source
o Click on the drop-down arrow on the cas server.
o Click on the drop-down arrow on CASUSER.
o Highlight the data set/file(s) in CASUSER in SAS VA in VFL. Then click on the icon to load the file into memory
o Data set/file is now available in Memory in CASUSER in SAS VA in VFL.