BU.330.740 Large Scale Computing on the Cloud
Instruction and Rubric for Team Project
You will work in teams of 4-5 members, and develop a data mining proposal for prospective opportunity that would lend the client to a big data solution, using models/applications such as:
1. Frequent itemset mining/association rule mining;
2. Text mining, including term frequency mining, co-occurrence analysis, topic modeling;
3. Sentiment analysis; and
4. Recommender engine.
The client is an organization/business area that one of your team members is associated with as a current employee, past employee, or by some other connection (at least one member should be familiar with their work processes). Develop a proposal that describes how your client organization can benefit from implementing the recommended solution to address their problem/opportunity. Other data mining models may also be utilized, and instructor’s approval is required.
Deliverable #1: By the day of Module 3, please submit the names of your group members to TA.
Deliverable #2: On the day of Module 7, one of your team members will submit your final report on behalf of the team through Canvas, together with all the supplemental materials (preliminary datasets and results). Grading will be based on the items detailed below. Use external sources where appropriate, and provide clear citation and bibliography. All group members should contribute to the analysis and presentation.
Business Problem/Opportunity (20%)
· Identify, define, and motivate the business problem or big data opportunity that you are addressing.
· Describe product/services being enhanced/substituted by your analytics solution.
Data Set (25%)
· Identify and describe the data set you propose to understand the business processes/operations/customers/system users better.
· Estimate the size of the data involved in your business case.
· Provide the overview of how the data can be collected if it is a new dataset; or
· Provide arguments why the dataset can help with the business context if it is an existing one.
· Preliminary datasets (optional), can be simulated or from other existing businesses/problems.
Data Mining Methods (25%)
· Discuss potential big data technologies or cloud computing tools to store and process your data.
· Describe (precisely) your proposed data mining methods.
· Describe different stages of your data analytics.
· Preliminary results (optional) can be used to demonstrate the feasibility of your proposal.
Readability/Style/Mechanics (10%)
· Follow the format guideline provided in the end of this document.
· Structure the report in a clear and logical manner.
· Use identified and credible references and follow citations/references guidelines.
· Use non-textual reader-friendly tools such as tables and figures as appropriate.
Presentation/Communication (20%)
· Each person on the team must present some part of the briefing. The length of time that each team has for the oral presentation will be provided in class.
· Reading from a script is prohibited. Points will be deducted if any team member reads instead of presenting.
Peer evaluation
· A peer evaluation will be administered at the end of the project. Each team member will be evaluated by all other members on a scale of 1 to 10, where 10 means excellent contribution.
· If all team members contribute to the project equally, they will share the same grade for the project. Otherwise, the peer evaluation results will affect your final score for the project.
· For example, if your team effort score is 25, and you receive an average peer evaluation score of 8 out of 10, then your final project grade will be 25*80%=20.
Format guideline:
· The project report should be in the format of a PPT deck.
· Use and submit a PPT deck that will consist of the slides your team will use for its presentation, plus annotations using the Notes feature of PowerPoint.
· Use the Notes feature of PPT to expand on the slide content, and to explain details of your project on the suggested components.
· Maximum number of slides is 12.
· Your first slide must be a title slide that includes the names of all team members.