Department of Computer Science
CS 484: Data Mining (Sections 001 and 002)
Spring 2024
Course Description
Concepts and techniques in data mining and multidisciplinary applications. Topics include data cleaning and transformation; classification and predictive modeling; clustering; association analysis; performance analysis and scalability; data mining in advanced database systems, including text, audio, and images; and emerging themes and future challenges. Students will gain hands-on experience and learn how to implement and apply various data mining algorithms.
Class Time and Location
Section 001: Tuesday/Thursday 1:30-2:45pm
Exploratory Hall L003
Section 002: Tuesday/Thursday 3:00-4:15m
Horizon Hall 2016
Instructor
Dr. Jessica Lin
Email: jessica [AT] gmu [DOT] edu
Office Hours: Tuesday/Thursday 11am-12pm
Teaching Assistant
Madhukar Vongala
Prerequisites
Formally: Grade of C or better in CS 310 (Data Structures) and STAT 344 (Probability and Statistics) or equivalent.
More specifically: Programming experience in Python, or willing to learn. Experience in Java or C++ will work as well, but the assignments will use the Python framework. Students should be familiar with basic probability and statistics concepts, and linear algebra. Please expect lots of programming in the assignments.
Grading
Programming Assignments: 45%
Quizzes: 20%
Final Exam: 30%
Class participation/Activities: 5%
Extra credit: competition winners for homework
Assignments
There will be 4 competition-style programming assignments in Python. Competition winners will get 1% extra credit added to the final grade. You are allowed 3 days of grace period past the deadline, with 10% penalty each day. You will receive 0 credit if the homework is not submitted by then. Note that internet trouble is not a valid excuse for subbmitting late. Therefore, you should plan to submit a few hours early to avoid last minute technical difficulties.
Exams
There will be quizzes throughout the semester covering lectures and readings, and one final exam. The purpose of the quizzes is to help you stay on track of the lecture materials, so they are typically short and easier compared to the final exam. The final exam is comprehensive. All exams are closed-book, and they must be taken at the scheduled time, unless prior arrangement has been made with the instructor. Missed exams cannot be made up. The lowest quiz grade will be dropped.
Class Participation
You will be able to earn class participation credit through in-class activities.
Textbooks
Required: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (click on the link for the companion website)
Topics
Ch.1: Introduction
Ch.2: Data
Ch.3: Classification
· Ch.4: Classification: Alternative Techniques
· Ch.5: Association Analysis: Basic Concepts and Algorithms
· Ch.6: Association Analysis: Advanced Concepts
· Ch.7: Cluster Analysis: Basic Concepts and Algorithms
· Ch.8: Cluster Analysis: Additional Issues and Algorithms
· Ch.9: Anomaly Detection
· Recommendation Systems
Honor Code Statement
The GMU Honor Code is in effect at all times. In addition, the CS Department has further honor code policies regarding programming projects, which are detailed here. Some examples can be found here . Any deviation from the GMU or the CS department Honor Code is considered an Honor Code violation. All assignments for this class are individual unless otherwise specified. ChatGPT or other Generative-AI models may NOT be used in this course as an assistant in the assignments.
Learning Disability Accommodation
If you have a documented learning disability or other condition which may affect academic performance, make sure this documentation is on file with the Office of Disability Services and then discuss with the professor about accommodations.