CS 422 Data Mining

Home Syllabus Slides References Links
  CS 422 / Home

CS 422: Data Mining
Fall Semester 2010

Instructor
   Dr. David Grossman, grossman@iit.edu
   Office: Stuart Building 228D

Teaching Assistants
   Yuval Merhav, yuval.mer@gmail.com

Grading:
The final grade will be determined as follows:
Final Project / Presentation....................20%
Programming Projects (3)......................10/15/20 = 45%
Test #1......................................................15%
Test #2......................................................20%

Academic Integrity:
Each member of this course bears responsibility for maintaining the highest standards of academic integrity. All breaches of academic integrity must be reported immediately. Any violation of the academic integrity policy will result in the student receiving a failing grade for the course.

Programming Assignments
Programming assignments are designed to improve understanding of core concepts by implementing them. Assignments will be as follows:

A1: Getting Familiar with Mahout
A2: Voting algorithms for Recommenders
A3: Voting algorithms for Categorization

Project / Research Presentations:
The presentation is an important part of the course as well. Students will be given a set of research papers and they will select a paper. Depending upon class size in a couple of weeks, students will be asked to form a group of size 2,3, or 4 (no more than four). A written summary of the paper will be due one week prior to the presentation. Allocate significant time to read the paper many times and to work to identify the best means of presenting the paper. Presentations must be extremely well rehearsed -- failure to properly prepare for the presentation will result in an extremely poor grade on the presentation. An ideal presentation will include a robust implementation of the algorithm being discussed in the mahout framework. It should use hadoop as well.

Late Assignment Policy:
Assignments and presentations must be submitted on or before their due date and time. No late assignments will be assigned a grade. It is important that students understand this because an ontime submission of a partial assignment is usually worth far more than nothing.
Requests for extensions will be routinely denied -- students are strongly encouraged to avoid wasting the professors time by asking for an extension, but in the event that a request for an extension occurs, it will be quickly denied and the student will be asked why they chose not to read this policy.

Class Participation:
Students who actively participate in class and stay current with the reading assignments will receive consideration should their final grades be borderline.

Schedule:
8/23 Introduction. Give out A1.
8/30 Recommendation Algorithms. Give out A2.
9/6 Labor Day -- No class. A1 Due.
9/13 Evaluation of Recommenders
9/20 Scalable Recommenders
9/27 Classifiers: Naive Bayes, Decision Trees, SVM.
10/4 Scalable Classifiers: A2 Due. Give out A3.
10/11 Fall Break -- No class
10/18 Test #1
10/25 Assocation Rules. A3 Due.
11/1 Clustering Algorithms: K-Means, LDA
11/8 Scalable Clustering.
11/15 Paper Presentation Day 1
11/22 Paper Presentation Day 2
11/29 Paper Presentaiton Day 3
12/6 Test #2