Lecture class:
Yasumoto International Academic Park LT8 (YIA LT8)
Wednesdays 7:00pm-9:30pm
Laboratory Class sessions:
(Wed) 5:45pm-7:30pm - William M. W. Mong Engineering Building Room 602 (Lab)
(Wed) 7:30pm-9:15pm - William M. W. Mong Engineering Building Room 612 (Lab)
For Lab Class sessions, the allocation of students is as follows:
For students with surname's first letter less than or equal to "S", please attend
ERB602 at 5:45pm.
For students with surname's first letter greater than "S",
please attend ERB612 at 7:30pm.
For part-time students, you can attend the 7:30pm session.
Objectives
This course aims at providing a foundation in data mining concepts and
techniques. Data mining algorithms and technical implementation are covered.
E-commerce data mining applications are also mentioned.
There will be 2 assignments. Both assignments are required
to make use of Weka, a data mining package.
Students are also required to do a project conducting a data mining process.
For the project, students can use any data mining package or programming
languages.
Course Schedule and Content (ongoing updates)
Sept 4 (YIA LT8)
introduction; data pre-processing
Sept 11 (YIA LT8)
data pre-processing; decision tree learning
Sept 18 (Holiday)
No class
Sept 25 (YIA LT8)
decision tree learning; model evaluation; Weka installation
Oct 2 (Lab Class)
Lab Class sessions on Weka - installation, data preprocessing, simple
feature engineering, basic decision trees, explain Assignment 1
Oct 9 (Lab Class)
Lab Class sessions on Weka - More on feature engineering, More on
decision trees, Assignment 1 Q&A
Oct 16 (YIA LT8)
practical consideration, logistic regression
Oct 23 (YIA LT8)
loan assessment case study, neural networks
Oct 30 (YIA LT8)
clustering
Nov 6 (Lab Class)
Lab Class sessions on Weka - neural networks, clustering, explain Assignment 2
[Weka] Please follow the instruction on the website to install
the stable version (3.8) of Weka. It provides different links to suit different OS. Please select the one you are using (if possible,
download it before the lab class).
The bank marketing data set used in Weka notes: bank.csv
[Anaconda]
Please follow the instruction on the website to install Anaconda. It provides different links to suit different OS. Please select the one you are using (if possible, download it before the lab class).
The requirement of the project is to conduct data mining
analysis using a data mining package or program.
you can use any packages or any programming languages.
Data can be collected from public domains or from your company.
examples of available data:
The project can be done as an individual project or a team of at
most 2 students.
the expectation of the project work load and report is proportional
to the number of students.
The report should fully describe the work done for the whole data mining analysis, not just the end results.
Often, the whole data mining analysis may iterate the data mining processes several times, not just one-shot.
The process may include problem analysis, data collection, data preparation, data transformation,
mining methodologies, result analysis, lessons learned and so on. Therefore, presenting
merely the output diagrams will receive very low scores for the report.
Every student in a team needs to do the project presentation because the mark for project
presentation is different for each student in the same team.
The language of presentation is English.
If you do the project individually, the presentation time is 10 minutes.
If there are 2 members in the group, the presentation time for each
student is 8 to 10 minutes and the total presentation time is no more than 20 minutes.
Record and produce a video capturing the presentation, e.g. using Zoom.
Also, show your face during the presentation.
Store the video on the cloud that does not require password and permission. Therefore the video can be
accessible by public.
Then submit the cloud link in the Blackboard system.
The deadline for the presentation video is Dec 11.
The project report should be submitted to the Blackboard system. The
deadline for the report is Dec 16.
Reference Books
Data Mining - Concepts and Techniques, Jiawei Han, Jian Pei, Hanghang
Tong, 4th edition,
Elsevier Inc., 2023.
(e-book can be accessed online via CUHK library)
Data Mining : Practical Machine Learning Tools and Techniques,
Ian Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal,
Fourth edition. Amsterdam, : Elsevier 2017.
(e-book can be accessed online via CUHK library)
Data Science for Business: what you need to know about data mining and
data-analytic thinking, F. Provost and T. Fawcett, (O'Reilly)
(e-book can be accessed online via CUHK library)