ECLT5810/SEEM5750 E-Commerce Data Mining Technique
Teacher
Prof. LAM Wai
E-mail:
wlam@se.cuhk.edu.hk
Venue
Lecture class:
Yasumoto International Academic Park LT4 (YIA LT4)
Tuesdays 7:00pm-9:30pm
Laboratory Class sessions:
(Tue) 5:45pm-7:30pm - William M. W. Mong Engineering Building Room 602 (Lab)
(Tue) 7:30pm-9:15pm - William M. W. Mong Engineering Building Room 612 (Lab)
Tutors
1. ZHOU Chulun
Email: clzhou@link.cuhk.edu.hk
2. XU Ting
Email: xut0092@gmail.com
|
Announcements
-
For Lab Class sessions, the allocation of students is as follows:
For students with surname's first letter less than or equal to "W", please attend
ERB602 at 5:45pm.
For students with surname's first letter greater than "W",
please attend ERB612 at 7:30pm.
For part-time students, you can attend the 7:30pm session.
Objectives
This course aims at providing a foundation in data mining concepts and
techniques. Data mining algorithms and technical implementation are covered.
E-commerce data mining applications are also mentioned.
|
There will be 2 assignments. Both assignments are required
to make use of Weka, a data mining package.
|
Students are also required to do a project conducting a data mining process.
For the project, students can use any data mining package or programming
languages.
|
Course Schedule and Content (ongoing updates)
Sep 2 (YIA LT4) |
introduction; data pre-processing |
Sep 9 (YIA LT4)
|
data pre-processing; decision tree learning |
Sep 16 (YIA LT4)
|
decision tree learning; model evaluation; Weka installation |
Sep 23 (Lab Class)
|
Lab Class sessions on Weka - installation, data preprocessing, simple
feature engineering, basic decision trees, explain Assignment 1 |
Sep 30 (Lab Class)
|
Lab Class sessions on Weka - More on feature engineering, More on
decision trees, Assignment 1 Q&A |
Oct 7 (Holiday)
|
No class
|
Oct 14 (YIA LT4)
|
practical consideration, logistic regression |
Oct 21 (YIA LT4)
|
loan assessment case study, neural networks |
Oct 28 (YIA LT4)
|
clustering |
Nov 4 (Lab Class)
|
Lab Class sessions on Weka - neural networks, clustering, explain Assignment 2 |
Nov 11 (YIA LT4)
|
logistic regression learning; Bayesian classification learning |
Nov 18 (YIA LT4)
|
text classification; association rule mining; Python libraries for data science |
Nov 25 (Lab Class)
|
Lab Class sessions on demo for Python libraries for data science; text classification |
|
Grading
- assignments : 45%
|
- presentation: 15%
|
- project report: 40%
|
Lecture Notes
[introduction]
ref: Provost's book chapter 1 and Han's book chapter 1
[Industry
Case - How Big Data Analysis helped Walmart]
[Industry
Case - 5 Ways Amazon and Alibaba Use AI and Data Mining]
[data preprocessing]
ref: Han's book chapter 2
[classification and decision tree induction]
ref: Han's book chapter 8.1 and 8.2
[decision tree sample application - Boston Globe]
[decision tree sample
application - Analysis of Covid-19]
Lab Class Materials
[Weka] Please follow the instruction on the website to install
the stable version (3.8) of Weka. It provides different links to suit different OS. Please select the one you are using (if possible,
download it before the lab class).
The bank marketing data set used in Weka notes: bank.csv
[Anaconda]
Please follow the instruction on the website to install Anaconda. It provides different links to suit different OS. Please select the one you are using (if possible, download it before the lab class).
Assignments
Project
The requirement of the project is to conduct data mining
analysis using a data mining package or program.
you can use any packages or any programming languages.
Data can be collected from public domains or from your company.
examples of available data:
The project can be done as an individual project or a team of at
most 2 students.
the expectation of the project work load and report is proportional
to the number of students.
The report should fully describe the work done for the whole data mining analysis, not just the end results.
Often, the whole data mining analysis may iterate the data mining processes several times, not just one-shot.
The process may include problem analysis, data collection, data preparation, data transformation,
mining methodologies, result analysis, lessons learned and so on. Therefore, presenting
merely the output diagrams will receive very low scores for the report.
Every student in a team needs to do the project presentation because the mark for project
presentation is different for each student in the same team.
The language of presentation is English.
If you do the project individually, the presentation time is 10 minutes.
If there are 2 members in the group, the presentation time for each
student is 8 to 10 minutes and the total presentation time is no more than 20 minutes.
Record and produce a video capturing the presentation, e.g. using Zoom.
Also, show your face during the presentation.
Store the video on the cloud that does not require password and permission. Therefore the video can be
accessible by public.
Then submit the cloud link in the Blackboard system.
The deadline for the presentation video is Dec 11.
The project report should be submitted to the Blackboard system. The
deadline for the report is Dec 16.
Reference Books
Data Mining - Concepts and Techniques, Jiawei Han, Jian Pei, Hanghang
Tong, 4th edition,
Elsevier Inc., 2023.
(e-book can be accessed online via CUHK library)
|
Data Mining : Practical Machine Learning Tools and Techniques,
Ian Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal,
Fourth edition. Amsterdam, : Elsevier 2017.
(e-book can be accessed online via CUHK library)
|
Data Science for Business: what you need to know about data mining and
data-analytic thinking, F. Provost and T. Fawcett, (O'Reilly)
(e-book can be accessed online via CUHK library)
|