ECLT5810/SEEM5750 E-Commerce Data Mining Technique


Teacher

Prof. LAM Wai
E-mail: wlam@se.cuhk.edu.hk

Venue

Lecture class:
Yasumoto International Academic Park LT4 (YIA LT4)
Tuesdays 7:00pm-9:30pm

Laboratory Class sessions:
(Tue) 5:45pm-7:30pm - William M. W. Mong Engineering Building Room 602 (Lab)
(Tue) 7:30pm-9:15pm - William M. W. Mong Engineering Building Room 612 (Lab)

Tutors

1. ZHOU Chulun
Email: clzhou@link.cuhk.edu.hk
2. XU Ting
Email: xut0092@gmail.com

Announcements

Objectives
This course aims at providing a foundation in data mining concepts and techniques. Data mining algorithms and technical implementation are covered. E-commerce data mining applications are also mentioned.
There will be 2 assignments. Both assignments are required to make use of Weka, a data mining package.
Students are also required to do a project conducting a data mining process. For the project, students can use any data mining package or programming languages.

Course Schedule and Content (ongoing updates)
Sep 2 (YIA LT4) introduction; data pre-processing
Sep 9 (YIA LT4) data pre-processing; decision tree learning
Sep 16 (YIA LT4) decision tree learning; model evaluation; Weka installation
Sep 23 (Lab Class) Lab Class sessions on Weka - installation, data preprocessing, simple feature engineering, basic decision trees, explain Assignment 1
Sep 30 (Lab Class) Lab Class sessions on Weka - More on feature engineering, More on decision trees, Assignment 1 Q&A
Oct 7 (Holiday) No class
Oct 14 (YIA LT4) practical consideration, logistic regression
Oct 21 (YIA LT4) loan assessment case study, neural networks
Oct 28 (YIA LT4) clustering
Nov 4 (Lab Class) Lab Class sessions on Weka - neural networks, clustering, explain Assignment 2
Nov 11 (YIA LT4) logistic regression learning; Bayesian classification learning
Nov 18 (YIA LT4) text classification; association rule mining; Python libraries for data science
Nov 25 (Lab Class) Lab Class sessions on demo for Python libraries for data science; text classification

Grading
- assignments : 45%
- presentation: 15%
- project report: 40%

Lecture Notes

  • [introduction] ref: Provost's book chapter 1 and Han's book chapter 1
  • [Industry Case - How Big Data Analysis helped Walmart]
  • [Industry Case - 5 Ways Amazon and Alibaba Use AI and Data Mining]
  • [data preprocessing] ref: Han's book chapter 2
  • [classification and decision tree induction] ref: Han's book chapter 8.1 and 8.2
  • [decision tree sample application - Boston Globe]
  • [decision tree sample application - Analysis of Covid-19]
  • Lab Class Materials

  • [Weka] Please follow the instruction on the website to install the stable version (3.8) of Weka. It provides different links to suit different OS. Please select the one you are using (if possible, download it before the lab class).
  • The bank marketing data set used in Weka notes: bank.csv
  • [Anaconda] Please follow the instruction on the website to install Anaconda. It provides different links to suit different OS. Please select the one you are using (if possible, download it before the lab class).
  • Assignments

    Project

  • The requirement of the project is to conduct data mining analysis using a data mining package or program. you can use any packages or any programming languages.
  • Data can be collected from public domains or from your company. examples of available data:
  • The project can be done as an individual project or a team of at most 2 students. the expectation of the project work load and report is proportional to the number of students.
  • The report should fully describe the work done for the whole data mining analysis, not just the end results. Often, the whole data mining analysis may iterate the data mining processes several times, not just one-shot. The process may include problem analysis, data collection, data preparation, data transformation, mining methodologies, result analysis, lessons learned and so on. Therefore, presenting merely the output diagrams will receive very low scores for the report.
  • Every student in a team needs to do the project presentation because the mark for project presentation is different for each student in the same team. The language of presentation is English. If you do the project individually, the presentation time is 10 minutes. If there are 2 members in the group, the presentation time for each student is 8 to 10 minutes and the total presentation time is no more than 20 minutes.
  • Record and produce a video capturing the presentation, e.g. using Zoom. Also, show your face during the presentation.
  • Store the video on the cloud that does not require password and permission. Therefore the video can be accessible by public. Then submit the cloud link in the Blackboard system. The deadline for the presentation video is Dec 11.
  • The project report should be submitted to the Blackboard system. The deadline for the report is Dec 16.
  • Reference Books

    Data Mining - Concepts and Techniques, Jiawei Han, Jian Pei, Hanghang Tong, 4th edition, Elsevier Inc., 2023.
    (e-book can be accessed online via CUHK library)
    Data Mining : Practical Machine Learning Tools and Techniques, Ian Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, Fourth edition. Amsterdam, : Elsevier 2017.
    (e-book can be accessed online via CUHK library)
    Data Science for Business: what you need to know about data mining and data-analytic thinking, F. Provost and T. Fawcett, (O'Reilly)
    (e-book can be accessed online via CUHK library)