ECLT5810/SEEM5750 E-Commerce Data Mining Technique


Teacher

Prof. LAM Wai
E-mail: wlam@se.cuhk.edu.hk

Venue

Lecture class:
Yasumoto International Academic Park LT8 (YIA LT8)
Wednesdays 7:00pm-9:30pm

Laboratory Class sessions:
(Wed) 5:45pm-7:30pm - William M. W. Mong Engineering Building Room 602 (Lab)
(Wed) 7:30pm-9:15pm - William M. W. Mong Engineering Building Room 612 (Lab)

Tutors

1. LI Shuaiyi
Email: Li_Shuaiyi2021@outlook.com
2. ZHOU Chulun
Email: clzhou@link.cuhk.edu.hk

Announcements

Objectives
This course aims at providing a foundation in data mining concepts and techniques. Data mining algorithms and technical implementation are covered. E-commerce data mining applications are also mentioned.
There will be 2 assignments. Both assignments are required to make use of Weka, a data mining package.
Students are also required to do a project conducting a data mining process. For the project, students can use any data mining package or programming languages.

Course Schedule and Content (ongoing updates)
Sept 4 (YIA LT8) introduction; data pre-processing
Sept 11 (YIA LT8) data pre-processing; decision tree learning
Sept 18 (Holiday) No class
Sept 25 (YIA LT8) decision tree learning; model evaluation; Weka installation
Oct 2 (Lab Class) Lab Class sessions on Weka - installation, data preprocessing, simple feature engineering, basic decision trees, explain Assignment 1
Oct 9 (Lab Class) Lab Class sessions on Weka - More on feature engineering, More on decision trees, Assignment 1 Q&A
Oct 16 (YIA LT8) practical consideration, logistic regression
Oct 23 (YIA LT8) loan assessment case study, neural networks
Oct 30 (YIA LT8) clustering
Nov 6 (Lab Class) Lab Class sessions on Weka - neural networks, clustering, explain Assignment 2
Nov 13 (YIA LT8) logistic regression learning; Bayesian classification learning
Nov 20 (YIA LT8) text classification; association rule mining; Python libraries for data science
Nov 27 (Lab Class) Lab Class sessions on demo for Python libraries for data science; text classification

Grading
- assignments : 45%
- presentation: 15%
- project report: 40%

Lecture Notes

  • [introduction] ref: Provost's book chapter 1 and Han's book chapter 1
  • [data preprocessing] ref: Han's book chapter 2
  • [Industry Case - 5 Ways Amazon and Alibaba Use AI and Data Mining]
  • [classification and decision tree induction] ref: Han's book chapter 8.1 and 8.2
  • [decision tree sample application - Boston Globe]
  • [decision tree sample application - Analysis of Covid-19]
  • [classification evaluation] ref: Provost's book chapter 5
  • [Weka Installation]
  • [Industry Case from Ctrip]
  • [practical considerations for classification learning] ref: Provost's book chapter 5
  • [loan assessment case study article]
  • [loan assessment case study lecture note]
  • [Industry Case - Real-Life and Business Applications of Neural Networks]
  • [introduction to neural network] ref: Han's book chapter 9.2
  • [clustering] ref: Han's book chapter 2,4, 10.1 and 10.2
  • [logistic regression for classification]
  • [Bayesian classification] ref: Han's book chapter 8.3
  • [Industry Case from Deloitte - Credit Scoring - Case Study in Data Analytics]
  • [association rule] ref: han's book chapter 6.1 and 6.2
  • [sentiment analysis and Bayesian approach for text classification] ref: "Introduction to Information Retrieval", Manning
  • [Bayesian spam filtering for Outlook]
  • [Python Libraries for Data Science]
  • Lab Class Materials

  • [Weka] Please follow the instruction on the website to install the stable version (3.8) of Weka. It provides different links to suit different OS. Please select the one you are using (if possible, download it before the lab class).
  • The bank marketing data set used in Weka notes: bank.csv
  • Lab Class 1: Tutorial Notes (We will need to use the bank.csv data given above.) recording
  • Lab Class 2: Tutorial Notes (We will need to use the bank.csv data given above.) recording
  • Lab Class 3: Tutorial Notes (We will need to use the bank-additional.csv, bank-additional-test.arff and clustering_coordinate.csv.) recording
  • [Anaconda] Please follow the instruction on the website to install Anaconda. It provides different links to suit different OS. Please select the one you are using (if possible, download it before the lab class).
  • [Lab Class 4] ( [python demo] is provided)
  • Assignments

  • [Assignment 1] (You need to use the data: bank-additional.csv bank-new-clients.csv) [Solution reference]
  • [Assignment 2] (You need to use the data: bank-additional.csv bank-new-clients.csv)
  • Project

  • The requirement of the project is to conduct data mining analysis using a data mining package or program. you can use any packages or any programming languages.
  • Data can be collected from public domains or from your company. examples of available data:
  • The project can be done as an individual project or a team of at most 2 students. the expectation of the project work load and report is proportional to the number of students.
  • The report should fully describe the work done for the whole data mining analysis, not just the end results. Often, the whole data mining analysis may iterate the data mining processes several times, not just one-shot. The process may include problem analysis, data collection, data preparation, data transformation, mining methodologies, result analysis, lessons learned and so on. Therefore, presenting merely the output diagrams will receive very low scores for the report.
  • Every student in a team needs to do the project presentation because the mark for project presentation is different for each student in the same team. The language of presentation is English. If you do the project individually, the presentation time is 10 minutes. If there are 2 members in the group, the presentation time for each student is 8 to 10 minutes and the total presentation time is no more than 20 minutes.
  • Record and produce a video capturing the presentation, e.g. using Zoom. Also, show your face during the presentation.
  • Store the video on the cloud that does not require password and permission. Therefore the video can be accessible by public. Then submit the cloud link in the Blackboard system. The deadline for the presentation video is Dec 11.
  • The project report should be submitted to the Blackboard system. The deadline for the report is Dec 16.
  • Reference Books

    Data Mining - Concepts and Techniques, Jiawei Han, Jian Pei, Hanghang Tong, 4th edition, Elsevier Inc., 2023.
    (e-book can be accessed online via CUHK library)
    Data Mining : Practical Machine Learning Tools and Techniques, Ian Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, Fourth edition. Amsterdam, : Elsevier 2017.
    (e-book can be accessed online via CUHK library)
    Data Science for Business: what you need to know about data mining and data-analytic thinking, F. Provost and T. Fawcett, (O'Reilly)
    (e-book can be accessed online via CUHK library)

    Evaluation

    CSV File Upload
    Accuracy: