The source code includes the inference and the training algorithms of Web page segmentation model presented in the paper “Web Page Segmentation with Structured Prediction and its Application in Web Page Classification. SIGIR 2014.”

This program is granted free of charge for non-commercial research and education purposes. However you must obtain a license from the author to use it for commercial purposes.

To use the code:

  • You should put the folder “glpk” containing the ILP solver in the same directory as the executive learning and inference programs.
  • To run the training procedure, use the command: svm_pageseg_learn -c 5 -e 0.5 -v 4 test_data/train_d9 test_data/model/d9.model
  • To run the inference procedure, use the command: svm_pageseg_classify test_data/train_d9 test_data/model/d9.model test_data/test/out
  • For more details, get the help information with svm_pageseg_learn - and svm_pageseg_classify -.

Download: Source code for Web page segmentation