(Downloads - 0)
For more info about our services contact : help@bestpfe.com
Table of contents
Abstract
Acknowledgements
Contents
List of Figures
List of Tables
1 Web Page Segmentation and Evaluation
1.1 Preliminars
1.1.1 Web applications
1.1.2 Rendering
1.1.3 Rendered DOM
1.1.4 Element positioning
1.2 Web Page Characteristics
1.2.1 Web page characteristics from the rendered DOM
1.2.2 Characteristics related to the website
1.2.3 Glossary
1.3 Web page segmentation
1.3.1 Concepts
1.3.2 Notation
1.3.3 Top-down versus bottom-up
1.3.4 Basic Approaches
1.3.5 Hybrids Approaches
1.3.6 Conclusion on Web page segmentation algorithms
1.3.7 Document processing and Web page segmentation
1.3.8 Summary Table
1.3.9 Discussion
1.4 Segmentation evaluation
1.4.1 Classication of evaluation methods
1.4.2 Segmentation correctness evaluation
1.4.3 Correctness measures in scanned document segmentation
1.4.4 State of the art on evaluating Web page segmentation
1.4.5 Summary table
1.4.6 Discussion
2 Block-o-Matic (BoM): a New Web Page Segmenter
2.1 Preliminars
2.2 Overview
2.3 Fine-grained segmentation construction
2.4 Composite block and ow detection
2.5 Merging blocks
2.6 Discussion
3 Segmentation evaluation model
3.1 Model adaptation
3.2 Representation of segmentation
3.2.1 Absolute representation of a segmentation
3.2.2 Normalized Segmentation Representation
3.2.3 Block importance
3.3 Representation of the evaluation
3.3.1 Measuring text coverage
3.3.2 Measuring block correspondence
3.4 Example
3.4.1 Computing the importance
3.4.2 Computing text coverage
3.4.3 Computing block correspondence
3.5 Discussion
4 Experimentation
4.1 Overview
4.2 Block descriptors
4.3 Tested segmentation algorithms
4.3.1 BF (BlockFusion)
4.3.2 BoM (Block-o-Matic)
4.3.3 VIPS (Vision-based Web Page Segmentation)
4.3.4 jVIPS (Java VIPS)
4.3.5 Summary
4.4 Dataset construction
4.4.1 Dataset organization
4.4.2 Ground truth construction
4.5 Experiments and results
4.5.1 Setting the stop condition parameters
4.5.2 Setting the thresholds
4.5.3 Computing block correspondence
4.5.4 Computing text coverage
4.6 Discussion
5 Applications
5.1 Pagelyzer
5.1.1 How does it work?
5.1.2 Implementation
5.1.3 Practical application
5.1.4 Perspectives and outlook
5.2 Block-based migration of HTML4 standard to HTML5 standard
5.2.1 Introduction
5.2.2 Proposed solution
5.2.3 Experiments
5.2.4 Results
5.2.5 Perspectives and outlook
A HTML5 Content Categories
B Semantic HTML5 elements
C Web page segmentation evaluation metrics
C.1 Adjusted Rand Index
C.2 Normalized Mutual Information
C.3 Dunn Index
C.4 Nested Earth Mover’s Distance
C.5 Precision, Recall and F1 score
D Web Segmentation approaches details
D.1 Text-based
D.2 Vision-based
Bibliography




