Sketch Recognition Techniques

Get Complete Project Material File(s) Now! »

Chapter 2 Related Work

This chapter presents a review of sketch recognition research. It begins by summarising the current state of sketch recognition techniques focusing on the division of writing and drawing. Next, a review of the use of stroke features used in sketch recognition research is outlined. Following this, in Section 2.3, an overview of data collection techniques and existing datasets available for hand-drawn diagrams and the tools available to support this task is provided. Finally, a review of data mining tools and techniques is presented to provide a background for our data analysis methodolog

Sketch Recognition Techniques

Sketch tools generally include some form of recognition. Early sketch tools include the user interface design software Silk (Landay et al. 1996) and The Electronic Cocktail Napkin (Gross 1996) for sketching early designs (shown in Figure 10). Both these tools provide basic recognition of hand-drawn diagrams. There is also early work in sketching using digital whiteboards such as LiveBoard (Elrod et al. 1992) and Tivoli (Pedersen et al. 1993; Moran et al. 1997; Moran et al. 1998) (shown in Figure 11) which include gesture recognition for commands such as scrolling, page turning, delete, select and move, and allows the user to group text and shapes manually (Moran et al. 1997).
Rubine’s work (1991) in feature-based gesture recognition has been used by many other sketch recognition systems, including Silk (Landay et al. 1996). It involves using a linear classifier for single stroke ink recognition with 13 features. Rubine reported a 96.8% success rate. However, further experiments that re-implement Rubine’s algorithm have been lower: 86% (Plimmer 2004) and 84% (Young 2005). Despite this, his algorithm has been widely adopted (Landay et al. 1995; Damm et al. 2000; Lin et al. 2000; Chen et al. 2003; Plimmer et al. 2003b; Plimmer 2004; Chung et al. 2005; Young 2005; Freeman et al. 2007), mostly due to its ease of implementation, with various alterations to the feature set reported.
Recognition for many diagram domains has been explored, including CALI (Fonseca et al. 2002) for general shape recognition, mechanical engineering design tools (Stahovich et al. 1995; Sezgin et al.2001), Tahuti (Hammond et al. 2002) and Lank’s system (2000) for UML class diagrams and SketchNode (Plimmer et al. 2010) for graphs, as well as multi domain recognition tools, SketchREAD (Alvarado et al. 2004) and InkKit (Plimmer et al. 2007).
In this project, we are particularly looking at the problem of distinguishing between text and shapes as a first step to recognising sketched diagrams. This is a fundamental problem required to preserve a non-modal user interface similar to pen and paper (Plimmer 2004). When text and shape strokes are accurately divided, the symbols can be passed to specific handwriting and shape recognisers to continue the recognition process.
Two particular applications of dividers are freehand note-taking and hand-drawn diagrams. The research on sketched diagram recognition includes dividers, but has also addressed recognition of basic shapes and spatial relationships between diagram components. This project has drawn on the work from both applications of dividers

Sketch Diagram Recognition

In the area of sketch diagram recognition, many systems focus only on shapes (Rubine 1991; Fonseca et al. 2002; Leung et al. 2002; Yu et al. 2003; Szummer et al. 2004; Qi et al. 2005; Wobbrock et al. 2007; Paulson et al. 2008a). Character recognition is also a mature area of research. However, less attention has been given to the division of text and shapes, although they are both present in diagrams.
Lank et al (2000) designed a system for recognising hand-drawn UML diagrams. Their first step is to group strokes into glyphs using intersection tests and temporal context information, and then perform recognition. The glyphs are divided into writing and drawing, based primarily on bounding box size, as character glyphs are usually smaller than shapes. They report that they have not found writing to be misclassified using this method, but they have found small shapes to be misclassified as writing. The Tahuti system (Hammond et al. 2002), another tool for UML class diagrams also performs some division of shape and text strokes. Text is considered to be smaller than shape classes and contained by or close to a class. These domain-specific solutions for division only consider a small range of symbols and can use spatial context more reliably, such as Tahuti’s use of stroke location.
There are also domain- independent diagramming tools that have built-in dividers. InkKit (Freeman et al. 2007) is one such tool. The divider in InkKit has two phases. The first phase evaluates the stroke in isolation using Rubine’s algorithm (1991) with partially adapted features, trained on predefined writing and drawing samples. The second phase uses spatial context to further identify which class the stroke in question belongs to. This phase rests on the theory that strokes in close proximity to one another are usually from the same class. After text-shape division, the recognition process continues and results in the identification of domain components as shown in Figure 12.
Lineogrammer (Zeleznik et al. 2008) is another domain-independent sketch tool with a text-shape divider. Their approach is a variant of my previous work (Patel et al. 2007) using a decision tree which they have tuned themselves. Their divider is based on a set of heuristics examining size, geometry and spatial and temporal context. Handwriting is limited to a maximum of 2cm in height. Spatial context is used in a similar style to InkKit (Freeman et al. 2007) where strokes intersecting or close to one another belong to the same class. For isolated strokes, the ratio of stroke length to the number of cusps is used to identify cursive writing and a handwriting recogniser is used to classify any other text. Temporal information is used to classify strokes as text when sketched quickly and drawing when the input is slow.
These systems are predominantly rule-based, using various stroke features chosen heuristically to distinguish between writing and drawing.
More recent research has produced a small number of domain-independent dividers for distinguishing between writing and drawing in sketches. A variety of techniques has been used, ranging from rule-based dividers to decision trees and neural networks. All dividers are based on features. However, most studies have focused their development on one or two algorithms and rely on very limited feature sets.
In my previous work (Patel 2007; Patel et al. 2007) (referred to as “Divider 2007” in this thesis) we developed a domain-independent feature-based divider for shapes and text using a decision tree. This divider is unique in that it was developed using statistical analysis of a set of 46 stroke features; a much more comprehensive selection of features in comparison with other dividers. A decision tree was built that identified eight features as significant for distinguishing between shapes and text (Figure 13). The results on a test set showed a classification rate of 78.6% for text and 57.9% for shapes. Part of the test set was composed of musical notes which had a significant effect on this low classification rate. However, when evaluated against the Microsoft (2005) and InkKit (Freeman et al. 2007) dividers, it was able to correctly classify more strokes overall for the test set.
Bishop et al (2004) developed a feature-based divider that uses local stroke features and spatial and temporal context within a Multilayer Perceptron model (this is a type of neural network) and a Hidden Markov Model (HMM) to distinguish between text and shape strokes. The features used are described in Table 2 and Table 3.
They first consider the stroke in isolation using features from Table 2. These features involve simple point-based calculations and more complicated features using principle component analysis and stroke fragmentation. Stroke fragments are used as they believe that if the largest fragment is large (possibly representing the whole stroke) and it has a high length to width ratio (feature 4 in Table 2) then the stroke is considered to be a shape stroke. A Multilayer Perceptron model was trained using these feature vectors and produces a probability for each stroke as to whether it represents text or shapes

Chapter 1 Introduction
1.1 Motivation
1.2 Thesis Objectives
1.3 Thesis Outline .
1.4 Definition of Terms
Chapter 2 Related Work
2.1 Sketch Recognition Techniques
2.2 Features
2.3 Data Collection
2.4 Data Mining Tools and Techniques
2.5 Summary
Chapter 3 Methodology
3.1 Feature Search
3.2 Data Collection
3.3 Data Analysis
3.4 Evaluation
Chapter 4 Feature Search
4.1 Curvature
4.2 Density
4.3 Direction
4.4 Intersections
4.5 Pressure
4.6 Size
4.7 Spatial Context
4.8 Temporal Context
4.9 Time/Speed
4.10 Divider Results
4.11 New Features
4.12 Summary
Chapter 5 Data Collection
5.1 DataManager Requirements .
5.2 Usage Example
5.3 Implementation
5.4 Usability Study
5.5 Data Collection
Chapter 6 Data Analysis
6.1 Preliminary Analysis of Classifiers
6.2 Classifier Tuning
6.3 Feature Selection
6.4 Ensembles
6.5 Second Round Analysis
6.6 Computational Requirements
6.7 Summary
Chapter 7 Evaluation
7.1 Divider Implementation
7.2 Test Data
7.3 Results
7.4 Domain-Specific Divider
7.5 Summary
Chapter 8 Discussion
8.1 Features and Feature Search .
8.2 Data Collection and DataManager
8.3 Data Analysis
8.4 Evaluation
Chapter 9 Conclusions and Future Research
9.1 Conclusions
9.2 Future Research
References .
GET THE COMPLETE PROJECT