This report will draw on resources from various existing fields for methodology, theory and inspiration. This section will give an overview of these fields, with an emphasis on how they relate to this report.
Pattern Recognition is a broad, vaguely defined field that attempts to find interesting correlations between and within sets of data. It spans a number of subfields and is primarily considered in light of its applications in other fields, such as computer science, physics, neurobiology, psychology, engineering, statistics, mathematics or cognitive science (Pal, 2001 p. 2).
Pattern Recognition attempts to emulate in computers the human ability to make accurate distinctions between objects and concepts based on vague and unreliable data. As movement analysis deals with highly situational and flexible behaviour, it is a prime example of the kind of discrete yet relational data set that makes general solutions to the problem of Pattern Recognition so elusive (Jain, 1999a p. 2).
Granular Computing is an emerging field attempting to consolidate and extrapolate on standards for grouping related atomic variables into lower resolution components. It builds on the view that human perception and cognition intuitively focus on that level of detail in a given system which matches potential, known patterns, and attempts to describe these levels in terms of fundamental information granules (Pedrycz, 2007).
A granule is defined as a meaningful abstraction of data, a pattern that emerges when considering a certain level of resolution, but is either lost in noise or in information entropy when viewed at a higher or lower resolution respectively. An example is the proverbial forest that can’t be seen for the trees.
Common terms to describe granules are large and small, coarse and fine, low and high resolution, low and high granularity. These refer to examining data in large and small sets; building on the previous example the forest would be large, coarse or low while the trees would be small, fine or high.
Granules are not universal, but arbitrarily defined for each instance, which gives rise to two separate subdisciplines; Granule Construction and Granule Calculation (Yao, 2000; Pedrycz, 2007). These deal with the identification of atomic descriptors in given patterns; and the use of these in describing more complex variations, respectively.
Data Mining & Knowledge Discovery in Databases
The field of Data Mining is concerned with methods for distilling interesting, new information from large sets of data (Hegland, 2003 p. 5). The process can be broken down into three steps (Fayyad, 1997 p. 102):
Normalize, refine or complement the data in a pre-processing stage
Identify and extract patterns
Analyze the relevance of the results
Data mining is sometimes considered a subset or a step of a broader subject called Knowledge Database Discovery (KDD) (Adriaans, 1996 p. 5; Fayyad, 1997 p. 102), in which case its scope is limited to just identifying and extracting patterns. In many publications the two are considered synonymous (Adriaans, 1996 p. 5; Hegland, 2003), but in this report we will use the former definition.
As previously written, pattern recognition is a popular field with a wide variety of applications. Substantial research has been done in these areas, especially over the past few decades when new business practices and standards, coupled with the rise in ubiquity and capability of hardware, has significantly increased the amount of data available to commercial and administrative entities, while intuitive comprehension has understandably decreased (Adriaans, 1996 p. 2; Olafsson, 2006). Areas like cluster analysis employ multiple parameters to group features into multidimensional categories, where computers excel, as arbitrary numbers of dimensions hold a similar complexity to them, but humans find it difficult, if not impossible, to visualize connections in hyperdimensional space (Höppner, 1999 p. 1; Olafsson, 2006; Fayyad, 1997 p. 101).
Focus has therefore been put on Machine Learning, to strip as much of the interpretation away from the human aspect and allow autonomous programs to interpret the data themselves and output the correlations they find. The benefit of this approach is the aforementioned comprehension and speed; computers are much better at finding connections between entities with hyperdimensional parameters, and are capable of processing far greater sets of data. The disadvantage is the uncertainty of the results. Machine learning approaches are good at finding correlations, but a similarity in one or more aspects is not inherently interesting, nor is its relevance obvious (Fayyad, 1997 p. 102-103; Kodratoff, 2001 p. 15). If the human operator cannot understand the connection, the data is little more than additional noise in the system.
There are several ways to deal with this uncertainty. Some approaches consider it a part of the recognition criteria to link the correlations to previously explored and established patterns (Jain, 1999a p. 3), whereas some limit the scope of the search by selecting a suitable subset of dimensions to use for clustering (Fayyad, 1997 p. 101).
Another approach to addressing the problem is a combination of the two, as described in works on Granular Computing, namely breaking down the description of types of patterns into so called granules, which will both limit the number of dimensions and make them more interesting to an observer.
A similar method to detect anomalies in ship movements was proposed by Ekman and Holst (Ekman, 2003) in their SICS evaluation for SaabTech. In their report, they propose that a set of basic sensory input about a ship can be used in conjunction with adjacent ships, statistics of area and changes over time to satisfactorily describe interesting, anomalous behaviour.
This report aims to explore Ekman and Holst’s proposition by identifying the granules required for describing the defining aspects of movement patterns utilized by different ship types. We will concentrate on a level of granularity that minimizes the number of dimensions needed, but is fine enough to not miss interesting behaviour. We will also assume that the data we have to draw from will be limited to a set of coordinates with related timestamps.
The purpose of the implementation is to provide a way to define granules, as well as an interface for matching granules to ship movements from groups of ships of similar type. As the result of this report is meant to be incorporated into the IBD at Saab Microwave Technologies, the application will need to be built using its interface, as well as access its databases of stored ship movements. It will utilize the existing graphical interface of the IBD and will be written in Java.
The implementation will be divided into several components. The three main components will be:
Ship type group data
Ship type group data
The ship type group data refers to the data that will be collected from the IBD AIS database, and will be sorted in several groups based on type. The groups are selected from the AIS registry based on uniqueness of movement patterns and prevalence in the Gothenburg port area. The groups selected are:
Pleasure / private ships
Fishing ships have distinct movement patterns that are usually centered on small areas with many sharp turns and short, jerky movements. This behaviour is called trawling, where the boats attempt to cover as much of a fish-rich area as possible.
Cargo ships are characterized by large, slow movements with long acceleration stretches, few, wide turns and generally stable bearings. This behaviour is indicative of high fuel consumption in maneuver adjustments.
Passenger ships keep within a defined area, going back and forth with few deviations between two or more points. Their repetitive movement patterns are easily identified.
Pilot ships move much like fishing ships, but within a smaller area.
Pleasure / private ships are any non-commercial ships with an AIS tracker. They do not follow any standardized patterns but are useful as a comparison since they generally exhibit certain capabilities such as speed and high turn rate.
Sailing ships have some limitations that make them interesting as a comparison to the other groups.
Rescue ships move similarly to pilot ships and fishing ships, going long or small distances before occupying an area for a certain amount of time.
Ship data of the selected type will be extracted from the IBD AIS database at SAAB using SQL queries and stored in custom data types
The Granule Library stores the definitions of granules, defined in a way that allows them to be matched to the format used to store the movement data. The format must therefore be designed along with the Granule Library, or with the interface of it in mind.
The matching algorithm or functions will find occurrences of granules described in the Granule Library in the ship data, and provide feedback regarding occurrence of granules in a ship’s movements, as well as metadata collected during the comparison.
The possible movement range of any given ship is significantly limited and well described. The possible data of a ship at any time, given a set of positions with timestamps, can be described with two vectors; its position and its velocity vector given its previous position. These can be further extended over longer spans to calculate for each time interval:
Rate of Turn
As such, detecting certain thresholds of these variables would be prime candidates for atomicity; the highest level of granularity.
Once the implementation is complete we must interpret the data and use it to find better descriptions for each group.
It is difficult to prove understandability of a system, but it is possible to show how well different ship types can be described by the proposed granules. The methodology used will be Correlational as described in (Ellis, 2009 p. 327), as the focus is to find a correlation between more or less basic aspects of ship movements and their type classification. In a Correlational method it is necessary to have both a solid framework with which to describe the data you wish to correlate, and to know what signs to look for.
In 3.2.1 Implementation we describe the mechanism we will use to both extract the data and to describe it in relation to the ship types. In 3.2.2 Iteration we use this data to create better descriptions and refine our searches.
The implementation will serve to illustrate the correlation between granules and different kinds of ship. By matching granules to the movement patterns of groups of ships, we can extract data of granule usage, and compare the prevalence. The occurrence of a granule in one group compared to others would indicate the degree to which it can be used for identification, called the accuracy of the granule.
Further data that can be drawn from the matching algorithm is the span of variable variation in granule matches. If a granule finds many matches in a given ship’s movement based on one criterion, several other criteria may be overlooked that, in conjunction with the first, could increase the granule’s accuracy. E.g.: both fishing boats and ferries make many turns based on Rate of Turn criteria, but ferries slow to a stop beforehand based on Speed, while fishing boats keep a constant Speed through all their turns.
The important aspects of the implementation are those that can provide meaningful feedback of the granule matches, as these are key to the next step in the verification: the iteration.
Each time we extract and analyze the data from the granule matches, we will use it to create new granules or modify existing ones, based on the characteristics of the ones that are most uniquely matched to the groups. That is to say, if a granule is shown to occur more often for a specific group of ships we can spawn a number of new granules based on it with slightly altered parameters and compare in the next iteration if their accuracy for a specific group has changed. This way the granule library will iteratively evolve to more uniquely describe ships of different types.
The granule library and each of the more accurate – or otherwise interesting – granules will be evaluated each iteration.
The granules will be evaluated on their accuracy, which is a measure of its highest percentile in relative distribution of group matches. A granule with 20, 200 and 200 matches for groups A, B and C respectively will show a ~4.8%, ~48% and ~48% accuracy. The advantage of this metric as opposed to measuring the difference between the highest and second highest accuracy is that it allows for analysis of granules that have a high accuracy in two groups, which could lead to the discovery of defining aspects in the group with the lowest accuracy (group A in the example given above).
The library will be evaluated on its ability to identify classes of ships, and the parameters that are unique to these.
The produced system
The application is comprised of three main types of objects. Below is a summary of their names and overall functionality, with more detailed descriptions in later sections.
Granules and granule utilities
The management objects are responsible for retrieving and collating the movement data from the database, and facilitating the usage of granules. They provide the framework for the granules to work in, and have been explicitly constructed separately to make the design of granules as modular as possible. These include Ship, ShipGroup, ShipHandler, DatabaseConnector, DatabaseRetriever and Filemanager.
The utility objects are the classes that store movement data and provide related functionality. These are more closely tied to the design of the granules, as they are used as intermediary objects between them and the management objects. The members are ShipType, DataPoint and ShipVector.
The granules and granule utilities perform searches and calculations on ShipVectors, and return data regarding occurrence and various locally defined parameters. Each granule extends the abstract base class Granule which provides the interface for accessing the granules. The classes included under this definition are Granule, DistanceGranule, MeanStatsGranule, RoTGranule, SpeedGranule, Demarcation, Statistics, Turn and TurnFinder.
Table of contents :
2.1 Relevant Fields
2.1.1 Pattern Recognition
2.1.2 Granular Computing
2.1.3 Data Mining & Knowledge Discovery in Databases
4.1 The produced system
4.1.1 Object overview
4.1.2 Data loading and interpretation
4.1.4 Granule utilities
4.2.1 Rate of Turn Granule
4.2.2 Distance Granule
4.3 Measurement analysis
4.3.1 Rate of Turn Granule
4.3.2 Distance Granule
5.3 Future work