Related work using machine learning

Get Complete Project Material File(s) Now! »

Approach

The study was initiated by an information retrieval from previous research in the areas of machine learning and GIS. The information was then used to determine what algorithms and methods were the most suitable for the following phase of the study. The last phase consisted of an experiment designed to answer the research questions of the thesis. Based on the information study, a hypothesis was also formulated for the experiment.

Experiment design

The first phase of the experiment involved reproducing an automated method for ditch detection not using machine learning. The method chosen for this comparison is from

Experimental methodology

Delineation of Ditches in Wetlands by Remote Sensing by Gustavsson and Selberg (2018). In this study, the Whitebox software (Lindsay, J. B., 2016) was used on a DEM to determine how well different data attributes could be used to detect ditches in raster and polyline formats. Since two of the data attributes (Sky View Factor and Impoundment index) were also available in our dataset, reproducing this method on the Krycklan area produced a good comparison with our model. The second experiment phase involved feature engineering and developing post-processing for the random forest model. The third phase involved evaluating the output from the model and determining the importances of the features used. Lastly, the results from the different methods were compared and analysed to determine how they differed.
The random forest algorithm from the python library scikit-learn was used in the experiment. This library allows you to choose to split your trees with a Gini- or entropy criterion. The probability predictions used in this random forest distribution simply calculates the amount of occurrences of a class output divided with the total amount of trees in the forest. (Pedregosa et al., 2011)
The independent variables of the experiment were the feature inputs of the learning algorithm, as well as the number of trees and other configurations of the random forest algorithm. The dependent variable was the raster output classified either as ditch or non-ditch. (Zobel, 2015) To answer the research question ”Does the proposed approach detect ditches more accurately than that of Gustavsson and Selberg (2018)?”, the following hypothesis for the experiment was formulated:
The method proposed in this study outperforms the method by Gustavsson and Selberg (2018) in ditch detection, with respect to Cohen’s Kappa index.

Data preparation

Training and validation data

To develop and evaluate our model, the raster and ditch label data of Krycklan were manually divided into 21 smaller subsections. From this division, 11 of the subsections were put aside as hold-out data to evaluate the performance of the predictions, and 10 zones were used in the development of the model. This allowed the model to be evaluated on unseen data to strengthen the validity of the experiment. Each zone represents an area of roughly 196 hectare. Figure 4 shows which zones were used for development and evaluation respectively.
With the 11 zones in the hold-out data for the final random forest experiment, a process called leave-one-out cross validation was used. Leave-one-out cross validation is a method where you train a model on all but one of your occurrences, and use that occurrence to evaluate the results (Wong, 2015). Using this technique allowed us to train 11 different random forest classifying models with a large amount of data, and evaluate each model once on a single zone, producing 11 sub experiments to evaluate the method on.

Defining ditches in raster format

The digital elevation data from the SFA was represented in a raster format, whereas the ditches from SLU were represented as vectors. These vectors contain no information about the width of ditches. To label each individual pixel as either ditch or non-ditch, a conversion from vector to raster format was performed. Because the observed average width of ditches is larger than 0.5 metres, all pixels within a radius of three pixels (1.5 metres) of the vectors were labelled as ditch pixels. Figure 5 A shows the ditches rasterised from vectors and B shows the ditches after widening. The data in Figure 5 B is the labelled data that was used to train the random forest model. A similar approach was taken by Stanislawski et al. (2018) in their study of roads and stream valleys. Due to all ditches varying in width, it was not possible to produce a perfect representation of each ditch. However, this made for a good compromise for the average ditch.
Since our aim was to detect ditches, and not each pixel labelled as a ditch, some adjustments were made when evaluating the prediction results. The dataset was divided into a lower resolution grid of six by six pixels (9 m²) for each grid. Each grid cell that contained at least 25% ditch pixels was then labelled as a ditch. A similar method was used by Stanislawski et al. (2018). See Figure 5 C for a visual representation of these grid zones.

Reproducing the Whitebox method

In Delineation of Ditches in Wetlands by Remote Sensing (Gustavsson & Selberg, 2018), the workflow for ditch detection consisted of a reclassification to remove noise and to define the limits of what to classify as a ditch. The raster data was then imported into ArcMap (Esri, 2017) to convert the raster to vectors (Gustavsson & Selberg, 2018). We only reproduced the reclassification step, as the results needed to be in raster format in order to compare it with our model. The workflow that Gustavsson and Selberg (2018) used for ditch detection is presented below.

Sky View Factor

The Sky View Factor data has a value between zero and one. The data was binarised to only include values below 0.989. To remove large waterbodies, Gustavsson and Selberg (2018) created a buffer of six metres around polygons of waterbodies. These were converted to pixels and excluded in the result (Gustavsson & Selberg, 2018). Since we had no available data on waterbodies, we could not remove them from the prediction.

Impoundment Index

The dams constructed in Whitebox (Lindsay, J. B., 2016) were four by four metres in size. After running the impoundment tool, the data was binarised to remove values with a water accumulation below 30 m³. This was done to remove flat areas, but still maintain the pixels with a large water accumulation. (Gustavsson & Selberg, 2018)

Feature engineering

Developing the random forest model involved examining how different kinds of features affected the prediction. Several possible data manipulatation methods could theoretically produce a better prediction. An issue with the previously used automated methods is that they do not correctly detect ditches where the LiDAR has been interrupted by bushes or trees. To combat this, steps were taken where neighbouring pixels were included to give a representation of the area surrounding a specific pixel. A similar approach was taken by Roelens et al. (2018), and this approach produced positive results in their study.

General features

The features used for training the model (Sky View Factor, Impoundment index, Slope, High Pass Median Filter) are all derivatives of the digital elevation data. These raw features provided a satisfactory foundation for the model, but lacked in the generalisability of their predictions. More diverse features were extracted using simple statistical aggregates such as mean, median, min, max, and standard deviation. This facilitated finding obscurities in the neighbouring areas around pixels. These features were calculated by gathering all data points in different circular radii around the studied pixel, before performing one of the statistical aggregations. See Figure 6: B, C, H, and J for graphical representation of some of these features.

Custom features

Several custom features were also developed in addition to the general features, attempting to specifically target and enhance ditches as well as non-ditches. These will be presented as follows.
The Sky View Factor Conic filter uses the Sky View Factor attribute to detect and fill gaps in ditches. This was done by taking the mean of all the pixels covered by a cone-shaped mask, which expands outwards from each examined pixel point. The mean was calculated in eight directions from each pixel in a radius outward of 10 pixels. If the mean value from two opposing masks were both below a threshold, the pixel was updated with the lowest of these values. This meant that only pixels with strong ditch indicative values in two opposing directions were updated. This allowed the filter to avoid updating pixels that lay close to cavities or hollows, and only focus on linear geographical properties. This however also meant that geographical properties such as streams were amplified as well.
The Impoundment Ditch Amplification feature uses the Impoundment attribute to amplify ditches by using thresholds and classifying pixels that usually indicate ditches with increasing values. Means and medians were used to eliminate noise, and to produce a smoother ditch representation. See Figure 6: K for a graphical representation of this feature.
Similar to the aforementioned Impoundment Ditch Amplification, the HPMF ditch amplification feature classifies pixels based on their likelihood of lying in a ditch. Values were smoothed with medians and means of different radii before receiving another reclassification based on ditch likelihood. A mean was taken one more time to smooth out the reclassified data. See Figure 6: E for a graphical representation of this feature.
The Sky View Factor non-ditch amplification feature amplifies pixels which are not ditches. This aims to help the model exclude hills and streams, which generally have a deeper impression on the landscape than ditches do with the Sky View Factor attribute. This observation was used to help amplify pixels that exceeded a certain threshold. This feature still misses many stream pixels, and sometimes also picks up pixels from particularly deep ditches. See Figure 6: I for a graphical representation of this feature. The Slope non-ditch amplification has the same goal as the Sky View Factor-based filter, but uses different thresholds and is based on the Slope attribute instead. This more aggressive filter will pick up a much higher percentage of hills and streams, with the downside of sometimes covering ditches as well.
The DEM ditch amplification feature was extracted from the DEM, where differences in elevation of local areas were calculated. Pixels that lay at a lower altitude than the average of a 15 meter radius circle around the examined pixel were marked out before a morphological grey closing was performed to remove noise from the feature.
A Gabor filter is an image processing filter that can be used to detect lines of a certain orientation in an image (Hong, Wan, & Jain, 1998). A set of 30 Gabor filters, which were rotated in different angles and with different frequencies, was used to detect lines in all directions. The filters from this set of filters were then combined to amplify ditches. These filters were used to create features from both the HPMF and Sky View Factor attributes. See Figure 6: D and G for graphical representations of these features.
The raw Impoundment feature was used to create a mask, attempting to retain ditches, but mark out streams. This was done by using a threshold on the Impoundment index that only marked out areas with a relatively large impoundment, which would indicate that these areas contained streams. After widening the resulting area, this mask was used to remove streams from all the aforementioned custom features, generating one new feature from each. See Figure 6: F and L for graphical representations of features that make use of this mask.

Model configuration

The random forest classifier was trained on all the features seen in Table 3. The testing phase showed that the classifier produced poor results when the ratio of ditch- versus non-ditch pixels in the training data was very high. A high ratio led to the model not being punished for mislabelling ditches as non-ditches, causing it to prioritise a high accuracy over a high recall. According to Spelmen and Porkodi (2018) an imbalanced dataset causes a minority class to receive a reduced accuracy. As the ditch class is much less common than the non-ditch class, this needed to be addressed when training our model. To balance the model, we attempted to train the model with a roughly equal amount of ditch pixels and non-ditch pixels.
The first step to create a more balanced dataset was to extract all pixels labelled as ditches, as well as pixels within close proximity of ditches. Secondly, random pixel samples from the entire area were extracted. This allowed the training dataset to be fairly balanced while still containing most of the geographical features of each zone, see Figure 7.
A hyperparameter tuning was performed to determine what parameter values for the random forest algorithm would yield the best results. Evaluating a maximum of 25 features for each node, and using 200 trees showed the best results. Setting the class weight to balanced also improved the performance of the classifier. A probability prediction was used instead of a majority vote binary prediction to allow further post-processing of the prediction.

Post-processing

The model outputs a ditch class probability prediction for each pixel. These probabilities have continuous values between zero and one. zero indicates a very low probability of a pixel lying in a ditch, while one equals a very high ditch probability. See Figure 8: A for a graphical representation a raw prediction for one of the 21 zones of Krycklan.

Noise reduction and gap filling

The probability predictions contained a lot of noise in places far away from ditches, which needed to be excluded. The first step for removing noise was to use a bilateral de-noising filter on the entire prediction image. This left linear properties and pixels with a very high value intact, while lowering the value of pixels that did not contribute to an accurate prediction. See Figure 8: B for a graphical representation.
The second step for removing noise was to use a custom function to remove pixels with a semi-high probability, but that lay far away from any other high probability pixels. A threshold value was used to avoid removing pixels that had a high enough probability, helping to retain pixels that lay in or close to a ditch. The max probability value in a circular radius of 10 pixels was then calculated. If this max value was not high enough, the probability of the examined pixel was lowered. See Figure 8: C for a graphical representation.
The third step involved taking measures to try to fill gaps in ditches that the model failed to correctly predict. A similar method to the one described in 6.5.2 was employed to calculate the mean of cone masks expanding outwards in different directions from the examined pixel. This step also amplified some of the noise that was left, but filling the gaps in the ditches was judged to be more important to help make the next step more effective. See Figure 8: D for a graphical representation.

Contents
1 Introduction
2 Context
2.1 Available data
2.2 Current situation
3 Aim and scope
4 Background
4.1 Supervised learning
4.2 Related work using machine learning
4.3 Random forests and decision trees
4.4 Gini importance
5 Evaluation
6 Approach
6.1 Experiment design
6.2 Experimental methodology
6.3 Data preparation
6.4 Reproducing the Whitebox method
6.5 Feature engineering
6.6 Model configuration
6.7 Post-processing
7 Results and analysis
7.1 Experimental results
7.2 Analysis
8 Discussion
8.1 Strengths
8.2 Weaknesses and limitations
8.3 Comparison to state-of-the-art
8.4 General discussions
9 Conclusions and future work
References
GET THE COMPLETE PROJECT