Hot Target Detection

Research

Machine Learning

Geo

Image Processing

23 min read

Introduction

Hot targets such as wildfires and volcanoes have a devastating impact on the planet, our infrastructure, and our personal health. Wildfires across the globe result in a yearly additional $3.5 \cdot 10^{15}$ g to the existing atmospheric carbon emission.

If the aim is to decrease emission numbers, grassland, savannah, and deforestation fires are the prime targets for hot target detection. Fire emission time series analysis conducted by van der Werf et al. shows that the largest contributors to carbon emissions were grassland and savannah fires (44%), with tropical deforestation and degradation fires (20%), woodland fires (16%), and forest fires (15%) being big factors as well.

Most fires in remote areas and at higher elevation are caused by lightning, while we see an inverse effect near urban conglomerations. Wildfire detection is increasingly important, due to the globally increasing prevalence of wildfires. Westerling et al. show that wildfires have increased in both frequency and duration in the US since the 1980’s. We also see a global increase in yearly burned area, though there is quite some spatial variability in these assessments.

However, the risk and hazards of wildfires are not restricted to just atmospheric carbon emission. Humans, wildlife, and vegetation are severely affected as well. Injury and death resulting from exposure to heat and smoke inhalation, and trauma due to the loss of structural integrity are just two of the most prominent examples of the adverse effects of wildfires for humans. Other examples include indirect exposure to chemicals released during a fire, through water or soil contamination.

The consequences of the destructive nature of wildfires detailed above take years to restore, and can cost billions: the US Forest Service estimates the cost of fire suppression to increase to nearly USD 1.8 billion by 2025. That is leaving out other costs such as ecological and infrastructural reconstruction, and medical aid.

Clearly, effective detection, suppression, and prevention are necessary to combat wildfires. Detecting hot targets early and monitoring their progress is important to successfully manage and control these situations.

Detection and suppression of both urban and rural wildfires has been important for many civilizations throughout history. In ancient Rome, the Vigiles was a proto-firefighting brigade created by Nero (though they could not prevent the Great Fire of Rome), and similar watch forces were maintained in Europe through the ages. More modern approaches to fire detecting came in the 20th century, when fire lookout towers, infrared camera’s, and smoke detectors where employed. These approaches are referred to as remote sensing, which is the collecting and interpreting of information about the environment and earth’s surface without making physical contact. Fire analysis using satellite imagery or aerial data, also part of remote sensing, was first performed in the 1980’s. Beside local sensor networks and regular fire lookout towers, it is one of the main practices for fire detection used today.

Following, a brief overview of relevant research in automatic hot target detection using satellite imagery. We will also expand on relevant research in machine learning, and its role in hot target detection.

Remote sensing

Remote sensing approaches such as wireless sensor networks are growing in terms of research and implementation. Wireless sensor networks have been shown to be effective in detecting and forecasting forest fires in real-time, as opposed to satellite-imagery-based methods that have low spatial and temporal resolution. Yu et al., for example, propose a neural network paradigm for wireless sensor networks in which sensor nodes collect data (i.e. temperature, humidity, smoke) that gets sent to a cluster node that - together with other cluster nodes - processes the data using a neural network. The neural network takes the input data and produces a weather index (likelihood for the current weather to cause a fire) and reports it to a manager node, which in turn produces a fire danger rate. Though no comparison with imagery-based methods is made, their neural network approach is more efficient than other in-network processing methods.

Most existing data-driven approaches of fire detection however, are satellite-imagery-based. The moderate-resolution imaging spectroradiometer (MODIS) was launched into orbit aboard the Terra (1999) and Aqua (2002) satellites respectively. Combined, Terra MODIS and Aqua MODIS can map the Earth’s surface in 1 to 2 days, obtaining data from 36 spectral bands. These bands come in 3 spatial resolutions: 2 bands at 250m/px, 5 bands at 500m/px and 29 bands at 1km/px. MODIS produces global fire products every day using the original detection algorithm by Kaufman et al. and currently the improved contextual algorithm proposed by Giglio et al.. This implementation of the MODIS fire detection algorithm relies - besides pre-/post-processing steps like cloud masking and sun-glint rejection - on manually selected thresholds for top of atmosphere (TOA) reflectance/radiance, though improvements are still being actively researched.

Fire detection algorithms based on data acquired by the Visual Infrared Imaging Radiometer Suite (VIIRS), an imager with a higher spatial resolution than MODIS, also rely on these manually selected thresholds. VIIRS launched in 2011 aboard the Suomi-NPP satellite, and obtains spectral imaging data from 21 bands (16 at 750m/px and 5 at 375m/px). A second VIIRS is expected to launch aboard the JPSS-1 in 2017 on the same orbit as the first VIIRS. VIIRS fire products are generated using a stripped down version of the MODIS algorithm, where C4 (The 4th iteration of the algorithm) is used for the 750m/px product, and C6 for both 750m/px and 375m/px products. These fire products are used for further analysis on, for example burned area mapping, modelling of freight traffic, or combustion source characterization.

Figure 1. Bandpass wavelengths for the Landsat 8 OLI and TIRS sensors, compared to the Sentinel-2 MSI sensor, and Landsat 7 ETM+ sensor. Image obtained from https://landsat.gsfc.nasa.gov/article/sentinel-2a-launches-our-compliments-our-complements/.

Even higher resolution imagery is provided by the Landsat 8 satellite. Landsat is the longest running satellite imagery program, running since 1972. The latest satellite, Landsat 8, was launched in 2013 and provides images with a spatial resolution of 15 to 100 m/px and a temporal resolution of 10-16 days. Landsat 8’s Operational Land Imager (OLI) can acquire data in 9 spectral bands with 30m/px spatial resolution, while the Thermal Infrared Sensor (TIRS) collects 2 spectral bands with 100m/px spatial resolution. Figure 1 shows the bandpass wavelengths for the Landsat 8 sensors. High spatial resolution imaging data acquired by the Landsat satellites have been used for a range of topics, including hot target detection such as volcanism and fires. Due to open access policies, the full Landsat archive has been made publicly available by NASA/USGS. The European Space Agency (ESA) has also adopted similar policies, providing access to data obtained through their Copernicus program (including the latest Sentinel-2 missions), as well as Japan and their ASTER mission. However, even Sentinel-2 data ranging up to 10m/px is insufficient to chart the small fields in the Bangladesh delta, exposing the need for non-satellite based methods as well.

Fusion of different imaging data has also been used to overcome the spatial and temporal limitations of each imager. Boschetti et al. have used MODIS-Landsat fusion to identify burned area with MODIS active fire detection at a 30m/px spatial resolution. Similar fusion has also been applied for gap filling, and radiometric normalization. Murphy et al. present a novel global hot target detection algorithm (HOTMAP) with high detection rates (80%) and low false positive rates (<10%) that incorporates both Landsat 8 and Sentinel-2 data. An visualization of an implementation of their novel daytime detection algorithm can be found in Figure 2. Fusion of Landsat and MERIS imagery has been performed by Zurita-Milla et al. by applying unmixing-based data fusion to combine Landsat’s spatial resolution with MERIS’ spectral resolution.

Figure 2. Classification of hot pixels in Landsat 8 data in the vicinity of Adelaide, Australia (LC80970842015004LGN00). a) Landsat 8 spectral bands 2 (top left); 5 (top right); 6 (bottom left); and 7 (bottom right). b) False colour RGB image of bands 2, 3, and 4. c) Binary image of hot pixel classification output by the algorithm proposed by Murphy et al. d) Hot pixels marked in red superimposed on the original false colour RGB image.

Learning-based remote sensing

There have been a number of approaches to remote sensing that rely on machine learning. Petropoulos et al. have investigated the use of SVMs with Landsat data to perform burned area mapping with high accuracy. Others use SVMs to classify urban areas in an active and semi-supervised learning context, in which the SVM is fed training data, annotated by a human expert, that is expected to be most effective for training. A full review of the use of SVMs in remote sensing is provided by Mountrakis et al..

In the domain of remote sensing, some research on DNNs has been done. Basu et al. use a DNN approach to classify satellite imagery in land cover classes, and obtain impressive results on the SAT-4 and SAT-6 datasets, outperforming other deep learning methods. Castelluccio et al., and Hu et al. utilize a CNN to classify different types of land (i.e. forest, buildings, beach) with great success.

Data

we use multispectral Landsat 8 imaging data obtained from open-access portals, such as Amazon AWS. Data from these portals can be used in combination with search portals such as the USGS Earth Explorer and news sources to easily locate and obtain imaging data from large and small wildfires. We use all multispectral data as input for our machine learning methods, except OLI band 8, since the spectral information in this panchromatic band is already captured by bands 2, 3, and 4 (see also Figure 1). Although the Near Infrared (NIR) and Short-wave Infrared (SWIR) bands are commonly used in hot target detection, the other bands, including the Thermal Infrared Sensor (TIRS) bands, can provide additional information. The machine learning methods may pick up on more subtle relations between irradiative energy output on different wavelengths.

TOA reflectances (corrected for solar angle using Equation 1), and TOA radiance from all used bands will be used as input for our network, including an 8-neighbourhood window. For the 11-band Landsat 8 scenes, excluding OLI band 8, this would result in an input layer of 10x3x3 neurons.

$\begin{equation} \begin{align} \rho \lambda = \frac{\rho \lambda '}{\cos(\theta_{SZ})} = \frac{\rho \lambda '}{\cos(\theta_{SE})} \text{, with} \nonumber \newline \rho \lambda = \text{TOA planetary reflectance} \nonumber \newline \theta_{SE} = \text{Local sun elevation angle} \nonumber \newline \theta_{SZ} = \text{Local solar zenith angle} \nonumber \newline \end{align} \end{equation}$

However, to be able to use our machine learning models, we need annotated data. Meaning that beside Landsat or Sentinel scenes, we require the exact positions of hot target pixels in these scenes to be able to allow the network to learn which pixels are hot targets. The size of academic literature on hot target detection algorithms is relatively small, and annotated datasets hard to come by. This makes evaluation of any new method quite labour intensive and error-prone. We augment our data to simulate hot target pixels to generate training data for our network, by extracting obvious hot target pixels from Landsat scenes and incorporating these features in a new dataset. This method however, might cause the machine learning methods to overfit, and we lose the generalization property of machine learning. We have therefore reached out to the authors of the HOTMAP algorithm with the request of sharing their evaluation set consisting of 45 Landsat 8 OLI scenes, manually annotated by human analysis. They have kindly allowed us to use this dataset for our research.

In this dataset, there are five Landsat 8 scenes per geographic region, including radiance and TOA reflectance maps, and pixel coordinates of hot targets. We use this dataset as an evaluation benchmark, together with the simulation dataset. From here on out, we will refer to this dataset as FIRES. An example scene of the FIRES dataset is provided in Figure 3.

Figure 3. Example data of the FIRES dataset by Murphy et al. used in this research. Imagery was taken in the vicinity of Alaska, USA (LC80690182014140LGN00). Shown are OLI bands 1 to 9 (excluding Panchromatic band 8), TIRS bands 10 and 11, a False Colour (FC) composite image, and the binary Ground Truth (GT) labelling.

Methods

We will look at 3 different methods, HOTMAP (highlighted in previous sections), an SVM-based approach, and a DNN-based approach. Our hot target detection method is compared to the findings of Murphy et al.(HOTMAP) and Giglio et al. (CA).

HOTMAP

HOTMAP, the algorithm proposed by Murphy et al., depends on a number of parameters, mainly the $\alpha$ and $\beta$ parameters. They can be described as boolean mapping functions, taking a multispectral image (converted to TOA reflectances) as input and outputting binary maps. Their formulas are shown in Equations 2 and 3.

$\begin{equation} \begin{align} \alpha = \frac{\rho_{7}}{\rho_{6}} \ge 1.4 \text{ AND } \frac{\rho_{7}}{\rho_{5}} \ge 1.4 \text{ AND } \rho_{7} \ge 0.15 \text{, with} \nonumber \newline \rho_{n} = \text{TOA reflectance in band n} \nonumber \newline \end{align} \end{equation}$

$\begin{equation} \begin{align} \beta = (\frac{\rho_{6}}{\rho_{5}} \ge 2.0 \text{ AND } \rho_{6} \ge 0.5) \text{ OR } S_{6} \text{ OR } S_{7} \text{, with} \nonumber \newline \rho_{n} = \text{TOA reflectance in band n} \nonumber \newline S_{n} = \text{TOA reflectance is saturated in band n} \nonumber \newline \end{align} \end{equation}$

The $\alpha$ and $\beta$ maps are merged using a simple logical OR operator ( $\alpha$ OR $\beta$ ), after which a clustering algorithm generates clusters of hot target candidates. Clusters that do not contain at least a single $\alpha$ pixel are discarded. Remaining clusters are labelled as hot targets.

The $\alpha$ parameter is designed specifically to produce as little false alarms as possible, but to detect at least a single pixel in a fire cluster. Its precision means that it can sometimes miss hot target clusters (especially on saturation of band 7 with very hot targets), which is why the $\beta$ parameter is used in combination with the $\alpha$ parameter. The $\beta$ parameter is used to detect the particularly hot targets, for example in cases of saturation of bands 6 and 7. The $\beta$ map will have good accuracy, but can also produce a large number of false alarms. When used together, the parameters provide effective hot target detection.

We implemented the algorithm to verify its effectiveness on the dataset. We will compare our results to the results reported in [81] consid- ering we use the FIRES dataset as well.

SVM

We also implemented an support vector approach for hot target classification. We initialize our classifier with a radial basis function kernel, with kernel coefficient $\gamma = 1$ n with $n$ = number of features, in accordance with our feature vector size. The other hyperparameters were tweaked as part of our experiment. Each pixel is used as a data point, consisting of 10 features, being the combined OLI and TIRS channels. The binary labels are used for ground truth validation. To ensure a balanced dataset, we use all hot target data points, and use the same amount of non-hot target data points. The rest of the non-hot target data points are discarded. We randomize the data before we apply pruning. This results in a total size of 43546 data points (with 21773 hot target pixels). Using 10-fold cross-validation we utilize the entire dataset, each fold being employed as validation set once.

DNN

We use the same architecture as in the Ground Cover Analysis project, but adapt it to be applicable for multispectral images. Instead of a 3-channel 8-bit RGB input, we use 10-channel 16-bit multispectral images. The output (and concurrently validation data) remains the same, a binary segmentation map the size of the original input image. We again employ data augmentation (rotations, and elastic transformations) to generate more training data, and so allow the network to encounter a broader range of data. We also train an MLP using the previously mentioned feature vectors, with Adam stochastic gradient based weight initialization. The MLP’s hyperparameters were tweaked as part of the experiments.

Results

To benchmark our deep neural network implementation of hot target detec- tion we implemented the CA, and HOTMAP algorithms. In addition, we implemented an SVM and a simple MLP neural network. We will start with an overview of the accuracy and precision scores, and continue with a more expansive view of our results - most notably the machine learning results.

Method	Accuracy	Precision
CA	0.9999	0.2397
HOTMAP	0.9999	0.6766
SVM	0.9847	0.9954
MLP	0.9844	0.9951
DNN	0.9797	0.0015

Table 1. Hot target detection by method.

CA & HOTMAP

The contextual algorithm (CA) of Giglio et al., and the HOTMAP algorithm both provide excellent accuracy. However, this good result is primarily due to the large number of non-hot target pixels in the satellite imagery. We get a clearer picture of how these algorithms perform when looking at the precision scores, the fraction of true hot targets over all marked hot targets (also called positive detection rate in Giglio et al.). In line with Murphy et al., we observe that CA results in many false positives, while HOTMAP performs much better. We believe this difference is caused by incorporating extra information from additional wavelengths.

SVM & MLP

The SVM and MLP were both trained on a dataset consisting of 50% hot target pixels and 50% non-hot target pixels (selected randomly from the complete FIRES dataset). We obtain exceptional precision scores, even with 10-fold cross validation. Accuracy scores are slightly lower than CA and HOTMAP, but this is due to the difference in dataset size. We ran CA and HOTMAP on the full images, the majority of which were non-hot target pixels, while the SVM and MLP saw only 50% non-hot target pixels. When we use the trained SVM to predict on full images, we achieve similar accuracy results.

To determine the effectiveness of these methods we first determined the data separability in n-dimensional feature space, by generating crossplots of the pixel values in every wavelength band, and labelling them with our ground truth (hot target or non-hot target). We observed that there exists a near linear separability when comparing the values of OLI band 6 versus that of OLI band 7.

We also looked at the effect of our features (being the 10 OLI and TIRS channels) on the performance of the SVM, by removing these features from our dataset. In line with the results of our visual inspection of separability, we observe that removing the information in OLI band 7 results in the most severe performance drop.

DNN

The DNN fails to perform with any precision, as the results in Table 1 clearly show. When we investigate individual segmentations, a batch of which is displayed in Figure 4, we see that not only does the DNN not pick up the hot targets, it also tends to label other structures such as cloud formations as hot targets.

This detrimental effect could be caused by the lack of large swaths of hot targets. The hot targets in our satellite imagery are usually only a couple of pixels large, while our DNN architecture works better for larger targets, as is evident by our ground cover analysis results. We tried a number of changes in architecture, similar to those reported in previous ground cover analysis sections, but found no major improvement.

Figure 4. Examples of DNN segmentations of augmented satellite imagery. Shown are the original OLI band 7 image (left), ground truth hot target annotation (middle), and DNN labelling (right).

Figure 4. Examples of DNN segmentations of augmented satellite imagery. Shown are the original OLI band 7 image (left), ground truth hot target annotation (middle), and DNN labelling (right).

Concluding

Existing algorithms of Giglio et al. and Murphy et al. both have excellent accuracy. This is due to the combination of the algorithms effective ability to discern non-hot target pixels (under-classification) and the large number of non-hot target pixels in our FIRES dataset. When we consider precision however, the fraction of accurately labelled hot targets out of all labelled hot targets, we see that there is definitely room for improvement.

The existing algorithms were able to be run on the original input image, and somewhat resemble the colour indices methods of ground cover analysis. Their advantages, much like the colour indices methods, are the speed of execution, simplicity, and ability to handle large input. Their disadvantages are immediately apparent in the precision scores listed in Table 1.

When we look at our machine learning methods, we see that these methods can obtain a much higher precision rate. When we investigate the feature space, we can see why. There is a relatively clean segregation between hot target pixels and non-hot target pixels, even when only considering just two features. The SVM and MLP are able to find this as well and generate an accurate decision boundary cq. distribution of weights. A downside of these methods is that they require some time to generate full resolution segmentation maps. The other machine learning method we implemented, the DNN architecture, fails to perform. We observe very low precision results, due to the network’s inability to generate effective feature maps. These low scores can be explained by the shape of our ground truth data. The fires present in our dataset usually only contain a few pixels. In some cases, there are a few hundred, but spread out and sparse. The architecture we used generates smooth, round, segmentation maps, which we showed earlier as well. While we tried various alternative architectures, with a lower number of features, and convolutional layers, to decrease the daubing effect visible in the segmentation maps, we did not observe any major increase in performance.

An additional limitation was our small dataset. As mentioned previously, convolutional neural networks require large datasets to perform accurately, and even with augmented data our FIRES set was not very sizeable. These shortcomings, in combination with the innate training time of DNNs, make them unfit, in this form, for accurate hot target detection. Other DNN architectures might be able to generate better segmentation maps, although it is hard to rival SVM-based approaches.