Preview

Journal of «Almaz – Antey» Air and Space Defence Corporation

Advanced search

Classification of surface water objects in visible spectrum images

https://doi.org/10.38013/2542-0542-2020-1-87-95

Abstract

This paper considers the problem of classifying surface water objects, e.g. ships of different classes, in visible spectrum images using convolutional neural networks. A technique for forming a database of images of surface water objects and a special training dataset for creating a classification are presented. A method for forming and training of a convolutional neural network is described. The dependence of the probability of correct recognition on the number and variants of the selection of specific classes of surface water objects is analysed. The results of recognizing different sets of classes are presented.

For citation:


Artemyev A.A., Kazachkov E.A., Matyugin S.N., Sharonov V.V. Classification of surface water objects in visible spectrum images. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2020;(1):87-95. https://doi.org/10.38013/2542-0542-2020-1-87-95

Introduction

The paper investigates the classification algorithms of surface water objects (SWO), such as ships and boats of different classes, in visible spectrum images using convolutional neural networks (CNN).

The traditional approaches to developing the classification algorithms come down to selecting a formal description of objects, modelling a database (DB) with the most distinctive descriptions (reference vectors of the features) for each class, and subsequently collating the vectors of object features with the DB of reference. In case of SWO, in various papers the ships silhouettes [1] and sets of characteristic points of the objects were used as the feature vectors, while the images were the visible-spectrum [2], infrared [3], and radar [4] ones.

It shall be noted that shaping a DB of references (i. e., DB of feature vectors) is the most labour consuming part of such approaches and demands expert knowledge from the recognition system developers.

One of the most actively developing approaches in the recognition domain is application of neural networks, in particular, various CNN models, e. g., for recognition of ground objects on radar images [5]. As compared with the traditional approaches, CNN requires no expert generation of formal descriptions of objects, since use is made of object images directly, and no DB with reference vectors of features is required for the recognition task, whereas, the knowledge of classes lies immediately in the parameters of a trained CNN. Moreover, CNN are sufficiently resistant to noise contamination of images being processed. For CNN training, a sizeable set of object images in each class is required.

Currently, there is a certain number of papers dedicated to the recognition of SWO on satellite radar images with the use of CNN, such as [4]. As regards visible spectrum images, there are considerably fewer such papers, which, first of all, is explained by the absence of available high-quality set of SWO images. One can mention paper [6], where recognition of objects from the VAIS dataset [7] is considered. Those data contain a limited set of classes with low-quality images in the visible and infrared spectra. However, the dependencies of correct recognition probability on selection of object classes are not scrutinised there. Therefore, research into the possibilities of SWO recognition in visible spectrum images and assessment of correct recognition probability depending on the types and quantity of selected classes is a topical issue.

1. Methodology of training dataset generation

It is known that one of the principal tasks in the application of classifiers based on the neural network methods is to create a training dataset with sufficient data volume, which can amount to tens of thousands of objects split into classes.

Currently, an alternative to training dataset creation is the use of one of the generally accessible sets of images, such as VAIS [7], MSTAR [8], and others. However, sets like that do not suit all the tasks, or such data can be unfit in terms of survey conditions, or they would appear insufficient for handling a specific task.

For classifying SWO, a training data set has been created within the framework of this paper. The data used were the images of 5 classes of ships and 5 classes of boats, the examples of which are given in Fig. 1.

Fig. 1. Examples of surface water objects

In the classification task, the algorithm predicts classes (their markers), which SWO belong to. Accordingly, a set of marked data is required for training of a neural network classifier. The algorithm quality is evaluated proceeding from how accurately it can classify new (not involved in the training) SWO images.

For training which is based on generally accessible visible-spectrum SWO images, a general dataset (GDS) was generated for 10 object classes, represented in Fig. 1 with respective markers. The set contains 7,800 images of civil ships and boats (780 images per each class). The available images were split into a training (680 images per class) and a verification (100 images per class) sets. Each image is given in the gradations of grey, the image type being JPEG. Initially, the processing was done on the images of RGB format, but this image type had to be eventually abandoned because of higher computing costs as compared with processing of images in the gradations of grey. Also, to reduce neural network’s training time, all images in the training set were brought to a single size of 32x32 pixels. Selection of a single size was conditioned by the available images, part of which were under 64 pixels in one of the dimensions. The experiments with image sizes proved that with the size increased to 64x64 and 128x128 pixels, the classification accuracy improves insignificantly, whereas, the demand for computing resources required for CNN training is much higher.

As is known, in order to achieve high resuits, deep neural networks have to be trained on large data volumes. When the initial training set contained a limited number of images, as in our case, the augmentation operation was applied to expand the dataset. Data augmentation [9] is a methodology for creating additional training data from an available set, when the images undergo various transformations: parallel translation in the vertical and horizontal direction, scaling, turning, reflection relative to the horizontal and vertical axes, swapping of the image channels. Different transformation combinations were applied, such as turning and random scaling, variations in saturation amount and the values of all pixels.

To build a neural network classifier and carry out GDS-based experiments on CNN training, individual datasets (databases) for training were generated, containing different quantities of the object classes (Table 1). It should be noted that when building a recognition algorithm, boat classes “Destroyer” and “Frigate” were combined into a single common class, as the structures of these SWO were hard to tell one from the other.

2. Classification algorithm

The tools used in SWO classification were the CNN methods and algorithms. The methodology of CNN application for recognition of objects in images is described in previous papers of the authors [10, 11].

Such structure of an artificial neural network was first offered by French informatics scientist Yann LeCun in 1988 [12], being specifically aimed at recognition of images.

Table 1

Specific datasets for training

Title

of training database

Classes of objects in the database

DB-1

Aircraft carrier, Assault ship, Destroyer and Frigate, Motorboat, Corvette, Container carrier, Fishing ship, Passenger ship, Tanker, Yacht

DB-2

Aircraft carrier, Destroyer and Frigate, Corvette, Container carrier, Fishing boat, Passenger ship, Tanker, Yacht

DB-3

Aircraft carrier, Assault ship, Destroyer and Frigate, Motorboat, Corvette

DB-4

Container carrier, Fishing boat, Passenger ship, Tanker, Yacht

DB-5

Aircraft carrier, Destroyer and Frigate

The network architecture got its name due to the presence of a convolution layer (convolution operation). CNN functioning can be interpreted as transition from specific features of an image to more abstract details, and further to still more abstract ones, up to distinction of high-level concepts. In the course of training, CNN self-tunes and generates the necessary hierarchy of features, filtering out less important details and focusing on the essentials.

A resultant CNN represents an 18-block sequence. Each block contains a convolution layer and a downsampling layer. The convolution layer consists of a set of maps (a feature map), with each map having a scanning kernel, or filter (convolution operation kernel). The number of maps is determined by fitting in the course of model training. The kernel is essentially a filter, or a fixed- size window, which slides over the entire area of the previous map and finds certain features of the objects. Kernel size is normally taken within 3x3 to 7x7 pixels. In principle, kernel size is selected by trial-and-error method. If the size is too small, a probability of losing some important features exists, and when it is too large, the number of inter-neuron links increases, which will increase computational complexity and processing time. Also, the kernel size is selected in such a way so that the size of convolution layer maps makes an even number, which will make it possible not to lose information in case of dimensionality decrease in the downsampling layer. The values of convolution kernel elements are selected automatically in the process of neural network training. Similar to the convolution layer, the downsampling layer has maps. The layer’s objective is to decrease dimensionality of feature maps of the previous layer. If at the previous convolution operation certain features were already revealed, an image detailed to such an extent is no longer required for further processing, so it is compressed to a less detailed representation. Moreover, filtering out the details that are no longer required helps neural network not to retrain. For example, if a layer map has a kernel sized 2x2, it will allow to make the previous feature maps of the convolution layer twice as small. In this case the entire feature map is divided into cells sized 2x2, from which the cells of the maximum value are selected. This operation is called MaxPooling, i. e., selection of the maximum.

To improve neural network performance and operational stability, the batch normalization technique is applied in the model being used. The principle of this technique is that pre-processed data having zero mean value and unit variance are sent to the input of some layers of a neural network. This technique was first introduced in [13].

It is known that deep networks extract low-, medium-, and high-level features through end-to- end multilayer technique, and an increase of the number of layers or blocks consisting of several layers, as in our case, may enrich the levels of features. However, as the network depth increases, training becomes unstable and the achieved accuracy starts to decrease (degrade).

To overcome this problem, an architecture of inter-layer links, as applied in the ResNet network [14], was used. For training, the majority of neural networks uses the back-propagation error method [15, 16], which is based on the ‘response error correction’ rule, developed as a method for multilayer perceptron training. The key idea behind the method consists in error signals propagation, following error detection at the network’s output, in the direction opposite to direct propagation of signals during regular computation process, i. e. from network outputs to its inputs. During back-propagation, the weights are tuned for error minimisation.

With the training pattern passed and output obtained, backward pass of the network starts so as to minimise error function, with the gradient descent method applied. When increasing neural network depth, one of the most frequently encountered problems is gradient attenuation on the back-ward pass and, as a consequence, deterioration of neural network functioning. In particular, gradient attenuation and information loss most often occur due to the use of layer ReLu with activation function f = max{x,0). In many cases, with unwanted features screened out, it is exactly 0 value that is obtained, which actually contributes to gradient attenuation.

The ResNet architecture [14] is based on a so-called residual block, which diagram is given in Fig. 2. Hence, the architecture is named Residual Network {residual neural network).

Fig. 2. Residual block of ResNet architecture

Operation of this block is as follows: let the objective function be expressed as H(x) = F(x) + x. With quality limit at the previous layer reached, to avoid gradient attenuation it would be desirable that function F(x) returns identity transformation; however, this does not occur because layer ReLU is in use, so the function often returns 0. For this network, it is suggested to use so-called quick connects, i. e. identity mapping is added explicitly. As a result, on the backward pass in the back-propagation error method we obtain: dF(x)/dx+1. In this way, no gradient attenuation occurs, as backward pass will always be performed.

The use of ResNet architecture allowed for training of a 18-layers deep neural network (against a three-layer network used at the initial stages of work), with acceptable recognition results obtained simultaneously for 10 classes at a small number of images in the training set.

To make a final decision on the class of image being analysed, layer SoftMax was used, with the number of its outputs corresponding to that of the classes considered. The class which matched CNN output with the maximum response was taken as the resultant class.

One of the practicable methods to take account of possible unknown classes and blank background images is introduction of a response threshold at the CNN output layer and interpretation of the “maximum response <  threshold” case as “unrecognised”. A correct analysis of such cases requires further dataset expansion and is beyond the scope of this paper.

3. Results

Separate training was performed for each of the databases given in Table 1. In each case, the same network architecture was used - ResNet with 18 layers. Also, in each case 200 epochs of training were used, as well as early stopping, i.e. training termination when recognition accuracy did not improve over several epochs. In all cases, training finished with early stopping, thus the number of epochs for training was sufficient. The number of training samples processed simultaneously within one iteration of training algorithm (batch size) was 20. The achieved probabilities of correct recognition for all the considered cases are given below. The results are given in Tables 2-7.

3.1. Training results with DB-1 used - classification by 10 SWO classes (Table 2).

Table 2

Training results with DB-1 used - classification by 10 SWO classes

10 classes

Class

Aircraft

carrier

Assault

ship

Destroyer,

frigate

Motorboat

Corvette

Container

carrier

Fishing

ship

Passenger

ship

Tanker

Yacht

Aircraft carriers

62

7

11

7

0

2

0

3

2

6

Assault ships

26

37

21

4

4

0

0

4

4

0

Destroyers and frigates

7

5

61

13

9

0

0

0

4

1

Motorboats

3

17

10

51

4

2

0

6

0

7

Corvettes

0

0

0

0

100

0

0

0

0

0

Container carriers

1

3

0

2

0

73

2

0

15

4

Fishing ships

2

0

0

1

0

6

74

6

6

5

Passenger ships

5

4

11

3

1

5

0

57

5

9

Tankers

1

1

0

2

0

4

2

4

82

4

Yachts

1

0

3

1

0

2

0

7

2

84

Correctly recognised images: 681 out of 1,000 (CRP = 0.681)

3.2. Training results with DB-2 used - classification by 8 SWO classes (Table 3).

Table 3

Training results with DB-2 used - classification by 8 SWO classes

8 classes

Class

Aircraft

carrier

Destroyer,

frigate

Corvette

Container

carrier

Fishing

ship

Passenger

ship

Tanker

Yacht

Aircraft carriers

85

2

2

1

0

2

0

8

Destroyers and frigates

4

82

4

0

0

1

0

9

Corvettes

0

14

81

0

0

0

0

5

Container carriers

1

0

0

77

1

4

13

4

Fishing ships

1

4

0

6

54

3

18

14

Passenger ships

7

10

0

3

1

56

1

22

Tankers

1

2

0

12

1

3

73

8

Yachts

3

2

0

2

3

5

1

84

Correctly recognised images: 592 out of 800 (CRP = 0.740)

3.3. Training results with DB-3 used - classification by 5 boat classes (Table 4).

Table 4

Training results with DB-3 used - classification by 5 boat classes

5 warship classes

Class

Aircraft carrier

Assault ship

Destroyer,

frigate

Motorboat

Corvette

Aircraft carriers

58

5

22

9

6

Assault ships

10

33

39

4

14

Destroyers and frigates

3

3

84

8

2

Motorboats

4

7

31

56

2

Corvettes

0

0

2

0

98

Correctly recognised images: 329 out of 500 (CRP = 0.658)

3.4. Training results with DB-4 used - classification by 5 surface ship classes (Table 5).

Table 5

Training results with DB-4 used - classification by 5 surface ship classes

5 civil ship classes

Class

Container

carrier

Fishing ship

Passenger ship

Tanker

Yacht

Container carriers

66

5

5

19

5

Fishing ships

5

76

6

7

6

Passenger ships

4

1

88

1

6

Tankers

5

2

9

80

4

Yachts

0

4

14

2

80

Correctly recognised images: 390 out of 500 (CRP = 0.780)

3.5. Training results with DB-5 used - classification by 2 boat classes (Table 6).

Table 6

Training results with DB-5 used - classification by 2 boat classes

2 boat classes

Class

Aircraft carrier

Destroyer, frigate

Objects

Aircraft carriers

97

3

Destroyers and frigates

7

93

Correctly recognised images: 190 out of 200 (CRP = 0.950)

Mean correct recognition probabilities depending on the number of SWO classes are given in Table 7.

Table 7

Mean correct recognition probabilities depending on the number of SWO classes

 

DB-1

DB-2

DB-3

DB-4

DB-5

Number of classes

10

8

5

5

2

Mean correct recognition probability

0.68

0.74

0.66

0.78

0.95

Expectedly, as the number of classes decreases, the CRP would increase. The reduction of correct recognition probability in case of DB-3 as compared with DB-4 (given the same number of classes) is presumably explained by the presence of classes with similar silhouettes.

CRP increase can be possible with the use of additional CNN features and sets. Thus, in [6], CRP average increase by 13 %, as compared with a single CNN for 6 civil ship classes, is demonstrated. The use of this approach for all classes considered in the paper is a promising direction for further research.

Conclusion

The paper investigates the methodical aspects of recognition of surface water objects (SWO) by their visible spectrum images at a large number of different SWO classes. Recognition was done with the use of convolutional neural networks. The applied convolutional neural network allowed to solve the classification task without intermediate shaping of SWO features, proceeding immediately from the visible spectrum images.

The dependence of correct recognition probabilities on the number of classes and a set of recognised SWO in class was analysed. For that purpose a methodology for generating an object image database was developed and a special training dataset (database of the visible spectrum images of objects) created for SWO classification in the visible spectrum range. In the database forming, images of 10 classes of SWO and boats of different types were used.

It was revealed that the mean correct recognition probability made Р = 0.95 for 2 classes (“Aircraft carriers” and “Destroyers, frigates”) and decreased to Р = 0.68 for 10 SWO classes.

The results obtained in the course of investigation have demonstrated a possibility to apply the algorithms of SWO stepwise recognition. In so doing, recognition is first performed for a combined class, which includes similar objects, in order to distinguish this class from the rest of SWO classes. Then, if necessary, recognition of SWO classes (types) included in the combined class is carried out.

A detailed analysis of the methodical aspects of CRP increase, in particular, when applying CNN sets, and introduction of class “unrecognised” will be done in subsequent papers.

References

1. Gouaillier V., Gagnon L. Ship silhouette recognition using principal components analysis // Applications of Digital Image Processing XX. 1997. Vol. 3164. P. 59–70.

2. Feineigle P. A., Morris D. D., Snyder F. D. Ship recognition using optical imagery for harbor surveillance // Proceedings of Association for Unmanned Vehicle Systems International (AUVSI). 2007. P. 1–17.

3. Li H., Wang X. Automatic recognition of ship types from infrared images using support vector machines // International Conference on Computer Science and Software Engineering. 2008. Vol. 6. P. 483–486.

4. Rainey K., Reeder J. D., Corelli A. G. Convolution neural networks for ship type recognition // Proceedings of the SPIE Defense + Security. Vol. 9844: “Automatic Target Recognition XXVI”. 2016. 984409. DOI: 10.1117/12.2229366

5. Казачков Е. А., Матюгин С. Н., Попов И. В., Шаронов В. В. Обнаружение и классификация малоразмерных объектов на изображениях, полученных радиолокационными станциями с синтезированной апертурой // Вестник концерна ВКО «Алмаз – Антей». 2018. № 1. С. 93–99.

6. Shi Q., Li W., Tao R., Sun X., Gao L. Ship Classification Based on Multifeature Ensemble with Convolutional Neural Network // Remote Sensing. 2019. No. 11. P. 419.

7. VAIS: A Dataset for Recognizing Maritime Imagery in the Visible and Infrared Spectrums. URL: http://vcipl-okstate.org/pbvs/bench/ Data/12/VAIS.zip

8. Moving and Stationary Target Acquisition and Recognition (MSTAR) Public Release Data. URL: https://www.sdms.afrl.af.mil/datasets/mstar/ (дата обращения: 24.03.2018).

9. Хайкин C. Нейронные сети: полный курс / Пер. с англ. д-ра техн. наук Н. Н. Кусеуль, канд. техн. наук А. Ю. Шелестова. 2-e изд. М.: Издательский дом «Вильямс», 2006. 1104 с.

10. Матюгин С. Н., Чернигин А. А. Исследование применимости нейронных сетей для классификации объектов на изображениях // Нейрокомпьютеры: разработка, применение. 2007. № 11. С. 38–42.

11. Казачков Е. А., Матюгин С. Н., Попов И. В., Шаронов В. В. Обработка глубокими сверточными нейронными сетями радиолокационных изображений из баз данных CARABAS-II и MSTAR // Радиолокация. Результаты теоретических и экспериментальных исследований: монография: в 2-х кн. Кн. 2 / Под ред. А. Б. Бляхмана. М.: Радиотехника, 2019. С. 72–86.

12. LeCun Y. LeNet-5, convolutional neural networks. 2013.

13. Ioffe S., Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2016.

14. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. P. 770–778.

15. Гудфеллоу Я., Бенджио И., Курвилль А. Глубокое обучение / Пер. с англ. А. А. Слинкина. 2-е изд., испр. М.: ДМК Пресс, 2018. 652 с.

16. Николенко C., Кадурин А., Архангельская Е. Глубокое обучение. Погружение в мир нейронных сетей. СПб.: Питер, 2018. 480 с.


About the Authors

A. A. Artemyev
Scientific Research Institute of Radio Engineering, JSC

Artemyev Anatoly Aleksandrovich – Engineer of the 1st category

Research interests: algorithm development, digital processing of radar signals and optical images, image recognition, neural networks.



E. A. Kazachkov
Scientific Research Institute of Radio Engineering, JSC

Kazachkov Egor Andreevich – Engineer

Research interests: algorithm development, digital processing of radar signals and optical images, statistical processing of information, image recognition, neural network.



S. N. Matyugin
Scientific Research Institute of Radio Engineering, JSC

Matyugin Sergey Nikandrovich – Cand. Sci. (Phys.-Math.), Sector Head

Research interests: distribution of radio signals, digital processing of radar signals and optical images, image recognition.



V. V. Sharonov
Scientific Research Institute of Radio Engineering, JSC

Sharonov Vladimir Vitalievich – Deputy Departmental Head, Deputy Chief Design Manager

Research interests: radar systems, digital processing of radar signals, image recognition.



Review

For citation:


Artemyev A.A., Kazachkov E.A., Matyugin S.N., Sharonov V.V. Classification of surface water objects in visible spectrum images. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2020;(1):87-95. https://doi.org/10.38013/2542-0542-2020-1-87-95

Views: 568


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2542-0542 (Print)