Journal of «Almaz – Antey» Air and Space Defence Corporation

Advanced search

Experimental study of a prototype for an autonomous infrared system for ground object recognition


The results of experiments with a prototype of an autonomous infrared system for recognition of ground objects based on domestic physical components and open architecture of the YOLOv3 convolutional neural network are presented. The object of recognition is a car van. The neural network is trained on a set of images taken in the visible range. Infrared video footage of imperfect quality recorded by a moving and vibrating air carrier – octocopter – is analysed.

For citation:

Maltsev A.I., Otkupman D.G., Ostashenkova V.K., Ostanin M.V. Experimental study of a prototype for an autonomous infrared system for ground object recognition. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2021;(1):93-102.


Development of the theory and applications of convolutional neural networks allowed to make a remarkable progress in automatic object recognition algorithms for various civil and military applications. The typical process of recognition system development with a neutral network-based algorithm implies the availability of an image dataset with labelled standard objects shot at different angles and conditions. The neural network algorithm implemented as a computer program is adjusted for detection and recognition of the predetermined type of objects by training the neural network via a large image dataset of objects to be recognized by the system. Imagery is supposed to be acquired in the range similar to the operating range of the recognition system’s receiver. However, for the infrared (IR) spectral region, suitable imagery sets, especially for objects classified as point of interest in military applications, are difficult to access or have a limited volume and are often taken at wrong angles. Therefore, it would be interesting to study available options for recognition if we use a dataset containing images of the other range.

Some authors have studied the application of various architectures of convolutional neural networks within systems operating in the IR range. One of such research papers [1] describes the application of various architectures trained on a visible range dataset and on a dataset provided by FLIR. But this paper does not analyse the option of using the YOLO architecture.

To look ahead into all the capabilities offered by the YOLOv3 architecture, the authors have studied the materials represented in [2][3]. However, despite the resemblance between other works and our study in terms of training the wellknown YOLOv3 architecture on a set of images acquired in the visible region of the electromagnetic spectrum for application in IR systems, most foreign publications place emphasis on facial recognition technologies. We are not interested in this approach; besides, known publications include no analysis of the results of video stream data processing when the data are acquired from an oscillating or vibrating carrier.

Available domestic research publications also fail to describe the approach represented herein. They give consideration either to automatic thermal imaging recognition systems featuring different architectures or to other recognition equipment [4][5].

Main part

We conducted an experimental study of the autonomous ground object recognition system prototype from the side of the upper hemisphere, based on the IR-range long-wave region receiving part (lens, matrix) and on the neural network-based recognition algorithm. For implementation of the receiving part, we selected domestic components made by Astrohn Design Bureau [6], such as a vanadium oxide-based microbolometric module with 640x480 resolution, increment of 17 µm and shutter-less calibration, as well as thermal imaging germanium lens with passive athermalization, focal distance of 100 mm, 1:1.4 aperture ratio and image quality near the diffraction limit. The receiving part with data transmission equipment was mounted on an unmanned aerial vehicle (UAV) – an octocopter. The simplified block diagram of the experiment is shown in Figure 1. The general view of airborne experimental equipment is shown in Figure 2.

Fig. 1
. Simplified diagram of experiment

Fig. 2
. 3D model of unmanned carrier with equipment

To develop the recognition algorithm, we selected the YOLO (You Only Look Once) convolutional neural network architecture described in public-domain sources. The architecture enables recognition of multiple imagery objects and supports many object classifications. In comparison with other known architectures, the YOLOv3 is one of the most precise and fast solutions [7].

Speaking of the theoretical basis of the YOLOv3 algorithm operation, it is worth mentioning that the classifier of any convolutional neural network usually tries to predict which type of object is captured in the window. Each image requires a large number of predictions (expressed in four-digit numbers) to be done. Based on the above, the algorithm operates slowly. With its advantage expressed in its full name “You Only Look Once”, the YOLO architecture helps eliminate that particular drawback.

The YOLO convolutional neural network uses the entire image for pre-configuring each bounding box. It is also capable of simultaneous pre-configuration of all bounding boxes for each image class. This means YOLO at once analyses the entire image and any objects shown in it. This becomes a key factor for frame processing duration. The graph shown in Figure 3 proves the advantages of the YOLOv3 architecture. The graph demonstrates high performance of YOLOv3 as compared with other known architectures. According to the data represented in Figure 3, if trained on the same COCO dataset and with the comparable object’s mAP (mean Average Precision), YOLOv3 is faster in object detection than other networks under consideration.

In order to develop a task-oriented software implementing the neural network, we selected the Python, a high-level programming language, including add-ons in the form of open libraries (modules) for scientific computing, deep learning and computer vision (NumPy, TensorFlow, Keras, OpenCV).

As the object under observation, we selected a car van (4150×1960×1820 mm).

To select the optimal recognition range for an object of certain size, we can use the simplified criterion considered in [8]:

where L – recognition range (distance to object), m;
v = 58.82 1/mm – spatial frequency;
f' = 100 mm – focal distance;
hкр = 2.7 m – critical size of the object;
N – number of active elements (pixels), pcs..

Fig. 4 shows the dependence, defined in geometric approximation, of our object’s recognition range on the number of resolution pixels predetermined by the detection criterion.

Fig. 4
. Theoretical dependence of the detection range on the number of resolution pixels predetermined by detection criterion, in an object image with the critical size of an object of 2.75 m. The dependence is determined for the selected configuration of the receiving system in geometric approximation

Based on the assumption of the resolution that is knowingly sufficient to detect the object under consideration, and also based on comfortable operational conditions at a site selected for experiments, we made a decision to conduct an UAVbased airborne survey of the object of interest at a distance of around 250 m. In order to estimate probabilities of object recognition at other distances, the acquired raw images were processed by combining pixels of resolution multiple of distance variation.

The below results represent day-time footage on the underlying terrain background belonging to the “non-uniform meadow – earth road” type. During survey, the object under observation (car van) made turns, and the object observation angle was changed accordingly. Figure 5 shows the initial and final frames of raw video footage (without digital image enhancement) of the observed scene taken in the long-wave region of the IR range.

Fig. 5
. Initial and final frames of video footage with the object under observation

Since sufficiently vast databases of object images taken in the long-wave region of the IR range are not available, the neural network was trained on the open-source visible range image COCO dataset [9] with labelled objects of the “car”, “bus” and “truck” types. We should note that the YOLO developers provide an open source code and a detailed description for training your own model using different datasets, including the COCO dataset [10]. Thus, for training we used over 10,000 images, size 416×416, taken in the visible range. The recognition precision for the learning sample was over 0.8 (80 %).

Trained in accordance with the above-mentioned method, the neural network implementation software processed video footage frames with the object turning from the position “from overhead at a certain angle – front view” to the position “from overhead at a certain angle – side view”. The software detected the object in the frame and recognized it, determining the object type from the dataset containing the “car”, “bus” and “truck” types and estimating the probability of object type recognition (Fig. 6).

Fig. 6
. Recognition process visualization for one of frames. The object type recognized in the frame with the maximum probability and the probability value are shown above the box around the detected object

Accordingly, for the “car” type it will be the probability of correct recognition, but regarding two other types – the probability of classifying the actual object as an object of the other type, i.e. the probability of false recognition. In order to illustrate how point estimates of probabilities change from one frame to another when the object under observation makes a turn, we have plotted a curve where the abscissa axis is the frame number and the ordinate axis is the probability. Figure 7 shows an example that explains the curve structure.

We should mention a significant variation in frame-to-frame probability estimates along with a stable state of approximating estimates. This indicates the relevance of estimate acquisition based on multiple frames in order to enhance the probability of correct decision-making by the recognition algorithm to be implemented into the final product.

Figure 8 shows results of sequenced frames processing from the “front view” position to the “side view” position.

To estimate the probability of object recognition at distances of 500 and 1000 m, the resulted video file passed digital processing to simulate an increase in the observation range (merging 2×2 and 4×4 adjacent pixels of the image in the frame with brightness averaging). Figure 9 shows analysis results for processed frame sequences.

Comparison of the results represented in Figures 8 and 9 shows that an increased range (deteriorated resolution) leads to significant reduction of the probability of correct object recognition at an adverse observation angle.
For clear illustration of this dependence, the table gives results of probability estimate’s linear approximation for different ranges at the beginning of the record being analysed in the position “from overhead at a certain angle – front view” and at the end of the record in the position “from overhead at a certain angle – side view”.


Results of linear approximation of correct recognition probability for different observation ranges and angles

To check the performance of the trained recognition software in a complex background with a flickering camera and a blurred image, we recorded a video showing a suburban area such as “one- and two-storey buildings – fences – paved roads – cars” with the airborne equipment UAV carrier moving randomly. Figure 10 shows a fragment of the received video footage in its raw form and with objects detected by the recognition software.

Fig. 10
. Fragment of scene footage “one- and two-storey buildings – fences – paved roads – cars”: a) raw footage; b) footage with recognition.

With a blurred image and camera flicker, the network algorithm implemented into the recognition software demonstrated consistent recognition of the object clearly visible against the background. But a more important thing is that, firstly, we have obtained a non-zero estimate of the correct recognition probability of a barely visible object in adverse background conditions (closer to the upper left corner of the picture) and, secondly, a combination of contrast background areas has never been misrecognized as the target object.


  1. We developed the autonomous ground object recognition system prototype based on a domestically made matrix receiver operating in the long-wave region of the IR range, a domestically made lens and software that implements the convolutional neutral network algorithm with the YOLOv3 architecture taken from open sources.
  2. Experimental study of the developed prototype by means of video recording from an airborne carrier and further ground-based frame processing proves its performance, i.e. its ability, if mounted on a moving airborne carrier, to generate IR images with the resolution sufficient for detecting and recognizing the target object, as well as the ability of the recognition software to recognize the object type.
  3. The neural network trained on a set of typical objects in the visible range demonstrated a considerably high probability of object recognition in images acquired in the long-wave region of the IR range. This simplifies preparation of image databases when developing real applications.
  4. Processing of video stream data received from an airborne carrier, which is exposed to oscillations and vibrations, shows a sustainable variation in object recognition probability estimates for adjacent video footage frames. That is why, it is reasonable in development of the real application operation algorithm to carry out co-processing of a series of sequenced frames with probability estimate accumulation.
  5. The developed prototype of a recognition system as part of the receiver unit based on domestic hardware components and on the open algorithm of the YOLOv3 convolutional neural network may serve as the basis for solving applied problems, including development of coordinators able to autonomously detect target objects.


1. Devaguptapu Ch., Akolekar N., Sharma M.M., Balasubramanian V.N. Chaitanya Devaguptapu Borrow from Anywhere: Pseudo Multimodal Object Detection in Thermal Imagery [Electronic resource] // IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW). 2019. 10 p. URL: (date of request: 01.10.2020). DOI: 10.1109/CVPRW.2019.00135

2. Kristo M., Ivasic-Kos M., Pobar M. Thermal Object Detection in Difficult Weather Conditions Using YOLO // IEEE Access. 2020. Vol. 8. P. 125459–125476. DOI: 10.1109/ACCESS.2020.3007481

3. Ivašić-Kos M., Kristo M., Pobar M. Human detection in thermal imaging using YOLO // Conference Paper. April, 2019. 5 p. URL: (date of request: 01.10.2020). DOI: 10.1145/3323933.3324076

4. Мингалев А.В., Белов А.В., Габдуллин И.М., Агафонова Р.Р., Шушарин С.Н. Распознавание тест-объектов на тепловизионных изображениях // «Компьютерная оптика». 2019. Т. 43, № 3. С. 402–411. DOI: 10.18287/2412-6179-2019-43-3-402-411

5. Фомичева О.А., Стреляев С.И. Методы распознавания ИК-изображения // Известия ТулГУ. Технические науки. 2018. Вып. 11. С. 207–213.

6. Оптико-механическое конструкторское бюро «Астрон»: сайт. – МО, г. Лыткарино, 2020. URL: (дата обращения: 01.09.2020).

7. Redmon J., Farhadi А. YOLOv3: An Incremental Improvement. 8 April, 2018. arXiv:1804.02767v1 [cs.CV]. URL: (date of request: 01.09.2020).

8. Якушенков Ю.Г., Тарасов В.В. Инфракрасные системы «смотрящего» типа. М.: Логос, 2004. 444 с.

9. COCO – Common Objects in Context: website. 2015. URL: (date of request: 01.09.2020).

10. Survival Strategies for the Robot Rebellion: website. 2015. URL: (date of request: 01.09.2020).

About the Authors

A. I. Maltsev
JSC Research and Production Enterprise “Impulse”
Russian Federation

Maltsev Andrey Ivanovich – Cand. Sci. (Engineering), Senior Researcher, Deputy Chief Designer.

Moscow, Russian Federation

D. G. Otkupman
JSC Research and Production Enterprise “Impulse”; Moscow State University of Geodesy and Cartography (MIIGAiK)
Russian Federation

Otkupman Dmitriy Grigoryevich – Senior Research Engineer; Lecturer, Department of Optical Electronic Devices. Research interests: synthesis of optical systems, thermal imaging, computer vision, photonics, laser technology.

Moscow, Russian Federation

V. K. Ostashenkova
JSC Research and Production Enterprise “Impulse”
Russian Federation

Ostashenkova Viktorya Konstantinovna – Research Engineer. Research interests: infrared technology, artificial intelligent systems.

Moscow, Russian Federation

M. V. Ostanin
JSC Research and Production Enterprise “Impulse”
Russian Federation

Ostanin Mikhail Vasilievich – Departmental Head. Research interest: infrared technology, optic-electronic systems, laser rangefinder.

Moscow, Russian Federation


For citation:

Maltsev A.I., Otkupman D.G., Ostashenkova V.K., Ostanin M.V. Experimental study of a prototype for an autonomous infrared system for ground object recognition. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2021;(1):93-102.

Views: 810

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2542-0542 (Print)