Journal of «Almaz – Antey» Air and Space Defence Corporation

Advanced search

Automation of the processes of failure analysis, reliability assessment and product completion efficiency


The article proposes a mathematical apparatus for assessing the reliability of small-scale production. Methods for determining the efficiency of product completion works were tested. A software application was developed to automate statistical accounting and analysis of product failures.

For citation:

Afanasyev V.B., Vorobyov T.K., Mamaev V.A., Medvedev V.M., Tikhmenev N.V. Automation of the processes of failure analysis, reliability assessment and product completion efficiency. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2021;(1):76-84.

The efficiency of quality management systems at the enterprises of the defence industry is interrelated with the increasing complexity of information flows that accompany their activities, leading to the need of communicating the procedures of data processing, transfer and analysis of arrays of heterogeneous information. In this connection, the issues of organizing the information systems (IS) [1] functioning are of particular importance, including the automatic processing of statistical data in various modes along with real-time operation..

The idea behind statistical methods of product quality control is that the general characteristics of a tested batch of products are assessed based on sampled characteristics from a small sample of that given batch. This idea was initially put forward back in 1846 by Academician M. V. Ostrogradsky [2]. The statistical methods of product quality control are now widely used in multiple industries. These methods have a number of shortcomings. Figure 1 shows the lower confidence limits in the processing of experimental data depending on the amount of testing and the proportion of defective products in a binomial distribution at a confidence coefficient of γ = 0.8. The figure suggests that a sharp decrease in values occurs when the number of tests is less than 100, which does not allow high reliability values to be confirmed even at low failure rates or in the absence of failures. At the same time, due to financial reasons it is impossible to carry out a great number of tests with the cost of one product of tens of millions of roubles. For the same reason there is a problem of experimental confirmation of quantitative values of product reliability at the stage of research and development and in the course of manufacturing start-up.

The enterprises manufacturing navigation systems in the Russian Federation are currently undergoing a major overhaul, new equipment is being purchased and mastered, the modern technologies are being tested on the equipment. At the same time, modern inertial sensors are being developed and put into mass production. In other words, development of inertial sensors and absorption of the underlying technology on the new equipment are carried out simultaneously. In general, reducing the number of failures is achieved by long-term improvement of technology with fundamental science-intensive upgrades of critical technologies and “adjustment” of design elements of manufactured sensors to fit the established process operations. Accordingly, products of different years of manufacture have varying reliability indicators.

The given article presents the results of development and testing of the software and mathematical tools for assessing reliability and efficiency of modification of cost intensive small-scale products and their components.

According to [3][4], JSC GosNIIP has developed and tested a mathematical model, which allows evaluating quality and reliability indicators of manufactured products and purchased component parts, as well as the efficiency of product completions to eliminate root causes of failures. Since the requirements for experimental confirmation of maintainability and repairability assessment are not specified in the terms of reference and specifications for products, our general task of reliability assessment shall be narrowed down to describing the approach to the assessment of failure-free operation of products. Reliability assessment is performed by the computational and experimental method (CEM) [5], approved by the scientific and methodological department of the Customer, via bringing the product operating time in the life cycle (LC) process to the equivalent operating time under the conditions of intended use..

The initial data for CEM application are the following indicators: average operating time of the product (t), number of manufactured products (N), number of failures (defects) recorded and documented over the analysed period (k).

The total operating time T = t∙N for the period of manufacturing, testing and operation with account for the equivalence coefficient Kэ is reduced to the operating time under the operational conditions Tэ = T/Kэ and divided into cycles with the duration of each cycle corresponding to the product operating time under the conditions of intended use Tp, for which the probability of failurefree operation (PFFO) is estimated. The number of cycles n is derived accordingly from the relation: n = Tэ/Tp.

Since the number of cycles (total operating time) attributable to one product is considerably lower than the specified service life and the total storage time does not exceed the specified service life, the wear and ageing of products may be neglected, which means we can proceed from the condition of uniformity and independence of tests. In this case, the use of binomial distribution for a random k value [6] is allowed as well as the definition of the lower (Pн) and upper (Pв) limits of the confidence interval for the probability of failure-free operation by means of solving the Clopper – Pearson equations (1) and (2):

where γ – confidence coefficient.

Two methods for assessing the reliability of manufactured (final) products can be used (Figure 2). The first method allows to account for the final product (FP) operating time only (burn-in, pre-delivery and acceptance tests, functional tests and checks in the headquarters). The second method regards FP as a combination of component parts (CP) with account for their operating time from the moment of manufacture or passing of the functional test at JSC GosNIIP. This method can be divided into two: the minimum operating time of all CP is taken for reliability assessment while the final product reliability is calculated as a sequential connection of CP with a certain operating time. The confidence intervals of PFFO indicators at the confidence coefficient γ = 0.8, obtained by the three above-mentioned methods for three product modifications, are given in Table 1.

Table 1

Confidence intervals of reliability indicators

Fig. 2. Operating time definition diagram: FP – final product, CP – component part, OO – operating organization

The task of failure analysis, determination of necessity for correcting the design documentation and verification of reliability enhancement action plan efficiency has been established along with FP and CP reliability assessment and regulatory documentation requirements, e.g. GOST RV 20.39.302-98.

Two product samples are suggested to be compared for the purposes of estimating the product completion efficiency (significance). Since the product checks are independent, the binomial distribution can be applied here as well. The task of comparing two samples is solved, e.g., in [2]. In the solution process, the difference between the experimental frequencies ∆ is compared with a value ε corresponding to a sufficiently large probability α. However, the solution algorithm contains assumptions accepted due to the imperfection of computing technology dating back to the middle of the 20th century, that tamper with the estimate in case of a small number of tests. Therefore, in case of automated calculations it is suggested to use a more complex way of solving the problem of checking the binomial distribution parameter identity based on two samples through consideration of the decreasing product defectiveness hypothesis, as discussed in [6].

Let us estimate inertial sensor No. 1 completion efficiency as an example. It is established that 82 products were manufactured in 2013 with 4 failures over the first 2 years of operation. In 2017, 99 products were manufactured with 2 failures, respectively.

Let us solve the problem following the first method. Let us find the difference ∆ between experimental frequencies:

At a confidence coefficient α = 0.95, the tabulated value z0.95 = 1.96.

Since ∆ < ε, there is no reason to believe that the new design is better than the old one. The discrepancy between the results can be explained by random fluctuations.

Let us solve the problem following the second way. Let us denote the defectiveness in the first and second groups by p1 and p2. Then the random number of defects mi , obtained in ni products, shall be distributed as per the binomial law:

Let us put forward the following hypotheses.

Hypothesis H0: Product completion has no effect on reducing the number of defects.
Hypothesis H1: Product completion has effect on reducing the number of defects.

The distribution densities for the defectiveness factor based on the data obtained are:

The probability that the reduction in the number of defects is a consequence of the performed product completion equals:

This value is relatively small. In comparison, the probability of the effect from 2013–2019 product completion on the reduction of defects for inertial sensor No. 2 is 0.999981, indicating the efficiency of the performed activities. Consequently, both sample comparison methods fail to unambiguously assert that the performed product completion led to a statistically significant improvement despite sensor No. 1 defectiveness reduction more than two-fold. As demonstrated by the example provided above, the comparison between the point values of defectiveness factor is not indicative for small samples. Therefore, it is suggested to additionally use the visual method of grouping failures by production characteristics, which allows to analyse the effectiveness of individual activities on eliminating the root causes of failures.

As per the given method, a list of failures is compiled and examination reports (ER), provided by the manufacturer to the consumer with the repaired device, are analysed. These documents reflect the indications of failures, their causes and performed actions. The failures from the list are further divided into groups by their production characteristics according to ER. Failure grouping is a rather complicated, non-trivial task which is initially performed by an expert committee comprising the most experienced developers and reliability department employees. It is worth noting that most suppliers identify and clearly articulate the root cause of failures and develop a specific action plan to improve quality and increase reliability when those recur. Nevertheless, some suppliers have a flawed attitude towards the investigation, trying to dissipate the cause of failure. They pronounce the failure to be operational or unconfirmed, which significantly hinders the breakdown into production groups.

Further, the time axis is marked with the dates of products manufacture, dates of failures, dates of actions on eliminating the root causes of failures. Thus, the visual evaluation of performed activities efficiency is ensured. Figure 3 shows the most typical cases revealed by the analysis.

Group 1: the root cause of failures was eliminated, the main part of the defective products was identified through inspections and was not put into operation.
Group 2: the root cause of failures was eliminated but the failures occur during storage, therefore, defective products are likely to be revealed in operation during inter-regulatory inspections.
Group 3: the activities were carried out implemented, their efficiency must be further evaluated based on operational results.
Group 4: the implemented activities proved inefficient.

Let us consider the method using inertial sensor No. 2 as an example. The following production groups were identified for the sensor: 1 – failure of the built-in processor; 2 – failure of adhesive seams between the internal parts; 3 – displacement of the sensor element; 4 – failure of welding contacts ; 5 – other or isolated failures; 6 – unconfirmed failures (Fig. 4). No activities were undertaken for groups 5 and 6. The graph shows that manufacturing of sensors with production failures due to reasons in groups 1, 4 and 6 ended in 2015–2016, i.e. the executed activities proved to be efficient. Sensors with production failures due to reasons in groups 2, 3 and 5 continued to be manufactured until 2018. If necessary, the production group can be broken down into subgroups for additional analysis, given that sufficient statistical data are available. E.g., group No. 4 can be further divided into failures related to insufficient weld contact area and those related to welding machine contamination.

The ultimate goal of information flows processing is to provide the decision maker with the relevant data. The standards of JSC “Almaz – Antey” Air and Space Defence Corporation (Corporation) establish the following organizational levels of management (control): 0 – integrated structure; I – organization of the integrated structure; II – product; III – stages of product life cycle; IV – production and process systems [7]. Based on the specifics of information exchange at a particular level, an IS is developed to support it..

For instance, the automated information system of claims registration and quality analysis for defence products, established at “Almaz – Antey” Air and Space Defence Corporation and its enterprises, is intended to provide information support for the zero level of management. It provides assessment, analysis, justification and action plan implementation in the area of product quality and reliability, coordination and management of activities in the area of product reliability, production and process systems of manufacturing enterprises. The system processes and analyses the information received from subsidiaries, namely, quality certificates, quality management system reports and data on claims processing.

The system used by PJSC Tambov Elektropribor factory operates at the third and fourth levels of management. It has been developed primarily by process engineers and is intended for collecting statistics on the reliability of production and fabrication processes.

The IS used by JSC GosNIIP covers the first and the second levels of management. The prominent feature of the system is its practical orientation for it was introduced in order to automate the existing tasks in the Reliability Department.

Failure analysis process automation is realized by means of integrating software and mathematical tools developed in the Python language with the Reliability database (DB) (Registration Certificate No. 2018620285) controlled by nonrelational database management system (DBMS) MongoDB. The given DBMS permits addition of the data requirements in the course of project realisation, which allowed [8] to extend the IS functionality without re-engineering and to form a scorecard system for the standards implemented by the Corporation.

The developed DBMS is built based on a classical three-tier model. This model is an extension of the two-tier (client/server) model and introduces an additional intermediate layer between a client and server. The architecture of the three-tier model is shown in Figure 5.

Fig 5
. Three-tier database functional diagram

The first level is the client, responsible for the logic of presenting data to the user, as well as for the logic of the data management by the end user. At this stage, the client is realised as a web interface. The intermediate level of the three-tier system contains one or more application servers. In the present model, this level is represented by a Python-based application running on the basis of WSGI server waitress. The DB server is responsible for correct data downloading and uploading from/to MongoDB.

Within the database, the information is stored in the form of arrays, each of them representing a group of documents. The database is structured by calendar dates of events, product types, customers (consumers), CP supplier enterprises, LC stages, types of impact on products nature of defects (failures), which allows to automate the information processing and computation of product reliability. The mechanisms of DB functioning are described in more details in [9][10].

Thus, in the course of implementing of a set of software and mathematical tools aimed at automating the process of failure analysis, the following results have been obtained.

1. Mathematical models have been developed to assess reliability indicators of manufactured and purchased products, including the case of limited (small) samples, allowing to enhance quality and reliability control of FP and CP at the LC stages of production and operation.
2. Visual and computational methods for the assessment of failure root cause elimination activities and product completion efficiency have been suggested and implemented.
3. Software enabling the automation of product reliability assessment has been developed and implemented.

The results of the study can be used at industrial enterprises automating the processes of reliability assessment and product failure investigation.


1. Громов Ю. А., Минин Ю. В., Копылов С. А. Постановка и алгоритм решения задачи определения параметров структуры информационной системы в условиях неопределенности // Приборы и системы. Управление, контроль, диагностика. Выпуск 4, 2020. С. 32–39.

2. Шор Я. Б. Статистические методы анализа и контроля качества и надежности. М.: Советское радио, 1962. 552 с.

3. ГОСТ 27.410-87. Надежность в технике. Методы контроля показателей надежности и планы контрольных испытаний на надежность.

4. Вентцель Е. С. Теория вероятностей. М.: Наука, 1969. 576 с.

5. Расчетно-экспериментальный метод оценки и контроля показателей надежности. Общая методика. М.: АО «ГосНИИП», 2019. С. 21.

6. Сухорученков Б. И. Анализ малой выборки. Прикладные статистические методы. М.: Вузовская книга, 2010. 384 с.

7. СТ ИС КОНЦЕРН ВКО 02.1–101–2019. Комплексная система управления качеством и надежностью оборонной продукции интегрированной структуры АО «Концерн ВКО «Алмаз – Антей». Общие положения.

8. Афанасьев В. Б., Медведев В. М., Остапенко С. Н. и др. Реализация оценки показателей качества и надежности продукции предприятия оборонно-промышленного комплекса // Известия Российской Академии Ракетных и Артиллерийских наук. 2020. № 3. С. 18–24.

9. Афанасьев В. Б., Медведев В. М., Остапенко С. Н. и др. Управление качеством продукции на предприятиях ОПК с использованием инновационных технологий // Известия ТулГУ. 2019. № 12. С. 3–10.

10. Афанасьев В. Б. Особенности проектирования системы информационной поддержки качества продукции оборонного предприятия // Известия ТулГУ, технические науки. 2020. № 5. С. 255–269.

About the Authors

V. B. Afanasyev
V.P. Efremov Scientific and Educational Centre for Aerospace Defense “Almaz – Antey”; JSC “GosNIIP”
Russian Federation

Afanasyev Viktor Borisovich – Post-graduate Researcher; Head of the Reliability Department. Research interests: reliability and quality of HT and ST samples, quality management system and reliability of mechanical engineering enterprises.

Moscow, Russian Federation

T. K. Vorobyov
Russian Federation

Vorobyov Timur Konstantinovich – Leading Engineer-Mathematician, Reliability Department. Research interests: probability theory, mathematical statistics, electro-radio product reliability.

Moscow, Russian Federation

V. A. Mamaev
JSC “GosNIIP”; Moscow Aviation Institute (MAI)
Russian Federation

Mamaev Vladimir Alekseevich – Senior Engineer-Mathematician, Department of Reliability; Master’s student. Research interests: automation of reliability assessments.

Moscow, Russian Federation

V. M. Medvedev
Russian Federation

Medvedev Vladimir Mikhailovich – Dr. Sci. (Engineering), Professor, General Director. Research interests: technical diagnostics, organizational and technical issues of product operation management.

Moscow, Russian Federation

N. V. Tikhmenev
Russian Federation

Tikhmenev Nikolay Vadimovich – Cand. Sci. (Phys.-Math.), Senior Researcher, Reliability Department. Research interests: laser gyroscopy, vacuum and optical inertial sensor technologies, technological assurance of reliability.

Moscow, Russian Federation


For citation:

Afanasyev V.B., Vorobyov T.K., Mamaev V.A., Medvedev V.M., Tikhmenev N.V. Automation of the processes of failure analysis, reliability assessment and product completion efficiency. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2021;(1):76-84.

Views: 450

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2542-0542 (Print)