Journal of «Almaz – Antey» Air and Space Defence Corporation

Advanced search

Problems of evaluating development testing quality of missile prototypes in full-scale experiments at the development testing stage and ways of solving these problems


The paper analyses whether the volume of new generation missile testing is sufficient for a given development timescale. We consider development testing specifics for the new generation of missiles. The paper cites labour input estimates for debugging the software used in contemporary surface-to-air missiles. We present an approach to estimating missile design and development process quality according to a combination of indices, taking into account how urgent the problem under consideration is for a range of leading developers, as well as for the benefit of the customer.

For citation:

Doronin V.V. Problems of evaluating development testing quality of missile prototypes in full-scale experiments at the development testing stage and ways of solving these problems. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2018;(2):35-52.


A specific feature of the contemporary stage of sophisticated military equipment development testing is close supervision of all test processes on the part of the ordering structures of the Ministry of Defence (hereinafter referred to as “customer”), which often combine most functions of planning, financial support, control, and evaluation of results.

For proper understanding of the twists and turns of the development testing process of new-generation engineering products and drawing subsequent conclusions about the actual state of affairs for the purpose of taking corrective action if necessary, it is required to have a technical staff with quality level no less than design engineer for each one of the projects being maintained. A task like this is infeasible for a number of obvious reasons.

An alternative to carrying out a profound analysis of the design and testing processes is availability for the customer of a toolset based on relatively simple criteria and indices for evaluating the quality of the results of works being performed, which could be used without possessing any special knowledge in all design activity domains.

Applying elementary approaches to those ends, such that do not consider the design specifics, is inadmissible due to the presence of major inconsistencies, which might disavow the very idea of unbiased evaluation of the work quality and results obtained. When resting upon incorrect conclusions, the customer substantially increases a risk of making wrong managerial decisions, which in its turn may lead to schedule overrun because of ill-founded interruption of works and violation of the set plans.

An example of simplified evaluation of work results at the stage of development testing of surface-to-air missile products is assessing the result of each launch according to the “hit-ormiss” criterion. This evaluation procedure is the
most commonly used by the customer. Based on it, an inference is drawn on “success” or “failure” of not only a particular full-scale work performed but also the state of development as a whole. No matter how appealing and illustrative this approach may be for the customer, it is unacceptable for the developer. It will be shown below at what stage of tests and under which conditions the above simplified approach can be used without compromising common sense. In most cases, to evaluate the test results, a different solution is required.

Following on the discussions with engineering managers of the leading design teams involved with the related subjects, the author of this paper may conclude that the considered problems are the concern of many.

A number of projects implemented in JSC “MKB “Fakel” have enabled to create a metho­dology for evaluating development testing quality of missile prototypes in full-scale experiments at the development testing stage, making it possible to take into account a number of objective factors for obtaining valid assessments of the current state of a particular development.

The development testing is understood as an aggregate of the processes of designing, manufacturing, flight testing of missile prototypes, and introducing changes into the design and software so as to improve functional capabilities of those prototypes according to the customer’s requirements.

The said methodology has been tried out at the stage of development testing of new-generation products, receiving positive assessments from a number of organisations involved in the activities of commissions evaluating the state of developments implemented by JSC “MKB “Fakel”.

At the same time, this approach faces stubborn negation on the part of the MoD ordering structures in favour of the simplified “hit-or-miss” methodology for evaluation of results.

In order to have an insight into this problem, it should be relevant to highlight certain specific features of testing advanced items of weaponry and combat equipment.

Specific features of the development flight testing stage of new-generation missile engineering products

When conducting development tests of missile engineering products, a relevant task to be solved is checking serviceability (correct operation) of all onboard equipment modules, integrated software, and units responsible for accomplishment of the flight tasks.

Operation correctness of the onboard equipment responsible for the flight’s final leg [1] is the most difficult to check, since it requires proper and timely execution of their respective tasks by other onboard units and equipment before the start of flight’s final leg.

By the amount of equipment and computational resources employed in contemporary surface-to-air missiles, the latter considerably exceed missiles of previous generations. The developers of surface-to-air missiles ever more often use the term ’digital missile’. This term in many aspects addresses specific features of modern missile onboard equipment configuration, as virtually each one of the modules of this equipment includes an individual computer with the implemented source code. The entire onboard equipment of a contemporary surface-to-air missile is, as a rule, covered by a common computational network and is engaged to the full extent seconds before task execution. Within this short time, the maximum workload is handled by the integrated computational facilities.

At each of the intermediate stages of the onboard equipment operation, situations may occur that will result in deviations from the desired course of events. Based on personal experience and the available data from international publications, we can point out that the most frequent pre-conditions for occurrence of some or other unwanted situations on board a digital missile, after successful passage of the initial phase of equipment complex development testing, are created by the computational algorithms implemented in special-purpose software. Developers and manufacturers of the onboard equipment do not always have a full picture of possible situations involving interaction of the onboard systems. A sufficient notion of operation and interaction of all subsystems of the missile can be obtained in the course of flight experiments at the closing phase of the tests.

For debugging of the said algorithms on the ground, special process equipment is designed: test benches for simulation modelling, hardware modelling, and half-scale modelling. Verification of the said equipment complex and mathematical models requires a number of full-scale experiments with detailed recording of a large amount of measured parameters.

Full-scale experiments are performed in certain environmental conditions, which do not always match the planned ones. For that reason, as well as taking into account the sequence of operations required for constructing and debugging mathematical models that demonstrate the obtained results, verification of models and process facilities both of individual modules of the onboard equipment and missile as a whole takes a fairly long time.

The development testing stage duration also depends on:

  • equipment complexity;
  • range of its application conditions;
  • size of digital code in the executable algorithms;
  • developer team qualifications;
  • experience in similar system designs;
  • availability level of laboratory and testing resources, and many other important factors.

At the stage of full-scale development testing occurrence of all sorts of problems, such that prevent obtaining a final result with verification of the entire amount of algorithms, is inevitable.

Therefore, at the initial and subsequent stages of new-generation missile product development testing, development manager is often faced with a problem of how to communicate to the customer the actual state of a development project after occurrence of some or other unforeseen situation.

For a design engineer the problem of unbiased evaluation of results is not critical, since the development testing of products is performed in line with the controlled work sequence, continuously correcting the design and algorithms based on the results of each piece of work while approaching the specified requirements. The importance of this evaluation increases during interaction with the customer, when the obtained results are interpreted solely in the light of situation understanding by representatives of the customer’s structures. The developer’s opinion and arguments may be downright ignored.

Developers of new engineering products are well familiar with a paradox situation when an unforeseen problem occurring during the tests, such that is not associated with the quality of experiment preparation or personnel errors, is a useful result after all, even though on a formal level the tasks have not been accomplished to the full scope. Experts would know that by the results of failure a cause will be revealed that was unaccounted for because of the lack of knowledge about it. In preparation to the next work, this cause will be investigated and measures to preclude or parry its effect will be taken. From this perspective, obtaining a negative result is always a step towards improvement of a given parameter at the next stage.

Examples are known from practice when the same negative result was obtained in several flight tests in a row. A distracted observer will have an impression that work is being done in vain, by the method of statistics accumulation. However, given proper job management, there is always progress from work to work, with new details revealed, new hypotheses tested, unsupported ones discarded, new technical solutions mastered, additional checks and tests performed.

As mentioned above, the customer believes that the goal of full-scale works at the development testing stage is not verification of some or other technical solutions, algorithms and processes, aimed at improvement of the design, but rather achievement of an integral result, i.e. complete fulfilment of specified requirements by all subsystems irrespective of the development stage and the tasks faced by the designer. Any unforeseen situations occurring at the test site will in most cases be interpreted by the supervising structures as a failure of not only the tests at hand, but also the development as a whole. Obtaining several problematic results within a short period of time leads to an expectable reaction: “up-the-line” reports, work suspension, appointment of commissions of inquiry, investigations, “dressing-down” sessions, and other methods of administrative enforcement of the development process. As a result of unscheduled interruptions, work is resumed anyway, additional measures are taken, new deadlines set, etc. The main causes of time losses are associated with customer’s incorrect evaluation of work results at the design testing stage and, as a consequence, frequent intrusion into the development testing process. It should be pointed out that in any case financial liability for most of the results lies with the developer.

The above discrepancy in the criteria of result evaluation by the developer and the customer arises from the absence of objective accounting for the ratio of the total amount of required tests, which depends on the novelty degree of missile equipment and design, and the amount of fullscale works given in the contract. Because of incorrectly planned amount of flight tests, the amount of full-scale works performed can be a few times less than necessary for carrying out all the checks. Despite using models and half-scale test benches, if the amount of full-scale tests is insufficient, latent problems may manifest themselves at the final test phases, including the customer acceptance stage.

In this way, there is an objective discrepancy between the number of flight tests stipulated in the contract and the presence of such an amount of onboard equipment, units, and algorithms that an end-to-end check of them all would require a number of launches which is often higher than the planned scope by an order of magnitude.

Overcoming this discrepancy can only be possible by aggregating a large amount of checks in a single launch, wherein a probability of solving all the tasks planned may differ from the desired one at the initial stages of tests.

In view of the above, at the initial stage of development flight testing, deviations from the anticipated ideal result are inevitable.  In the course of work, an averaged assessment (integral index) of missile functional quality is to grow in each successive launch. It is the growth of this integral index of missile functional quality that is an indicator of the propriety of the path being followed in development of a new-generation product. The dynamics of such growth may be indicative of work difficulty, developer’s qualification, adequacy of the planned work scope, etc.

A most important requirement to successful conduct of the development testing stage of surface-to-air missiles (SAM) is a demand for authentic and high-accuracy data of missile trajectory ground-based measurements, video recording of the processes of missile encounter with a target, made from several points for determining combat equipment operation efficiency, as well as obtaining telemetry information from onboard the missile. With such information at hand, it is possible to have a detailed pattern of operation of the entire onboard equipment, units, and missile as a whole. In this case, considering product complexity and presence of a large amount of equipment, the tests will run successfully for the developer.

Estimating complexity of missile products and its relationship with the necessary testing scope

To estimate the necessary number of flight experiments for missile products under development, the following factors should be considered: the number of onboard equipment (OBE) functional modules to be tested, availability of software in them, level of hardware novelty and special-purpose software refinement (novelty). An estimate of a probability of achieving the end result in a single full-scale SAM work will look as follows:

where P1 – probability of achieving end result in a single full-scale SAM work;

k – number of functional modules (units) in SAM onboard equipment (OBE);

– probability of task execution by hardware part of the i-th OBE module (unit);

 – probability of task execution by software part of the i-th OBE module.

Expression (1) makes it possible to estimate end result dependence on the refinement state of OBE, respective software (SW), and units. In an ideal case, Р1 ≈ 1. In practical calculations however, the result is substantially different from the ideal one. Even under serial production of proven equipment, there is a reliability factor for components and equipment on the whole, which under no circumstances will allow to obtain a unity value in the above ideal case.

At the initial stage of flight tests, most OBE modules have not quite high functional readiness levels. The greater the share of new equipment, the higher the risks that not everything is envisaged in advance, and the result of the tests may differ from the expected one. In presence of a sophisticated SW, whose entire range of operating conditions often cannot be checked for objective reasons, the value of Р1 may be very low.

Example 1. Let there be 10 functional modules and units on board SAM. Suppose that half of them are borrowed from other proven products (the total novelty level of missile development – 50 %). Let us assume that, with reliability taken into account, the probability of task accomplishment by the borrowed equipment modules and units is piапп piПО = 0.99. Let the rest of modules and units have the product of respective probabilities piапп piПО = 0.8. It is easy to obtain the result: Р1= 0.31.

In this way, in a single full-scale work, when all SAM modules and units operate without deviations and the desired end result is achieved, under the conditions of assumptions made, a probability of task accomplishment to the full scope does not exceed 31 %.

Such situation may be characteristic of the initial stages of development testing. In the course of design advancement, taking into account the experimental data obtained, elaborating on the operating conditions of facilities, debugging special purpose SW, etc., the above probability will be growing from work to work. However, even if task accomplishment probability obtained for each of the functional modules is 0.99, the integral value of probability Р1 will not exceed 0.9.

If it be assumed that the product is a new development and the applied modules and units are not installed, the initial value of probability Рmight even fail to reach 0.1.

Using equation (1) for estimating probability Р1 is not quite convenient, because the values of parameters piапп piПО are difficult to obtain by means of analytical expressions. On the other hand, with the method of expert assessments applied, obtaining those values becomes simpler.

Let us assume that each of the values of piапп and piПО lies within

This assumption is based on that at the stage of development testing of modules and units on the ground, such a level of readiness must be reached at which the share of positive outcomes in task execution by a functional module in any given flight experiment would be at least no less than the share of negative outcomes.

For assessment of attainable outcomes of the full-scale works at different testing stages, with consideration of (2), we shall introduce an expert scale:

piJ = 0,8,                                                         (3.1)

if novelty degree of the i-th module or unit (J = “апп”) or software in the i-th module or unit (J = “ПО”) is high;

PiJ = 0,9,                                                        (3.2)

if novelty degree of the i-th module or unit (J = “апп”) or software in the i-th module or unit (J = “ПО”) is medium;

piJ = 0,95,                                                      (3.3)

if novelty degree of the i-th module or unit (J = “апп”) or software in the i-th module or unit (J = “ПО”) is low;

PiJ = 1,                                                             (3.4)

if novelty degree of the i-th module or unit (J = “апп”) or software in the i-th module or unit (J = “ПО”) is absent.

For making assessments it should also be convenient to assume that if in the i-th module or unit there is no SW at all, then piПО = 1.

For better attractiveness of the proposed approach in practical use, it will be relevant to introduce coefficients (3.1)–(3.4) vs. time dependence:

PiJ = PiJ (t).                                                        (4)

This dependence means that in the process of development testing, by the moment of time T0, given a successful completion of tests t→T0, the values of coefficients PiJ (t) will tend to the upper limit of inequality (2). Hence, with successful completion of design works and tests of a new missile prototype, all the parameters of equipment functioning quality will approach their maximum, and the resultant probability P1 will reach the specified values.

Let us estimate the amount of “pointwise” full-scale works N that have to be carried out to reach the set goal given the assumptions of Example 1. A “point” in the tests is commonly understood as fixed application conditions (predicted parameters of missile encounter with a target, operating conditions of the support facilities, ambient factors, target parameters and flight conditions, and the like).

With the probability of obtaining a desired end result being no less than 0.95, it is easy to obtain the expression

If a required number of situations for the full-scale works be considered, including the number of checked “points” of the kill zone, target types, kinds and states of the underlying terrain (for onboard systems with homing guidance), and other similar conditions, then the total required amount of full-scale works may exceed several hundreds.

Example 2. In case of completely new engineering products, there are no modules applied. Let there be 10 functional modules and units on board. Then, assuming for clarity that equipment novelty level is high (3.1) for all modules and units, while SW is present in only half of them, we have for the initial stage of the flight tests:

P1 = 0,810 · 0,85 = 0,035.

It means that at the initial testing stage of sophisticated equipment, which features a high share of novel modules and units, obtaining an end result as desired by the customer and checking each and every module and unit for proper operation is all but impracticable.

It should be mentioned, too, that the result in Example 2 is rarely obtained in practice and can be regarded as an extreme case. In most situations the level of equipment advancement is higher than in the said example.

Also, the rate of parameter piапп growth from one full-scale work to another is quite high, therefore

where t – current time;

t0 – flight experiments starting time;

∆Ti – period of development testing to a required level of the i-th equipment item, which may take from one year to several years.

As concerns special-purpose SW, situation here is, as a rule, radically different from the hardware part problems. The time of SW debugging considerably exceed that of the hardware part. As a rule, it proves infeasible to reduce the time of SW development testing. Specific features of SW debugging are considered in the next section.

Specific features of applying digital systems on board advanced missile engineering products

A demand for high-quality software to be used in contemporary advanced missile engineering products is indisputable. What a ’high-quality SW’ actually is, and how high its costs are – this, regretfully, is not always properly perceived not only by the customer, but also by some developer structures.

As provided in paper [2], any particular behaviour of a software system is regarded as a certain path in a discrete space of states and, notably, it is practically impossible to go over all of those paths in the course of testing. If a given software product has the total of N independent16-bit variables, then the lower estimate for the total number G of its states is expressed by equation G = 216N. For quite a small program, number N = 10. The total number of states for such SW will be over 1.46×1048. A human life will not be enough for complete verification of even a program like this. In typical combat programs employed in onboard devices, the number of variables exceeds 100 units.

Incorrect program behaviour on any one of the multitude of execution paths may be conditioned by a non-detected error or an incorrect algorithm, which may cause breakdowns in operation of the entire system.

The above discrepancy between the impossibility to check the whole multitude of states of a digital system employing voluminous software and the necessity to ensure debugging within a short period of time by a team of developers which is limited in numbers will be usually solved through application of indirect random testing technologies, taking into account a large aggregate of various factors.

The known techniques of testing the quality of software products are based on using the aggregate of metric data characterising the current state of the product under development, the progress of the development process, the achieved maturity level of the developer organisation, and many others [2]. There are over 500 different known measurable parameters (metrics) one way or another related to the development of software products.

Making no claims on the analysis completeness, we can mention some of the metrics applied which may prove useful for analysing specialpurpose software to the part of dynamics and quality of its debugging [2].

Distinguished in the basic group of so-called product metrics are the initial requirements, variability of requirements, entirety and inconsistency of requirements, their completeness, system components, technologies applied, code size, code branching, code parallelism and complexity, code quality, post-release code defects, product innovativeness, consumer feedback, problems in perceiving the essence of certain processes, etc.

Distinguished in the project metrics group are the following positions: effort (total labour input by project phases), performance (measured in KLOC per person-day), project duration, automation level in executable code development, project cost and “not to exceed” limit, cost of a line of code, error rate, number of developers, defect density, project team continuity, hardware and software platform experience, effectiveness and cost-efficiency of testing, and a number of other parameters.

Further used in the paper are designations KLOС and KAELOC (KLOC – Kilos (thousands) of Lines of Code in the C language; KAELOC – Kilos (thousands) of Assembler Equivalent Lines of Code). Conversion of KLOC into KAELOC for the C language is done by a 2.5-fold increase, and for language С++ – by 11-fold increase [2].

In the process metrics group, developer’s maturity level, compliance with qualification, team experience can be used, as well as a number of others.

Given in paper [2], with reference to the original American source, the data are summarised for some metrics of the U.S. industry for the year 2000 (see table).

U.S. Industry Benchmarks for 2000

When determining program quality by estimating the number of residual defects, the N Sigma (Nσ) level approach is often used, with the lowest level (Sigma) allowing about 700,000 defects (errors) per 1 million lines of the initial code and the highest program quality level (Six Sigma) allowing on the average just 3.4 errors per one million lines of the initial code. At present, level Six Sigma is accepted in the world software industry as a quality benchmark for reliable SW; however, very few avail of this level.

In conclusion to the analysis of the methods for evaluating software product quality, it is relevant to cite the COCOMO model [2], which offers three formulas for calculation of the most critical indices of software development:

Effort = 4.6 · (KLOC)1.2 (person-months);             (7)

Development_time = 2.5 × (Effort)0.32 (months);  (8)

Staffing = (Effort/ Development_time) (persons)    (9)

The above calculation ratios hold true for making estimates in development of embedded SW, given the most stringent requirements.

The amount of embedded SW code lines in the onboard systems may reach 104...10and more, depending on the purpose and complexity of the tasks handled.

Example 3. For onboard equipment with 10code lines, we have the following estimates:

  • total effort for special-purpose SW development – 1150 person-months;
  • development time – 24 months;
  • staff of qualified SW developers – 48 persons.

It is obvious that with a smaller number of professional developers the time of project execution will increase respective number of times.

The number of SW defects that may remain undetected upon development completion will be of the order of 80...100. Those defects may emerge much later, at the stage of operation, becoming preconditions for situations that will lead to non-fulfilment of the task as a whole.

Continuing the analysis of the SW debugging timescales, it should be expedient to turn to the international experience again.

Publication [3] of the Institute for Defense Analyses (USA) offers the results of research on complex analysis of the influence of program code volumes in embedded computation systems on the development time according to the data of a number of projects, including those of NASA. The said results somewhat differ from those given in Example 3. However, they cover a wider range of projects of different orientations both in the military and space branches of the American economy. The existing differences can be explained by that in the first considered case [2], labour input for debugging of the finished program code is estimated. The labour inputs from [3] are given for debugging of a new SW, when this process undergoes changes, new algorithm branches are added, newly established links in the form of new codes revealed in the course of testing are taken into account, functioning processes are elaborated on, etc.

According to the data of the aforementioned American research, the mean time of development and debugging of the SW for sophisticated computer-aided systems of military purpose is, on the average, steadily constant, amounting to 5–8 years. Attempts to accelerate this process would lead to a dramatic rise in costs in the absence of a desired result. A similar increase in the project costs occurs also when work schedule is overrun relative to the optimal timescale.

Given in paper [3] is an analytical dependence of mean time t (in months) required for development and testing of sophisticated military-purpose computer-aided systems on the complex index of work difficulty S:

where at S = 0, “difficulty” is absent, and the work implies modernisation of a proven product; at S = 1, “difficulty” is at the maximum, with everything, including the hardware and software components, to be created from scratch.

The obtained approximation is constructed from a multitude of results, with dispersion of the
development time estimates amounting to 0.7165 according to the authors’ data. Expression (10) raises questions as to correctness of dependence presentation, with apparently overstated accuracy of the coefficients, but is quite suitable anyway for making estimates.

It can be estimated that with SW sophistication level of 0.6 as per the [0, 1] scale, the duration of SW debugging cycle cannot be below 4 years. For a 0.9 level (high novelty of the SW and equipment), an optimal duration of SW development testing cycle from the time of creation till the completion of tests will amount to 7–8 years.

Fig. 1 shows a plot of relationship (10) between debugging timescale t(S) in years and difficulty index S.

Fig. 1. Function of time t (years) of military-purpose computer-aided systems development vs. complex index of difficulty (S)

Publication [3] also provides dependence of the duration of development and deployment cycle of a sophisticated computer-aided system on the amount of program code lines for the standards of American companies – developers of weaponry and combat equipment (WCE). Thus, given the amount of code lines of ~1 million, cycle duration is 5 years, and with 10 million code lines, respective duration increases to 12–13 years. This relationship is represented in Fig. 2.

Fig. 2
. Function of minimum time Y (years) for finalising military-purpose computer-aided systems development vs. code lines S for the level of U.S. design organisations

Notably, even though the U.S. Department of Defense has undertaken numerous research studies into these trends, so far there is just a statement of the fact: the above development time periods have remained so over the last 15–20 years in spite of the fast development of SW devising and testing technologies.

Comparing the obtained estimates of the optimal time periods for development of sophisticated computer-aided systems, their implementation costs, and work deadlines set by the customer, a discrepancy between them can be noticed.

Fig. 3 shows the dynamics of development testing of products with different novelty levels of equipment and SW.

Fig. 3
. Functions of new product development levels during development testing vs. development time and product complexity

Three curves are given for comparison:

  • curve I corresponds to the dynamics of development testing of a product with the minimum changes introduced into a proven design and algorithms;
  • curve II corresponds to the dynamics of development testing of a product with design of a medium novelty level;
  • curve III corresponds to the dynamics of development testing of a product with design and SW of a high novelty level.

These dependencies illustrate the essence of the aforementioned estimates of the timescales for debugging of SW for sophisticated computer-aided systems.

As an index of development level E, an integral index of specimen quality can be used, taking into account the aggregate of modules and units and calculated as described in section “Methodology for evaluating development testing quality of missile products at the design testing stage”.

Depending on work deadline, as stipulated by the contract, and novelty degree of the development project, there may occur situations when development level E will not reach a required level Eдоп by the set deadline T0. In this case, the following inequality holds true:

Inequality (11) can only be eliminated by transferring time T0 to the right-hand part. Such situations will occur because deadline T0 to fulfil a contract for development of a new engineering product was set proceeding from the customer’s needs, without due consideration of the objective time requirements for development of a sophisticated computer-aided system. As a matter of fact, the tendency for shifting deadline T0 to the right is characteristic of many projects, both abroad and in the national military-industrial complex (MIC).

It should be acknowledged that complexity of a prototype under development is by no means the only cause for extending contract deadlines.

There is a possibility to select another path for solving the problem, such that will ensure referencing of the result deadline to the set time T0, admitting of the value ∆ > 0.

This condition registers the presence of a certain residual uncertainty in the level of SW development at the time of completion of the main stage of development and testing T0. Having an unbiased evaluation of the achieved development testing level of hardware, units, and SW of a complex product, it is easy to estimate the level of residual risks conditioned by development project incompleteness. Obviously, the bulk of tests should have been completed by T0. It is utterly unthinkable that a contract can be deemed fulfilled if there is no certainty that a task will be properly accomplished within the basic range of application conditions. At the same time, the absence of full-scale checks to cover the entire range of application conditions implies certain risks (∆ > 0) that in some cases problems may occur which had not been encountered earlier during the full-scale works.

In this way, a relevant problem is that of integral evaluation of the quality of missile engineering products development testing based on the aggregate data of operation of all subsystems for determining, in particular, the residual risks.

Methodology for evaluating development testing quality of missile products at the design testing stage

The data given above go to show that during the tests of a new engineering product with high content of digital equipment employing large program code volumes, when it is necessary either to go by the objective data of labour input for the development testing of such product and strive to go through the entire time cycle before completion of the development tests, or to determine a method for obtaining an achievable result with residual risks within the set deadlines, with subsequent completion of development testing of product accepted by the customer and put into service.

As mentioned above, the basic indicator used by the customer for evaluating the quality of missile product performing its functions is the index of launch success (UP), which registers achievement of the end goal, i.e. hitting a target or maintaining a desired trajectory of flight.

For a designer, this index yields little information. Even though at the initial stage of fullscale works a probability of obtaining an end result consistent with the accepted UP index is, on the average, small, in each full-scale experiment, after analysis of the results, the algorithms are corrected, SW errors and inaccuracies revealed, corrective action taken for the next full-scale work, i.e. operation quality of a missile product is improved through carrying out a sequence of fullscale works. This is a normal process of product development.

Let us estimate the value of mathematical expectation of the UP index over the time of development testing from the initial to the final phase. Using expressions (1)–(3), we distinguish three stages, at each one of which the value of probability P1 is relatively stable:

P1– probability of achieving end result at stage I (initial stage) of the tests;

P1II – probability of achieving end result at stage II (main stage) of the development tests;

P1III – probability of achieving end result at stage III (final stage) of the tests.

Then the mathematical expectation of the UP index value will look as follows:

where χi – indicator of positive outcome of the i-th test (equal to unity if the end goal is achieved and equal to zero if the end goal is not achieved, regardless of the cause);

I – total number of tests over the entire period of R&D work execution.

Let us transform expression (12), grouping the sum total into three addends:

where ∆I – number of tests at the first (initial) stage;

∆II – number of tests at the main stage of the development tests;

∆III – number of tests at the closing stage.

Assuming for simplicity that the intensity of tests is constant, expression (13) will be convenient to represent as follows:

where ∆TI – relative duration of the initial stage of the tests;

∆ТII – relative duration of the main stage of the tests;

∆ТIII – relative duration of the closing stage of the tests;

Т0 – total development time according to contract.

Considering that, as a rule,

an inference can be made that contribution of the third addend of expression (14) into the resultant UP estimate is insignificant. In this case the value of P1III is the closest to the required value of missile product efficiency, since at stage III (final) the product has basically passed the development testing routine.

This inference is an evidence of that the UP index used by the customer substantially underrates the real state of things in development testing of prototypes, when averaging is done over the entire testing duration Т0. Hence, this index will be approaching the best estimate only under condition that

ΔΤιιι ≥ ΔΤιι + ΔΤι.                                            (16)

Given the fixed time for completing the design process and testing of a prototype, condition (16) will only be met at the operation stage, after commissioning of the work. Another option for adequate application of the UP index is estimating mathematical expectation of the success of launches at the third (final) stage of tests only, which, as can be plainly seen, corresponds to probability P1 of launch tasks fulfilment:

where I3 – amount of full-scale works at the final testing stage.

In the accepted terminology, the final testing stage corresponds to the state tests stage.

For a design engineer it is useful to have a method for evaluating the quality of product
development testing through the entire process of new engineering product development and testing. Then, having achieved growth of the quality of product development (QPD) from launch to launch, it can be possible to judge about correctness of the selected technical solutions and methods for hardware, units, and SW improvement.

In this way, setting up a methodology for estimating the QPD index will allow to obtain a much better tool for evaluating success of the works performed than the UP index.

Let us consider the basic approaches to selection of indices and criteria for evaluating efficiency (outputs) of the studied processes.

Given that the analysis of flight experiment results requires evaluating an aggregate of parameters and operation outputs of many modules (units) and SW, this problem relates to the class of multiple criteria problems. Due to specific features of SAM functioning in flight, each unit (module) of the onboard equipment contributes to the end result. It seems impossible to single out the most important of those units (equipment modules), since a fault of any one of them will lead to non fulfilment of the task as a whole. At the same time, it is obvious that the complexity and reliability of the said units and modules is essentially different.

It is known that in order to obtain an evaluation of solution quality in multiple criteria problems, the method of convolution of particular indices is applied [4].

When using generic criteria, one has to operate their values that are usually devoid of conceptual (physical) meaning.

The main question arises: how to account for integral operation quality U of the aggregate of devices on board the missile during execution of a functional task?

A convenient method for practical application is that of multiplicative convolution, when the resultant index of process quality is obtained from the product (multiplicative convolution) of particular indices of operation quality of the subsystems of a larger system.

On the premise that SAM contains a certain aggregate of units, electronic equipment modules, special computing complexes, and control algorithms for the basic units and equipment, evaluation of the quality of joint operation must ensure an integral evaluation of the functional properties of a product. It will be incorrect to conclude that if one of the units (equipment modules) has failed in any given launch, then the operation quality of the entire aggregate of devices and units is unsatisfactory.

In view of the complexity of new-generation products, the QPD index must always take into account the scope of tasks accomplished by each one of the units (equipment modules).

Also, the selected index must consider the accumulated positive experience on development testing of the most crucial self-contained elements (equipment modules, units, SW) of the product by the results of previous work stages.

It is likewise important to take into account that in certain flight experiments the conditions of product application may differ from those planned. Strict adherence to the experiment conditions is the most critical requirement to the conduct of development tests. If, however, the conditions evoked by external factors have not allowed the entire product or its individual devices to accomplish their task, this circumstance must necessarily be accounted for in evaluation of the current work quality. The principle of collective responsibility is absolutely unacceptable for integral evaluation.

It is not correct to attribute the highest possible quality of task accomplishment to a device which is not to blame that it could not be checked. It should be right to presume that various outcomes were possible, but if there is a positive operation prehistory of a given module, this fact cannot be disregarded.

An analysis of the above requirements to the index of experimental launches efficiency evaluation at the development testing stage makes it possible to draw the following conclusions:

  • at the development testing stage the discrete criterion UP (U = 1 if a target is hit and U = 0 if it is not) not only has no practical value because it is not referenced to missile integral quality, but also will entail high resource expenditure when implemented (given above are the data for the necessary number of launches which is greater than any sensible amounts);
  • use of the probability indices cannot be acknowledged as relevant, because to estimate occurrence probability of some or other events, accumulation of statistics is required, which is also associated with a large amount of full-scale works. Each of the works intended for accumulation of statistics must be performed under the same combination of conditions, which is practically unattainable;
  • at the development testing stage the integral index QPD must be sensitive both to the final result of task execution and to the launch conditions, taking into account, among other factors, the infeasibility of achieving the set goal. Moreover, if, proceeding from the obtained data, the result can be reproduced on a verified mathematical model and brought up to the final phase (missile-target encounter), and the obtained target hit probability can be evaluated, then the launch can be regarded a success, which must be reflected in respective particular indices of the operation quality of devices and units.

Due to the fact that there was no readily available methodology for evaluating operation output of onboard equipment and units of the product under conditions of rigid limitation on the number of launches at the development testing stage, an approach was proposed enabling to overcome this problem.

Based on the analysis of requirements to the QPD index, it became possible to obtain an empirical expression for estimating success index U for test launch of product of the k-th type, conforming to the requirements described above:

where Ik – number of units (equipment modules) of SAM of the k-th type, which have functional self-containment and are determinant in the sequence of operations for accomplishing the main task in the launch;

ni– number of scoring checks of the i-th module (unit) of SAM in flight experiments in which it is established with certainty that a given module (unit) has accomplished its task;

αi– indicator of possible emergency or fault in the i-th module (unit): if emergency (fault) is proved by objective telemetry data or groundbased trajectory measurement facilities, then αi= 1, if not, αi= 0;

βi – indicator of the quality of task accomplishment by the i-th module (unit): if the task is accomplished and it is proved, then βi = 1, if not, and it is proved, βi = 0, and if the task could not be accomplished for reasons beyond control and because of that it was impossible to check module operation in a particular launch, then βi = 0.5, or βi > 0.5 if a module (unit) with a high degree of probability could have accomplished the task, since it would do this more than once in the previous launches.

Coefficients αand βare dependent in some sense; however, each one of them is indicative of the specific features of information obtained from a flight experiment, therefore these indicator indices shall undergo independent expert evaluation on the base of given full-scale experiments.

It is easy to see that the expression given has the following features.

Index U takes the values from 0 to 1, where 0 – the value of total unsuccess, zero quality of product development, when it is established with certainty that each and every one of the missile modules and units failed (which in principle cannot be the case), and 1 – it is established with certainty that the launch task was accomplished, i.e., given the proper execution of their functions by all modules and units, an evident result is obtained: target is hit under the existing experiment conditions. In this extreme case the estimate of U fully corresponds to UP index value.

Index U is testing-prehistory sensitive: the greater the number of scoring checks of operation quality of each of the modules UP (ni), the less the influence of a single emergency situation with a given module (unit) on the integral index U.
Taking this factor into consideration is crucial for the development testing, as breakdown of a module may occur due to a production defect, reliability failure, and other factors that do not directly change evaluation of the adopted design solution correctness. If the number of scoring tests of the i-th module being reviewed is small over a certain testing period, then the next emergency with this module will be indicative exactly of a design defect, requiring a profound analysis and improvement of the design of a module (unit) or respective SW. In this case contribution of emergency into the integral index will be tangible.

Index U is sensitive to the root cause of unsuccess: if the i-th module in the analysed launch operated correctly and this is established with certainty, then respective coefficient in the final product of multiplicands is equal to 1 regardless of the number of the obtained scoring outcomes for this module up to this moment of time. If the reviewed module (unit) has failed to accomplish its task due to reasons beyond control, yet it is difficult to unambiguously predict an outcome in case operation is still possible, then contribution into the final product of multiplicands will differ from unity. It will depend on the number of scoring results obtained previously. This is quite logical: if the number of scoring results is high, then there is no reason to suspect that a given module (unit) would fail its task this time around as well. It should be pointed out, too, that the role of subjective factors in making the said estimates cannot be excluded. However, as shown by practice, estimates of U obtained by different experts do not differ much. With the growth of nthose estimates converge.

For some equipment modules (algorithm branches) it is downright impossible to check operation in a given launch. This situation is characteristic of cases when the conditions for operation of such module have not been shaped. For such situations it is reasonable to accept the following coefficient values: αi= 0, βi= 1.

At the development testing stage, the relevant success criteria using index U are as follows:

U > 0.95 – successful launch;

U ≤ 0.7 – unsuccessful launch (from the experience, for the closing stage of development testing, the success criterion bar shall be raised up to a level, say, U*= 0.85);

0.7 < U < 0.95 – a successful launch on the whole, with success index U (for the closing stage of development testing, a stricter success criterion on the whole should be set: 0.85 < U < 0.95).

The value of unsuccess threshold equal to 0.7 is determined by the circumstance that by the time of development flight testing stage part of the equipment modules, units, and SW are development-tested and it will be unacceptable if lower values are obtained, otherwise it will be necessary to revise the results of development testing on the ground and autonomous flight tests.

For methodology verification, Example 4 is given.

Example 4. Let Ik = 15. Let us presume that the tests are at the initial stage, when the number of scoring launches is small. For clarity, we take the following parameter values for each one of the 15 OBE modules and units:

Index α, even though it is supposed to take a value of 0 or 1, may in this case be equal to 0.1, which will mean a non-zero probability of emergency operation of each of the equipment modules. Index β = 0.5 means uncertainty in the operation quality of modules (units) in flight, even though in the course of ground testing of the said modules and units their operation quality was confirmed by the modelling methods (mathematical or half-scale modelling).

After substitution of data in the basic calculation formula, we have U = 0.182, which indicates that the development level of structural elements (modules, units) of the products is too low. The scoring level, relative to readiness to the closing stage of development testing, is far from required. The development tests must go on.

After development testing at the autonomous flights stage, the parameter values may have better values. Let us take the following parameter values:

It means that the number of scoring checks of the equipment and units has risen, failure rate of a number of modules and units has considerably decreased, while the quality of task accomplishment by a number of modules (units) has reached the required level.

After substitution of data in the basic calculation formula, we have U = 0.749, which indicates that the development level of structural elements (modules, units, SW) of the products has substantially improved. The scoring level, relative to readiness to the closing stage of development testing, has approached the target value.

At the main stage II of the development testing process, the following data for individual equipment modules can be obtained:

It means that the number of scoring checks of the majority of equipment modules and units rose still further, failure rate of a number of modules and units kept on decreasing, while the quality of task accomplishment by the majority of modules (units) reached the required level. However, uncertainty concerning the modules associated with operation at the final flight leg (missile-target encounter) still remains.

After substitution of data in the basic calculation formula, we have U = 0.872, which indicates that the development level of structural elements (modules, units) of the products has closely approached the required level.

Finally, at the closing stage of the development testing process, the following combination of results by individual modules (units) can be obtained:

It means that the number of scoring checks of the equipment and units has risen still further, failure rate of a number of modules and units is virtually ruled out, while the quality of task accomplishment by the majority of modules (units) has reached the required level.

After substitution of data in the basic calculation formula, we have U = 0.977, which indicates that the development level of structural elements (modules, units, SW) of the products conforms to that desired. The scoring level with 0.977 index value means that the product is ready for handover to the customer.

The data given in Example 4 are conventional, used solely to illustrate association of the integral index of success with the quality and failure rate of the tested modules and units of products.

The requirements to the integral index of launch success at the development testing stage, as set forth in the previous section, are met. The proposed method for estimation of the said integral index is sensitive exactly to those properties of missile equipment and units that actually are evaluated at the design testing stage.

Hence, based on the analysis performed, it follows that for evaluating launch success of products at the design testing stage it is relevant to use the proposed approach. No other options for evaluating experimental work results, such that would ensure fulfilment of earlier formulated requirements, have been found in the available sources.

The presented methodology has been tried out in development testing of several new-generation missile engineering products, demonstrating sufficient sensitivity concerning evaluation of the results of each launch and the dynamics of work progress on the whole. Fig. 4 shows an exemplary view of smoothed function (18) of the QPD index vs. time and testing stages.

Fig. 4
. Exemplary function of QPD index vs. time for three testing stages

For comparison, Fig. 5 shows discrete estimates of success provided by the customer’s structures. Linear estimates demonstrate the result UР= 1, and the rest of the results are equal to zero. Notably, such estimates consider the end result only, regardless of the cause through which it was obtained. In the meantime, during the tests of complex missile engineering products such causes can be many: problems of the ground component of a surfaceto-air missile system, layout of training targets, onboard equipment quality, assembling quality, human factor at the tests, and so on.

Comparison between the plots in Figs. 4 and 5 demonstrates clarity of the QPD index estimate and total absence of it in case of the UP index.

Fig. 5
. Discrete estimates of UP index calculated by customer regardless of time and testing stages

If we consider earlier obtained expressions and estimates (1)–(4), as well as inequality (11), an obvious conclusion can be made that, given considerable limitations on time T0 allocated for the tests, the main goal of the tests will be not to obtain a particular end result within a limited time frame, but rather to check all possible situations of branch algorithms functioning. The said checks cannot be performed in full-scale works due to many limitations. At the same time, the only tool for solving the above problems is simulation modelling on verified mathematical models.

In this way, the main goal of full-scale works, given the little time available for testing, must consist in obtaining, in the first instance, the data for verification of object models, followed by voluminous computations for detecting errors and inconsistencies in the executable program code.

The methods of verification on simulation models are given in [5].


The presented approaches concerning the methods of test results evaluation are oriented, first of all, towards the customers ordering products of military purpose and R&D organisations of the Ministry of Defence for interpreting and accounting for regularities in development of modern engineering products when evaluating activities of the developing structures of MIC. Straightforward approaches to evaluation of those activities, those not taking into account specific features and limitations on conducting tests of new-generation equipment, are harmful not only to the economics of those structures due to multiple unfair fines and financial deductions on the pretext of schedule overrun, but also to the very process of development of national military equipment and weaponry which, as before, hold the leading positions in the world.

Learning to understand the problems of another party, establishing contact between the customer and the developer, avoiding ill-founded meddling with the process of new equipment development – this is the way to enable MIC
enterprises not only to stay afloat in any situations, but also to maintain leadership in the world arms market.


1. Проектирование зенитных управляемых ракет / И.И. Архангельский, П.П. Афанасьев, Е.Г. Болотов [и др.]; под ред. И.С. Голубева, В.Г. Светлова. М.: Издательство МАИ, 2001. 732 с.

2. Баранов С.Н., Тележкин А.М. Метрическое обеспечение программных разработок // Труды СПИИРАН. 2014. Вып. 36. С. 5-27.

3. Tate D. Acquisition Cycle Time: Defining the Problem // Acquision Research Symposium. Monterey, CA: Naval Postgraduate School, 2016. Pр. 80-83.

4. Подиновский В.В. Введение в теорию важности критериев в многокритериальных задачах принятия решений. М.: Физматлит, 2007. 64 с.

5. Верификация кода и обнаружение ошибок исполнения путем абстрактной интерпретации. Решение современных проблем тестирования встроенного программного обеспечения. URL: (дата обращения 30.07.2018).

About the Author

V. V. Doronin
Joint stock company “Academician P.D. Grushin Mechanical Engineering Design Bureau “Fakel”
Russian Federation


For citation:

Doronin V.V. Problems of evaluating development testing quality of missile prototypes in full-scale experiments at the development testing stage and ways of solving these problems. Journal of «Almaz – Antey» Air and Space Defence Corporation. 2018;(2):35-52.

Views: 351

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2542-0542 (Print)