ITECH7409
Software Testing
Assignment 1
Individual
Research on Software Testing and Standards
Overview
According to Standards Australia:
“Standards are published documents setting out specifications and procedures
designed to ensure products, services and systems are safe, reliable and
consistently perform the way they were intended to. They establish a common
language which defines quality and safety criteria.”
There are several standards, international and national, that relate specifically to software
testing. Standards formalize industry best practice and they are agreed upon by
professionals in the industry in which the standards apply to.
This assignment is an investigation into those standards.
The purpose of the assignment is to help you to:
improve your research and comprehension skills
develop a good understanding of professional industry standards for software testing
appreciate the value of various processes and methods used in industry to test and evaluate
software systems
Timelines and Expectations
Marks and Percentage Value of Task: 100 marks
Due: Thursday, May 2, 2019 @16:00 (Week 7)
Minimum time expectation: 10 hours
10%
Learning Outcomes Assessed
K1.
Critically evaluate software requirements as inputs toward testing of the final
solution;
K2.
Analyse and critically evaluate appropriate tools and techniques to support the
testing process;
K3.
Develop a good understanding of selecting and applying measures and models
used for testing that are compliant with the appropriate professional industry
standards such as the ACS and IEEE;
S1.
Analyse and critically evaluate software requirements and proposed solutions;
S2.
Apply complex decision making to select appropriate testing techniques;
S3.
Write professional level management, planning, quality assurance and testing
documentation for a software system;
CRICOS Provider No. 00103D
ITECH7409 Assig 1 Sem 1 2019-07
Page 1 of 5
S4.
Apply effective testing techniques to software systems of varying scales and test
levels
A1.
Develop and maintain plans for scheduling quality assurance tasks including
software testing;
Assessment Details
You will need to:
locate a research paper related to software testing that refers to at least one
standard.
research, comprehend and analyse each document (both the paper and the chosen
standard) to find relevant details to answer a set of questions, and
prepare a written summary report of findings
As a suggestion, commence your search for a research paper at the Federation University
website. There is a QuickSearch link on the library home page:
Type your
query here
CRICOS Provider No. 00103D
ITECH7409 Assig 1 Sem 1 2019-07
Page 2 of 5
There is also a listing of Databases A-Z (http://libguides.federation.edu.au/az.php ) which has a
link for Australian Standards Online.
Requirements:
Questions for the standard:
What is the standard name?
Who holds the copyright for the standard?
Amongst the acknowledged contributors to the document, which universities were involved
(if any)?
What is the scope or intent of the standard?
What are key terms and understandings needed for the standard to be understood and
applied?
In your own words, what does application of the standard result in? Or, in other words, what
does the standard do?
Finally, what specific relevance to software testing is the standard?
Discuss the paper and how it relates to the standard. For example: does the paper suggest
how to improve the standard? Does the paper highlight issues in applying the standard?
Attached is a sample paper (Wichmann and Cox 1992) which refers to the ANSI1/IEEE2
Standard 8293 and ANSI/IEEE Standard 1008. Although this paper is somewhat outdated, it’s
serves to illustrate the task for this assignment.
Prepare a report of no more than 1,500 words answering all questions. The report should have
the following structure.
an introduction to standards and a brief description of the research paper and chosen standard
responses to questions for the standard
listing and a discussion of commonalities and differences between the two documents
a conclusion summarizing the report findings
1
American National Standards Institute
Institute of Electrical and Electronic Engineers
3
Current versions of both these standards are available online at FedUni library
2
CRICOS Provider No. 00103D
ITECH7409 Assig 1 Sem 1 2019-07
Page 3 of 5
Marking Criteria/Rubric
Student ID:
Student name:
Mark
Assessment component
1. Introduction
20
2. Responses to questions
20
3. Listing and discussion of commonalities and differences between the research paper
and the chosen standard
20
4. Conclusion
20
5. Spelling, grammar and report presentation
20
Total
100
Final
/10
Comments
Submission
Your assignment should be completed according to the guides for your assessments:
https://federation.edu.au/library/student-resources/guides-to-your-assessments
You are required to provide documentation, contained in an appropriate file, which includes a front
page indicating:
the title of the assignment
the course ID and course name,
student name and student ID
a statement of what has been completed
CRICOS Provider No. 00103D
ITECH7409 Assig 1 Sem 1 2019-07
Page 4 of 5
acknowledgement of the names of all people (including other students and people
outside of the university) who have assisted you and details on what parts of the
assignment that they have assisted you with
a list of references used (APA style)
Using the link provided in Moodle, please upload your report as a Word file. Name your Word file in
the following manner:
_.docx
e.g. Aravind_ADIGA_30301234.docx
Also, upload a copy of the standard used by the research paper
Feedback
Assessment marks will be made available in fdlMarks, feedback to individual students will be
provided via Moodle or as direct feedback during your tutorial class
Plagiarism:
Plagiarism is the presentation of the expressed thought or work of another person as though it is
one's own without properly acknowledging that person. You must not allow other students to copy
your work and must take care to safeguard against this happening. More information about the
plagiarism policy and procedure for the university can be found at:
http://federation.edu.au/students/learning-and-study/online-help-with/plagiarism
Your support material must be compiled from reliable sources such as the academic resources in
Federation University library which might include, but not limited to: the main library collection, library
databases and the BONUS+ collection as well as any reputable online resources (you should confirm
this with your tutor).
Federation University General Guide to Referencing:
The University has published a style guide to help students correctly reference and cite information
they use in assignments. A copy of the University’s citation guides can be found on the university’s
web site. It is imperative that students cite all sources of information. The General Guide to
Referencing can be purchased from the University bookshop or accessed online at:
https://federation.edu.au/library/guides/referencing
References
Wichmann, B. A. and M. G. Cox (1992). "Problems and strategies for software component testing standards."
Software Testing, Verification and Reliability 2(4): 167-185.
CRICOS Provider No. 00103D
ITECH7409 Assig 1 Sem 1 2019-07
Page 5 of 5
SOFTWARE TESTING, VERIFICATION AND RELIABILITY, VOL. 2, 167-185 (1992)
Problems and Strategies for Software
Component Testing Standards
B. A. WICHMANN AND M. G. COX
National Physical Laboratory, Teddington, Middlesex, W l l OLW , U.K.
SUMMARY
What does it mean to say that an item of software has been tested? Unfortunately, currently accepted
standards are inadequate to give the confidence the user needs and the meaningful objective for the
supplier. This paper assesses the currently available standards, mainly in the component testing area
and advocates that the British Computer Society proto-standard should be taken as a basis for a
formal standard. The paper is intended for those concerned with software quality.
KEY WORDS Component testing Quality assurance Standards
1. PROBLEM STATEMENT
This paper considers the issue of software testing in the narrow sense, i.e. the execution
of the code of a system to attempt to find errors. The objective is to quantify or assess
the quality of the software under test, particularly in an objective manner which can
therefore be agreed by both a supplier and customer.
The reliance placed upon dynamic testing varies significantly from sector to sector. It
appears from current drafts that the revision of the avionics standard for safety-critical
software depends almost exclusively on testing (RCTA, 1993), while the U.K. Interim
Defence Standard (MOD, 1991) places much greater emphasis on static analysis.
Dynamic testing has been widely used within industry since the advent of the first
computers. However, the contribution that testing makes to software quality in practice
is hard to judge. Everybody producing software will claim that it is ‘tested’, and therefore
one must look beyond such superficial claims.
The classic work on testing is that of Myers (1979). In the author’s view, little progress
has been made in the practical state of the art since the publication of Myers’ book. For
more recent summaries of the art of testing (see Ince, 1991 and White, 1987). This paper
is an attempt to revisit the issues to see if some advance can be made; preferably so that
those who undertake testing can claim some measurable quality objective.
Nomenclature in this topic is not uniform and it is unfortunate that an Alvey document
covering this point has not been formally published (Alvey, 1985). However, this gap
will be filled by a BCS proto-standard (Graham, 1990), assuming this is published.
Although it is known that the absence of bugs can never be demonstrated by testing,
it is equally known that effective software testing is a good method of finding bugs.
O960-0833/9UO40167-19%14.50
01992 by John Wiley & Sons, Ltd.
Received 22 October 1992
Revised 26 February 1993
168
B . A . WICHMANN A N D M. G . COX
Would any academic, even those who discount testing, be prepared to fly in an aircraft
in which the software had only been subjected to static analysis? On the other hand,
there are good statistical reasons for questioning figures for the reliability of ‘highly’
reliable software (see Littlewood and Strigini, 1993). Professor Littlewood claims that
dynamic testing alone can only justify a reliability figure commensurate with the amount
of testing undertaken. This implies that the reliability requirements of some applications
areas, such as the most critical avionics systems, must be justified in terms of the quality
of the development process.
The paper concentrates upon component testing since this is the most incisive testing
method, and the method which is best understood, again from Myers. Of course, other
testing methods have an important part to play at different points in the life-cycle.
The central thesis of testing is that if sufficient effort is put into the testing process,
then confidence can be gained in the software, although it cannot guarantee correctness.
The confidence, depends, of course, on new tests being undertaken without faults being
found.
Specialized aspects of testing are not considered in this paper. Examples are as follows.
(1) Testing the ‘look and feel’ of a system. This can involve the use of a mouse, a
windows environment, etc. This is being studied under an ESPRIT project (MUSiC,
1993).
(2) Testing performance. With complex modern architectures, an analysis of performance can be quite difficult.
(3) Testing real-time and concurrent systems. These require special techniques, such
as in-circuit emulators, which are too specialized to be considered here.
2. TOOLS AND TECHNIQUES
2.1. Research
Much of the research in software testing has focused on the coverage of various aspects
of software, such as control flow features or data flow features. For example, Hennell
et al. (1976) proposed three levels of component testing based on control flow:
(1) the execution of statements;
(2) the execution of branches (both ways);
(3) the execution of ‘linear code sequence and jumps’ (LCSAJs).
The degree of testedness of each level is the percentage of items at that level which
have been executed during the tests. This hierarchy of so-called ‘test effectiveness ratios’
(TERs) was later extended in the work of Woodward et al. (1980). Rapps and Weyuker
(1985) proposed various levels of component testing based on data flow features, specifically the interaction between definitions and uses of variables. A good critique of coverage
metrics and an analysis of the theory of testing is to be found in the work of White
(1987). In this context, the point of interest is that coverage metrics clearly provide
objective measures and can be applied on a routine basis in industry.
SOFTWARE COMPONENT TESTING STANDARDS
169
Considering control flow testing further, the purpose of the LCSAJ is to provide an
achievable test objective beyond that of branches. Since a program with ‘while’ loops will
have infinitely many paths, it is not useful to measure a percentage of paths covered.
However, there are a finite number of LCSAJs in a program and hence 100% coverage
is feasible (in simple cases, at least). This is the conventional metric for statement, branch
or path testing, which is the ratio of items executed to the total in the module. If 100%
statement coverage were to be a requirement, this would have to be taken into account
in design and coding, since it would exclude defensive programming methods. In practice,
the coverage that can be obtained depends critically upon the nature of the code and is
not easy to predict in advance.
The company called Program Analyses provides the LDRA TestbedTMsystem which
allows the above TERs to be determined. Verilog’s LOGISCOPEm and Software
Research’s TestWorksTMare able to provide similar testedness metrics. Since these tools
work in many environments, there is little impediment to the above measures, or similar,
being used in practice.
It appears that relatively few organizations require or quote testedness metrics. Those
that do are mainly in the safety-critical or security sectors where a major driving force
is certification by an independent body for which the testedness metrics have an obvious
appeal.
The main limitation to the wider use of testedness metrics seem to be that of the lack
of an obvious value to the purchaser. If company A purchases software from company
B, what value is it to require a specific level of testedness? This question is addressed
further in section 6.
2.2. Accredited Testing
At first sight, testedness metrics seem to provide an excellent basis for objective testing,
which is the main requirement for accredited testing, as administered by NAMAS (the
U.K. national service for accreditation of testing services).
Several years ago, NAMAS commissioned a study (NATLAS, 1985), performed by
Data Logic, to see if such a scheme would be viable. The study concluded that from a
theoretical viewpoint, both statement and branch coverage measures would be feasible.
The results of the study were presented by NAMAS to industry to see if it required
such accredited test services. The conclusion was negative for reasons which have never
been totally resolved. The following points indicate some aspects of concern.
(1) It is important to check that the output from each test is correct, which can be a
significant cost factor.
(2) Devising tests to execute all statements or branches can also be expensive.
(3) If less than 100% coverage is obtained, the user may think that the software has
poor quality. In fact, it merely constitutes inadequate evidence of high quality.
(4) If 100% coverage is obtained, the user may place undue reliance on the correctness
of the software.
This experience seems to indicate a substantial gap between academic research and its
application in industry. This topic is addressed further in section 6.
170
B. A . WICHMANN AND M. G . COX
2.3. Guidance
The British Standard BS 5887 (BSI, 1988) is a guidance document, as are both (ANSI,
1983) and (ANSI, 1987). Other more general guidance material is available under the
TickIT scheme (DTI, 1992), but this does not cover the area in depth. Such guidance
material is useful for those undertaking testing but cannot be used for objective measurement or independent tests.
An interesting application of testing is in the assessment of an organization using the
SEI Maturity Model (Kitson and Humphrey, 1989). The overall aim here is to provide
a method of determining the maturity of the software engineering process used within an
organization on a five point scale. Whereas TickIT is restricted to the ISO-9OOO concept
of quality management, the SEI model is specific to software engineering and therefore
could be expected to address testing in some detail. The initial questionnaire used to
assess the maturity level of an organization used two questions concerned with regression
testing. Although this is probably the most important single aspect from the perspective
of overall project management, it does not meet the needs of an objective measure of
testedness for software quality.
A revision of the SEI Model (Paulk et af., 1991) handles testing in greater depth, but
only at level 3 in the model. (In fact, the lower levels are confined to management issues.)
This implies that no requirements are given for the lower levels of maturity, within which
the majority of companies actually fall. The main issues of testing are addressed at level
3, but there is no clear indication as to how a company can be assessed. For instance,
the key questions are as follows.
‘The adequacy of testing is determined based upon:
(1) the level of testing performed;
(2) the test strategy selected; and
(3) the test coverage to be achieved.’
At this point, a classic dilemma arises-if the test coverage is high, so will be the costs,
but if it is low, software quality may be jeopardized.
3. CURRENT PRACTICE
3.1. General
It is not easy to assess current practice, since reports from suppliers naturally reflect
best practice. From questionnaire responses given by attendees at a testing conference,
Gelperin and Hetzel (1988) concluded that only 5% provide regular measurements of
code coverage and only 51% regularly save their tests for reuse after software changes.
Gelperin and Hetzel note that the results of this small survey are biased, but that ‘some
observers believe that general industry practice is much worse than the survey profile’.
The Institute of Software Engineering (in Northern Ireland) reports that only about
one quarter of companies use regression testing on a routine basis (Thompson, 1991).
Similar, rather less quantified experience has been reported to the author by the Centre
for Software Maintenance at Durham University (U.K). They also state that since testing
SOFTWARE COMPONENT TESTING STANDARDS
171
is at the end of the life-cycle, there is a tendency for the amount of testing to be reduced
to allow the project to complete within budget.
The automation of regression testing is a substantial benefit in retaining the quality of
a software product over a long period. The problem is that setting up an appropriate
automatic test facility can be a significant investment. However, unless this automation
is undertaken, the temptation to cut corners by not performing testing on a ‘small’ change
is irresistible.
No information is available about other forms of testing, such as module and integration
testing. It is reasonable to assume that the situation is worse than the current poor position
for regression testing.
3.2. Compiler Regression Testing
NPL involvement with compiler validation enables us independently to assess the use
of regression testing by vendors. It appears that the majority of vendors do regression
testing on their compilers both against the appropriate validation suite and also against
internal tests. However, several suppliers of validated compilers do not undertake such
testing. This significantly reduces the quality of their compilers.
Regression testing in some contexts is quite difficult to manage effectively. For instance,
if company A provides a system for B which is subsequently to be maintained by B, then
handing over the appropriate regression testing technology may be difficult. Moreover,
the original tests are likely to have been undertaken at module level, while complete
system test may be more appropriate for the maintenance phase.
4. TESTING STANDARDS
4.1. Existing Formal Standards
In this section, a brief review of existing formal (i.e. those approved by ‘official’
standards-making bodies) standards is given.
BS 5887. This Standard (BSI, 1988) takes the form of guidance material for functional
testing of software, i.e. black-box testing of a complete system. There are no mandatory
requirements and therefore the document has little relevance to this study, which is
concerned with the quality implications of a specific level of test.
MOD 00-55. The U.K. Interim Defence Standard (MOD, 1991) is highly prescriptive on
dynamic testing. The essential requirements are as follows.
Test coverage monitor to be used.
All statements, and all conditional branches with both outcomes, to be executed
during testing.
Loops to be executed with 0, 1 and many iterations.
Results to be compared with executable prototype.
Module tests results to be computed in advance from design information.
172
B. A. WICHMANN AND M. G . COX
IEEE. There are two ANSI/IEEE standards on testing (ANSI, 1983; 1987) (and several
other standards which refer to testing in quality management and quality assurance).
These two standards are as follows.
(1) ANSI/IEEE Std 829:1983, Standard for Software Test Documentation. This document is 48 pages long, of which 8 are the main text, the rest being an example
and explanatory material. The scope is roughly a ‘completeness check list’, for the
documentation itself. It has little direct relevance to this study since it only handles
the documentation rather than the content and nature of the tests themselves.
(2) ANSI/IEEE Std 1008:1987, Standard for Software Unit Testing. The main text of
this standard is 24 pages long with the scope of a ‘standard approach to software
unit testing’. However, in spite of its date, this work does not take into account
the work of Myers (1979) in classifying the forms of white-box testing. Moreover,
the question of whether the testing of a unit has been performed in conformity to
the standard is not really addressed. Indeed, the standard could be considered as
a ‘guideline’ rather than a true standard.
JM178B (draft). This document is the current revision of the general avionics standard
for safety-critical software used internationally by both the industry and the certification
bodies (RCTA, 1993). The document has an objective that statements of conditions (both
ways) should be executed during testing. However, there appears to be no requirement
that this objective should be attained. There is an interesting additional requirement to
ensure that the testing based upon the source code is not defective in that the same
structural testing based upon the object code would require more tests. (This implies that
if a compiler generates a loop which was not in the source text, the testing must be
undertaken in just the same way as if the user had written the loop.)
IEC/WGB. This International Electrotechnical Commission (IEC) proposed standard (IEC,
1989) recommends but does not require boundary value analysis and path coverage. The
term ‘path coverage’ is rather unfortunate, since on all but trivial examples, complete
coverage of paths is impossible. Hence the possibility of executing all statements, which
is often feasible but difficult, is not considered. German work on the same area of safetycritical software uses several hardware oriented testing methods specified by Holscher
and Rader (1986).
IEChJuclear. This international standard for nuclear safety software (IEC, 1986) has an
annex concerned with software testing. This covers both systematic and random testing.
Table E4.b on page 111 of the standard requires that all statements and branches be
executed as part of the testing process.
ITSEC. The current version of the Information Technology Security Evaluation Criteria
(CESG, 1991) does not specify actual testing methods (this is under development).
However, the higher levels of conformance do require the provision of information that
is necessary for independent white-box testing.
SOmWARE COMPONENT TESTING STANDARDS
173
4.2. The BCS Proto-standard
The British Computer Society Software Testing Specialist Group has been developing
over about three years a highly demanding and quantitative standard for software component testing. The intention of the Group is to submit the standard to the British Standards
Institution once the method has been shown to be effective in practice by the members
of the Group. The comments here are based upon the current draft (Graham, 1990).
Although it is a draft, it is more complete and comprehensive in its treatment of testing
than the other formal standards noted here. In particular, it takes into account all the
classic methods of testing and even shows the relationships between them.
The broad objective of the document is to provide a rigorous standard against which
conformance can be judged. For instance, given an item of software for which two parties
have done the same type of testing under this standard, the actual differences in the tests
performed should be minimal. For this reason, random testing is excluded, although an
informative annex describes that method. It must be admitted that two good people
undertaking the same testing method are likely to get different results. However, the
broad objective is required if a sounder basis of testing is to be established. It is only then
that statements such as ‘100% coverage with method A’ becomes generally meaningful.
Fourteen types of testing, based upon coverage of various attributes, are listed in the
proto-standard. They are as follows.
Equivalence partitioning. A black-box testing method that uses one representative
for each class of input data that is handled differently according to the specification.
Cause-eflect graphing. Based upon a limited entry decision table produced from
the specification, from which a test is derived for each entry.
Boundary testing. A black-box testing method that uses a representative of each
equivalence class which is a boundary value to another class (or error situation).
Syntax directed resting. Testing based upon the syntax of the input data.
State transitions testing. Testing based upon each valid internal state change.
Statement testing. White-box testing of each statement in the source code.
Branch testing. White-box testing of each branch in the source code.
Branch condition testing. White-box testing of branches that is slightly stronger
than branch testing, since each predicate within a compound predicate in a branch
statement must be tested. For example, if a < b and c = d then implies two tests.
Branch condition combination tesring. Considers combinations of conditions and
eliminates impossible combinations.
Linear Code Sequence And Jump Testing. Determines LCSAJs and produces test
cases for the feasible ones.
Path testing. White-box testing based upon the use of all paths.
Data definition-use coverage. Ensures coverage of all data definition-use pairs.
Data definition computational use coverage. Restricts use to computational use.
Data definition predicate use coverage. Restricts use to use within predicates.
the standard can reasonably be described as a complete specification of existing
test methods meeting the requirement of objectivity.
The only significant problem with the standard is determining how it can be applied
in practice. Even in the most critical contexts, applying all 14 methods would be excessive.
Hence there is a need to provide:
174
B. A. WfCHMANN AND M. G. COX
(a) guidance on which methods to apply;
(b) a summary of the strengths and weaknesses of each method;
(c) the format for a ‘testing statement’ which should be used to report the testing
actually performed on a component;
(d) guidance for the formulation of appropriate requirements for testing to be used
by purchasers.
The existing document makes an attempt (in Appendix G) to provide ‘Proformas’
designed for stating testedness (actually achieved or to be required of a supplier). However,
this part of the standard is clearly less mature and will require further work to make it
acceptable to industry. A points system is proposed which seems quite arbitrary.
The comprehensiveness and technical detail of the standard makes it hard for nontechnical management to grasp. This gap must be filled-top management will not sanction
significant resources for testing unless the benefits are clearly understood.
It does not seem feasible at this stage for the standard to lay down the degree of testing
to be undertaken, since this will depend upon the criticality of the application, and it
would be hard to obtain agreement within industry. However, the BCS standard appears
to provide the measurement framework for the quantification of testing. Hence if a
supplier makes a statement about the degree of testedness, this should be capable of
independent validation. Wider use of the standard should allow industry to determine
the amount of testing required in each context.
5. SPECIALIZED TESTING
Testing makes a serious contribution to software quality, yet this is hard to quantify.
Moreover, the objective testedness metrics are not widely used or accepted. Clearly, if
stronger links of testedness metrics to software quality could be established empirically
this would help.
This section undertakes a further technical analysis of the potential contribution that
objective software testing can make to software quality. The analysis is undertaken by
means of examples in different application areas. In most cases, it is possible to claim
that the specific techniques have a measurable effect on quality.
5.1. Data Validation Modules
A significant class of commercial software consists of ‘data validation’ programs which
check initial input. By the very nature of such programs, it is straightforward to construct
test data that will exercise the statements or branches as desired. It may be equally
important to access all fields of a record that need not be directly related to the statements
(i.e. perform data-dependent testing as opposed to statement dependent testing).
However, if the validation is to be performed upon internal data, say from another
file, then data validation may well require the construction of special data files. Most
commercial systems have special utilities to construct such files. If the internal input data
cannot be constructed by such a utility or test harness, then a special program may be
needed to perform the testing required.
For this class of module, there is little reason not to provide test data to ensure
SORWARE COMPONENT TESTING STANDARDS
175
execution of all statements and perhaps all branches. The only potential problem is the
large number of individual test cases needed, but Graham (1991) has shown that this can
be handled well by test harness software.
5.2. Numerical Programs
‘The algorithms and [numerical] software community does not have a tradition of highquality performance evaluations. . .in a survey of over 50 papers to evaluate algorithms and
software for a certain class of problems, none of the papers was found to have used
consistently good experimental techniques; most were found to be poor in many respects.’
(Rice, 1981)
Education is required of the numerical software industry, although there are important
pockets of expertise, e.g. at the Numerical Algorithms Group Ltd., the National Physical
Laboratory and Argonne National Laboratory, particularly in regard to the development
of mathematical software libraries and packages and other library-standard software.
Expectations of what software testing can undertake automatically are too high:
‘One particular company was distressed to discover that existing tools were unable to take
an arbitrary piece of real-time software and perform a systematic test entirely automatically.
The fact that the software producers were expected to participate was a major blow.’
(Hennell, 1991)
One would hope that such naivety is rare in industry and is, of couse, unknown in the
better companies.
Numerical software is commonly tested using fixed agreed data sets. This, the most
obvious form of black-box testing, is eminently sensible in that the approach mimics the
use of the software in the field. It is analogous to the common practice of testing measuring
instruments with ‘known’ artefacts, where the artefacts have similar properties to those
of the production workpieces that are to be measured in the inspection room (Peggs,
1991).
Difficulties are presented by data-sets testing due to properties of the computer’s
floating-point unit (different wordlengths, different rounding rules, etc., that vary across
machine ranges) and different compilers (various orders of executing expressions, nonoptimizing or optimizing, etc.). To avoid confounding errors from a number of sources,
as far as possible the numerical software tester prefers to eliminate or minimize these
influences by first ensuring the software is used in an environment in which the floatingpoint unit and the compiler have been validated (Du Croz and Pont, 1984; Wichmann
and Ciechanowicz, 1983).
Some of the best available software has been tested by means of data sets that seriously
stretch the software. All routines in the NAG Library (Ford et al., 1979), DASL (Anthony
and Cox, 1987; Cox, 1987), LINPACK (Dongarra et al., 1979) and EISPACK (Garbow
et al., 1977; Smith et al., 1976) have been so tested, in addition to the use of more typical
data.
176
B. A . WICHMANN AND M. G . COX
5.2.1. Reference software
Reference software is software developed to solve exactly the same problem as the
software under test. It differs from field software in that it will have been written to
extremely stringent requirements that will, for example, sacrifice speed of execution for
reliability, or have considerably greater memory requirements. Reference software is
therefore not just a question of design diversity, but a means of achieving quality at a
significantly higher cost than could perhaps be justified in the field. Reference software
is designed (hopefully) to have a high probability of computing the correct result corresponding to input data within its domain of applicability. It can then be used as the basis
for data-sets testing, viz. to generate reference results for a range of data sets. For a range
of metrology software, reference software was constructed independently by PTB and
NPL (NPL, 1991; Cox, 1992) using totally different underlying algorithms and the results
on data sets were compared before the reference software was accepted.
5.2.2. Reference data sets
Reference data sets are synthesized data sets that mimic actual problem data. They
often also include ‘difficult’ data sets (e.g. data sets at the boundary of the domain of
applicability of the algorithm under test). In conjunction with reference software they
constitute, in addition to testing, a valuable tool for product development and permit the
simulation of almost any attribute required of real data.
Reference data sets are used as follows.
(1) Apply the reference software to the reference data sets to produce reference results.
(2) Apply the software under test to the reference data sets to produce test resulfs.
(3) Carry out a comparison of test and reference results.
For specific problems, the data sets may include mathematical functions. For example,
in numerical integration a set of ‘model’ definite integrals with various mathematical
behaviour, and known analytical solutions and hence precise numerical answers, has been
used to test quadrature software (Casaletto ef al., 1969).
By making the reference data sets varied in character, many of the properties of the
software can be checked, as follows.
(a) Various paths in the software, e.g. to handle normal and exceptional cases, can
be explored.
(b) Weaknesses, e.g. unstable numerical methods, can be exposed.
(c) Restrictive algorithms, i.e. with a limited domain of applicability, can be identified.
Reference data sets have been used very successfully in measurement science to test
software widely used for least-squares geometric-element fitting (Drieschner et al., 1991;
Porta and Waeldele, 1986).
5.2.3. Data generators
Reference data sets can be constructed manually, but this is a time-consuming and
error-prone task. Such time is better spent in designing software-data set generators-
SOFTWARE COMPONENT TESTING STANDARDS
177
to produce the data sets automatically. The generators are controlled by various parameters
that relate to required general properties of the data sets and also to any specific features
required. The range of parameters should be such that ‘severe’ data sets can be generated
in order to test the software at the extremities of its specification (Anthony and Cox,
1984; Myers, 1979).
A further advantage of using a data generator is that in some instances the generator
can be designed to derive data for which the correct solution is known a priori, thus
obviating the need to develop reference software.
As an example, consider testing regression (linear or nonlinear least-squares
optimization) software. A data generator can be constructed that, given a set of parameters
defining a regression function (a straight line is one of the simplest cases), a set of data
points can be determined with the property that the best-fitting function to the data set
is the prescribed regression function. Moreover, the data generator can be controlled in
various ways (in terms of the number of data points and the scatter of the data about
the function).
5.2.4. Floating-point error analysis and perturbation analysis
Floating-point error analysis and perturbation analysis per se (Wilkinson, 1963; 1965)
fall outside the scope of this paper. These disciplines are relevant, however, when the
results of such an analysis are used to support data-sets testing. Very many of the
underlying algorithms have provable mathematical properties when implemented in floating-point arithmetic, and so prior knowledge of the degree of agreement between the
observed result and the expected result is often available. The computed results can
readily be checked for the extent to which they satisfy the problems to which they
purportedly correspond.
5.2.5. Modularity
Software testing is greatly simplified when maximum use is made of previously tested
modules. Extensive use of modularity is made in the numerical software libraries NAG
(Ford et al., 1979) and DASL (Anthony and Cox, 1987; Cox, 1987), for instance. The
testing process can then to a great extent concentrate on newly produced software and
on its interfacing to the tested modules. Some of the advantages of individually-testable
modules and the extensive re-use of modules have been reported (Jacobs and Markham,
1990).
5.2.6. The NAG Library approach
A seminal paper (Ford et al., 1979) on the NAG Library ‘machine’ describes an
integrated approach to software development which includes testing as one facet. This
approach is in use today at NAG, which ports its library to over 60 platforms in single
and double precision versions. As far as possible, high quality is engineered into NAG
products (considerable defensive programming, highest-possible portability, use of
software transformation tools, etc), although considerable testing is undertaken for each
distinct implementation. Each implementor is required to ensure that the software-which
passes as many times as necessary through a cycle involving an author, a contributor and
178
B. A . WICHMANN AND M. G . COX
a validator-xecutes
correctly for a wide range of data sets, including boundary cases
(Myers, 1979).
Toolpack (Iles, 1984) and other software transformation tools are used to produce
variants automatically for different machine ranges and to assist with language-language
translation. The overall effect of the approach is to reduce significantly the amount of
subsequent testing that would otherwise be necessary. Nevertheless, the use of stringent
test programs or demonstration programs forms a very significant aspect of NAG software
development. As far as possible, all modes of operation of the software including all
programmed failure exits are tested, and extreme cases are included in an effort to make
the software fail. In particular, all available knowledge of the problem that the software
attempts to solve is considered in order to design suitable data sets that try to explore
as many paths as possible (see below). The development of suitable demonstration
programs is itself challenging and can require an effort comparable to that of producing
the routine itself (Smith, et al., 1974).
5.2.7. Use of problem solution properties
The algorithm implemented by the software will introduce errors of various types. It
is possible to capitalize on the properties of some such errors in order to assess the degree
of immunity of the software under test to them.
For example, one class of error is resfufemenferrors, which arise from a restatement
of the problem to be solved, and reflect the problem’s invariance properties. Simple
examples are: the effects of scaling the data, translating the data and permuting the data
in a case when theoretically an algorithm is invariant with respect to such data changes.
The results of such data changes can often be readily predicted and the results obtained
on such data sets can be compared.
A very simple change to the data can be extremely effective in testing adaptive quadradx and - f i f ( x ) dx will in general be produced.
ture routines. Different estimates of Jy(x)
If the required integration error is to be bounded by 6, say, and the results of the two
applications of the quadrature routine differ by more than 26, the routine is not performing
to specification. The different results may arise because the quadrature scheme subdivides
the interval in a directed manner, e.g. from left to right, for example.
The mathematical properties of the solution can also obviate the need for major aspects
of data-sets testing. This use, termed self-validation or self-checking (Anthony and Cox,
1984) is not yet in wide use, despite the fact that it offers considerable potential. It
involves providing confirmation that the computed solution satisfies the original mathematical problem statement. As a simple example, does the sphere computed to pass through
four given non-coplanar points actually do so? A check therefore is the extent to which
the four data points satisfy (a suitable form of) the sphere equation for the calculated
radius and centre coordinates.
Self-validation processes are sometimes straightforward to implement by the software
development engineers as a check on their own software. Particular advantages are that
the ‘true’ result is not required and that the test is automatically carried out every time
the software is executed. Such checks may not be as appropriate as the use of data sets
for third-party testing, however. The approach has been used for checking roundnessassessment software (Anthony and Cox, 1986) according to the national standard (BSI,
1982), and will become growingly relevant in the context of self-validating numerical
algorithms within mathematical sub-routine libraries (Linz, 1991).
SOFTWARE COMPONENT TESTING STANDARDS
179
5.2.8. Importance to standards and embedded systems: software
traceability
As virtually all scientific and technological fields become increasingly dependent on the
computer, so numerical software assumes ever greater importance. Many products are
now required to be produced or assessed by systems that are traceable to national,
international or industrial standards. Since such systems often have an integrated computer
and embedded numerical software, the sofhvare traceability of this part of the system also
needs to be addressed. There are various aspects that need to be covered in order to
ensure that software can be considered traceable. A partial list of considerations is:
(a)
(b)
(c)
(d)
(e)
a rigid mathematical basis;
good numerical analysis;
careful algorithm design;
sound software implementation; and
an authoritative testing strategy.
It is recognized that a major step towards traceability will result from maximizing the use
of standard computational modules.
5.3. Testing Numeric Operations
Schryer reported (Schryer, 1981) many years ago that significant bugs can appear in
the basic floating point operations of machines. With large word-length machines, it is
easy to see that exhaustive testing is impossible (more than 1030 test cases!). Moreover,
naive random testing is also ineffective in locating bugs. (Schryer reports that his test
program discovered a bug in a few seconds as opposed to many centuries for the naive
random tests then being undertaken by the supplier.) More recent experience confirms
these findings (Du Croz and Pont, 1984).
The main problems with numeric operations occur at the end points, and also with
special cases in the conventional algorithms. It is therefore quite easy to bias random
tests in the areas most likely to cause problems. The algorithms are well-known which
gives an insight into the most effective test cases.
Experience with integer testing reveals a similar pattern (Wichmann, 1991). On a two’s
complement machine, the additional most negative value causes problems since it is the
only value with overflow on negate, absolute value, multiplication by -1 and division by
-1.
The conclusion from this is that purely random tests are unlikely to be effective for
simple algorithms, but carefully biased random tests can be very effective. (For this
reason, the Pascal Program Generator (Wichmann and Davies, 1989) biases the random
integers it uses.)
5.4. Compiler Validation
Testing compilers is at the other end of the spectrum of logical complexity. No production compiler can be expected to be bug-free yet bugs couId severely impact other
software projects, so some testing is clearly important.
180
B. A. WICHMANN AND M. G. COX
Modern language standards make requirements on language processors which allow
very effective black-box testing of compilers. This trend started with Pascal, the results
of which have been reported by Wichmann and Ciechanowicz (1983). The conclusions
from this work can be summarized as follows.
An essential requirement for good testing is a very accurate specification. Language
standards that have been subject to international review are typically just good
enough. Testing can only be undertaken effectively if the specification is written
with a view to testability. For instance, if a specification makes no mention of the
handling of incorrect input, testing is not feasible. Programming language standards
are generally good from the point of view of testing, but there are exceptions.
A vital part of a program specification is the handling of incorrect input. Hence,
the language standards for FORTRAN and COBOL are inadequate for good testing
since they have no requirement to reject non-conforming programs.
Error guessing is a vital aspect of devising good functional tests.
The Pascal Validation Suite was checked for ‘completeness’ by ensuring that all
statements in a Pascal compiler front-end were executed (Ciechanowicz and De
Weever, 1984). Although these tests were specific to this front-end, it did ensure
a good coverage.
A test suite that remains fixed for several years does not perform a useful purpose.
Certain language features in Pascal proved to be much more difficult to test
than was initially thought. For instance, actual bugs in handling the for-statement
continued to appear in validated compilers for many years causing additional tests
to be added.
It has not proved possible to test the back-end or code generator part of a compiler
as completely as the front-end. This issue has been considered by Wichmann and
Davies (1989) and is considered in the next sub-section.
5.5. Compiler Code Generator
It is difficult to test the code generator part of a compiler. The main reason for this
is that the only practical way to generate test cases is via the front-end of the compiler.
However, the mapping between the source code and the input to the back end is nontrivial, so that there is no straightforward way to execute a specific statement (say) of
the back end. This problem is compounded by the machine-dependent aspect of the back
end.
As an example of the problems, consider the following bug which was present in the
Algol W compiler for the first ten years of its release without its ever being reported as
a bug. The compiler was machine-specific, generating code for the IBM 360 series. The
classic Algol display was held in as many of the 16 general purpose registers as required.
(The compiler limited programs to a nested block depth of 16.) Any registers left were
allocated to the evaluation of expressions. Unfortunately, there was a bug in the register
allocation routine when an odd-even pair of registerk were needed for integer division
and the available registers were nearly, but not quite, exhausted.
The point about this bug was that the risk of its occurring in a user program was very
low. Conventional test methods would also have difficulty in locating this problem. It
SOFTWARE COMPONENT TESTING STANDARDS
181
was only discovered during a re-write of the back-end when the actual source code of
the compiler was reviewed. However, it appears that this form of bug is not unusual in
compilers.
The question arises as to the best method to detect such compiler bugs. The obvious
answer is to generate random, self-checking programs as originally proposed by Hanford
(1970). The method of checking the code generator of a compiler by means of randomly
selected programming constructs is a highly specialized example of random test case
generation. As noted here, it can be very effective in finding bugs in compilers, but its
use elsewhere will depend upon the random selection process and the ease with which
test cases can be checked. The self-checking of the compiler test cases makes the method
highly effective. Random testing is not one of the methods in the BCS standard since,
in general, it is not repeatable. Other experience of this method with compilers is equally
encouraging (Bazzichi and Spadafora, 1982; Bird and Munoz, 1983). Experience with the
Pascal and Ada versions of this technology is encouraging: it locates bugs similar to the
Algol W one noted above. Moreover, one compiler-writer (Wichmann and Davies, 1989)
has stated that the bugs found could arise in ordinary user’s programs in spite of the
generated test cases being very unlike user-code.
The general conclusion from this is that carefully constructed random tests are effective
in locating bugs. For another example of random testing of Unix utilities, see Miller ef
al., (1990).
6. CONCLUSIONS
To be useful for a measured approach to software quality, a testing standard should be
as follows.
(1) It should be objective. The subjective nature of informal testing undertaken by
most suppliers is such that the phrase ‘it has been tested’ is meaningless. Repeatable
tests with clearly defined results must underlie the testing process.
(2) It should be enough for users. Any formalized system must provide enough guarantees that even the minimal level of testing ensures a significant evidence of software
quality.
(3) It should be practical for suppliers. It is easy to specify a level of test that is totally
uneconomic. A simple and measurable statement of what to do, why and when (in
the software lifecycle) is needed.
(4) It should match the perceived risks. There is a wide range of software, and any
testing approach that only addresses the most critical software will have little impact
on the majority of software developers.
(5) It should have conceptual simplicity. Effective testing requires resources. In consequence, all parties must have a clear understanding of the implications of the level
of testing envisaged, in both costs and benefits.
The white-box testedness metrics (like statement coverage) have a proven value. The
problem is that the cost of obtaining 100% coverage of even the statement metric can
be quite high. Hence any testing statement (or policy) would have to take into account
the logical complexity of the program. How can this be done? Merely counting jumps
and other control structures is not sufficient. The control structure of a validation program
182
B. A. WICHMANN AND M. G. COX
~
can be quite complex but is typically easy to test, while exactly the opposite is true for
a compiler code generator.
With the fourteen test techniques catalogued in the BCS standard, it would be highly
advantageous if their relative strength could be determined. However, apart from the
inclusion relationship (e.g. branch testing implies statement testing), any other relationship
is likely to be very approximate. Of equal interest, and rather easier to determine, is the
relative cost of the test techniques (which depends upon the degree of tool support).
The approach advocated here is to develop Appendix G of the BCS proto-standard to
meet the five points above. Some comments on the changes needed are as follows:
It should be objective. The BCS proto-standard has been designed from the outset
to be objective, and hence this requirement is already satisfied. It is for this reason
that the BCS work is being advocated as the starting point.
It should be enough for users. It is easy to specify high testedness metric values
that would give the users the necessary guarantees. For bespoke software, users
have a responsibility to specify the testing requirements explicitly.
It should be practical for suppliers. This is a major problem due to the cost
implications of high testedness metric figures. However, this is only a real problem
if the testedness ratios are specified in advance.
It should match the perceived risks. In principle, this is easy to achieve within the
framework of the BCS standard, since the metrics used can be based upon the
risks.
It should have conceptual simplicity. The fourteen testing methods defined within
the standard are very confusing for the non-expert. However, if the standard is
mainly used to define the testing undertaken after the event, then only the relative
few testing methods actually used need be considered.
eventual goal with a development of the BCS standard would be to specify the
testing in advance of the software development. In general, this must be the responsibility
of the software procurer (or developer for products), and hence the BCS standard can
only provide a framework for this specification. The criterion of objectivity has resulted
in two test methods being rejected which have been shown to be effective. These are as
follows.
(i) Random testing. This could be made objective by the use of a specific method to
obtain the randomness, such as the (repeatable) pseudo-random number generator
used by Wichmann and Davies (1989). If the random data used can be shown to
be representative of actual use, the reliability claims can be made, based upon the
testing as noted by Littlewood and Strigini (1993).
(ii) Error guessing. It would appear impossible to make such a method objective.
Indeed, the experience of the tester is vital here. Error reports from one (existing)
system could be used to derive tests for a similar system under development.
The experience from the mathematical area gives great weight to the re-use of proven
modules. Re-use requires an interface that permits independent testing, and if a high
degree of re-use can be expected, comprehensive testing can be justified in economic
terms.
SOFTWARE COMPONENT TESTING STANDARDS
183
Acknowledgements
This paper has benefited from critical comments from many people including Roger Scowen
(NPL), Nick North (NPL), David Schofield (NPL), John Kershaw (RSRE), Dorothy Graham
(Grove Consultants), Brian Marwick (Testing Foundation), Richard Hall (GEC Avionics) and
Martyn Ould (Praxis). The work has been undertaken with support from the Department of Trade
and Industry’s Software Quality Unit. The views expressed here are those of the authors and not
necessarily those of the reviewers noted above. Numerous comments from the three referees has
hopefully improved the presentation of the material.
References
Alvey (19854, ‘Glossary of terms (Deliverable A16)’, Alvey Test Specification and Quality Management Project SE/031.
ANSI (1983) ANSUIEEE Std 829: 1983, Standard for Software Test Documentation.
ANSI (1987) ANSI/IEEE Std 1008:1987, Standard for Software Unit Testing.
Anthony, G. T. and Cox, M. G. (1984) ‘The design and validation of software for dimensional
metrology’, Technical Report DITC 50/84, National Physical Laboratory, Teddington, U .K.
Anthony, G. T. and Cox, M. G. (1986) ‘Reliable algorithms for roundness assessment according
to BS3730’, in M. G. Cox and G. N. Peggs (eds), Software for Co-ordinate Measuring Machines,
National Physical Laboratory, Teddington, U.K., pp.30-37.
Anthony, G. T. and Cox, M. G. (1987) ‘The National Physical Laboratory’s Data Approximation
Subroutine Library’, in J. C. Mason and M. G. Cox (eds), Algorithms for Approximation,
Clarendon Press, Oxford, U.K. pp.669-687.
Bazzichi, F. and Spadafora, I. (1982) ‘An automatic generator for compiler testing’, IEEE Tramactions on Software Engineering, 8 (4), 343-353.
Bird, D. L. and Munoz, C. U. (1983) ‘Automatic generation of random self-checking test cases’,
IBM Systems Journal, 22 (3), 229-245.
BSI (1982) BS 3730: Assessment of Departure from Roundness, British Standards Institution,
London, U.K.
BSI (1988) BS 5887: Code of Practice for Testing of Computer-based Systems, British Standards
Institution, London, U. K.
Casaletto, J., Pickett, M., and Rice, J. (1969) ‘A comparison of some numerical integration
programs’, SIGNUM Newsletter, 4 (3), 30-40.
CESG (1991) ‘Information technology security evaluation criteria: provisional harmonised criteria,
Version 1.2. (U.K. contact point: CESG Room 2/0805, Fiddlers Green Lane, Cheltenham, Glos,
GL52 5AJ.)
Ciechanowicz, Z. J. and De Weever, A. C. (1984) ‘The “completeness” of the Pascal Test Suite’,
Software-Practice and Experience, 14 ( 5 ) , 463-471.
Cox, M. G. (1987) ‘The NPL Data Approximation Subroutine Library: current and planned
facilities’, NAG Newsletter, 2/87, 3-16.
Cox, M. G. (1992) ‘Improving CMM software quality’, Technical Report DITC 194/92, National
Physical Laboratory, Teddington, U.K.
Dongarra, J. J., Moler, C. B., Bunch, J. R. and Stewart, G. W. (1979) LINPACK User’s Guide,
Society for Industrial and Applied Mathematics, Philadelphia, U.S.A.
Drieschner, R., Bittner, B., Elligsen, R. and Waeldele, F. (1991) ‘Testing coordinate measuring
machine algorithms: Phase II’, Technical Report BCR EUR 13417 EN, Commission of the
European Communities.
DTI (1992) ‘TickIT: Making a better job of software’, Guide to Software Qualiv Management
System Construction and Cert@cation using EN29OO1, Issue 2.0.
Du Croz, J. and Pont, M. (1984) ‘The development of a floating-point validation package’, NAG
Newsletter, 3/84, 3-9.
Ford, B., Bentley, J., du Croz, J. J. and Hague, S. J. (1979) ‘The NAG Library “machine”’,
Software-Practice and Experience, 9 (l), 65-72.
Garbow, B. S., Boyle, J. M., Dongarra, J. J. and Moler, C. B. (1977) Matrix Eigensystems Routines:
EISPACK Guide Extensions, Lecture Notes in Computer Science, Vol. 51, Springer-Verlag,
Berlin, Germany.
184
B. A. WICHMANN AND M. G . COX
Gelperin, D. and Hetzel, B. (1988) ‘The growth of software testing’, Communications of the ACM,
31 (6), 687-4395.
Graham, D. R. (ed.), (1990) A Standard for Software Component Testing, version 1.2. British
Computer Society Specialist Group in Software Testing.
Graham, D. R. (1991) Computer Aided Software Testing: CAST Report, Unicorn Seminars, London,
U.K.
Hanford, K. V. (1970) ‘Automatic generation of test cases’, IBM Systems Journal, 9 (4), 242-257.
Hennell, M. A., Woodward, M. R. and Hedley D. (1976) ‘On program analysis’, Information
Processing Letters, 5 (5), 136-140.
Hennell, M. A. (1991) ‘How to avoid systematic software testing’, Software Testing, Verification
and Reliability, 1 (l), 23-30.
Holscher, H. and Rader, J. (1986) Microcompurers in Safety Technique, TUV Study Group on
Computer Safety, Verlag TUV Bayern, TUV Rheinland, Germany.
IEC (1986) IEC 880:86: Software for Computers in the Safety Systems of Nuclear Power Stations.
IEC (1989) IEC/SC65AIWG9/45: Software for Computers in the Application of Industrial Safetyrelated Systems, 3rd draft.
Iles, R. (1984) ‘Toolpack support for Fortran programmers’, NAG Newsletter, 2/84, 16-21.
Ince, D. (1991) ‘Software testing’, in J. McDermid (ed.), Software Engineer’s Reference Book,
Chapter 19, Butterworth-Heineman.
Jacobs, D. A. H. and Markham, G. (1990) Experiences with some software engineering practices
in numerical software’, in M. G . Cox and S. J. Hammarling (eds), Reliable Numerical Computation,
Oxford University Press, Oxford, U.K., pp.277-296.
Kitson, D. H. and Humphrey, W. S. (1989) ‘The role of assessment in software process improvement’, Software Engineering Institute, SEI-89-TR-3.
Linz, P. (1991) ‘Algorithms for the next generation of numerical software’, NAG Newsletter, 2/91,
3-6.
Littlewood, B. and Strigini, L. (1993) ‘Validation of ultra-high dependability for software-based
systems’, Communications of the ACM, to be published.
Miller, B. P., Frederiksen, L. and So, B. (1990) ‘An empirical study of the reliability of UNIX
utilities’, Communications of the ACM, 33 (12), 3242.
MOD (1991) Interim Defence Standard 00-55: The Procurement of Safety Critical Software in Defence
Equipment (Part 1: Requirements; Part 2: Guidance), Ministry of Defence.
MUSiC (1993) Metrics for Usability Standards In Computing, ESPRIT I1 Project 5429 (Contact:
M. Kelly, Brameur or N. Bevan, NPL).
Myers, G. J. (1979) The Art of Software Testing, Wiley, New York.
NATLAS (1985) Software Unit Test Standard and Method, N19.
NPL (1991) ‘LSGE: Package of algorithms for least-squares geometric elements’, Document ITC
h 069, National Physical Laboratory, U.K.
Paulk, M. C., Curtis, B. and Crissis, M. B. (1991) ‘Capability maturity model for software’, CMU/
SEI-91-TR-91.
Peggs, G. N. (1991) ‘A review of the methods for the accurate metrology of complex threedimensional components and artefacts’, Technical Report MOM 101, National Physical Laboratory, Teddington, U.K.
Porta, C. and Waeldele, F. (1986) ‘Testing of three coordinate measuring machine evaluation
algorithms’, Technical Report BCR EUR 10909 EN, Commission of the European Communities.
Rapps, S. and Weyuker, E. J. (1985) ‘Selecting software test data using data flow information’,
IEEE Transactions on Software Engineering, 11 (4), 367-375.
RCTA (1993) ‘Software considerations in airborne systems and equipment certification’ (DO-178B), Requirements and Technical Concepts for Aviation. 1140 Connecticut Avenue, N.W., Suite
1020 Washington, D C 20036, U.S.A.
Rice, J. R. (1981) Matrix Computations and Mathematical Software, McGraw-Hill, New York,
U.S.A.
Schryer, N. L. (1981) ‘A test of a computer’s floating-point unit’, Computer Science Technical
Report No. 89, AT&T Bell Laboratories, Murray Hill, New Jersey, U.S.A.
Smith, B. T., Boyle, J. M. and Cody, W. J. (1974) ‘The NATS approach to quality software’, in
Software for N u m e r i d Mathematics, Academic Press, London, U.K., pp.393-405.
SOFTWARE COMPONENT TESTING STANDARDS
185
Smith, B. T.,Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., Klema, V. C. and Moler,
C. B. (1976) Matrix System Routines-EISPACK Guide, Lecture Notes in Computer Science,
Vol. 6, 2nd edn, Springer-Verlag, Berlin, Germany.
Thompson, K. (1991) ‘A method for assessing organisational software development capability’,
EuroCASE 111 Conference.
White, L. J. (1987) ‘Software testing and verification’, in M. Yovits (ed.) Advances in Computers,
Vol. 26, Academic Press, pp.335-391.
Wichmann, B. A. and Ciechanowicz, Z. J. (eds), (1983) Pascal Compiler Validation, Wiley.
Wichmann, B. A. and Davies, M. (1989) ‘Experience with a compiler testing tool’, NPL Report
DITC 138189.
Wichmann, B. A. (1991) ‘The Language Compatible Arithmetic Standard and Ada’, NPL Report
DITC 173/91.
Wilkinson, J. H. (1963) Rounding Errors in Algebraic Processes, Notes in Applied Science No. 32,
Her Majesty’s Stationery Office, London, U.K.
Wilkinson, J. H. (1965) The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, U.K.
Woodward, M. R., Hedley, D. and Hennell, M. A. (1980) ‘Experience with path analysis and
testing of programs’, IEEE Transactions on Software Engineering, 6 (3), 278-285.
Purchase answer to see full
attachment