ITECH7409 University of Australia Software Testing and Standards Research Paper

User Generated

CenzbqZnnqunz7

Computer Science

ITECH7409

University of Australia

Description

locate a research paper related to software testing that refers to at least one standard.

 research, comprehend and analyse each document (both the paper and the chosen standard) to find relevant details to answer a set of questions, and

 prepare a written summary report of findings

Requirements:

Questions for the standard:  What is the standard name?  Who holds the copyright for the standard?  Amongst the acknowledged contributors to the document, which universities were involved (if any)?  What is the scope or intent of the standard?  What are key terms and understandings needed for the standard to be understood and applied?  In your own words, what does application of the standard result in? Or, in other words, what does the standard do?  Finally, what specific relevance to software testing is the standard?

Discuss the paper and how it relates to the standard. For example: does the paper suggest how to improve the standard? Does the paper highlight issues in applying the standard? Attached is a sample paper (Wichmann and Cox 1992) which refers to the ANSI1/IEEE2 Standard 8293 and ANSI/IEEE Standard 1008. Although this paper is somewhat outdated, it’s serves to illustrate the task for this assignment.

Prepare a report of no more than 1,500 words answering all questions. The report should have the following structure.  an introduction to standards and a brief description of the research paper and chosen standard  responses to questions for the standard

 listing and a discussion of commonalities and differences between the two documents

 a conclusion summarizing the report findings

iam providing sample paper along with main attachment

Unformatted Attachment Preview

ITECH7409 Software Testing Assignment 1 Individual Research on Software Testing and Standards Overview According to Standards Australia: “Standards are published documents setting out specifications and procedures designed to ensure products, services and systems are safe, reliable and consistently perform the way they were intended to. They establish a common language which defines quality and safety criteria.” There are several standards, international and national, that relate specifically to software testing. Standards formalize industry best practice and they are agreed upon by professionals in the industry in which the standards apply to. This assignment is an investigation into those standards. The purpose of the assignment is to help you to:   improve your research and comprehension skills develop a good understanding of professional industry standards for software testing appreciate the value of various processes and methods used in industry to test and evaluate software systems Timelines and Expectations Marks and Percentage Value of Task: 100 marks Due: Thursday, May 2, 2019 @16:00 (Week 7) Minimum time expectation: 10 hours 10% Learning Outcomes Assessed K1. Critically evaluate software requirements as inputs toward testing of the final solution; K2. Analyse and critically evaluate appropriate tools and techniques to support the testing process; K3. Develop a good understanding of selecting and applying measures and models used for testing that are compliant with the appropriate professional industry standards such as the ACS and IEEE; S1. Analyse and critically evaluate software requirements and proposed solutions; S2. Apply complex decision making to select appropriate testing techniques; S3. Write professional level management, planning, quality assurance and testing documentation for a software system; CRICOS Provider No. 00103D ITECH7409 Assig 1 Sem 1 2019-07 Page 1 of 5 S4. Apply effective testing techniques to software systems of varying scales and test levels A1. Develop and maintain plans for scheduling quality assurance tasks including software testing; Assessment Details You will need to:  locate a research paper related to software testing that refers to at least one standard.  research, comprehend and analyse each document (both the paper and the chosen standard) to find relevant details to answer a set of questions, and  prepare a written summary report of findings As a suggestion, commence your search for a research paper at the Federation University website. There is a QuickSearch link on the library home page: Type your query here CRICOS Provider No. 00103D ITECH7409 Assig 1 Sem 1 2019-07 Page 2 of 5 There is also a listing of Databases A-Z (http://libguides.federation.edu.au/az.php ) which has a link for Australian Standards Online. Requirements: Questions for the standard:  What is the standard name?  Who holds the copyright for the standard?  Amongst the acknowledged contributors to the document, which universities were involved (if any)?  What is the scope or intent of the standard?  What are key terms and understandings needed for the standard to be understood and applied?  In your own words, what does application of the standard result in? Or, in other words, what does the standard do?  Finally, what specific relevance to software testing is the standard? Discuss the paper and how it relates to the standard. For example: does the paper suggest how to improve the standard? Does the paper highlight issues in applying the standard? Attached is a sample paper (Wichmann and Cox 1992) which refers to the ANSI1/IEEE2 Standard 8293 and ANSI/IEEE Standard 1008. Although this paper is somewhat outdated, it’s serves to illustrate the task for this assignment. Prepare a report of no more than 1,500 words answering all questions. The report should have the following structure.  an introduction to standards and a brief description of the research paper and chosen standard  responses to questions for the standard  listing and a discussion of commonalities and differences between the two documents  a conclusion summarizing the report findings 1 American National Standards Institute Institute of Electrical and Electronic Engineers 3 Current versions of both these standards are available online at FedUni library 2 CRICOS Provider No. 00103D ITECH7409 Assig 1 Sem 1 2019-07 Page 3 of 5 Marking Criteria/Rubric Student ID: Student name: Mark Assessment component 1. Introduction 20 2. Responses to questions 20 3. Listing and discussion of commonalities and differences between the research paper and the chosen standard 20 4. Conclusion 20 5. Spelling, grammar and report presentation 20 Total 100 Final /10 Comments Submission Your assignment should be completed according to the guides for your assessments: https://federation.edu.au/library/student-resources/guides-to-your-assessments You are required to provide documentation, contained in an appropriate file, which includes a front page indicating:     the title of the assignment the course ID and course name, student name and student ID a statement of what has been completed CRICOS Provider No. 00103D ITECH7409 Assig 1 Sem 1 2019-07 Page 4 of 5   acknowledgement of the names of all people (including other students and people outside of the university) who have assisted you and details on what parts of the assignment that they have assisted you with a list of references used (APA style) Using the link provided in Moodle, please upload your report as a Word file. Name your Word file in the following manner: _.docx e.g. Aravind_ADIGA_30301234.docx Also, upload a copy of the standard used by the research paper Feedback Assessment marks will be made available in fdlMarks, feedback to individual students will be provided via Moodle or as direct feedback during your tutorial class Plagiarism: Plagiarism is the presentation of the expressed thought or work of another person as though it is one's own without properly acknowledging that person. You must not allow other students to copy your work and must take care to safeguard against this happening. More information about the plagiarism policy and procedure for the university can be found at: http://federation.edu.au/students/learning-and-study/online-help-with/plagiarism Your support material must be compiled from reliable sources such as the academic resources in Federation University library which might include, but not limited to: the main library collection, library databases and the BONUS+ collection as well as any reputable online resources (you should confirm this with your tutor). Federation University General Guide to Referencing: The University has published a style guide to help students correctly reference and cite information they use in assignments. A copy of the University’s citation guides can be found on the university’s web site. It is imperative that students cite all sources of information. The General Guide to Referencing can be purchased from the University bookshop or accessed online at: https://federation.edu.au/library/guides/referencing References Wichmann, B. A. and M. G. Cox (1992). "Problems and strategies for software component testing standards." Software Testing, Verification and Reliability 2(4): 167-185. CRICOS Provider No. 00103D ITECH7409 Assig 1 Sem 1 2019-07 Page 5 of 5 SOFTWARE TESTING, VERIFICATION AND RELIABILITY, VOL. 2, 167-185 (1992) Problems and Strategies for Software Component Testing Standards B. A. WICHMANN AND M. G. COX National Physical Laboratory, Teddington, Middlesex, W l l OLW , U.K. SUMMARY What does it mean to say that an item of software has been tested? Unfortunately, currently accepted standards are inadequate to give the confidence the user needs and the meaningful objective for the supplier. This paper assesses the currently available standards, mainly in the component testing area and advocates that the British Computer Society proto-standard should be taken as a basis for a formal standard. The paper is intended for those concerned with software quality. KEY WORDS Component testing Quality assurance Standards 1. PROBLEM STATEMENT This paper considers the issue of software testing in the narrow sense, i.e. the execution of the code of a system to attempt to find errors. The objective is to quantify or assess the quality of the software under test, particularly in an objective manner which can therefore be agreed by both a supplier and customer. The reliance placed upon dynamic testing varies significantly from sector to sector. It appears from current drafts that the revision of the avionics standard for safety-critical software depends almost exclusively on testing (RCTA, 1993), while the U.K. Interim Defence Standard (MOD, 1991) places much greater emphasis on static analysis. Dynamic testing has been widely used within industry since the advent of the first computers. However, the contribution that testing makes to software quality in practice is hard to judge. Everybody producing software will claim that it is ‘tested’, and therefore one must look beyond such superficial claims. The classic work on testing is that of Myers (1979). In the author’s view, little progress has been made in the practical state of the art since the publication of Myers’ book. For more recent summaries of the art of testing (see Ince, 1991 and White, 1987). This paper is an attempt to revisit the issues to see if some advance can be made; preferably so that those who undertake testing can claim some measurable quality objective. Nomenclature in this topic is not uniform and it is unfortunate that an Alvey document covering this point has not been formally published (Alvey, 1985). However, this gap will be filled by a BCS proto-standard (Graham, 1990), assuming this is published. Although it is known that the absence of bugs can never be demonstrated by testing, it is equally known that effective software testing is a good method of finding bugs. O960-0833/9UO40167-19%14.50 01992 by John Wiley & Sons, Ltd. Received 22 October 1992 Revised 26 February 1993 168 B . A . WICHMANN A N D M. G . COX Would any academic, even those who discount testing, be prepared to fly in an aircraft in which the software had only been subjected to static analysis? On the other hand, there are good statistical reasons for questioning figures for the reliability of ‘highly’ reliable software (see Littlewood and Strigini, 1993). Professor Littlewood claims that dynamic testing alone can only justify a reliability figure commensurate with the amount of testing undertaken. This implies that the reliability requirements of some applications areas, such as the most critical avionics systems, must be justified in terms of the quality of the development process. The paper concentrates upon component testing since this is the most incisive testing method, and the method which is best understood, again from Myers. Of course, other testing methods have an important part to play at different points in the life-cycle. The central thesis of testing is that if sufficient effort is put into the testing process, then confidence can be gained in the software, although it cannot guarantee correctness. The confidence, depends, of course, on new tests being undertaken without faults being found. Specialized aspects of testing are not considered in this paper. Examples are as follows. (1) Testing the ‘look and feel’ of a system. This can involve the use of a mouse, a windows environment, etc. This is being studied under an ESPRIT project (MUSiC, 1993). (2) Testing performance. With complex modern architectures, an analysis of performance can be quite difficult. (3) Testing real-time and concurrent systems. These require special techniques, such as in-circuit emulators, which are too specialized to be considered here. 2. TOOLS AND TECHNIQUES 2.1. Research Much of the research in software testing has focused on the coverage of various aspects of software, such as control flow features or data flow features. For example, Hennell et al. (1976) proposed three levels of component testing based on control flow: (1) the execution of statements; (2) the execution of branches (both ways); (3) the execution of ‘linear code sequence and jumps’ (LCSAJs). The degree of testedness of each level is the percentage of items at that level which have been executed during the tests. This hierarchy of so-called ‘test effectiveness ratios’ (TERs) was later extended in the work of Woodward et al. (1980). Rapps and Weyuker (1985) proposed various levels of component testing based on data flow features, specifically the interaction between definitions and uses of variables. A good critique of coverage metrics and an analysis of the theory of testing is to be found in the work of White (1987). In this context, the point of interest is that coverage metrics clearly provide objective measures and can be applied on a routine basis in industry. SOFTWARE COMPONENT TESTING STANDARDS 169 Considering control flow testing further, the purpose of the LCSAJ is to provide an achievable test objective beyond that of branches. Since a program with ‘while’ loops will have infinitely many paths, it is not useful to measure a percentage of paths covered. However, there are a finite number of LCSAJs in a program and hence 100% coverage is feasible (in simple cases, at least). This is the conventional metric for statement, branch or path testing, which is the ratio of items executed to the total in the module. If 100% statement coverage were to be a requirement, this would have to be taken into account in design and coding, since it would exclude defensive programming methods. In practice, the coverage that can be obtained depends critically upon the nature of the code and is not easy to predict in advance. The company called Program Analyses provides the LDRA TestbedTMsystem which allows the above TERs to be determined. Verilog’s LOGISCOPEm and Software Research’s TestWorksTMare able to provide similar testedness metrics. Since these tools work in many environments, there is little impediment to the above measures, or similar, being used in practice. It appears that relatively few organizations require or quote testedness metrics. Those that do are mainly in the safety-critical or security sectors where a major driving force is certification by an independent body for which the testedness metrics have an obvious appeal. The main limitation to the wider use of testedness metrics seem to be that of the lack of an obvious value to the purchaser. If company A purchases software from company B, what value is it to require a specific level of testedness? This question is addressed further in section 6. 2.2. Accredited Testing At first sight, testedness metrics seem to provide an excellent basis for objective testing, which is the main requirement for accredited testing, as administered by NAMAS (the U.K. national service for accreditation of testing services). Several years ago, NAMAS commissioned a study (NATLAS, 1985), performed by Data Logic, to see if such a scheme would be viable. The study concluded that from a theoretical viewpoint, both statement and branch coverage measures would be feasible. The results of the study were presented by NAMAS to industry to see if it required such accredited test services. The conclusion was negative for reasons which have never been totally resolved. The following points indicate some aspects of concern. (1) It is important to check that the output from each test is correct, which can be a significant cost factor. (2) Devising tests to execute all statements or branches can also be expensive. (3) If less than 100% coverage is obtained, the user may think that the software has poor quality. In fact, it merely constitutes inadequate evidence of high quality. (4) If 100% coverage is obtained, the user may place undue reliance on the correctness of the software. This experience seems to indicate a substantial gap between academic research and its application in industry. This topic is addressed further in section 6. 170 B. A . WICHMANN AND M. G . COX 2.3. Guidance The British Standard BS 5887 (BSI, 1988) is a guidance document, as are both (ANSI, 1983) and (ANSI, 1987). Other more general guidance material is available under the TickIT scheme (DTI, 1992), but this does not cover the area in depth. Such guidance material is useful for those undertaking testing but cannot be used for objective measurement or independent tests. An interesting application of testing is in the assessment of an organization using the SEI Maturity Model (Kitson and Humphrey, 1989). The overall aim here is to provide a method of determining the maturity of the software engineering process used within an organization on a five point scale. Whereas TickIT is restricted to the ISO-9OOO concept of quality management, the SEI model is specific to software engineering and therefore could be expected to address testing in some detail. The initial questionnaire used to assess the maturity level of an organization used two questions concerned with regression testing. Although this is probably the most important single aspect from the perspective of overall project management, it does not meet the needs of an objective measure of testedness for software quality. A revision of the SEI Model (Paulk et af., 1991) handles testing in greater depth, but only at level 3 in the model. (In fact, the lower levels are confined to management issues.) This implies that no requirements are given for the lower levels of maturity, within which the majority of companies actually fall. The main issues of testing are addressed at level 3, but there is no clear indication as to how a company can be assessed. For instance, the key questions are as follows. ‘The adequacy of testing is determined based upon: (1) the level of testing performed; (2) the test strategy selected; and (3) the test coverage to be achieved.’ At this point, a classic dilemma arises-if the test coverage is high, so will be the costs, but if it is low, software quality may be jeopardized. 3. CURRENT PRACTICE 3.1. General It is not easy to assess current practice, since reports from suppliers naturally reflect best practice. From questionnaire responses given by attendees at a testing conference, Gelperin and Hetzel (1988) concluded that only 5% provide regular measurements of code coverage and only 51% regularly save their tests for reuse after software changes. Gelperin and Hetzel note that the results of this small survey are biased, but that ‘some observers believe that general industry practice is much worse than the survey profile’. The Institute of Software Engineering (in Northern Ireland) reports that only about one quarter of companies use regression testing on a routine basis (Thompson, 1991). Similar, rather less quantified experience has been reported to the author by the Centre for Software Maintenance at Durham University (U.K). They also state that since testing SOFTWARE COMPONENT TESTING STANDARDS 171 is at the end of the life-cycle, there is a tendency for the amount of testing to be reduced to allow the project to complete within budget. The automation of regression testing is a substantial benefit in retaining the quality of a software product over a long period. The problem is that setting up an appropriate automatic test facility can be a significant investment. However, unless this automation is undertaken, the temptation to cut corners by not performing testing on a ‘small’ change is irresistible. No information is available about other forms of testing, such as module and integration testing. It is reasonable to assume that the situation is worse than the current poor position for regression testing. 3.2. Compiler Regression Testing NPL involvement with compiler validation enables us independently to assess the use of regression testing by vendors. It appears that the majority of vendors do regression testing on their compilers both against the appropriate validation suite and also against internal tests. However, several suppliers of validated compilers do not undertake such testing. This significantly reduces the quality of their compilers. Regression testing in some contexts is quite difficult to manage effectively. For instance, if company A provides a system for B which is subsequently to be maintained by B, then handing over the appropriate regression testing technology may be difficult. Moreover, the original tests are likely to have been undertaken at module level, while complete system test may be more appropriate for the maintenance phase. 4. TESTING STANDARDS 4.1. Existing Formal Standards In this section, a brief review of existing formal (i.e. those approved by ‘official’ standards-making bodies) standards is given. BS 5887. This Standard (BSI, 1988) takes the form of guidance material for functional testing of software, i.e. black-box testing of a complete system. There are no mandatory requirements and therefore the document has little relevance to this study, which is concerned with the quality implications of a specific level of test. MOD 00-55. The U.K. Interim Defence Standard (MOD, 1991) is highly prescriptive on dynamic testing. The essential requirements are as follows. Test coverage monitor to be used. All statements, and all conditional branches with both outcomes, to be executed during testing. Loops to be executed with 0, 1 and many iterations. Results to be compared with executable prototype. Module tests results to be computed in advance from design information. 172 B. A. WICHMANN AND M. G . COX IEEE. There are two ANSI/IEEE standards on testing (ANSI, 1983; 1987) (and several other standards which refer to testing in quality management and quality assurance). These two standards are as follows. (1) ANSI/IEEE Std 829:1983, Standard for Software Test Documentation. This document is 48 pages long, of which 8 are the main text, the rest being an example and explanatory material. The scope is roughly a ‘completeness check list’, for the documentation itself. It has little direct relevance to this study since it only handles the documentation rather than the content and nature of the tests themselves. (2) ANSI/IEEE Std 1008:1987, Standard for Software Unit Testing. The main text of this standard is 24 pages long with the scope of a ‘standard approach to software unit testing’. However, in spite of its date, this work does not take into account the work of Myers (1979) in classifying the forms of white-box testing. Moreover, the question of whether the testing of a unit has been performed in conformity to the standard is not really addressed. Indeed, the standard could be considered as a ‘guideline’ rather than a true standard. JM178B (draft). This document is the current revision of the general avionics standard for safety-critical software used internationally by both the industry and the certification bodies (RCTA, 1993). The document has an objective that statements of conditions (both ways) should be executed during testing. However, there appears to be no requirement that this objective should be attained. There is an interesting additional requirement to ensure that the testing based upon the source code is not defective in that the same structural testing based upon the object code would require more tests. (This implies that if a compiler generates a loop which was not in the source text, the testing must be undertaken in just the same way as if the user had written the loop.) IEC/WGB. This International Electrotechnical Commission (IEC) proposed standard (IEC, 1989) recommends but does not require boundary value analysis and path coverage. The term ‘path coverage’ is rather unfortunate, since on all but trivial examples, complete coverage of paths is impossible. Hence the possibility of executing all statements, which is often feasible but difficult, is not considered. German work on the same area of safetycritical software uses several hardware oriented testing methods specified by Holscher and Rader (1986). IEChJuclear. This international standard for nuclear safety software (IEC, 1986) has an annex concerned with software testing. This covers both systematic and random testing. Table E4.b on page 111 of the standard requires that all statements and branches be executed as part of the testing process. ITSEC. The current version of the Information Technology Security Evaluation Criteria (CESG, 1991) does not specify actual testing methods (this is under development). However, the higher levels of conformance do require the provision of information that is necessary for independent white-box testing. SOmWARE COMPONENT TESTING STANDARDS 173 4.2. The BCS Proto-standard The British Computer Society Software Testing Specialist Group has been developing over about three years a highly demanding and quantitative standard for software component testing. The intention of the Group is to submit the standard to the British Standards Institution once the method has been shown to be effective in practice by the members of the Group. The comments here are based upon the current draft (Graham, 1990). Although it is a draft, it is more complete and comprehensive in its treatment of testing than the other formal standards noted here. In particular, it takes into account all the classic methods of testing and even shows the relationships between them. The broad objective of the document is to provide a rigorous standard against which conformance can be judged. For instance, given an item of software for which two parties have done the same type of testing under this standard, the actual differences in the tests performed should be minimal. For this reason, random testing is excluded, although an informative annex describes that method. It must be admitted that two good people undertaking the same testing method are likely to get different results. However, the broad objective is required if a sounder basis of testing is to be established. It is only then that statements such as ‘100% coverage with method A’ becomes generally meaningful. Fourteen types of testing, based upon coverage of various attributes, are listed in the proto-standard. They are as follows. Equivalence partitioning. A black-box testing method that uses one representative for each class of input data that is handled differently according to the specification. Cause-eflect graphing. Based upon a limited entry decision table produced from the specification, from which a test is derived for each entry. Boundary testing. A black-box testing method that uses a representative of each equivalence class which is a boundary value to another class (or error situation). Syntax directed resting. Testing based upon the syntax of the input data. State transitions testing. Testing based upon each valid internal state change. Statement testing. White-box testing of each statement in the source code. Branch testing. White-box testing of each branch in the source code. Branch condition testing. White-box testing of branches that is slightly stronger than branch testing, since each predicate within a compound predicate in a branch statement must be tested. For example, if a < b and c = d then implies two tests. Branch condition combination tesring. Considers combinations of conditions and eliminates impossible combinations. Linear Code Sequence And Jump Testing. Determines LCSAJs and produces test cases for the feasible ones. Path testing. White-box testing based upon the use of all paths. Data definition-use coverage. Ensures coverage of all data definition-use pairs. Data definition computational use coverage. Restricts use to computational use. Data definition predicate use coverage. Restricts use to use within predicates. the standard can reasonably be described as a complete specification of existing test methods meeting the requirement of objectivity. The only significant problem with the standard is determining how it can be applied in practice. Even in the most critical contexts, applying all 14 methods would be excessive. Hence there is a need to provide: 174 B. A. WfCHMANN AND M. G. COX (a) guidance on which methods to apply; (b) a summary of the strengths and weaknesses of each method; (c) the format for a ‘testing statement’ which should be used to report the testing actually performed on a component; (d) guidance for the formulation of appropriate requirements for testing to be used by purchasers. The existing document makes an attempt (in Appendix G) to provide ‘Proformas’ designed for stating testedness (actually achieved or to be required of a supplier). However, this part of the standard is clearly less mature and will require further work to make it acceptable to industry. A points system is proposed which seems quite arbitrary. The comprehensiveness and technical detail of the standard makes it hard for nontechnical management to grasp. This gap must be filled-top management will not sanction significant resources for testing unless the benefits are clearly understood. It does not seem feasible at this stage for the standard to lay down the degree of testing to be undertaken, since this will depend upon the criticality of the application, and it would be hard to obtain agreement within industry. However, the BCS standard appears to provide the measurement framework for the quantification of testing. Hence if a supplier makes a statement about the degree of testedness, this should be capable of independent validation. Wider use of the standard should allow industry to determine the amount of testing required in each context. 5. SPECIALIZED TESTING Testing makes a serious contribution to software quality, yet this is hard to quantify. Moreover, the objective testedness metrics are not widely used or accepted. Clearly, if stronger links of testedness metrics to software quality could be established empirically this would help. This section undertakes a further technical analysis of the potential contribution that objective software testing can make to software quality. The analysis is undertaken by means of examples in different application areas. In most cases, it is possible to claim that the specific techniques have a measurable effect on quality. 5.1. Data Validation Modules A significant class of commercial software consists of ‘data validation’ programs which check initial input. By the very nature of such programs, it is straightforward to construct test data that will exercise the statements or branches as desired. It may be equally important to access all fields of a record that need not be directly related to the statements (i.e. perform data-dependent testing as opposed to statement dependent testing). However, if the validation is to be performed upon internal data, say from another file, then data validation may well require the construction of special data files. Most commercial systems have special utilities to construct such files. If the internal input data cannot be constructed by such a utility or test harness, then a special program may be needed to perform the testing required. For this class of module, there is little reason not to provide test data to ensure SORWARE COMPONENT TESTING STANDARDS 175 execution of all statements and perhaps all branches. The only potential problem is the large number of individual test cases needed, but Graham (1991) has shown that this can be handled well by test harness software. 5.2. Numerical Programs ‘The algorithms and [numerical] software community does not have a tradition of highquality performance evaluations. . .in a survey of over 50 papers to evaluate algorithms and software for a certain class of problems, none of the papers was found to have used consistently good experimental techniques; most were found to be poor in many respects.’ (Rice, 1981) Education is required of the numerical software industry, although there are important pockets of expertise, e.g. at the Numerical Algorithms Group Ltd., the National Physical Laboratory and Argonne National Laboratory, particularly in regard to the development of mathematical software libraries and packages and other library-standard software. Expectations of what software testing can undertake automatically are too high: ‘One particular company was distressed to discover that existing tools were unable to take an arbitrary piece of real-time software and perform a systematic test entirely automatically. The fact that the software producers were expected to participate was a major blow.’ (Hennell, 1991) One would hope that such naivety is rare in industry and is, of couse, unknown in the better companies. Numerical software is commonly tested using fixed agreed data sets. This, the most obvious form of black-box testing, is eminently sensible in that the approach mimics the use of the software in the field. It is analogous to the common practice of testing measuring instruments with ‘known’ artefacts, where the artefacts have similar properties to those of the production workpieces that are to be measured in the inspection room (Peggs, 1991). Difficulties are presented by data-sets testing due to properties of the computer’s floating-point unit (different wordlengths, different rounding rules, etc., that vary across machine ranges) and different compilers (various orders of executing expressions, nonoptimizing or optimizing, etc.). To avoid confounding errors from a number of sources, as far as possible the numerical software tester prefers to eliminate or minimize these influences by first ensuring the software is used in an environment in which the floatingpoint unit and the compiler have been validated (Du Croz and Pont, 1984; Wichmann and Ciechanowicz, 1983). Some of the best available software has been tested by means of data sets that seriously stretch the software. All routines in the NAG Library (Ford et al., 1979), DASL (Anthony and Cox, 1987; Cox, 1987), LINPACK (Dongarra et al., 1979) and EISPACK (Garbow et al., 1977; Smith et al., 1976) have been so tested, in addition to the use of more typical data. 176 B. A . WICHMANN AND M. G . COX 5.2.1. Reference software Reference software is software developed to solve exactly the same problem as the software under test. It differs from field software in that it will have been written to extremely stringent requirements that will, for example, sacrifice speed of execution for reliability, or have considerably greater memory requirements. Reference software is therefore not just a question of design diversity, but a means of achieving quality at a significantly higher cost than could perhaps be justified in the field. Reference software is designed (hopefully) to have a high probability of computing the correct result corresponding to input data within its domain of applicability. It can then be used as the basis for data-sets testing, viz. to generate reference results for a range of data sets. For a range of metrology software, reference software was constructed independently by PTB and NPL (NPL, 1991; Cox, 1992) using totally different underlying algorithms and the results on data sets were compared before the reference software was accepted. 5.2.2. Reference data sets Reference data sets are synthesized data sets that mimic actual problem data. They often also include ‘difficult’ data sets (e.g. data sets at the boundary of the domain of applicability of the algorithm under test). In conjunction with reference software they constitute, in addition to testing, a valuable tool for product development and permit the simulation of almost any attribute required of real data. Reference data sets are used as follows. (1) Apply the reference software to the reference data sets to produce reference results. (2) Apply the software under test to the reference data sets to produce test resulfs. (3) Carry out a comparison of test and reference results. For specific problems, the data sets may include mathematical functions. For example, in numerical integration a set of ‘model’ definite integrals with various mathematical behaviour, and known analytical solutions and hence precise numerical answers, has been used to test quadrature software (Casaletto ef al., 1969). By making the reference data sets varied in character, many of the properties of the software can be checked, as follows. (a) Various paths in the software, e.g. to handle normal and exceptional cases, can be explored. (b) Weaknesses, e.g. unstable numerical methods, can be exposed. (c) Restrictive algorithms, i.e. with a limited domain of applicability, can be identified. Reference data sets have been used very successfully in measurement science to test software widely used for least-squares geometric-element fitting (Drieschner et al., 1991; Porta and Waeldele, 1986). 5.2.3. Data generators Reference data sets can be constructed manually, but this is a time-consuming and error-prone task. Such time is better spent in designing software-data set generators- SOFTWARE COMPONENT TESTING STANDARDS 177 to produce the data sets automatically. The generators are controlled by various parameters that relate to required general properties of the data sets and also to any specific features required. The range of parameters should be such that ‘severe’ data sets can be generated in order to test the software at the extremities of its specification (Anthony and Cox, 1984; Myers, 1979). A further advantage of using a data generator is that in some instances the generator can be designed to derive data for which the correct solution is known a priori, thus obviating the need to develop reference software. As an example, consider testing regression (linear or nonlinear least-squares optimization) software. A data generator can be constructed that, given a set of parameters defining a regression function (a straight line is one of the simplest cases), a set of data points can be determined with the property that the best-fitting function to the data set is the prescribed regression function. Moreover, the data generator can be controlled in various ways (in terms of the number of data points and the scatter of the data about the function). 5.2.4. Floating-point error analysis and perturbation analysis Floating-point error analysis and perturbation analysis per se (Wilkinson, 1963; 1965) fall outside the scope of this paper. These disciplines are relevant, however, when the results of such an analysis are used to support data-sets testing. Very many of the underlying algorithms have provable mathematical properties when implemented in floating-point arithmetic, and so prior knowledge of the degree of agreement between the observed result and the expected result is often available. The computed results can readily be checked for the extent to which they satisfy the problems to which they purportedly correspond. 5.2.5. Modularity Software testing is greatly simplified when maximum use is made of previously tested modules. Extensive use of modularity is made in the numerical software libraries NAG (Ford et al., 1979) and DASL (Anthony and Cox, 1987; Cox, 1987), for instance. The testing process can then to a great extent concentrate on newly produced software and on its interfacing to the tested modules. Some of the advantages of individually-testable modules and the extensive re-use of modules have been reported (Jacobs and Markham, 1990). 5.2.6. The NAG Library approach A seminal paper (Ford et al., 1979) on the NAG Library ‘machine’ describes an integrated approach to software development which includes testing as one facet. This approach is in use today at NAG, which ports its library to over 60 platforms in single and double precision versions. As far as possible, high quality is engineered into NAG products (considerable defensive programming, highest-possible portability, use of software transformation tools, etc), although considerable testing is undertaken for each distinct implementation. Each implementor is required to ensure that the software-which passes as many times as necessary through a cycle involving an author, a contributor and 178 B. A . WICHMANN AND M. G . COX a validator-xecutes correctly for a wide range of data sets, including boundary cases (Myers, 1979). Toolpack (Iles, 1984) and other software transformation tools are used to produce variants automatically for different machine ranges and to assist with language-language translation. The overall effect of the approach is to reduce significantly the amount of subsequent testing that would otherwise be necessary. Nevertheless, the use of stringent test programs or demonstration programs forms a very significant aspect of NAG software development. As far as possible, all modes of operation of the software including all programmed failure exits are tested, and extreme cases are included in an effort to make the software fail. In particular, all available knowledge of the problem that the software attempts to solve is considered in order to design suitable data sets that try to explore as many paths as possible (see below). The development of suitable demonstration programs is itself challenging and can require an effort comparable to that of producing the routine itself (Smith, et al., 1974). 5.2.7. Use of problem solution properties The algorithm implemented by the software will introduce errors of various types. It is possible to capitalize on the properties of some such errors in order to assess the degree of immunity of the software under test to them. For example, one class of error is resfufemenferrors, which arise from a restatement of the problem to be solved, and reflect the problem’s invariance properties. Simple examples are: the effects of scaling the data, translating the data and permuting the data in a case when theoretically an algorithm is invariant with respect to such data changes. The results of such data changes can often be readily predicted and the results obtained on such data sets can be compared. A very simple change to the data can be extremely effective in testing adaptive quadradx and - f i f ( x ) dx will in general be produced. ture routines. Different estimates of Jy(x) If the required integration error is to be bounded by 6, say, and the results of the two applications of the quadrature routine differ by more than 26, the routine is not performing to specification. The different results may arise because the quadrature scheme subdivides the interval in a directed manner, e.g. from left to right, for example. The mathematical properties of the solution can also obviate the need for major aspects of data-sets testing. This use, termed self-validation or self-checking (Anthony and Cox, 1984) is not yet in wide use, despite the fact that it offers considerable potential. It involves providing confirmation that the computed solution satisfies the original mathematical problem statement. As a simple example, does the sphere computed to pass through four given non-coplanar points actually do so? A check therefore is the extent to which the four data points satisfy (a suitable form of) the sphere equation for the calculated radius and centre coordinates. Self-validation processes are sometimes straightforward to implement by the software development engineers as a check on their own software. Particular advantages are that the ‘true’ result is not required and that the test is automatically carried out every time the software is executed. Such checks may not be as appropriate as the use of data sets for third-party testing, however. The approach has been used for checking roundnessassessment software (Anthony and Cox, 1986) according to the national standard (BSI, 1982), and will become growingly relevant in the context of self-validating numerical algorithms within mathematical sub-routine libraries (Linz, 1991). SOFTWARE COMPONENT TESTING STANDARDS 179 5.2.8. Importance to standards and embedded systems: software traceability As virtually all scientific and technological fields become increasingly dependent on the computer, so numerical software assumes ever greater importance. Many products are now required to be produced or assessed by systems that are traceable to national, international or industrial standards. Since such systems often have an integrated computer and embedded numerical software, the sofhvare traceability of this part of the system also needs to be addressed. There are various aspects that need to be covered in order to ensure that software can be considered traceable. A partial list of considerations is: (a) (b) (c) (d) (e) a rigid mathematical basis; good numerical analysis; careful algorithm design; sound software implementation; and an authoritative testing strategy. It is recognized that a major step towards traceability will result from maximizing the use of standard computational modules. 5.3. Testing Numeric Operations Schryer reported (Schryer, 1981) many years ago that significant bugs can appear in the basic floating point operations of machines. With large word-length machines, it is easy to see that exhaustive testing is impossible (more than 1030 test cases!). Moreover, naive random testing is also ineffective in locating bugs. (Schryer reports that his test program discovered a bug in a few seconds as opposed to many centuries for the naive random tests then being undertaken by the supplier.) More recent experience confirms these findings (Du Croz and Pont, 1984). The main problems with numeric operations occur at the end points, and also with special cases in the conventional algorithms. It is therefore quite easy to bias random tests in the areas most likely to cause problems. The algorithms are well-known which gives an insight into the most effective test cases. Experience with integer testing reveals a similar pattern (Wichmann, 1991). On a two’s complement machine, the additional most negative value causes problems since it is the only value with overflow on negate, absolute value, multiplication by -1 and division by -1. The conclusion from this is that purely random tests are unlikely to be effective for simple algorithms, but carefully biased random tests can be very effective. (For this reason, the Pascal Program Generator (Wichmann and Davies, 1989) biases the random integers it uses.) 5.4. Compiler Validation Testing compilers is at the other end of the spectrum of logical complexity. No production compiler can be expected to be bug-free yet bugs couId severely impact other software projects, so some testing is clearly important. 180 B. A. WICHMANN AND M. G. COX Modern language standards make requirements on language processors which allow very effective black-box testing of compilers. This trend started with Pascal, the results of which have been reported by Wichmann and Ciechanowicz (1983). The conclusions from this work can be summarized as follows. An essential requirement for good testing is a very accurate specification. Language standards that have been subject to international review are typically just good enough. Testing can only be undertaken effectively if the specification is written with a view to testability. For instance, if a specification makes no mention of the handling of incorrect input, testing is not feasible. Programming language standards are generally good from the point of view of testing, but there are exceptions. A vital part of a program specification is the handling of incorrect input. Hence, the language standards for FORTRAN and COBOL are inadequate for good testing since they have no requirement to reject non-conforming programs. Error guessing is a vital aspect of devising good functional tests. The Pascal Validation Suite was checked for ‘completeness’ by ensuring that all statements in a Pascal compiler front-end were executed (Ciechanowicz and De Weever, 1984). Although these tests were specific to this front-end, it did ensure a good coverage. A test suite that remains fixed for several years does not perform a useful purpose. Certain language features in Pascal proved to be much more difficult to test than was initially thought. For instance, actual bugs in handling the for-statement continued to appear in validated compilers for many years causing additional tests to be added. It has not proved possible to test the back-end or code generator part of a compiler as completely as the front-end. This issue has been considered by Wichmann and Davies (1989) and is considered in the next sub-section. 5.5. Compiler Code Generator It is difficult to test the code generator part of a compiler. The main reason for this is that the only practical way to generate test cases is via the front-end of the compiler. However, the mapping between the source code and the input to the back end is nontrivial, so that there is no straightforward way to execute a specific statement (say) of the back end. This problem is compounded by the machine-dependent aspect of the back end. As an example of the problems, consider the following bug which was present in the Algol W compiler for the first ten years of its release without its ever being reported as a bug. The compiler was machine-specific, generating code for the IBM 360 series. The classic Algol display was held in as many of the 16 general purpose registers as required. (The compiler limited programs to a nested block depth of 16.) Any registers left were allocated to the evaluation of expressions. Unfortunately, there was a bug in the register allocation routine when an odd-even pair of registerk were needed for integer division and the available registers were nearly, but not quite, exhausted. The point about this bug was that the risk of its occurring in a user program was very low. Conventional test methods would also have difficulty in locating this problem. It SOFTWARE COMPONENT TESTING STANDARDS 181 was only discovered during a re-write of the back-end when the actual source code of the compiler was reviewed. However, it appears that this form of bug is not unusual in compilers. The question arises as to the best method to detect such compiler bugs. The obvious answer is to generate random, self-checking programs as originally proposed by Hanford (1970). The method of checking the code generator of a compiler by means of randomly selected programming constructs is a highly specialized example of random test case generation. As noted here, it can be very effective in finding bugs in compilers, but its use elsewhere will depend upon the random selection process and the ease with which test cases can be checked. The self-checking of the compiler test cases makes the method highly effective. Random testing is not one of the methods in the BCS standard since, in general, it is not repeatable. Other experience of this method with compilers is equally encouraging (Bazzichi and Spadafora, 1982; Bird and Munoz, 1983). Experience with the Pascal and Ada versions of this technology is encouraging: it locates bugs similar to the Algol W one noted above. Moreover, one compiler-writer (Wichmann and Davies, 1989) has stated that the bugs found could arise in ordinary user’s programs in spite of the generated test cases being very unlike user-code. The general conclusion from this is that carefully constructed random tests are effective in locating bugs. For another example of random testing of Unix utilities, see Miller ef al., (1990). 6. CONCLUSIONS To be useful for a measured approach to software quality, a testing standard should be as follows. (1) It should be objective. The subjective nature of informal testing undertaken by most suppliers is such that the phrase ‘it has been tested’ is meaningless. Repeatable tests with clearly defined results must underlie the testing process. (2) It should be enough for users. Any formalized system must provide enough guarantees that even the minimal level of testing ensures a significant evidence of software quality. (3) It should be practical for suppliers. It is easy to specify a level of test that is totally uneconomic. A simple and measurable statement of what to do, why and when (in the software lifecycle) is needed. (4) It should match the perceived risks. There is a wide range of software, and any testing approach that only addresses the most critical software will have little impact on the majority of software developers. (5) It should have conceptual simplicity. Effective testing requires resources. In consequence, all parties must have a clear understanding of the implications of the level of testing envisaged, in both costs and benefits. The white-box testedness metrics (like statement coverage) have a proven value. The problem is that the cost of obtaining 100% coverage of even the statement metric can be quite high. Hence any testing statement (or policy) would have to take into account the logical complexity of the program. How can this be done? Merely counting jumps and other control structures is not sufficient. The control structure of a validation program 182 B. A. WICHMANN AND M. G. COX ~ can be quite complex but is typically easy to test, while exactly the opposite is true for a compiler code generator. With the fourteen test techniques catalogued in the BCS standard, it would be highly advantageous if their relative strength could be determined. However, apart from the inclusion relationship (e.g. branch testing implies statement testing), any other relationship is likely to be very approximate. Of equal interest, and rather easier to determine, is the relative cost of the test techniques (which depends upon the degree of tool support). The approach advocated here is to develop Appendix G of the BCS proto-standard to meet the five points above. Some comments on the changes needed are as follows: It should be objective. The BCS proto-standard has been designed from the outset to be objective, and hence this requirement is already satisfied. It is for this reason that the BCS work is being advocated as the starting point. It should be enough for users. It is easy to specify high testedness metric values that would give the users the necessary guarantees. For bespoke software, users have a responsibility to specify the testing requirements explicitly. It should be practical for suppliers. This is a major problem due to the cost implications of high testedness metric figures. However, this is only a real problem if the testedness ratios are specified in advance. It should match the perceived risks. In principle, this is easy to achieve within the framework of the BCS standard, since the metrics used can be based upon the risks. It should have conceptual simplicity. The fourteen testing methods defined within the standard are very confusing for the non-expert. However, if the standard is mainly used to define the testing undertaken after the event, then only the relative few testing methods actually used need be considered. eventual goal with a development of the BCS standard would be to specify the testing in advance of the software development. In general, this must be the responsibility of the software procurer (or developer for products), and hence the BCS standard can only provide a framework for this specification. The criterion of objectivity has resulted in two test methods being rejected which have been shown to be effective. These are as follows. (i) Random testing. This could be made objective by the use of a specific method to obtain the randomness, such as the (repeatable) pseudo-random number generator used by Wichmann and Davies (1989). If the random data used can be shown to be representative of actual use, the reliability claims can be made, based upon the testing as noted by Littlewood and Strigini (1993). (ii) Error guessing. It would appear impossible to make such a method objective. Indeed, the experience of the tester is vital here. Error reports from one (existing) system could be used to derive tests for a similar system under development. The experience from the mathematical area gives great weight to the re-use of proven modules. Re-use requires an interface that permits independent testing, and if a high degree of re-use can be expected, comprehensive testing can be justified in economic terms. SOFTWARE COMPONENT TESTING STANDARDS 183 Acknowledgements This paper has benefited from critical comments from many people including Roger Scowen (NPL), Nick North (NPL), David Schofield (NPL), John Kershaw (RSRE), Dorothy Graham (Grove Consultants), Brian Marwick (Testing Foundation), Richard Hall (GEC Avionics) and Martyn Ould (Praxis). The work has been undertaken with support from the Department of Trade and Industry’s Software Quality Unit. The views expressed here are those of the authors and not necessarily those of the reviewers noted above. Numerous comments from the three referees has hopefully improved the presentation of the material. References Alvey (19854, ‘Glossary of terms (Deliverable A16)’, Alvey Test Specification and Quality Management Project SE/031. ANSI (1983) ANSUIEEE Std 829: 1983, Standard for Software Test Documentation. ANSI (1987) ANSI/IEEE Std 1008:1987, Standard for Software Unit Testing. Anthony, G. T. and Cox, M. G. (1984) ‘The design and validation of software for dimensional metrology’, Technical Report DITC 50/84, National Physical Laboratory, Teddington, U .K. Anthony, G. T. and Cox, M. G. (1986) ‘Reliable algorithms for roundness assessment according to BS3730’, in M. G. Cox and G. N. Peggs (eds), Software for Co-ordinate Measuring Machines, National Physical Laboratory, Teddington, U.K., pp.30-37. Anthony, G. T. and Cox, M. G. (1987) ‘The National Physical Laboratory’s Data Approximation Subroutine Library’, in J. C. Mason and M. G. Cox (eds), Algorithms for Approximation, Clarendon Press, Oxford, U.K. pp.669-687. Bazzichi, F. and Spadafora, I. (1982) ‘An automatic generator for compiler testing’, IEEE Tramactions on Software Engineering, 8 (4), 343-353. Bird, D. L. and Munoz, C. U. (1983) ‘Automatic generation of random self-checking test cases’, IBM Systems Journal, 22 (3), 229-245. BSI (1982) BS 3730: Assessment of Departure from Roundness, British Standards Institution, London, U.K. BSI (1988) BS 5887: Code of Practice for Testing of Computer-based Systems, British Standards Institution, London, U. K. Casaletto, J., Pickett, M., and Rice, J. (1969) ‘A comparison of some numerical integration programs’, SIGNUM Newsletter, 4 (3), 30-40. CESG (1991) ‘Information technology security evaluation criteria: provisional harmonised criteria, Version 1.2. (U.K. contact point: CESG Room 2/0805, Fiddlers Green Lane, Cheltenham, Glos, GL52 5AJ.) Ciechanowicz, Z. J. and De Weever, A. C. (1984) ‘The “completeness” of the Pascal Test Suite’, Software-Practice and Experience, 14 ( 5 ) , 463-471. Cox, M. G. (1987) ‘The NPL Data Approximation Subroutine Library: current and planned facilities’, NAG Newsletter, 2/87, 3-16. Cox, M. G. (1992) ‘Improving CMM software quality’, Technical Report DITC 194/92, National Physical Laboratory, Teddington, U.K. Dongarra, J. J., Moler, C. B., Bunch, J. R. and Stewart, G. W. (1979) LINPACK User’s Guide, Society for Industrial and Applied Mathematics, Philadelphia, U.S.A. Drieschner, R., Bittner, B., Elligsen, R. and Waeldele, F. (1991) ‘Testing coordinate measuring machine algorithms: Phase II’, Technical Report BCR EUR 13417 EN, Commission of the European Communities. DTI (1992) ‘TickIT: Making a better job of software’, Guide to Software Qualiv Management System Construction and Cert@cation using EN29OO1, Issue 2.0. Du Croz, J. and Pont, M. (1984) ‘The development of a floating-point validation package’, NAG Newsletter, 3/84, 3-9. Ford, B., Bentley, J., du Croz, J. J. and Hague, S. J. (1979) ‘The NAG Library “machine”’, Software-Practice and Experience, 9 (l), 65-72. Garbow, B. S., Boyle, J. M., Dongarra, J. J. and Moler, C. B. (1977) Matrix Eigensystems Routines: EISPACK Guide Extensions, Lecture Notes in Computer Science, Vol. 51, Springer-Verlag, Berlin, Germany. 184 B. A. WICHMANN AND M. G . COX Gelperin, D. and Hetzel, B. (1988) ‘The growth of software testing’, Communications of the ACM, 31 (6), 687-4395. Graham, D. R. (ed.), (1990) A Standard for Software Component Testing, version 1.2. British Computer Society Specialist Group in Software Testing. Graham, D. R. (1991) Computer Aided Software Testing: CAST Report, Unicorn Seminars, London, U.K. Hanford, K. V. (1970) ‘Automatic generation of test cases’, IBM Systems Journal, 9 (4), 242-257. Hennell, M. A., Woodward, M. R. and Hedley D. (1976) ‘On program analysis’, Information Processing Letters, 5 (5), 136-140. Hennell, M. A. (1991) ‘How to avoid systematic software testing’, Software Testing, Verification and Reliability, 1 (l), 23-30. Holscher, H. and Rader, J. (1986) Microcompurers in Safety Technique, TUV Study Group on Computer Safety, Verlag TUV Bayern, TUV Rheinland, Germany. IEC (1986) IEC 880:86: Software for Computers in the Safety Systems of Nuclear Power Stations. IEC (1989) IEC/SC65AIWG9/45: Software for Computers in the Application of Industrial Safetyrelated Systems, 3rd draft. Iles, R. (1984) ‘Toolpack support for Fortran programmers’, NAG Newsletter, 2/84, 16-21. Ince, D. (1991) ‘Software testing’, in J. McDermid (ed.), Software Engineer’s Reference Book, Chapter 19, Butterworth-Heineman. Jacobs, D. A. H. and Markham, G. (1990) Experiences with some software engineering practices in numerical software’, in M. G . Cox and S. J. Hammarling (eds), Reliable Numerical Computation, Oxford University Press, Oxford, U.K., pp.277-296. Kitson, D. H. and Humphrey, W. S. (1989) ‘The role of assessment in software process improvement’, Software Engineering Institute, SEI-89-TR-3. Linz, P. (1991) ‘Algorithms for the next generation of numerical software’, NAG Newsletter, 2/91, 3-6. Littlewood, B. and Strigini, L. (1993) ‘Validation of ultra-high dependability for software-based systems’, Communications of the ACM, to be published. Miller, B. P., Frederiksen, L. and So, B. (1990) ‘An empirical study of the reliability of UNIX utilities’, Communications of the ACM, 33 (12), 3242. MOD (1991) Interim Defence Standard 00-55: The Procurement of Safety Critical Software in Defence Equipment (Part 1: Requirements; Part 2: Guidance), Ministry of Defence. MUSiC (1993) Metrics for Usability Standards In Computing, ESPRIT I1 Project 5429 (Contact: M. Kelly, Brameur or N. Bevan, NPL). Myers, G. J. (1979) The Art of Software Testing, Wiley, New York. NATLAS (1985) Software Unit Test Standard and Method, N19. NPL (1991) ‘LSGE: Package of algorithms for least-squares geometric elements’, Document ITC h 069, National Physical Laboratory, U.K. Paulk, M. C., Curtis, B. and Crissis, M. B. (1991) ‘Capability maturity model for software’, CMU/ SEI-91-TR-91. Peggs, G. N. (1991) ‘A review of the methods for the accurate metrology of complex threedimensional components and artefacts’, Technical Report MOM 101, National Physical Laboratory, Teddington, U.K. Porta, C. and Waeldele, F. (1986) ‘Testing of three coordinate measuring machine evaluation algorithms’, Technical Report BCR EUR 10909 EN, Commission of the European Communities. Rapps, S. and Weyuker, E. J. (1985) ‘Selecting software test data using data flow information’, IEEE Transactions on Software Engineering, 11 (4), 367-375. RCTA (1993) ‘Software considerations in airborne systems and equipment certification’ (DO-178B), Requirements and Technical Concepts for Aviation. 1140 Connecticut Avenue, N.W., Suite 1020 Washington, D C 20036, U.S.A. Rice, J. R. (1981) Matrix Computations and Mathematical Software, McGraw-Hill, New York, U.S.A. Schryer, N. L. (1981) ‘A test of a computer’s floating-point unit’, Computer Science Technical Report No. 89, AT&T Bell Laboratories, Murray Hill, New Jersey, U.S.A. Smith, B. T., Boyle, J. M. and Cody, W. J. (1974) ‘The NATS approach to quality software’, in Software for N u m e r i d Mathematics, Academic Press, London, U.K., pp.393-405. SOFTWARE COMPONENT TESTING STANDARDS 185 Smith, B. T.,Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., Klema, V. C. and Moler, C. B. (1976) Matrix System Routines-EISPACK Guide, Lecture Notes in Computer Science, Vol. 6, 2nd edn, Springer-Verlag, Berlin, Germany. Thompson, K. (1991) ‘A method for assessing organisational software development capability’, EuroCASE 111 Conference. White, L. J. (1987) ‘Software testing and verification’, in M. Yovits (ed.) Advances in Computers, Vol. 26, Academic Press, pp.335-391. Wichmann, B. A. and Ciechanowicz, Z. J. (eds), (1983) Pascal Compiler Validation, Wiley. Wichmann, B. A. and Davies, M. (1989) ‘Experience with a compiler testing tool’, NPL Report DITC 138189. Wichmann, B. A. (1991) ‘The Language Compatible Arithmetic Standard and Ada’, NPL Report DITC 173/91. Wilkinson, J. H. (1963) Rounding Errors in Algebraic Processes, Notes in Applied Science No. 32, Her Majesty’s Stationery Office, London, U.K. Wilkinson, J. H. (1965) The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, U.K. Woodward, M. R., Hedley, D. and Hennell, M. A. (1980) ‘Experience with path analysis and testing of programs’, IEEE Transactions on Software Engineering, 6 (3), 278-285.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running Head: SOFTWARE TESTING AND STANDARDS

Software testing and standards.
The course ID and course name.
Name of the student and student ID:

1

SOFTWARE TESTING AND STANDARDS

2

Software testing and standards.
Introduction.
Software testing is one of the most critical part in the process of software development
because it acts as a guarantee that the software is able to perform the intended task. Software
testing ensures that the developer followed all the requirements outlined by software owner. It
eliminates any additional cost that might be incurred by the person who purchases the software
because of accurate documentation which explains the software functionality. This paper will
provide an analysis of ISO/IEC/IEEE 29199 software testing standard. ISO/IEC/IEEE 29199 is a
standardization which entails systems and software engineering testing in order to create a set of
standards that are agreed internationally.
Questions for the standard.
1. Name of the standard.
ISO/IEC/IEEE 29199 software testing standard.
2. Who holds the copyright for the standard?
ISO standards.
3. Universities which were involved.
King Faisal University, college of computer science and information technology,
department of computer...

Similar Content

Related Tags