|
|
How to Improve Safety Critical Systems Standards
Norman Fenton Centre for Software Reliability City University London
Abstract
1 Introduction and Background Between 1990 and 1994 researchers at CSR City University were involved in a collaborative project (SMARTIE) whose primary objective was to propose an objective method for assessing the efficacy of software engineering standards [Pfleeger et al 1994]. The method was based on the simple principle that a software standard is effective if, when used properly, it improves the quality of the resulting software products cost-effectively. We considered evidence from the literature and also conducted a small number of empirical studies of specific company standards. We found no evidence that any of the existing standards are effective according to our criteria. This will come as no surprise to anybody who has sought quantitative evidence about the effectiveness of any software engineering method or tool. However, what concerned us more was that, in general, software engineering standards are written in such a way that we could never determine whether they were effective or not. There was certainly no shortage of standards to review. We came across over 250 standards (from various international and national bodies) that we considered to fall within the remit of software engineering. The common feature of all of them was that they define some aspect of perceived best practice relevant for developing or assuring high quality software systems or systems with software components. Unfortunately, there is no consensus about what constitutes best practice, and it follows that there is no consensus as to how to distinguish those best practice techniques that should always be applied. Thus, for standards of similar names and objectives we came across very different models of software quality and the software development process. This was especially true of the safety critical software standards; of which IEC SC65A [IEC 1992] and DEF-STAN 00-55 [MOD 1991] were two significant examples. We discovered the following general problems in the standards we reviewed:
In this paper we propose a framework for improving standards. The approach (which is based much on the SMARTIE philosophy) is applicable to any software standards, but is especially pertinent to the safety critical ones. The latter can be viewed as simply the most demanding of the software standards; if you remove the safety integrity requirements material from such standards then they can be applied to any software system with high quality requirements. Our framework for interpreting standards is to view a standard as a collection of requirements that developers have to comply with and to which assessors have be able to determine conformance. In Section 2 we discuss the notion of clarity and objectivity in these respects. Our objective is to provide recommendations on how to rationalise and refine standards in such a way that we move toward the scenario where at least the obligations for the assessor are clear and objective. In Section 3 we explain how to classify requirements according to whether they focus primarily on one of three categories: process, product, or resource. Using this classification, we show how the safety critical standards concentrate on process and resource requirements at the expense of clear product requirements. We explain how to shift the focus toward the product requirements. In Section 4 we explain how requirements could be interpreted in such a way that there is greater objectivity, especially for the assessor. Our emphasis is on how we can interpret and use standards despite their current weaknesses. We do not question the importance of standards to safety critical systems development. However, clearly some standards are better than others and some requirements are more important than others, even though a priori we do not know which. Thus in Section 5 we discuss the need for assessing the effectiveness of standards, and describe the basic principles behind a measurement-based procedure. Throughout the paper we concentrate on the recently issued, and highly significant IEC1508 [IEC 1995] (of which Parts 1 and 3 are relevant) as an example of applying our method. This standard is the updated version of IEC SC65A. 2 Clarity of Requirements in Standards A standard is a collection of individual requirements. Our main concern is to consider the clarity of each mandatory requirement in the following two keys respects:
Generally, obligation (2) will follow from (1). For example, in IEC 1508, Part 1, there are a number of requirements concerning the Safety Plan. Of these 6.2.2e asserts: that the Safety Plan shall include a description of the safety lifecycle phases to be applied and the dependence between them. The developer knows that certain specific information must appear in the document. The assessor only has to check that this information is there. Conversely, however, it is not necessarily true that (1) will follow from (2). For example, for the software safety lifecycle we have:
Obligation (2) is clear. The assessor has, strictly speaking, only to check the existence of a specific report for each specified phase. However, the developer's obligations for the requirement is unclear; a subsequent requirement (in the software verification section) sheds little light on what constitutes an acceptable verification report:
Unfortunately, in a key standard like IEC 1508 most requirements are unclear in both respects. For example, requirement 7.4.6.1a asserts that:
It is unclear what is expected of developers, while an assessor could only give a purely subjective view about conformance. In traditional engineering standards it is widely accepted that the necessary clarity for both obligations (1) and (2) have to be achieved for all requirements [Fenton et al 1993]. Partly because of the immaturity of the discipline, software engineering standards do not have this clarity. Our objective here is to provide recommendations on how to rationalise and refine standards in such a way that we move toward the scenario where at least the obligations for the assessor are clear and objective. 3 Classifying requirements in standards 3.1 Processes, Products, and Resources Our approach to interpreting standards begins by classifying individual requirements according to whether they focus primarily on processes, products, or resources: A Process is any specific activity, set of activities, or time period within the manufacturing or development project. Examples of process requirements are:
A Product is any new artefact, deliverable or document arising out of a process. Examples of product requirements are:
A Resource is any item forming, or providing input to, a process. Examples include a person, a compiler, and a software test tool. Examples of resource requirements are:
Ideally, it should be absolutely clear for each requirement which process, product, or resource is being referred to and which property or attribute of that process, product, or resource is being specified. The example requirements above are reasonably satisfactory in this respect (even though they do not all have the desired clarity discussed in Section 2). However, in many requirements, it is necessary to tease out this information. Consider the following examples,
Although this refers explicitly to the software production process, this requirement really only has meaning for the resulting product, namely the source code. Moreover, the three specified product attributes are quite different and should be stated as separate requirements (preferably in measurable form as discussed below in Section 4).
Although this requirement refers to two processes (design and modification) its primary focus is a resource, namely the design method. Three very different attributes of the method are specified. The reference to modification is out of place here, since the specified properties are only conjectured to be beneficial when subsequent modifications take place.
This is strictly speaking a combination of two separate requirements (and should be treated as such). One is a product requirement: the existence of a document (Software Module Test Specification) to accompany each module. The other is a process requirement that specifies that a certain type of testing activity has to be carried out. The following requirement also says something about the testing process, but is driven by much more specific properties of the product (and hence we would classify it as a product requirement):
The above classification of standards' requirements represents only the first stage in our proposed means of interpreting standards. It is important because it forces us to identify the specific object of the requirement, and to naturally seek clarification where this is unclear. As a final example, consider the following requirement:
By thinking about our classification we can interpret this rather vague and confusing requirement. First of all we tease out the fact that this is a product requirement, but that there are two levels of product being considered: the software as a whole; and the set of individual functions which are being implemented. We need to break up the requirement into the following sub-requirements:
3.2 Internal and external attributes For product requirements, we make a distinction between attributes which are internal and those which are external. An internal attribute of product X is one that is dependent only on product X itself (and hence not on any other entity, be it another product, process or resource). For example, where X is source code, size is an internal attribute. An external attribute of a product X is one that is dependent on some entities other than just product X itself. For example, if product X is source code then the reliability of X is an external attribute. Reliability of X cannot be determined by looking only at X; it is dependent on the machine and compiler running X, the person using X, and the mode of use. If any of these are changed then the reliability of X can change. We have already seen numerous examples of external attributes in the above requirements (testability, maintainability, readability). Attributes like modularity (in 7.4.5.3) can, with specific definitions, be regarded as internal [Fenton and Pfleeger 1996]. The distinction between internal and external attributes is now a widely accepted basis for software evaluation. Clearly, external attributes are the ones of primary concern, especially as our ultimate objective here is to determine acceptance criteria for safety critical systems. This means that we have to determine whether the system's external attributes like safety, reliability, and maintainability are acceptable for the system's purpose. In practice, these attributes cannot be measured directly. We may be forced to make a decision about the acceptability of these attributes before the system is even extensively tested. This means that we are forced to look for evidence in terms of internal product attributes, or process and resource attributes. Requirements in standards which simply state that certain desirable external attributes should be present are invariably vacuous and should be removed (since they are nothing more than objectives). 3.3. Balance between types of requirements The Oxford Encyclopaedic English Dictionary defines a standard as
This definition conforms to the widely held intuitive view that standards should focus on specifying measurable quality requirements of products. Indeed, this is the emphasis in traditional engineering standards. This point was discussed in depth in [Fenton et al 1993] which looked at specific safety standards for products (such as pushchairs). These explicitly specify tests for assessing the safety of the products. That is, they provide requirements for an external attribute of the final product. The measurable criteria for the testing process are also specified. There is therefore a direct link between conformance to the standard and the notions of quality and safety in the final product. Standards such as BS4792 [BSI 1984] also specify a number of requirements for internal attributes of the final product, but only where there is a clearly understood relationship between these and the external attribute of safety. We contrast this approach with software safety standards. Very few requirements in these standards are well-defined product requirements. For example, [Fenton: et al 1993] provided a detailed comparison of the requirements in BS 4792 with those of DEF-STAN 00-55. The latter consists primarily of process requirements (88 out of a total 115 with 14 internal product and 13 resource requirements. There is not a single external product requirement. In contrast, BS 4792 consists entirely of product requirements (28 in total) of which 11 are external. The distribution of requirements in 00-55 seems fairly typical of software standards studied in SMARTIE. The standard IEC 1508 is slightly different in that there is a very large number of resource requirements, but again we find far more process than product requirements. The difference between requirements in standards such as 00-55 and IEC 1508 compared with those in BS 4792 is that, generally, there is no conclusive evidence that satisfying them will help achieve the intended aim of safer systems. For example, the following are typical internal product requirements from IEC 1508:
Each of these (which would need further clarification to be usable anyway) represent particular viewpoints about internal structural properties that may impact on system safety. Unfortunately, there is no clear evidence that any of them really do [Fenton et al 1994]. The many process and resource requirements in standards such as IEC 1508 have an even more tenuous link with final system safety. 4 Classifying standards' requirements by level of objectivity The above classification of standards' requirements into process, product, or resource represents only the first stage in interpreting standards. The next stage is to further classify the requirements according to the ease with which we can assess conformance. Our objective is to identify the rogue requirements. These are the requirements for which the assessor's obligation (as discussed in Section 2) is unclear; that is, where an assessment of conformance has to be purely subjective. Assuming that a requirement refers to some specific, well-defined process, product or resource, we distinguish four degrees of clarity for each requirement (as shown in Table 1
Table 1. Codes for degree of detail given in any requirement Ideally, the vast majority of requirements should be in categories ** and *** (with a small number of necessary Rs for definition). In the IEE pushchair safety standard BS4792 every one of the 28 requirements is in category ***. Although IEC 1508 is more objective than the vast majority of software standards reviewed during SMARTIE (and is indeed a significant improvement on its earlier draft IEC SC65A), many requirements (including most of the examples presented so far) still fall into the R and * category. This means that conformance to such requirements can only be assessed purely subjectively. It is difficult to justify their inclusion in a safety critical standard. How are we to assess, for example, requirements such as:
It would be near impossible to convincingly whether it is satisfied or not, so it is effectively redundant. Alternatively, we could attempt to re-write it in a form which enables us to check conformance objectively. As long as there is mutual agreement (between developer and assessor) in the overall value of a requirement (however vague) then this is the option we propose. First of all, we stress that there is a considerable difference between
Option (a) is generally very difficult and often impossible; in an immature discipline there is even some justification for allowing a level of subjectivity in the requirements. It is only option (b) that is being specifically recommended. The following example explains the key difference between (a) and (b) and shows the different ways we might interpret requirements to achieve (b). There are generally many ways in which this can be done: Example 1: We consider how we might interpret requirement 7.4.6.1a above in order that we can assess conformance objectively. First of all we note that there are actually three separate product requirements, namely:
We concentrate on just (i) here. Consider the following alternative versions:
Each of the above versions can be checked for conformance in a purely objective manner even though a large amount of subjectivity is still implicit in each of the requirements. In the case of A we have only to check the existence of a specific document. This is a trivial change to the original requirement since we have still said nothing about how to assess whether the document adequately justifies whether the module is readable. Nevertheless we have pushed this responsibility firmly onto the developers and not the assessors. Alternative B is a refinement of A in which we identify some specific criteria that must be present in the document (and which might increase our confidence in the readability argument). For alternative C we have only to check that the module has the right measures. A simple static analysis tool can do this. In alternative D we have only to check that the rating given by the independent reviewer is a 3 or 4 and check that this person does indeed have the specified qualifications and experience. In each of the alternative versions measurement plays a key, but very simple role. In the case of version D the requirement is based on a very subjective rating measure. Nevertheless we can determine conformance to this requirement purely objectively. None of the alternative requirements except C is a requirement for which the module itself (a product) is the focus. Alternatives A and B are both requirements of a different product, while Alternative D concentrates on the results of a reviewing process . Example 1 confirms that being able to assess conformance to a requirement objectively does not mean that the requirement itself is objective. Nor, unfortunately, does it always mean that assessment will be easy. The approach that we are proposing is to move toward identifying measurable criteria to replace ill-defined or subjective criteria. This is consistent with the traditional measurement-based approach of classical engineering disciplines. Texts such as [Fenton and Pfleeger 1996] explain how to move toward quantification of many of the subjective criteria appearing in a standard such as IEC 1508. The following example further illustrates the method: Example 2: Requirement 7.2.2.5a asserts "To the extent required by the integrity level the Software Safety Requirements Specification shall be expressed and structured in such a way that it is as clear, precise, unequivocal, verifiable, testable, maintainable and feasible as possible commensurate with the safety integrity level". Each of the required attributes here (which need to be treated as separate requirements) are ill-defined or subjective. In the case of maintainable there are a number of ways we could interpret this so that we could assess conformance objectively. The most direct way is to specify a mean or maximum time in which a change to the SSRS can be made. Since such measures are hard to obtain it may be preferable to specify certain internal attributes of the SSRS, such as: the electronic medium in which it must be represented; the language in which it has to be written; that it has to be broken up into separately identifiable functions specified using less than 1000 words each; etc. Specification measures such as Albrecht's Function Points [Albrecht 1979] might even be used. A radically different approach is that of alternative D in Example 1 where we simply specify what expert's rating of maintainability has to be achieved. 5 Measurement Based Standards Evaluation So far we have concentrated on how we can interpret and use standards despite their many weaknesses. We do not question the general importance and value of standards to safety critical systems development. Nevertheless, there are very wide differences of emphasis in specific safety-critical standards. For example, 00-55 and IEC1508 are totally different in their underlying assumptions about what constitutes a good software process; 00-55 mandates the use of formal specification (and is structured around the assumption that formal methods are used), while 1508 mentions it only as a technique which is highly recommended at the highest safety integrity level (level 4). Clearly the standards cannot all be equally effective. They are certainly not equally easy to apply or assess. Therefore we have to assume that some standards are better than others and some requirements in standards are more important than others. Unfortunately, a priori we do not know which. It follows that there is a need for assessing the effectiveness of standards, especially when we consider the massive technological investments which may be necessary to implement them. What we have described so far may be viewed as a front-end procedure for standards evaluation. This is like an intuitive quality audit, necessary to establish whether a given standard satisfies some basic criteria. It also enables us to interpret the standard, identify its scope, and check the ease with which it can really be applied and checked. However, for proper evaluation we need to demonstrate that, when strictly adhered to, the use of a standard is likely to deliver reliable and safe systems at an acceptable cost. The SMARTIE project looked at how to assess standards in this respect [Pfleeger et al 1994]. The basic impediment to proper evaluation is the sheer flabbiness of the relevant standards. Many of the standards address the entire development and testing life-cycle, containing massive (and extremely diverse) sets of requirements. It makes no scientific sense, and is in any case impractical, to assess the effectiveness of such large objects. Thus we use the notion of a mini-standard. Any set of requirements, all of which relate to the same specific entity or have the same specific objective, can be thought of as a standard in its own right, or a mini-standard. Rather than assess an entire set of possibly disparate requirements, we instead concentrate on mini-standards. The need to decompose standards into manageable mini-standards is a key stage in the evaluation procedure described in [Pfleeger et al 1994]. Many software-related standards are written in a way which makes this decomposition extremely difficult. However, the software part of IEC 1508 is structured in a naturally decomposable way. We can identify seven key mini-standards in the relevant parts of IEC 1508:
The formal obligations for evaluating the efficacy of a mini-standard reduces to measuring the following criteria in a given application of the standard:-
Essentially, a mini-standard successfully passes an evaluation for a specific environment if, in such an environment, it can be shown that the greater the degree of conformance to the standard, the greater are the benefits, providing that such improvements merit the costs of applying the standard. The problem of over-emphasis on process requirements in safety-critical standards has an important ramification when it comes to the evaluation procedure. Specifically, we found that, for many process requirements, the intended link to a specific benefit is unclear. For example, 00-55 contains the requirement:
Even if we could determine objectively conformance to such a requirement¾ the appendix of the standard provides some crude guidelines for this¾ it is unclear what the specific intended benefit is. Only from reading the rest of the standard do we discover that an intended major benefit is that it helps to make implementations provable (that is it makes possible a mathematical proof of correctness). However, this in itself would be of little interest as a benefit to users. Rather, we have to assume the implicit benefit to be implemented code which is more reliable. 6 Summary and Conclusions For safety-critical standards to be usable we expect the individual requirements to be clear to:
Unfortunately, many requirements in the relevant standards are not clear in either of these respects (although IEC 1508 shows a significant improvement on its previous incarnation IEC SC65A in many of the specific respects identified here). We have shown how to interpret unclear requirements in both respects, but with special emphasis on the assessors' needs. There is a significant different between:
While (a) is generally very difficult, we have shown how to achieve (b) in a rigorous manner. The vast majority of all requirements in existing safety-critical systems standards are unnecessarily unclear. Our approach to interpreting such requirements begins by teasing out the relevant process, product or resource that is the primary focus. In many cases this means breaking down the requirement into a number of parts. This technique alone can often achieve the required level of clarity. We provided numerous examples drawn from IEC 1508 on how to do this. When the requirements in safety critical standards are classified according to products, processes and resources, we found a dearth of external product requirements (in stark comparison with safety-related standards in traditional engineering disciplines). The emphasis was on process and resource requirements with a smaller number of internal product requirements. This balance seems inappropriate for standards whose primary objectives are to deliver products with specific external attributes, namely safety and reliability. Finally, we discussed the need for assessing the effectiveness of standards. The sheer size of existing standards makes them too large to assess as coherent objects. Thus we used the notion of mini-standards, whereby we identify coherent subsets of requirements all relating to the same specific process, product, or resource. The identification of mini-standards helps us not only in assessment but also in rationalising and interpreting standards. We proposed a decomposition of IEC 1508 into mini-standards. We have presented some simple practical advice on how to improve safety-related standards. Unfortunately, the standards-making process is long and tortuous. In many cases this process itself contributes to some of the problems highlighted earlier. Perhaps it is time that the software industry paid for the development of good, timely standards rather than continued to rely on the contributions of individuals who volunteer their effort to standards making bodies. While such contributions are, more often than not, heroic and unsung, they are nevertheless entirely ad-hoc. A such we deserve nothing better than the ad-hoc standards we have at present. 7 Acknowledgements The contents of this report have been influenced by material from the SMARTIE project (funded by EPSRC and DTI) in which the author was involved, and also by an earlier assessment of IEC SC65A that the author performed as part of the ESPRIT project CASCADE project (funded by the CEC). The new work carried out here was partly funded by the ESPRIT projects SERENE and DEVA. The author is indebted to Colum Devine, Miloudi El Koursi, Simon Hughes, Heinrich Krebs, Bev Littlewood, Martin Neil, Swapan Mitra, Stella Page, Shari Lawrence Pfleeger, Linda Shackleton, Roger Shaw and Jenny Thornton for comments that have influenced this work. 8 References Albrecht A.J, Measuring Application Development, Proceedings of IBM Applications Development joint SHARE/GUIDE symposium. Monterey CA, pp 83-92, 1979. British Standards Institute, Specification for Safety Requirements for Pushchairs, British Standards Institute BS 4792, 1984. Fenton NE and Pfleeger SL, Software Metrics: A Rigorous and Practical Approach (2nd Edition), International Thomson Computer Press, 1996. Fenton NE, Littlewood B, and Page S, Evaluating software engineering standards and methods, in Software Engineering: A European Perspective (Ed: Thayer R, McGettrick AD), IEEE Computer Society Press, pp 463--470, 1993. Fenton NE, Pfleeger SL, Glass R, Science and Substance: A Challenge to Software Engineers, IEEE Software, 11(4), 86-95, July, 1994. IEC (International Electrotechnical Commission), Software for computers in the application of industrial safety related systems, IEC 65A, 1992. IEC (International Electrotechnical Commission), Functional safety of electrical/electronic/programmable systems: generic aspects, IEC 1508, 1995. Ministry of Defence Directorate of Standardization, Interim Defence Standard 00-55: The procurement of safety critical software in defence equipment; Parts 1-2, Kentigern House 65 Brown Street Glasgow, G2 8EX, UK, 1991. Pfleeger SL, Fenton NE, Page P, Evaluating software engineering standards, IEEE Computer, 27(9), 71-79, Sept, 1994. |