* [cip-dev] [SystemSafety] Critical systems Linux [not found] <037a01d480f8$1f486570$5dd93050$@phaedsys.com> @ 2018-11-20 18:45 ` Paul Sherwood 2018-11-20 18:58 ` Paul Sherwood 0 siblings, 1 reply; 6+ messages in thread From: Paul Sherwood @ 2018-11-20 18:45 UTC (permalink / raw) To: cip-dev On 2018-11-20 17:40, Chris Hills wrote: > A subversion of the thread to answer one of the points raised by Paul > and > almost every Linux aficionado > >> -----Original Message----- >> bielefeld.de] On Behalf Of Paul Sherwood >> Sent: Sunday, November 4, 2018 8:54 PM > >> One anti-pattern I've grown a bit tired of is people choosing a > micro-kernel instead of Linux, because of the notional 'safety cert', >> and then having to implement tons of custom software in attempting to > match off-the-shelf Linux functionality or performance. When > application >> of the standards leads to "develop new, from scratch" instead of using > existing code which is widely used and known to be reliable, something >> is clearly weird imo. > > The question is:- > > As Linux is monolithic, already written (with minimal > requirements/design > docs) and not to any coding standard > How would the world go about making a Certifiable Linux? > > Is it possible? > > > And the question I asked: why do it at all when there are plenty of > other > POSIX Compliant RTOS and OS out there that have full Safety > Certification to > 61508 SIL3 and Do178 etc.? While systemsafety may be the leading community for public discussion around systems (and software) safety, it is not the only ML that has an interest in this topic so I'm cross-posting to some other (including Linux) lists in the hope that we may see wider discussion and contribution. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [cip-dev] [SystemSafety] Critical systems Linux 2018-11-20 18:45 ` [cip-dev] [SystemSafety] Critical systems Linux Paul Sherwood @ 2018-11-20 18:58 ` Paul Sherwood 2018-11-22 9:24 ` Paul Sherwood 0 siblings, 1 reply; 6+ messages in thread From: Paul Sherwood @ 2018-11-20 18:58 UTC (permalink / raw) To: cip-dev Now to attempt to answer the question... On 2018-11-20 18:45, Paul Sherwood wrote: >> The question is:- >> >> As Linux is monolithic, already written (with minimal >> requirements/design >> docs) and not to any coding standard >> How would the world go about making a Certifiable Linux? >> Is it possible? >> Some initiatives have already started down this road, for example SIL2LINUXMP (in cc) But my personal perspective is 1) it may be the the certifications themselves are inappropriate. It's far from clear to me that the current standards are fit for purpose. 2) there are many cases of folks retrofitting documentation to support compliance with standards, so perhaps that would be a feasible thing to attempt (although there is far too much code in the Linux kernel and associated FOSS tooling and userland components to make this something which could be achieved in a short time) 3) if we could establish justifiable concrete improvements to make in Linux (and the tools, and the userland), we could hope to persuade the upstreams to make them, or accept our patches. 4) we could construct new software to meet the ABI commitments of Linux (and other components) while adhering to some specific standards and/or processes, but I'm unconvinced this could be achieved in a time/cost-effective way. >> And the question I asked: why do it at all when there are plenty of >> other >> POSIX Compliant RTOS and OS out there that have full Safety >> Certification to >> 61508 SIL3 and Do178 etc.? My understanding is that existing certified RTOS/OS tend to be microkernels with limited functionality, limited hardware support, and performance limitations for some usecases. I'd be happy to be wrong, and no-doubt advocates of some of those technologies can explain the reality by return. br Paul ^ permalink raw reply [flat|nested] 6+ messages in thread
* [cip-dev] [SystemSafety] Critical systems Linux 2018-11-20 18:58 ` Paul Sherwood @ 2018-11-22 9:24 ` Paul Sherwood 2018-11-22 11:57 ` [cip-dev] [C-safe-secure-studygroup] " Clive Pygott 0 siblings, 1 reply; 6+ messages in thread From: Paul Sherwood @ 2018-11-22 9:24 UTC (permalink / raw) To: cip-dev Hi again... >>> The question is:- >>> >>> As Linux is monolithic, already written (with minimal >>> requirements/design >>> docs) and not to any coding standard >>> How would the world go about making a Certifiable Linux? > >>> Is it possible? Sadly most of the followon discussion seems to have stayed only on systemsafetylist.org [1] which rather reduces its impact IMO. I cross-posted in the hope that knowledge from the safety community could be usefully shared with other communities who are (for better or worse) considering and in some cases already using Linux in safety-critical systems. For example Linux Foundation is actively soliciting contributors expressly for an initiative to establish how best to support safety scenarios, as discussed at ELCE [2] with contributors from OSADL (e.g. [3]) and others. Perhaps I'm being stupid but it's still unclear to me, after the discussion about existing certificates, whether the 'pre-certification' approach is justifiable at all, for **any** software, not just Linux. As I understand it, for any particular project/system/service we need to define safety requirements, and safety architecture. From that we need to establish constraints and required properties and behaviours of chosen architecture components (including OS components). On that basis it seems to me that we must always prepare a specific argument for an actual system, and cannot safely claim that any generic pre-certification fits our use-case? Please could someone from systemsafetylist.org reply-all and spell it out, preferably without referring to standards and without triggering a lot of controversy? br Paul [1] http://systemsafetylist.org/4310.htm [2] https://www.osadl.org/Linux-in-Safety-Critical-Systems-Summit.lfsummit-elce-safety.0.html [3] https://events.linuxfoundation.org/wp-content/uploads/2017/12/Collaborate-on-Linux-for-Use-in-Safety-Critical-Systems-Lukas-Bulwahn-BMW-Car-IT-GmbH-1.pdf ^ permalink raw reply [flat|nested] 6+ messages in thread
* [cip-dev] [C-safe-secure-studygroup] [SystemSafety] Critical systems Linux 2018-11-22 9:24 ` Paul Sherwood @ 2018-11-22 11:57 ` Clive Pygott 2018-11-22 13:19 ` Paul Sherwood 0 siblings, 1 reply; 6+ messages in thread From: Clive Pygott @ 2018-11-22 11:57 UTC (permalink / raw) To: cip-dev Hi Paul I'll have a go at your question - FYI my background is system safety management (as in 61508 & DO178) and coding standards (MISRA & JSF++) You are right that ultimately system safety is a *system *property. You cannot talk about software doing harm without knowing what its controlling and how it fits into its physical environment. However, a standard like 61508 takes a layered approach to safety. The topmost levels are system specific: how could the system behave (intentionally or under fault conditions) to cause harm? and what features of the architecture (including software requirements) mitigate these risks? This establishes traceability from software requirements to safety. >From the software perspective, under this is the requirement to show that those software requirements related to safety have been implemented correctly, and as usual this has two components: - showing that the code implements the requirements (verification - we've built the right thing) - showing the code is well behaved under all circumstances (validation - we've built the thing right) If you are doing full formal semantic verification, the second step is unnecessary, as the semantic proof will consider all possible combinations of input and state. However, in practice formal proof is so onerous that its almost never done. This means that verification is based on testing, which no matter how thorough, is still based on sampling. There is an implied belief that the digital system will behave continuously, even though its known that this isn't true (a good few years ago an early home computer had an implementation of BASIC that had an integer ABS functions that worked perfectly except for ABS(-32768) which gave -32768 and it wasn't because it was limited to 16-bits, ABS(-32769) gave 32769 etc). The validation part aims to improve the (albeit flawed) belief in contiguous behaviour by: - checking that any constraints imposed by the language are respected - any deviations from arithmetic logic are identified (i.e. flagging where underflow, overflow, truncation, wraparound or loss of precision may occur) This is the domain of MISRA and JSF++ checking that the code will behave sensibly, without knowledge of what it should be doing. To get back to the original discussion, it is staggeringly naive to claim that 'I have a safe system, because I've used a certified OS kernel'. I'm sure you weren't suggesting that, but I have seen companies try it. What the certified kernel (or any other architectural component) buys you is that someone has done the verification and validation activities on that component, so you can be reasonably confident that that component will behave as advertised - its a level of detail your project doesn't have to look into (though you may want to audit the quality of the certification evidence). As I read your original message you are asking 'why can't a wide user base be accepted as evidence of correctness?' The short answer is, do you have any evidence of what features of the component the users are using and in what combination? Is my project about to use some combination of features in an inventive manner that no-one has previously tried, so the wide user base provides no evidence that it will work (again a good few years ago, colleagues of mine were writing a compiler for a VAX and traced a bug to a particular instruction in the VAX instruction set that had an error in its implementation. No DEC product or other customer had ever used this instruction. BTW, DEC's solution was to remove it from the instruction set) Hope this helps Clive LDRA Inc. On Thu, Nov 22, 2018 at 9:24 AM Paul Sherwood <paul.sherwood@codethink.co.uk> wrote: > Hi again... > >>> The question is:- > >>> > >>> As Linux is monolithic, already written (with minimal > >>> requirements/design > >>> docs) and not to any coding standard > >>> How would the world go about making a Certifiable Linux? > > > >>> Is it possible? > > Sadly most of the followon discussion seems to have stayed only on > systemsafetylist.org [1] which rather reduces its impact IMO. > > I cross-posted in the hope that knowledge from the safety community > could be usefully shared with other communities who are (for better or > worse) considering and in some cases already using Linux in > safety-critical systems. For example Linux Foundation is actively > soliciting contributors expressly for an initiative to establish how > best to support safety scenarios, as discussed at ELCE [2] with > contributors from OSADL (e.g. [3]) and others. > > Perhaps I'm being stupid but it's still unclear to me, after the > discussion about existing certificates, whether the 'pre-certification' > approach is justifiable at all, for **any** software, not just Linux. > > As I understand it, for any particular project/system/service we need to > define safety requirements, and safety architecture. From that we need > to establish constraints and required properties and behaviours of > chosen architecture components (including OS components). On that basis > it seems to me that we must always prepare a specific argument for an > actual system, and cannot safely claim that any generic > pre-certification fits our use-case? > > Please could someone from systemsafetylist.org reply-all and spell it > out, preferably without referring to standards and without triggering a > lot of controversy? > > br > Paul > > [1] http://systemsafetylist.org/4310.htm > [2] > > https://www.osadl.org/Linux-in-Safety-Critical-Systems-Summit.lfsummit-elce-safety.0.html > [3] > > https://events.linuxfoundation.org/wp-content/uploads/2017/12/Collaborate-on-Linux-for-Use-in-Safety-Critical-Systems-Lukas-Bulwahn-BMW-Car-IT-GmbH-1.pdf > > > _______________________________________________ > C-safe-secure-studygroup mailing list > C-safe-secure-studygroup at lists.trustable.io > > https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-secure-studygroup > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.cip-project.org/pipermail/cip-dev/attachments/20181122/95bcb11d/attachment-0001.html> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [cip-dev] [C-safe-secure-studygroup] [SystemSafety] Critical systems Linux 2018-11-22 11:57 ` [cip-dev] [C-safe-secure-studygroup] " Clive Pygott @ 2018-11-22 13:19 ` Paul Sherwood 2018-11-22 17:43 ` [cip-dev] [Safety-linux-formation] " Nicholas Mc Guire 0 siblings, 1 reply; 6+ messages in thread From: Paul Sherwood @ 2018-11-22 13:19 UTC (permalink / raw) To: cip-dev Hi Clive, this is very helpful, thank you. I'm going to re-add the other lists, for the same reason as before, and hope you're ok with that. Please see my comments inline below... On 2018-11-22 10:26, Clive Pygott wrote: > I'll have a go at your question - FYI my background is system safety > management (as in 61508 & DO178) and coding standards (MISRA & JSF++) > > You are right that ultimately system safety is a _system _property. > You cannot talk about software doing harm without knowing what its > controlling and how it fits into its physical environment. Understood, and I'd be surprised if anyone would challenge this reasoning. > However, a standard like 61508 takes a layered approach to safety. I'm not sure I understand "layered approach", since I've heard it mentioned in multiple situations outside safety (security for one, and general architecture/abstraction for another). Are you saying that the aim is redundant/overlapping safety methods, to avoid single-point-of-failure, or something else? > The topmost > levels are system specific: how could the system behave (intentionally > or under fault conditions) to cause harm? and what features of the > architecture (including software requirements) mitigate these risks? > This establishes traceability from software requirements to safety. OK, understood. In previous discussions I've been attempting to understand whether there's any fundamental reasons that such requirements would need to exist before the software, or whether they could be originated for a specific system, and then considered/applied to pre-existing code. Is there a hard and fast argument one way or the other? > From the software perspective, under this is the requirement to show > that those software requirements related to safety have been > implemented correctly, and as usual this has two components: > > * showing that the code implements the requirements (verification - > we've built the right thing) OK, makes sense. > * showing the code is well behaved under all circumstances > (validation - we've built the thing right) Here I fall off the horse. I don't believe we can be 100% certain about "all circumstances", except for small/constrained/simple systems. So I distrust claims of certainty about the behaviour of modern COTS multicore microprocessors, for example. > If you are doing full formal semantic verification, the second step is > unnecessary, as the semantic proof will consider all possible > combinations of input and state. It's not fair to single out any individual project/system/community, but as an example [1] SEL4's formal proof of correctness proved to be insufficient in the context of spectre/meltdown. I'd be (pleasantly) surprised if any semantic proof can withstand misbehaviour of the hardware/firmware/OS/tooling underneath it. > However, in practice formal proof is > so onerous that its almost never done. This means that verification is > based on testing, which no matter how thorough, is still based on > sampling. There is an implied belief that the digital system will > behave continuously, even though its known that this isn't true (a > good few years ago an early home computer had an implementation of > BASIC that had an integer ABS functions that worked perfectly except > for ABS(-32768) which gave -32768 and it wasn't because it was > limited to 16-bits, ABS(-32769) gave 32769 etc). Understood, and agreed. > The validation part aims to improve the (albeit flawed) belief in > contiguous behaviour by: > > * checking that any constraints imposed by the language are respected > * any deviations from arithmetic logic are identified (i.e. flagging > where underflow, overflow, truncation, wraparound or loss of precision > may occur) > > This is the domain of MISRA and JSF++ checking that the code will > behave sensibly, without knowledge of what it should be doing. OK, IIUC this is mainly - 'coding standards lead to tests we can run'. And once we are into tests, we have to consider applicability, correctness and completeness of the tests, in addition to the "flawed belief in contiguous behaviour". - and possibly some untestable guidance/principles which may or may not be relevant/correct > To get back to the original discussion, it is staggeringly naive to > claim that 'I have a safe system, because I've used a certified OS > kernel'. I'm sure you weren't suggesting that, but I have seen > companies try it. I've also seen that (in part that's why I'm here) but I certainly wouldn't dream of suggesting it. > What the certified kernel (or any other > architectural component) buys you is that someone has done the > verification and validation activities on that component, so you can > be reasonably confident that that component will behave as advertised > - its a level of detail your project doesn't have to look into (though > you may want to audit the quality of the certification evidence). OK in principle. However from some of the discussion, which I won't rehash here it seemed to me that some of the safety folks were expressly not confident in some of the certified/advertised/claimed behaviours. > As I read your original message you are asking 'why can't a wide user > base be accepted as evidence of correctness?' If that's the question I seemed to be asking I apologise; certainly I wouldn't count a wide user base as evidence of correctness. It's evidence of something, though, and that evidence may be part of what could be assessed when considering the usefulness of software. > The short answer is, do > you have any evidence of what features of the component the users are > using and in what combination? I totally agree - which brings us back to the need for required/architected behaviours/properties. > Is my project about to use some > combination of features in an inventive manner that no-one has > previously tried, so the wide user base provides no evidence that it > will work (again a good few years ago, colleagues of mine were > writing a compiler for a VAX and traced a bug to a particular > instruction in the VAX instruction set that had an error in its > implementation. No DEC product or other customer had ever used this > instruction. BTW, DEC's solution was to remove it from the > instruction set) Makes sense. Thanks again Clive! br Paul [1] https://research.csiro.au/tsblog/crisis-security-vs-performance/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* [cip-dev] [Safety-linux-formation] [C-safe-secure-studygroup] [SystemSafety] Critical systems Linux 2018-11-22 13:19 ` Paul Sherwood @ 2018-11-22 17:43 ` Nicholas Mc Guire 0 siblings, 0 replies; 6+ messages in thread From: Nicholas Mc Guire @ 2018-11-22 17:43 UTC (permalink / raw) To: cip-dev On Thu, Nov 22, 2018 at 01:19:44PM +0000, Paul Sherwood wrote: > Hi Clive, > this is very helpful, thank you. I'm going to re-add the other lists, for > the same reason as before, and hope you're ok with that. Please see my > comments inline below... > > On 2018-11-22 10:26, Clive Pygott wrote: > >I'll have a go at your question - FYI my background is system safety > >management (as in 61508 & DO178) and coding standards (MISRA & JSF++) > > > >You are right that ultimately system safety is a _system _property. > >You cannot talk about software doing harm without knowing what its > >controlling and how it fits into its physical environment. > > Understood, and I'd be surprised if anyone would challenge this reasoning. > > >However, a standard like 61508 takes a layered approach to safety. > > I'm not sure I understand "layered approach", since I've heard it mentioned > in multiple situations outside safety (security for one, and general > architecture/abstraction for another). > > Are you saying that the aim is redundant/overlapping safety methods, to > avoid single-point-of-failure, or something else? 61508 starts out with the top layer wich is actually technology agnostic - simply put if we do not understand the system then it can?t be adequately safe - so part 1 does not talk about HW/SW at all but about context/scope/hazard-analysis/mitigation-allocation... independent of technological issues. Part 2 then looks at the technical system (and not just HW) with respect to systematic and random deviations from specifications as derived by applying part 1 part 3 then looks at the specifics of software. So the layering of 61508 is a very abstract process layering to ensure that the poetntial high-level faults - non-understanding expressed in requirements and design faults - we do not kill that many people wiht dereferenced NULL pointers - atleast not repetedly if the high-level processes work are addressed at all levels. see e.g. HSE HSG238 In addition there may be technical "layering" as in layers of protection and architectural measures - but thats already at the implementation level. > > >The topmost > >levels are system specific: how could the system behave (intentionally > >or under fault conditions) to cause harm? and what features of the > >architecture (including software requirements) mitigate these risks? > >This establishes traceability from software requirements to safety. > > OK, understood. In previous discussions I've been attempting to understand > whether there's any fundamental reasons that such requirements would need to > exist before the software, or whether they could be originated for a > specific system, and then considered/applied to pre-existing code. Is there > a hard and fast argument one way or the other? The simplest argument is that the goal of any safety process is that the safety functional requirements are implemented in the software elements - the outlined process (route 1_S) is one way seen suitable to achieve the objectives of the safety standards (61508 and derivatives - DO 178 is a bit different because the context of ARP 4751A is well defined - so they can put very specific needs into DO 178/254 while 61508 is a generic standard and can?t do that. Essentially the goal of achieving the objectives is not dependent on the process by which the implementation is achieved - verification of the achieving of the objectives *may* though depend on the means by which the imlementation wsa achieved (but then again is quite independent of the question of "intent for use in safety related systems" or not). > > >From the software perspective, under this is the requirement to show > >that those software requirements related to safety have been > >implemented correctly, and as usual this has two components: > > > > * showing that the code implements the requirements (verification - > >we've built the right thing) > > OK, makes sense. > > > * showing the code is well behaved under all circumstances > >(validation - we've built the thing right) DO 178B (respectively DO 248) - but that misses the essential point of * showing that the assurance data (often process data) on which we based any such claim is adequate and this is the thing that is changing because the two highlevel requirements you give are fully adequate for deterministic and relatively simple system (type-A systems in 61508-2 Ed 2 7.4.4.1.2) but not for type B system because we generally can?t demonstrate correctness nor compleness in any meaningful sense - with other words we increasingly simply do not know what the "right thing" is and as soon as non-determinism comes into play the "built the thing right" becomes a probability as well and needs to be assessed as such (e.g. a pLRUt cache replacement in many current CPU does not allow to claim that it is built right other than probabilistically). > > Here I fall off the horse. I don't believe we can be 100% certain about "all > circumstances", except for small/constrained/simple systems. So I distrust > claims of certainty about the behaviour of modern COTS multicore > microprocessors, for example. ..we fell off that horse about 20 years ago but many did not notice ;) The point is to accept what has been stated many time alreday that safety is not a 100% property anyway - as long as system were simple we could entertain the illusion of completeness of testing (an absurd assertion since the mid 1990s for many systems) and we have not yet fully developed the necessary understanding and tools to actually handle complex systems. Also note that this idea of correctness is bound to strongly to technical realisation which puts the focus on mitigation of faults rather than the elimination of faults at the requirements and design level - and that is really why we are so lost with current safety standards when it comes to complex systems because we emediately jump to mitigation rather than harvesting the potential for elimination first - with other words the problem is systems engineering not software engineering. > > >If you are doing full formal semantic verification, the second step is > >unnecessary, as the semantic proof will consider all possible > >combinations of input and state. ...and who ever had a fault free initial specification to start with for her formal specificaiton that then was shown to be implemented ? the idea that "everything in the system matches the requirments" and "every requirement is built into the system" - kind of the corrolary to your two components above - does not address the key issue in fucntional safety and that is that our reqiurements are wrong because we do not fully understand the system and its environment (except for the most trivial of systems). > > It's not fair to single out any individual project/system/community, but as > an example [1] SEL4's formal proof of correctness proved to be insufficient > in the context of spectre/meltdown. I'd be (pleasantly) surprised if any > semantic proof can withstand misbehaviour of the > hardware/firmware/OS/tooling underneath it. ...and the misunderstanding of the systems intent by those writing the formal specification that then is proven - it is interesting to note that 61508 Ed 2 (Table A.1) ranks formal requiremets specification lower than semi-formal requirements specification and in Table C.1 it is clarified why - reduced understandability ! > > >However, in practice formal proof is > >so onerous that its almost never done. This means that verification is > >based on testing, which no matter how thorough, is still based on > >sampling. There is an implied belief that the digital system will > >behave continuously, even though its known that this isn't true (a > >good few years ago an early home computer had an implementation of > >BASIC that had an integer ABS functions that worked perfectly except > >for ABS(-32768) which gave -32768 and it wasn't because it was > >limited to 16-bits, ABS(-32769) gave 32769 etc). it is not based on testing - no sane safety standard would suggest to achieve verification by testing - it is always analysis and testing and if it is reduced to testing only then it will for sure produce a warm cosy fealing after execution of 100k test-csae... which covered 10E-20% of the systems state-space. The problem of testing is that in the heads of many we sitll have the idea that an aggregation of highly-reliable components forms a highly reliable system - which is wrong in it self but becomes a real hazard as soon as the ability to inspec comonents is so much easier that we focus on components because we can believe that we understand then in isolation and then simply drop the main cause which is interaction (which is in genreal not covered by testing - not even integratoin testing - maybe to a limited level by field trials) > > Understood, and agreed. > > >The validation part aims to improve the (albeit flawed) belief in > >contiguous behaviour by: > > > > * checking that any constraints imposed by the language are respected > > * any deviations from arithmetic logic are identified (i.e. flagging > >where underflow, overflow, truncation, wraparound or loss of precision > >may occur) > > > >This is the domain of MISRA and JSF++ checking that the code will > >behave sensibly, without knowledge of what it should be doing. > Does anyone have hard evidence that shows that there is *any* significant correlation between MISRA C coding rules and bug rates ? This is one of the cases where we focus on formality because we can even though we have little (or no) evidence that these rules or metrics have any effect (aside from them being used in a way that they were never intended for anyway) ? as a corrolary think about your personal driving experience - how many# situations were you in where you got out by violating a rule ? the assumption that following context agnostic rules leads to safety properties of system is truely absurd. > OK, IIUC this is mainly > > - 'coding standards lead to tests we can run'. And once we are into tests, > we have to consider applicability, correctness and completeness of the > tests, in addition to the "flawed belief in contiguous behaviour". > > - and possibly some untestable guidance/principles which may or may not be > relevant/correct If you have no specific context how can you assert more than correctness against context free reqiuremnts which them selve have no assurance of correctness or comleteness in the context of any specific system - focussing on what we can because we know that we can?t handle the level that actually is relevant is a form of deliberate ignorance. Coding standards (and this is the intent of the Linux kernel coding standard) lead to *readability* which is the maybe only relevant defence against correct implementation of the wrong function (or as you state above not "building the right system"). That is the expressed intent of the Linux kernel coding standard and readability respectively understandability of code (and fault behavior) is the key to actually being able to detect when the correctly implemented code is the wrong solution for a particular context - the requirements don?t do as they are an abstraction and as such they focus on the intended behavior not on the side effects or unintended interactions - thus matching only requirements of perceived generic elements will necessarily lead to missing the specific intent for any system in the systems specific corner cases. > > >To get back to the original discussion, it is staggeringly naive to > >claim that 'I have a safe system, because I've used a certified OS > >kernel'. I'm sure you weren't suggesting that, but I have seen > >companies try it. > > I've also seen that (in part that's why I'm here) but I certainly wouldn't > dream of suggesting it. > > >What the certified kernel (or any other > >architectural component) buys you is that someone has done the > >verification and validation activities on that component, so you can > >be reasonably confident that that component will behave as advertised No - thats precisely what only is true for very simple components - but never holds for complex components and any OS is a type-B system a) the failure mode of at least one constituent component is not well defined; or b) the behaviour of the element under fault conditions cannot be completely determined; or c) there is insufficient dependable failure data to support claims for rates of failure for detected and undetected dangerous failures (see 7.4.9.3 to 7.4.9.5). [IEC 61508-2 Ed 2 7.4.4.1.3] Pre-certified OS (or complex libraries) buy you the illusion that you took care of safety by giving someone else enough money - thats it. > >- its a level of detail your project doesn't have to look into (though > >you may want to audit the quality of the certification evidence). > > OK in principle. However from some of the discussion, which I won't rehash > here it seemed to me that some of the safety folks were expressly not > confident in some of the certified/advertised/claimed behaviours. good to hear that - they should not be - because it depends entirely on the specific context - the higher the complexity of a system the more we depend on looking at the right corners of the system to undrestand where they can go wrong - focussing on generic properties (unspecific behaviors and their correctness asserted against a more or less random model) gives you very little. The higher the complexity of a system the more the abiltiy to analyze the systems specific behavior in context of env/Use-case will determine the systems safety properties. Even *if* testing could achieve the initial goal of correctness the inability to analyze the system would impair any effort to understand and thus learn from incidences. > > >As I read your original message you are asking 'why can't a wide user > >base be accepted as evidence of correctness?' > > If that's the question I seemed to be asking I apologise; certainly I > wouldn't count a wide user base as evidence of correctness. It's evidence of > something, though, and that evidence may be part of what could be assessed > when considering the usefulness of software. prior usage may well be one building block in a chain of assessment of a pre-existing element but I would claim primarily in the sense of selecting the lowest risk elements - it will not save you any effort in assessing the objectives of functional safety - but careful selectoin based on prior usage will increas the likelyhood that the assessment will actually conclude positively. To the specifics of 61508 Ed 2 route 3_S (assessment of non-compliant development) the relevance of a large user base is also the ability to actually harvest process level data that can allow to assess the effectiveness of different measures e.g. it is trivial to state that a pre-existing element wsa reviewed but without any data on finding, peoples competency, level of deviations later found during operations, etc. we can not actually use "review was done" as an argument in the assessment of a non-compliant development - and in this sense user base is as you say "evidence of something" the trick is to find sound procedures how to extract the relevant information in that something so as to be able to make a statement on the process that created the element. So as soon as you shift the focus from the implementation details to the process that created these implementation details then the user base becomes the key "data set" that allows to actually build an argument - at least that is the assumption behind the SIL2LinuxMP project. > > >The short answer is, do > >you have any evidence of what features of the component the users are > >using and in what combination? > > I totally agree - which brings us back to the need for required/architected > behaviours/properties. > with an important change - the use of pre-existing elements always implies that you are building functionality into the system that does NOT match you needs exactly and the mitigation again only lies in the ability to analyze the system to the point where the system can ither be adjusted to the specifics of the element (by updating requirements and design) or by handling the discrepency at runtime (e.g. wrappers) in a complex system it is highly unlikely that the requirements anyone put on a complex element like an OS is in exact alignment with any particular system - not even POSIX 1003.13 PSE 51 matchies any real system 100%. > >Is my project about to use some > >combination of features in an inventive manner that no-one has > >previously tried, so the wide user base provides no evidence that it > >will work (again a good few years ago, colleagues of mine were > >writing a compiler for a VAX and traced a bug to a particular > >instruction in the VAX instruction set that had an error in its > >implementation. No DEC product or other customer had ever used this > >instruction. BTW, DEC's solution was to remove it from the > >instruction set) > > Makes sense. Thanks again Clive! > Thats the prime felacy I see in the whole pre-existing SW discussion that the focus on fuctionality - the argument for using the common setup is that the process initially was generating this common setup and the measures and techniques to achieve the specfied behavior where IN CONTEXT of the common use-case no mater if explicidly stated or implied - diverting from the common use-case potentially invalidates the results of these measures and techniques. So the requirement to be allowed to draw on any process level claims of the pre-existing element is to operate it in as close a context to the original intent as possible - using common configurations is one aprt of this. thx! hofrat ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-11-22 17:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <037a01d480f8$1f486570$5dd93050$@phaedsys.com>
2018-11-20 18:45 ` [cip-dev] [SystemSafety] Critical systems Linux Paul Sherwood
2018-11-20 18:58 ` Paul Sherwood
2018-11-22 9:24 ` Paul Sherwood
2018-11-22 11:57 ` [cip-dev] [C-safe-secure-studygroup] " Clive Pygott
2018-11-22 13:19 ` Paul Sherwood
2018-11-22 17:43 ` [cip-dev] [Safety-linux-formation] " Nicholas Mc Guire
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox