[cip-dev] [SystemSafety] Critical systems Linux

public inbox for cip-dev@lists.cip-project.org
 help / color / mirror / Atom feed

* [cip-dev] [SystemSafety] Critical systems Linux
       [not found] <037a01d480f8$1f486570$5dd93050$@phaedsys.com>
@ 2018-11-20 18:45 ` Paul Sherwood
  2018-11-20 18:58   ` Paul Sherwood
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Sherwood @ 2018-11-20 18:45 UTC (permalink / raw)
  To: cip-dev

On 2018-11-20 17:40, Chris Hills wrote:
> A subversion of the thread to answer one of the points raised by Paul 
> and
> almost every Linux aficionado
> 
>> -----Original Message-----
>> bielefeld.de] On Behalf Of Paul Sherwood
>> Sent: Sunday, November 4, 2018 8:54 PM
> 
>> One anti-pattern I've grown a bit tired of is people choosing a
> micro-kernel instead of Linux, because of the notional 'safety cert',
>> and then having to implement tons of custom software in attempting to
> match off-the-shelf Linux functionality or performance. When 
> application
>> of the standards leads to "develop new, from scratch" instead of using
> existing code which is widely used and known to be reliable, something
>> is clearly weird imo.
> 
> The question is:-
> 
> As Linux is monolithic, already written  (with minimal 
> requirements/design
> docs) and not to any coding standard
> How would the world go about making a Certifiable Linux?
> 
> Is it possible?
> 
> 
> And the question I asked: why do it at all when there are plenty of 
> other
> POSIX Compliant RTOS and OS out there that have full Safety 
> Certification to
> 61508 SIL3 and  Do178  etc.?

While systemsafety may be the leading community for public discussion 
around systems (and software) safety, it is not the only ML that has an 
interest in this topic so I'm cross-posting to some other (including 
Linux) lists in the hope that we may see wider discussion and 
contribution.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [cip-dev] [SystemSafety] Critical systems Linux
  2018-11-20 18:45 ` [cip-dev] [SystemSafety] Critical systems Linux Paul Sherwood
@ 2018-11-20 18:58   ` Paul Sherwood
  2018-11-22  9:24     ` Paul Sherwood
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Sherwood @ 2018-11-20 18:58 UTC (permalink / raw)
  To: cip-dev

Now to attempt to answer the question...

On 2018-11-20 18:45, Paul Sherwood wrote:
>> The question is:-
>> 
>> As Linux is monolithic, already written  (with minimal 
>> requirements/design
>> docs) and not to any coding standard
>> How would the world go about making a Certifiable Linux?

>> Is it possible?
>> 

Some initiatives have already started down this road, for example 
SIL2LINUXMP (in cc)

But my personal perspective is

1) it may be the the certifications themselves are inappropriate. It's 
far from clear to me that the current standards are fit for purpose.

2) there are many cases of folks retrofitting documentation to support 
compliance with standards, so perhaps that would be a feasible thing to 
attempt (although there is far too much code in the Linux kernel and 
associated FOSS tooling and userland components to make this something 
which could be achieved in a short time)

3) if we could establish justifiable concrete improvements to make in 
Linux (and the tools, and the userland), we could hope to persuade the 
upstreams to make them, or accept our patches.

4) we could construct new software to meet the ABI commitments of Linux 
(and other components) while adhering to some specific standards and/or 
processes, but I'm unconvinced this could be achieved in a 
time/cost-effective way.

>> And the question I asked: why do it at all when there are plenty of 
>> other
>> POSIX Compliant RTOS and OS out there that have full Safety 
>> Certification to
>> 61508 SIL3 and  Do178  etc.?

My understanding is that existing certified RTOS/OS tend to be 
microkernels with limited functionality, limited hardware support, and 
performance limitations for some usecases. I'd be happy to be wrong, and 
no-doubt advocates of some of those technologies can explain the reality 
by return.

br
Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [cip-dev] [SystemSafety]   Critical systems Linux
  2018-11-20 18:58   ` Paul Sherwood
@ 2018-11-22  9:24     ` Paul Sherwood
  2018-11-22 11:57       ` [cip-dev] [C-safe-secure-studygroup] " Clive Pygott
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Sherwood @ 2018-11-22  9:24 UTC (permalink / raw)
  To: cip-dev

Hi again...
>>> The question is:-
>>> 
>>> As Linux is monolithic, already written  (with minimal 
>>> requirements/design
>>> docs) and not to any coding standard
>>> How would the world go about making a Certifiable Linux?
> 
>>> Is it possible?

Sadly most of the followon discussion seems to have stayed only on 
systemsafetylist.org [1] which rather reduces its impact IMO.

I cross-posted in the hope that knowledge from the safety community 
could be usefully shared with other communities who are (for better or 
worse) considering and in some cases already using Linux in 
safety-critical systems. For example Linux Foundation is actively 
soliciting contributors expressly for an initiative to establish how 
best to support safety scenarios, as discussed at ELCE [2] with 
contributors from OSADL (e.g. [3]) and others.

Perhaps I'm being stupid but it's still unclear to me, after the 
discussion about existing certificates, whether the 'pre-certification' 
approach is justifiable at all, for **any** software, not just Linux.

As I understand it, for any particular project/system/service we need to 
define safety requirements, and safety architecture. From that we need 
to establish constraints and required properties and behaviours of 
chosen architecture components (including OS components). On that basis 
it seems to me that we must always prepare a specific argument for an 
actual system, and cannot safely claim that any generic 
pre-certification fits our use-case?

Please could someone from systemsafetylist.org reply-all and spell it 
out, preferably without referring to standards and without triggering a 
lot of controversy?

br
Paul

[1] http://systemsafetylist.org/4310.htm
[2] 
https://www.osadl.org/Linux-in-Safety-Critical-Systems-Summit.lfsummit-elce-safety.0.html
[3] 
https://events.linuxfoundation.org/wp-content/uploads/2017/12/Collaborate-on-Linux-for-Use-in-Safety-Critical-Systems-Lukas-Bulwahn-BMW-Car-IT-GmbH-1.pdf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [cip-dev] [C-safe-secure-studygroup] [SystemSafety] Critical systems Linux
  2018-11-22  9:24     ` Paul Sherwood
@ 2018-11-22 11:57       ` Clive Pygott
  2018-11-22 13:19         ` Paul Sherwood
  0 siblings, 1 reply; 6+ messages in thread
From: Clive Pygott @ 2018-11-22 11:57 UTC (permalink / raw)
  To: cip-dev

Hi Paul

I'll have a go at your question - FYI my background is system safety
management (as in 61508 & DO178) and coding standards (MISRA & JSF++)

You are right that ultimately system safety is a *system *property. You
cannot talk about software doing harm without knowing what its controlling
and how it fits into its physical environment. However, a standard like
61508 takes a layered approach to safety. The topmost levels are system
specific: how could the system behave (intentionally or under fault
conditions) to cause harm? and what features of the architecture (including
software requirements) mitigate these risks? This establishes traceability
from software requirements to safety.

>From the software perspective, under this is the requirement to show that
those software requirements related to safety have been implemented
correctly, and as usual this has two components:

   - showing that the code implements the requirements (verification -
   we've built the right thing)
   - showing the code is well behaved under all circumstances (validation -
   we've built the thing right)

If you are doing full formal semantic verification, the second step is
unnecessary, as the semantic proof will consider all possible combinations
of input and state. However, in practice formal proof is so onerous that
its almost never done. This means that verification is based on testing,
which no matter how thorough, is still based on sampling. There is an
implied belief that the digital system will behave continuously, even
though its known that this isn't true (a good few years ago an early home
computer had an implementation of BASIC that had an integer ABS functions
that worked perfectly except for ABS(-32768) which gave -32768  and it
wasn't because it was limited to 16-bits, ABS(-32769) gave 32769 etc).

The validation part aims to improve the (albeit flawed) belief in
contiguous behaviour by:

   - checking that any constraints imposed by the language are respected
   - any deviations from arithmetic logic are identified (i.e. flagging
   where underflow, overflow, truncation, wraparound or loss of precision may
   occur)

This is the domain of MISRA and JSF++  checking that the code will behave
sensibly, without knowledge of what it should be doing.

To get back to the original discussion, it is staggeringly naive to claim
that 'I have a safe system, because I've used a certified OS kernel'.  I'm
sure you weren't suggesting that, but I have seen companies try it. What
the certified kernel (or any other architectural component) buys you is
that someone has done the verification and validation activities on that
component, so you can be reasonably confident that that component will
behave as advertised - its a level of detail your project doesn't have to
look into (though you may want to audit the quality of the certification
evidence).

As I read your original message you are asking 'why can't a wide user base
be accepted as evidence of correctness?'  The short answer is, do you have
any evidence of what features of the component the users are using and in
what combination? Is my project about to use some combination of features
in an inventive manner that no-one has previously tried, so the wide user
base provides no evidence that it will work  (again a good few years ago,
colleagues of mine were writing a compiler for a VAX and traced a bug to a
particular instruction in the VAX instruction set that had an error in its
implementation. No DEC product or other customer had ever used this
instruction.  BTW, DEC's solution was to remove it from the instruction set)

Hope this helps

       Clive
       LDRA Inc.

On Thu, Nov 22, 2018 at 9:24 AM Paul Sherwood <paul.sherwood@codethink.co.uk>
wrote:

> Hi again...
> >>> The question is:-
> >>>
> >>> As Linux is monolithic, already written  (with minimal
> >>> requirements/design
> >>> docs) and not to any coding standard
> >>> How would the world go about making a Certifiable Linux?
> >
> >>> Is it possible?
>
> Sadly most of the followon discussion seems to have stayed only on
> systemsafetylist.org [1] which rather reduces its impact IMO.
>
> I cross-posted in the hope that knowledge from the safety community
> could be usefully shared with other communities who are (for better or
> worse) considering and in some cases already using Linux in
> safety-critical systems. For example Linux Foundation is actively
> soliciting contributors expressly for an initiative to establish how
> best to support safety scenarios, as discussed at ELCE [2] with
> contributors from OSADL (e.g. [3]) and others.
>
> Perhaps I'm being stupid but it's still unclear to me, after the
> discussion about existing certificates, whether the 'pre-certification'
> approach is justifiable at all, for **any** software, not just Linux.
>
> As I understand it, for any particular project/system/service we need to
> define safety requirements, and safety architecture. From that we need
> to establish constraints and required properties and behaviours of
> chosen architecture components (including OS components). On that basis
> it seems to me that we must always prepare a specific argument for an
> actual system, and cannot safely claim that any generic
> pre-certification fits our use-case?
>
> Please could someone from systemsafetylist.org reply-all and spell it
> out, preferably without referring to standards and without triggering a
> lot of controversy?
>
> br
> Paul
>
> [1] http://systemsafetylist.org/4310.htm
> [2]
>
> https://www.osadl.org/Linux-in-Safety-Critical-Systems-Summit.lfsummit-elce-safety.0.html
> [3]
>
> https://events.linuxfoundation.org/wp-content/uploads/2017/12/Collaborate-on-Linux-for-Use-in-Safety-Critical-Systems-Lukas-Bulwahn-BMW-Car-IT-GmbH-1.pdf
>
>
> _______________________________________________
> C-safe-secure-studygroup mailing list
> C-safe-secure-studygroup at lists.trustable.io
>
> https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-secure-studygroup
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cip-project.org/pipermail/cip-dev/attachments/20181122/95bcb11d/attachment-0001.html>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [cip-dev] [C-safe-secure-studygroup] [SystemSafety] Critical systems Linux
  2018-11-22 11:57       ` [cip-dev] [C-safe-secure-studygroup] " Clive Pygott
@ 2018-11-22 13:19         ` Paul Sherwood
  2018-11-22 17:43           ` [cip-dev] [Safety-linux-formation] " Nicholas Mc Guire
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Sherwood @ 2018-11-22 13:19 UTC (permalink / raw)
  To: cip-dev

Hi Clive,
this is very helpful, thank you. I'm going to re-add the other lists, 
for the same reason as before, and hope you're ok with that. Please see 
my comments inline below...

On 2018-11-22 10:26, Clive Pygott wrote:
> I'll have a go at your question - FYI my background is system safety
> management (as in 61508 & DO178) and coding standards (MISRA & JSF++)
> 
> You are right that ultimately system safety is a _system _property.
> You cannot talk about software doing harm without knowing what its
> controlling and how it fits into its physical environment.

Understood, and I'd be surprised if anyone would challenge this 
reasoning.

> However, a standard like 61508 takes a layered approach to safety.

I'm not sure I understand "layered approach", since I've heard it 
mentioned in multiple situations outside safety (security for one, and 
general architecture/abstraction for another).

Are you saying that the aim is redundant/overlapping safety methods, to 
avoid single-point-of-failure, or something else?

> The topmost
> levels are system specific: how could the system behave (intentionally
> or under fault conditions) to cause harm? and what features of the
> architecture (including software requirements) mitigate these risks?
> This establishes traceability from software requirements to safety.

OK, understood. In previous discussions I've been attempting to 
understand whether there's any fundamental reasons that such 
requirements would need to exist before the software, or whether they 
could be originated for a specific system, and then considered/applied 
to pre-existing code. Is there a hard and fast argument one way or the 
other?

> From the software perspective, under this is the requirement to show
> that those software requirements related to safety have been
> implemented correctly, and as usual this has two components:
> 
> 	* showing that the code implements the requirements (verification -
> we've built the right thing)

OK, makes sense.

> 	* showing the code is well behaved under all circumstances
> (validation - we've built the thing right)

Here I fall off the horse. I don't believe we can be 100% certain about 
"all circumstances", except for small/constrained/simple systems. So I 
distrust claims of certainty about the behaviour of modern COTS 
multicore microprocessors, for example.

> If you are doing full formal semantic verification, the second step is
> unnecessary, as the semantic proof will consider all possible
> combinations of input and state.

It's not fair to single out any individual project/system/community, but 
as an example [1] SEL4's formal proof of correctness proved to be 
insufficient in the context of spectre/meltdown. I'd be (pleasantly) 
surprised if any semantic proof can withstand misbehaviour of the 
hardware/firmware/OS/tooling underneath it.

> However, in practice formal proof is
> so onerous that its almost never done. This means that verification is
> based on testing, which no matter how thorough, is still based on
> sampling. There is an implied belief that the digital system will
> behave continuously, even though its known that this isn't true (a
> good few years ago an early home computer had an implementation of
> BASIC that had an integer ABS functions that worked perfectly except
> for ABS(-32768) which gave -32768  and it wasn't because it was
> limited to 16-bits, ABS(-32769) gave 32769 etc).

Understood, and agreed.

> The validation part aims to improve the (albeit flawed) belief in
> contiguous behaviour by:
> 
> 	* checking that any constraints imposed by the language are respected
> 	* any deviations from arithmetic logic are identified (i.e. flagging
> where underflow, overflow, truncation, wraparound or loss of precision
> may occur)
> 
> This is the domain of MISRA and JSF++  checking that the code will
> behave sensibly, without knowledge of what it should be doing.

OK, IIUC this is mainly

- 'coding standards lead to tests we can run'. And once we are into 
tests, we have to consider applicability, correctness and completeness 
of the tests, in addition to the "flawed belief in contiguous 
behaviour".

- and possibly some untestable guidance/principles which may or may not 
be relevant/correct

> To get back to the original discussion, it is staggeringly naive to
> claim that 'I have a safe system, because I've used a certified OS
> kernel'.  I'm sure you weren't suggesting that, but I have seen
> companies try it.

I've also seen that (in part that's why I'm here) but I certainly 
wouldn't dream of suggesting it.

> What the certified kernel (or any other
> architectural component) buys you is that someone has done the
> verification and validation activities on that component, so you can
> be reasonably confident that that component will behave as advertised
> - its a level of detail your project doesn't have to look into (though
> you may want to audit the quality of the certification evidence).

OK in principle. However from some of the discussion, which I won't 
rehash here it seemed to me that some of the safety folks were expressly 
not confident in some of the certified/advertised/claimed behaviours.

> As I read your original message you are asking 'why can't a wide user
> base be accepted as evidence of correctness?'

If that's the question I seemed to be asking I apologise; certainly I 
wouldn't count a wide user base as evidence of correctness. It's 
evidence of something, though, and that evidence may be part of what 
could be assessed when considering the usefulness of software.

> The short answer is, do
> you have any evidence of what features of the component the users are
> using and in what combination?

I totally agree - which brings us back to the need for 
required/architected behaviours/properties.

> Is my project about to use some
> combination of features in an inventive manner that no-one has
> previously tried, so the wide user base provides no evidence that it
> will work  (again a good few years ago, colleagues of mine were
> writing a compiler for a VAX and traced a bug to a particular
> instruction in the VAX instruction set that had an error in its
> implementation. No DEC product or other customer had ever used this
> instruction.  BTW, DEC's solution was to remove it from the
> instruction set)

Makes sense. Thanks again Clive!

br
Paul

[1] https://research.csiro.au/tsblog/crisis-security-vs-performance/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [cip-dev] [Safety-linux-formation] [C-safe-secure-studygroup] [SystemSafety] Critical systems Linux
  2018-11-22 13:19         ` Paul Sherwood
@ 2018-11-22 17:43           ` Nicholas Mc Guire
  0 siblings, 0 replies; 6+ messages in thread
From: Nicholas Mc Guire @ 2018-11-22 17:43 UTC (permalink / raw)
  To: cip-dev

On Thu, Nov 22, 2018 at 01:19:44PM +0000, Paul Sherwood wrote:
> Hi Clive,
> this is very helpful, thank you. I'm going to re-add the other lists, for
> the same reason as before, and hope you're ok with that. Please see my
> comments inline below...
> 
> On 2018-11-22 10:26, Clive Pygott wrote:
> >I'll have a go at your question - FYI my background is system safety
> >management (as in 61508 & DO178) and coding standards (MISRA & JSF++)
> >
> >You are right that ultimately system safety is a _system _property.
> >You cannot talk about software doing harm without knowing what its
> >controlling and how it fits into its physical environment.
> 
> Understood, and I'd be surprised if anyone would challenge this reasoning.
> 
> >However, a standard like 61508 takes a layered approach to safety.
> 
> I'm not sure I understand "layered approach", since I've heard it mentioned
> in multiple situations outside safety (security for one, and general
> architecture/abstraction for another).
> 
> Are you saying that the aim is redundant/overlapping safety methods, to
> avoid single-point-of-failure, or something else?

61508 starts out with the top layer wich is actually technology
agnostic - simply put if we do not understand the system then it
can?t be adequately safe - so part 1 does not talk about HW/SW at all
but about context/scope/hazard-analysis/mitigation-allocation...
independent of technological issues. Part 2 then looks at the 
technical system (and not just HW) with respect to systematic and
random deviations from specifications as derived by applying part 1
part 3 then looks at the specifics of software. So the layering
of 61508 is a very abstract process layering to ensure that the
poetntial high-level faults - non-understanding expressed in requirements
and design faults - we do not kill that many people wiht dereferenced
NULL pointers - atleast not repetedly if the high-level processes work
are addressed at all levels. see e.g. HSE HSG238

In addition there may be technical "layering" as in layers of protection
and architectural measures - but thats already at the implementation
level.

> 
> >The topmost
> >levels are system specific: how could the system behave (intentionally
> >or under fault conditions) to cause harm? and what features of the
> >architecture (including software requirements) mitigate these risks?
> >This establishes traceability from software requirements to safety.
> 
> OK, understood. In previous discussions I've been attempting to understand
> whether there's any fundamental reasons that such requirements would need to
> exist before the software, or whether they could be originated for a
> specific system, and then considered/applied to pre-existing code. Is there
> a hard and fast argument one way or the other?

The simplest argument is that the goal of any safety process is that the
safety functional requirements are implemented in the software elements - the
outlined process (route 1_S) is one way seen suitable to achieve the objectives
of the safety standards (61508 and derivatives - DO 178 is a bit different
because the context of ARP 4751A is well defined - so they can put very specific
needs into DO 178/254 while 61508 is a generic standard and can?t do that.

Essentially the goal of achieving the objectives is not dependent on the
process by which the implementation is achieved - verification of the
achieving of the objectives *may* though depend on the means by which
the imlementation wsa achieved (but then again is quite independent of
the question of "intent for use in safety related systems" or not).

> 
> >From the software perspective, under this is the requirement to show
> >that those software requirements related to safety have been
> >implemented correctly, and as usual this has two components:
> >
> >	* showing that the code implements the requirements (verification -
> >we've built the right thing)
> 
> OK, makes sense.
> 
> >	* showing the code is well behaved under all circumstances
> >(validation - we've built the thing right)

DO 178B (respectively DO 248) - but that misses the essential point of

      * showing that the assurance data (often process data) on which we
        based any such claim is adequate

and this is the thing that is changing because the two highlevel requirements
you give are fully adequate for deterministic and relatively simple
system (type-A systems in 61508-2 Ed 2 7.4.4.1.2) but not for type B
system because we generally can?t demonstrate correctness nor compleness
in any meaningful sense - with other words we increasingly simply do not
know what the "right thing" is and as soon as non-determinism comes
into play the "built the thing right" becomes a probability as well and
needs to be assessed as such (e.g. a pLRUt cache replacement in many
current CPU does not allow to claim that it is built right other than
probabilistically).

> 
> Here I fall off the horse. I don't believe we can be 100% certain about "all
> circumstances", except for small/constrained/simple systems. So I distrust
> claims of certainty about the behaviour of modern COTS multicore
> microprocessors, for example.

..we fell off that horse about 20 years ago but many did not notice ;)

The point is to accept what has been stated many time alreday that
safety is not a 100% property anyway - as long as system were simple
we could entertain the illusion of completeness of testing (an absurd
assertion since the mid 1990s for many systems) and we have not yet
fully developed the necessary understanding and tools to actually
handle complex systems. Also note that this idea of correctness is 
bound to strongly to technical realisation which puts the focus on
mitigation of faults rather than the elimination of faults at the
requirements and design level - and that is really why we are so lost
with current safety standards when it comes to complex systems because
we emediately jump to mitigation rather than harvesting the potential
for elimination first - with other words the problem is systems engineering
not software engineering.

> 
> >If you are doing full formal semantic verification, the second step is
> >unnecessary, as the semantic proof will consider all possible
> >combinations of input and state.

...and who ever had a fault free initial specification to start
with for her formal specificaiton that then was shown to be implemented ?

the idea that "everything in the system matches the requirments" and
"every requirement is built into the system" - kind of the corrolary to
your two components above - does not address the key issue in fucntional
safety and that is that our reqiurements are wrong because we do not 
fully understand the system and its environment (except for the most trivial
of systems).

> 
> It's not fair to single out any individual project/system/community, but as
> an example [1] SEL4's formal proof of correctness proved to be insufficient
> in the context of spectre/meltdown. I'd be (pleasantly) surprised if any
> semantic proof can withstand misbehaviour of the
> hardware/firmware/OS/tooling underneath it.

...and the misunderstanding of the systems intent by those writing
the formal specification that then is proven - it is interesting to note
that 61508 Ed 2 (Table A.1) ranks formal requiremets specification
lower than semi-formal requirements specification and in Table C.1 it
is clarified why - reduced understandability !

> 
> >However, in practice formal proof is
> >so onerous that its almost never done. This means that verification is
> >based on testing, which no matter how thorough, is still based on
> >sampling. There is an implied belief that the digital system will
> >behave continuously, even though its known that this isn't true (a
> >good few years ago an early home computer had an implementation of
> >BASIC that had an integer ABS functions that worked perfectly except
> >for ABS(-32768) which gave -32768  and it wasn't because it was
> >limited to 16-bits, ABS(-32769) gave 32769 etc).

it is not based on testing - no sane safety standard would suggest to
achieve verification by testing - it is always  analysis and testing 
and if it is reduced to testing only then it will for sure produce a
warm cosy fealing after execution of 100k test-csae... which covered
10E-20% of the systems state-space. 

The problem of testing is that in the heads of many we sitll have the
idea that an aggregation of highly-reliable components forms a highly
reliable system - which is wrong in it self but becomes a real hazard
as soon as the ability to inspec comonents is so much easier that we
focus on components because we can believe that we understand then
in isolation and then simply drop the main cause which is interaction
(which is in genreal not covered by testing - not even integratoin
testing - maybe to a limited level by field trials)

> 
> Understood, and agreed.
> 
> >The validation part aims to improve the (albeit flawed) belief in
> >contiguous behaviour by:
> >
> >	* checking that any constraints imposed by the language are respected
> >	* any deviations from arithmetic logic are identified (i.e. flagging
> >where underflow, overflow, truncation, wraparound or loss of precision
> >may occur)
> >
> >This is the domain of MISRA and JSF++  checking that the code will
> >behave sensibly, without knowledge of what it should be doing.
> 

Does anyone have hard evidence that shows that there is *any*
significant correlation between MISRA C coding rules and bug rates ?
This is one of the cases where we focus on formality because we can
even though we have little (or no) evidence that these rules or metrics
have any effect (aside from them being used in a way that they
were never intended for anyway) ?

as a corrolary think about your personal driving experience - how many#
situations were you in where you got out by violating a rule ?
the assumption that following context agnostic rules leads to 
safety properties of system is truely absurd.

> OK, IIUC this is mainly
> 
> - 'coding standards lead to tests we can run'. And once we are into tests,
> we have to consider applicability, correctness and completeness of the
> tests, in addition to the "flawed belief in contiguous behaviour".
> 
> - and possibly some untestable guidance/principles which may or may not be
> relevant/correct

If you have no specific context how can you assert more than correctness against
context free reqiuremnts which them selve have no assurance of correctness or
comleteness in the context of any specific system - focussing on what we can
because we know that we can?t handle the level that actually is
relevant is a form of deliberate ignorance.

Coding standards (and this is the intent of the Linux kernel coding standard)
lead to  *readability*  which is the maybe only relevant defence 
against correct implementation of the wrong function (or as you
state above not "building the right system"). That is the expressed
intent of the Linux kernel coding standard and readability respectively
understandability of code (and fault behavior) is the key to actually
being able to detect when the correctly implemented code is the wrong
solution for a particular context - the requirements don?t do as they are
an abstraction and as such they focus on the intended behavior not on the
side effects or unintended interactions - thus matching only requirements
of perceived generic elements will necessarily lead to missing the specific
intent for any system in the systems specific corner cases.

> 
> >To get back to the original discussion, it is staggeringly naive to
> >claim that 'I have a safe system, because I've used a certified OS
> >kernel'.  I'm sure you weren't suggesting that, but I have seen
> >companies try it.
> 
> I've also seen that (in part that's why I'm here) but I certainly wouldn't
> dream of suggesting it.
> 
> >What the certified kernel (or any other
> >architectural component) buys you is that someone has done the
> >verification and validation activities on that component, so you can
> >be reasonably confident that that component will behave as advertised

No - thats precisely what only is true for very simple components - but 
never holds for complex components and any OS is a type-B system 

a) the failure mode of at least one constituent component is not well defined; or
b) the behaviour of the element under fault conditions cannot be completely determined; or
c) there is insufficient dependable failure data to support claims for rates of failure for
   detected and undetected dangerous failures (see 7.4.9.3 to 7.4.9.5).
[IEC 61508-2 Ed 2 7.4.4.1.3]

Pre-certified OS (or complex libraries) buy you the illusion that you
took care of safety by giving someone else enough money - thats it.

> >- its a level of detail your project doesn't have to look into (though
> >you may want to audit the quality of the certification evidence).
> 
> OK in principle. However from some of the discussion, which I won't rehash
> here it seemed to me that some of the safety folks were expressly not
> confident in some of the certified/advertised/claimed behaviours.

good to hear that - they should not be - because it depends entirely on the
specific context - the higher the complexity of a system the more we depend
on looking at the right corners of the system to undrestand
where they can go wrong - focussing on generic properties (unspecific behaviors
and their correctness asserted against a more or less random model) gives
you very little. The higher the complexity of a system the more the abiltiy
to analyze the systems specific behavior in context of env/Use-case will determine
the systems safety properties. Even *if* testing could achieve the
initial goal of correctness the inability to analyze the system would
impair any effort to understand and thus learn from incidences.

> 
> >As I read your original message you are asking 'why can't a wide user
> >base be accepted as evidence of correctness?'
> 
> If that's the question I seemed to be asking I apologise; certainly I
> wouldn't count a wide user base as evidence of correctness. It's evidence of
> something, though, and that evidence may be part of what could be assessed
> when considering the usefulness of software.

prior usage may well be one building block in a chain of assessment
of a pre-existing element but I would claim primarily in the sense
of selecting the lowest risk elements - it will not save you any effort in
assessing the objectives of functional safety - but careful selectoin
based on prior usage will increas the likelyhood that the assessment
will actually conclude positively. To the specifics of 61508 Ed 2
route 3_S (assessment of non-compliant development) the relevance of
a large user base is also the ability to actually harvest process level
data that can allow to assess the effectiveness of different measures
e.g. it is trivial to state that a pre-existing element wsa reviewed but
without any data on finding, peoples competency, level of deviations later
found during operations, etc. we can not actually use "review was done"
as an argument in the assessment of a non-compliant development - and in
this sense user base is as you say "evidence of something" the trick
is to find sound procedures how to extract the relevant information in that
something so as to be able to make a statement on the process that created
the element. So as soon as you shift the focus from the implementation 
details to the process that created these implementation details
then the user base becomes the key "data set" that allows to actually build
an argument - at least that is the assumption behind the SIL2LinuxMP project.

> 
> >The short answer is, do
> >you have any evidence of what features of the component the users are
> >using and in what combination?
> 
> I totally agree - which brings us back to the need for required/architected
> behaviours/properties.
> 

with an important change - the use of pre-existing elements always
implies that you are building functionality into the system that
does NOT match you needs exactly and the mitigation again only
lies in the ability to analyze the system to the point where the system can
ither be adjusted to the specifics of the element (by updating requirements
and design) or by handling the discrepency at runtime (e.g. wrappers)
in a complex system it is highly unlikely that the requirements anyone put
on a complex element like an OS is in exact alignment with any particular 
system - not even POSIX 1003.13 PSE 51 matchies any real system 100%.

> >Is my project about to use some
> >combination of features in an inventive manner that no-one has
> >previously tried, so the wide user base provides no evidence that it
> >will work  (again a good few years ago, colleagues of mine were
> >writing a compiler for a VAX and traced a bug to a particular
> >instruction in the VAX instruction set that had an error in its
> >implementation. No DEC product or other customer had ever used this
> >instruction.  BTW, DEC's solution was to remove it from the
> >instruction set)
> 
> Makes sense. Thanks again Clive!
>
Thats the prime felacy I see in the whole pre-existing SW discussion
that the focus on fuctionality - the argument for using the common
setup is that the process initially was generating this common
setup and the measures and techniques to achieve the specfied
behavior where IN CONTEXT of the common use-case no mater if explicidly
stated or implied - diverting from the common use-case potentially
invalidates the results of these measures and techniques. So the 
requirement to be allowed to draw on any process level claims of the
pre-existing element is to operate it in as close a context to the
original intent as possible - using common configurations is one
aprt of this.

thx!
hofrat

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-11-22 17:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <037a01d480f8$1f486570$5dd93050$@phaedsys.com>
2018-11-20 18:45 ` [cip-dev] [SystemSafety] Critical systems Linux Paul Sherwood
2018-11-20 18:58   ` Paul Sherwood
2018-11-22  9:24     ` Paul Sherwood
2018-11-22 11:57       ` [cip-dev] [C-safe-secure-studygroup] " Clive Pygott
2018-11-22 13:19         ` Paul Sherwood
2018-11-22 17:43           ` [cip-dev] [Safety-linux-formation] " Nicholas Mc Guire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox