* Sym53C8xx Driver Hardening
@ 2002-07-23 13:29 Isabelle, Francois
2002-07-23 13:57 ` Rogier Wolff
0 siblings, 1 reply; 9+ messages in thread
From: Isabelle, Francois @ 2002-07-23 13:29 UTC (permalink / raw)
To: Linux SCSI list (E-mail)
Hi,
anyone working on driver hardening such as defined by the TLT/CGL
specification ?
TLT : Telecom Linux Technology
CGL: Open Source Development Lab - Carrier Grade Linux
Theese are 2 projects to define a "highly available" and "easily embeddable"
linux for the replacement of proprietary telecom platform.
The specification includes a section related to "driver hardening"
Some of the features:
- Panic removal
- Standardized event logging
- Parameter checking ...
The white paper from Intel:
http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic%20
Editorial::linux_hardening&cntType=IDS_EDITORIAL&catCode=0
Here is a quote from A.C. on the linux-ha list:
What I've seen so far on the hardening proposals varies between the
comical and the well-intentioned but ugly intel code. I've not seen
public montavista work so it would inappropriate for me to comment on it.
As you see, it's not really accepted at unanimity ...
It's just not bazaar like enough I suppose !
Anyway there is some good in such standardization and if it produces better
drivers for linux, why not ?
So to get back to the topic :
anyone started sym53c8xx driver hardening ?
It might already be full proof but most certainly lacks standard event
logging.
Any comments on CGL/TLT appreciated
Thank you
Francois Isabelle
Kontron Canada Inc.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Sym53C8xx Driver Hardening 2002-07-23 13:29 Sym53C8xx Driver Hardening Isabelle, Francois @ 2002-07-23 13:57 ` Rogier Wolff 2002-07-23 15:24 ` Alan Cox 0 siblings, 1 reply; 9+ messages in thread From: Rogier Wolff @ 2002-07-23 13:57 UTC (permalink / raw) To: Isabelle, Francois; +Cc: linux-scsi Isabelle, Francois wrote: > Some of the features: > - Panic removal > - Standardized event logging > - Parameter checking ... Panic removal is NOT a good thing. A panic should only happen wehn the driver/kernel/whatever CANNOT continue because of some serious problem. A panic should occurin cases like: b = 2 + 3; .... if (b != 5) panic (); Now it won't be as easy as this. But for instance in my firestream driver, you sometimes put a value in a register in the chip, and if later on you read it back, you want the chip to have left it unmodified, or to have it changed in a predictable way. If the value is unexpected, a panic is the right "way out". If this is NOT done, the chip may be DMA-ing data into the wrong spot in main memory. This will not panic your system, but crash it (*). A panic can be configured to "do the right thing": reboot, and be back in business as soon as possible. An operator should then be notified and somehow take action. Removing panics sounds like a manager-solution to the problem: A system that panics is not reliable, so we should remove the panics. Well, if the code panics for the right reason, then the panic is to be preferred above continuing. I'm convinced that most linux-code is well-written in this respect. Rogier. (*) A crash is meant as where the system ends up being wedged, with only a hardware watchdog able to get it out of that state. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * There are old pilots, and there are bold pilots. * There are also old, bald pilots. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-23 13:57 ` Rogier Wolff @ 2002-07-23 15:24 ` Alan Cox 2002-07-23 15:11 ` random1 ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Alan Cox @ 2002-07-23 15:24 UTC (permalink / raw) To: Rogier Wolff; +Cc: Isabelle, Francois, linux-scsi On Tue, 2002-07-23 at 14:57, Rogier Wolff wrote: > Now it won't be as easy as this. But for instance in my firestream > driver, you sometimes put a value in a register in the chip, and if > later on you read it back, you want the chip to have left it > unmodified, or to have it changed in a predictable way. If the value > is unexpected, a panic is the right "way out". The high reliability people take a different view. I actually agree with them. It isnt about 'oops didnt happen' it is about controlling the failure case Suppose your firestream driver reports catacylsmic internal error status. Their argument is not that you should pretend life is good but that the driver should log a fault and shut off the chip as best it can. So you might have a firestream_failed() function which did Disable master bit Put board into D3 Wait Put board into running state Try to reset and configure it If this fails shove it in D3 and give up At this point the high reliability system is servicing the other links it manages and flashing warning lights to the engineers, rather than completely down ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-23 15:24 ` Alan Cox @ 2002-07-23 15:11 ` random1 2002-07-23 15:38 ` Rogier Wolff 2002-07-24 23:11 ` Gérard Roudier 2 siblings, 0 replies; 9+ messages in thread From: random1 @ 2002-07-23 15:11 UTC (permalink / raw) To: linux-scsi Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > The high reliability people take a different view. I actually agree with > them. It isnt about 'oops didnt happen' it is about controlling the > failure case "graceful degradation" -- it's been being taught in CS curricula for *at least* twenty years. I'm disturbed by all the places that people have chosen, when faced with the unexpected, to just panic, rather than find a way to get back to some semblence of a running system. I don't want some goofy UPS hanging off a USB bus, for instance, to take down a critical storage server. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-23 15:24 ` Alan Cox 2002-07-23 15:11 ` random1 @ 2002-07-23 15:38 ` Rogier Wolff 2002-07-24 23:11 ` Gérard Roudier 2 siblings, 0 replies; 9+ messages in thread From: Rogier Wolff @ 2002-07-23 15:38 UTC (permalink / raw) To: Alan Cox; +Cc: Rogier Wolff, Isabelle, Francois, linux-scsi Alan Cox wrote: > On Tue, 2002-07-23 at 14:57, Rogier Wolff wrote: > > Now it won't be as easy as this. But for instance in my firestream > > driver, you sometimes put a value in a register in the chip, and if > > later on you read it back, you want the chip to have left it > > unmodified, or to have it changed in a predictable way. If the value > > is unexpected, a panic is the right "way out". > > The high reliability people take a different view. I actually agree with > them. It isnt about 'oops didnt happen' it is about controlling the > failure case > > Suppose your firestream driver reports catacylsmic internal error > status. Their argument is not that you should pretend life is good but > that the driver should log a fault and shut off the chip as best it can. > So you might have a firestream_failed() function which did > > Disable master bit > Put board into D3 > Wait > Put board into running state > Try to reset and configure it > If this fails shove it in D3 and give up > > At this point the high reliability system is servicing the other links > it manages and flashing warning lights to the engineers, rather than > completely down That might indeed be preferable. However, the "wild DMA" may have corrupted users' data, and/or the kernel's datastructures. So continuing may lead to a bad situation getting worse... Maybe we want to generalize "panic" so that you pass it a pointer to "shutdown this hardware" routine, allowing diversion of the "policy" about what to do to a user-definable central place..... Userspace would then be notified: "We shut down atm0 due to an irrecoverable error". And userspace can then decide to kick the device as you suggest above. Or, I could configure it to do an immediate reboot, with/without attempting to sync disks.... Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * There are old pilots, and there are bold pilots. * There are also old, bald pilots. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-23 15:24 ` Alan Cox 2002-07-23 15:11 ` random1 2002-07-23 15:38 ` Rogier Wolff @ 2002-07-24 23:11 ` Gérard Roudier 2002-07-25 22:33 ` Jeremy Higdon 2 siblings, 1 reply; 9+ messages in thread From: Gérard Roudier @ 2002-07-24 23:11 UTC (permalink / raw) To: Alan Cox; +Cc: Rogier Wolff, Isabelle, Francois, linux-scsi On 23 Jul 2002, Alan Cox wrote: > On Tue, 2002-07-23 at 14:57, Rogier Wolff wrote: > > Now it won't be as easy as this. But for instance in my firestream > > driver, you sometimes put a value in a register in the chip, and if > > later on you read it back, you want the chip to have left it > > unmodified, or to have it changed in a predictable way. If the value > > is unexpected, a panic is the right "way out". > > The high reliability people take a different view. I actually agree with > them. It isnt about 'oops didnt happen' it is about controlling the > failure case > > Suppose your firestream driver reports catacylsmic internal error > status. Their argument is not that you should pretend life is good but > that the driver should log a fault and shut off the chip as best it can. > So you might have a firestream_failed() function which did > > Disable master bit > Put board into D3 > Wait > Put board into running state > Try to reset and configure it > If this fails shove it in D3 and give up > > At this point the high reliability system is servicing the other links > it manages and flashing warning lights to the engineers, rather than > completely down By the way, the sym53c8xx_2 driver (and probably version 1 too) never intentionnaly panics the system on hardware failure detection. The couple of calls to panic you can see in the driver are related to software unexpected situations. In my opinion, serious high reliability requires special hardware support. On serious harware error detected, the driver simply tries to reset everything it can in order to have a chance to restart operations properly. May-be it should count such events and give up in some way after some given number of retries. If you just suggest such 'give up' operations to disable the device this should be doable, but this will not make upper layers aware of the situation and certainly not be what most users expect. This MEANS that serious high reliability ALSO requires special SOFTWARE support and USER utilities, and certainly NOT ONLY be based on some questionnable trivial tinking in device drivers, in my opinion. Gérard. > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-24 23:11 ` Gérard Roudier @ 2002-07-25 22:33 ` Jeremy Higdon 2002-07-25 23:53 ` Alan Cox 2002-07-26 22:25 ` Gérard Roudier 0 siblings, 2 replies; 9+ messages in thread From: Jeremy Higdon @ 2002-07-25 22:33 UTC (permalink / raw) To: Gérard Roudier, Alan Cox Cc: Rogier Wolff, Isabelle, Francois, linux-scsi [-- Attachment #1: Text --] [-- Type: text/plain , Size: 1537 bytes --] On Jul 25, 1:11am, Gérard Roudier wrote: > > By the way, the sym53c8xx_2 driver (and probably version 1 too) never > intentionnaly panics the system on hardware failure detection. > The couple of calls to panic you can see in the driver are related to > software unexpected situations. > In my opinion, serious high reliability requires special hardware support. > > On serious harware error detected, the driver simply tries to reset > everything it can in order to have a chance to restart operations > properly. May-be it should count such events and give up in some way after > some given number of retries. If you just suggest such 'give up' > operations to disable the device this should be doable, but this will not > make upper layers aware of the situation and certainly not be what most > users expect. This MEANS that serious high reliability ALSO requires > special SOFTWARE support and USER utilities, and certainly NOT ONLY be > based on some questionnable trivial tinking in device drivers, in my > opinion. > > Gérard. Also, with some sorts of errors, one might suspect that the device has DMA'd data into the wrong place in memory. Generally, I think a crash is preferable to data corruption (i.e. no answer is better than a wrong answer), so in such cases, the driver might want to panic. If hardware can place a fence around DMA (such that DMA to unintended locations is impossible), then you might not want to panic in such situations, assuming that you can retry. jeremy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-25 22:33 ` Jeremy Higdon @ 2002-07-25 23:53 ` Alan Cox 2002-07-26 22:25 ` Gérard Roudier 1 sibling, 0 replies; 9+ messages in thread From: Alan Cox @ 2002-07-25 23:53 UTC (permalink / raw) To: Jeremy Higdon Cc: Gérard Roudier, Rogier Wolff, Isabelle, Francois, linux-scsi On Thu, 2002-07-25 at 23:33, Jeremy Higdon wrote: > Also, with some sorts of errors, one might suspect that the device has > DMA'd data into the wrong place in memory. It depends on the environment. In telco stuff its often better to sound the alarms and pray. Gerard is right that policy must be configurable > If hardware can place a fence around DMA (such that DMA to unintended > locations is impossible), then you might not want to panic in such > situations, assuming that you can retry. Doable on real computers with an iommu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sym53C8xx Driver Hardening 2002-07-25 22:33 ` Jeremy Higdon 2002-07-25 23:53 ` Alan Cox @ 2002-07-26 22:25 ` Gérard Roudier 1 sibling, 0 replies; 9+ messages in thread From: Gérard Roudier @ 2002-07-26 22:25 UTC (permalink / raw) To: Jeremy Higdon; +Cc: Alan Cox, Rogier Wolff, Isabelle, Francois, linux-scsi On Thu, 25 Jul 2002, Jeremy Higdon wrote: > On Jul 25, 1:11am, Gérard Roudier wrote: > > > > By the way, the sym53c8xx_2 driver (and probably version 1 too) never > > intentionnaly panics the system on hardware failure detection. > > The couple of calls to panic you can see in the driver are related to > > software unexpected situations. > > In my opinion, serious high reliability requires special hardware support. > > > > On serious harware error detected, the driver simply tries to reset > > everything it can in order to have a chance to restart operations > > properly. May-be it should count such events and give up in some way after > > some given number of retries. If you just suggest such 'give up' > > operations to disable the device this should be doable, but this will not > > make upper layers aware of the situation and certainly not be what most > > users expect. This MEANS that serious high reliability ALSO requires > > special SOFTWARE support and USER utilities, and certainly NOT ONLY be > > based on some questionnable trivial tinking in device drivers, in my > > opinion. > > > > Gérard. > > > Also, with some sorts of errors, one might suspect that the device has > DMA'd data into the wrong place in memory. > > Generally, I think a crash is preferable to data corruption (i.e. no > answer is better than a wrong answer), so in such cases, the driver > might want to panic. The sym53c8xx ensures that all possible path checkings against errors it has under control are enabled. If the drivers gets the error, then this ideally means that it is not a fatal system error or user or O/S does not want it to handled so. Otherwise, some NMI should occur and the system should halt. So, my opinion is that a hardware error that could be handled as a system error but is not so should be considered as an invitation to try to recover. Hence, the answer of the driver is the appropriate one, in my opinion. > If hardware can place a fence around DMA (such that DMA to unintended > locations is impossible), then you might not want to panic in such > situations, assuming that you can retry. You may be dreaming there. At least, I never heard about such magic in the PCI world. Regards, Gérard. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2002-07-26 22:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-23 13:29 Sym53C8xx Driver Hardening Isabelle, Francois 2002-07-23 13:57 ` Rogier Wolff 2002-07-23 15:24 ` Alan Cox 2002-07-23 15:11 ` random1 2002-07-23 15:38 ` Rogier Wolff 2002-07-24 23:11 ` Gérard Roudier 2002-07-25 22:33 ` Jeremy Higdon 2002-07-25 23:53 ` Alan Cox 2002-07-26 22:25 ` Gérard Roudier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox