* RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) @ 2005-03-18 17:24 Nguyen, Tom L 2005-03-18 18:10 ` Grant Grundler 0 siblings, 1 reply; 5+ messages in thread From: Nguyen, Tom L @ 2005-03-18 17:24 UTC (permalink / raw) To: Paul Mackerras Cc: Benjamin Herrenschmidt, Hidetoshi Seto, Greg KH, linux-kernel, ak, linuxppc64-dev, linux-pci, Nguyen, Tom L On Thursday, March 17, 2005 8:01 PM Paul Mackerras wrote: > Does the PCI Express AER specification define an API for drivers? No. That is why we agree a general API that works for all platforms. >Likewise, with EEH the device driver could take recovery action on its >own. But we don't want to end up with multiple sets of recovery code >in drivers, if possible. Also we want the recovery code to be as >simple as possible, otherwise driver authors will get it wrong. Drivers own their devices register sets. Therefore if there are any vendor unique actions that can be taken by the driver to recovery we expect the driver to do so. For example, if the drivers see "xyz" error and there is a known errata and workaround that involves resetting some registers on the card. From our perspective we see drivers taking care of their own cards but the AER driver and your platform code will take care of the bus/link interfaces. >I would see the AER driver as being included in the "platform" code. >The AER driver would be be closely involved in the recovery process. Our goal is to have the AER driver be part of the general code base because it is based on a PCI SIG specification that can be implemented across all architectures. >What is the state of a link during the time between when an error is >detected and when a link reset is done? Is the link usable? What >happens if you try to do a MMIO read from a device downstream of the >link? For a FATAL error the link is "unreliable". This means MMIO operations may or may not succeed. That is why the reset is performed by the upstream port driver. The interface to that is reliable. A reset of an upstream port will propagate to all downstream links. So we need an interface to the bus/port driver to request a reset on its downstream link. We don't want the AER driver writing port bus driver bridge control registers. We are trying to keep the ownership of the devices register read/write within the domain of the devices driver. In our case the port bus driver. Thanks, Long ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) 2005-03-18 17:24 PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) Nguyen, Tom L @ 2005-03-18 18:10 ` Grant Grundler 2005-03-18 23:13 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 5+ messages in thread From: Grant Grundler @ 2005-03-18 18:10 UTC (permalink / raw) To: Nguyen, Tom L Cc: Paul Mackerras, Benjamin Herrenschmidt, Hidetoshi Seto, Greg KH, linux-kernel, ak, linuxppc64-dev, linux-pci On Fri, Mar 18, 2005 at 09:24:02AM -0800, Nguyen, Tom L wrote: > >Likewise, with EEH the device driver could take recovery action on its > >own. But we don't want to end up with multiple sets of recovery code > >in drivers, if possible. Also we want the recovery code to be as > >simple as possible, otherwise driver authors will get it wrong. > > Drivers own their devices register sets. Therefore if there are any > vendor unique actions that can be taken by the driver to recovery we > expect the driver to do so. ... All drivers also need to cleanup driver state if they can't simply recover (and restart pending IOs). ie they need to release DMA resources and return suitable errors for pending requests. > >I would see the AER driver as being included in the "platform" code. > >The AER driver would be be closely involved in the recovery process. > > Our goal is to have the AER driver be part of the general code base > because it is based on a PCI SIG specification that can be implemented > across all architectures. To the driver writer, it's all "platform" code. Folks who maintain PCI (and other) services differentiate between "generic" and "arch/platform" specific. Think first like a driver writer and then worry about if/how that can be divided between platform generic and platform/arch specific code. Even PCI-Express has *some* arch specific component. At a minimum each architecture has it's own chipset and firmware to deal with for PCI Express bus discovery and initialization. But driver writers don't have to worry about that and they shouldn't for error recovery either. > For a FATAL error the link is "unreliable". This means MMIO operations > may or may not succeed. That is why the reset is performed by the > upstream port driver. The interface to that is reliable. A reset of an > upstream port will propagate to all downstream links. So we need an > interface to the bus/port driver to request a reset on its downstream > link. We don't want the AER driver writing port bus driver bridge > control registers. We are trying to keep the ownership of the devices > register read/write within the domain of the devices driver. In our > case the port bus driver. A port bus driver does NOT sound like a normal device driver. If PCI Express defines a standard register set for a bridge device (like PCI COnfig space for PCI-PCI Bridges), then I don't see a problem with PCI-Express error handling code mucking with those registers. Look at how PCI-PCI bridges are supported today and which bits of code poke registers on PCI-PCI Bridges. hth, grant ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) 2005-03-18 18:10 ` Grant Grundler @ 2005-03-18 23:13 ` Benjamin Herrenschmidt 2005-03-19 0:35 ` Real-life pci errors (Was: " Linas Vepstas 0 siblings, 1 reply; 5+ messages in thread From: Benjamin Herrenschmidt @ 2005-03-18 23:13 UTC (permalink / raw) To: Grant Grundler Cc: Nguyen, Tom L, Paul Mackerras, Hidetoshi Seto, Greg KH, Linux Kernel list, ak, linuxppc64-dev, linux-pci On Fri, 2005-03-18 at 11:10 -0700, Grant Grundler wrote: > On Fri, Mar 18, 2005 at 09:24:02AM -0800, Nguyen, Tom L wrote: > > >Likewise, with EEH the device driver could take recovery action on its > > >own. But we don't want to end up with multiple sets of recovery code > > >in drivers, if possible. Also we want the recovery code to be as > > >simple as possible, otherwise driver authors will get it wrong. > > > > Drivers own their devices register sets. Therefore if there are any > > vendor unique actions that can be taken by the driver to recovery we > > expect the driver to do so. > ... > > All drivers also need to cleanup driver state if they can't > simply recover (and restart pending IOs). ie they need to release > DMA resources and return suitable errors for pending requests. Additionally, in "real life", very few errors are cause by known errata. If the drivers know about the errata, they usually already work around them. Afaik, most of the errors are caused by transcient conditions on the bus or the device, like a bit beeing flipped, or thermal conditions... > To the driver writer, it's all "platform" code. > Folks who maintain PCI (and other) services differentiate between > "generic" and "arch/platform" specific. Think first like a driver > writer and then worry about if/how that can be divided between platform > generic and platform/arch specific code. > > Even PCI-Express has *some* arch specific component. At a minimum each > architecture has it's own chipset and firmware to deal with > for PCI Express bus discovery and initialization. But driver writers > don't have to worry about that and they shouldn't for error > recovery either. Exactly. A given platform could use Intel's code as-is, or may choose to do things differently while still showing the same interface to drivers. Eventually we may end up adding platform hooks to the generic PCIE code like we have in the PCI code if some platforms require them. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Real-life pci errors (Was: Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) 2005-03-18 23:13 ` Benjamin Herrenschmidt @ 2005-03-19 0:35 ` Linas Vepstas 2005-03-19 1:24 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 5+ messages in thread From: Linas Vepstas @ 2005-03-19 0:35 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Grant Grundler, Nguyen, Tom L, Paul Mackerras, Hidetoshi Seto, Greg KH, Linux Kernel list, ak, linuxppc64-dev, linux-pci On Sat, Mar 19, 2005 at 10:13:02AM +1100, Benjamin Herrenschmidt was heard to remark: > > Additionally, in "real life", very few errors are cause by known errata. > If the drivers know about the errata, they usually already work around > them. Afaik, most of the errors are caused by transcient conditions on > the bus or the device, like a bit beeing flipped, or thermal > conditions... Heh. Let me describe "real life" a bit more accurately. We've been running with pci error detection enabled here for the last two years. Based on this experience, the ballpark figures are: 90% of all detected errors were device driver bugs coupled to pci card hardware errata 9% poorly seated pci cards (remove/reseat will make problem go away) 1% transient/other. We've seen *EVERY* and I mean *EVERY* device driver that we've put under stress tests (e.g. peak i/o rates for > 72 hours, e.g. massive tcp/nfs traffic, massive disk i/o traffic, etc), *EVERY* driver tripped on an EEH error detect that was traced back to a device driver bug. Not to blame the drivers, a lot of these were related to pci card hardware/foirmware bugs. For example, I think grepping for "split completion" and "NAPI" in the patches/errata for e100 and e1000 for the last year will reveal some of the stuff that was found. As far as I know, for every bug found, a patch made it into mainline. As a rule, it seems that finding these device driver bugs was very hard; we had some people work on these for months, and in the case of the e1000, we managed to get Intel engineers to fly out here and stare at PCI bus traces for a few days. (Thanks Intel!) Ditto for Emulex. For ipr, we had inhouse people. So overall, PCI error detection did have the expected effect (protecting the kernel from corruption, e.g. due to DMA's going to wild addresses), but I don't think anybody expected that the vast majority would be software/hardware bugs, instead of transient effects. What's ironic in all of this is that by adding error recovery, device driver bugs will be able to hide more effectively ... if there's a pci bus error due to a driver bug, the pci card will get rebooted, the kernel will burp for 3 seconds, and things will keep going, and most sysadmins won't notice or won't care. --linas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Real-life pci errors (Was: Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) 2005-03-19 0:35 ` Real-life pci errors (Was: " Linas Vepstas @ 2005-03-19 1:24 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 5+ messages in thread From: Benjamin Herrenschmidt @ 2005-03-19 1:24 UTC (permalink / raw) To: Linas Vepstas Cc: Grant Grundler, Nguyen, Tom L, Paul Mackerras, Hidetoshi Seto, Greg KH, Linux Kernel list, ak, linuxppc64-dev, linux-pci On Fri, 2005-03-18 at 18:35 -0600, Linas Vepstas wrote: > On Sat, Mar 19, 2005 at 10:13:02AM +1100, Benjamin Herrenschmidt was heard to remark: > > > > Additionally, in "real life", very few errors are cause by known errata. > > If the drivers know about the errata, they usually already work around > > them. Afaik, most of the errors are caused by transcient conditions on > > the bus or the device, like a bit beeing flipped, or thermal > > conditions... > > > Heh. Let me describe "real life" a bit more accurately. > > We've been running with pci error detection enabled here for the last > two years. Based on this experience, the ballpark figures are: > > 90% of all detected errors were device driver bugs coupled to > pci card hardware errata Well, this have been in-lab testing to fight driver bugs/errata on early rlease kernels, I'm talking about the context of a released solution with stable drivers/hw. > 9% poorly seated pci cards (remove/reseat will make problem go away) > > 1% transient/other. Ok. > We've seen *EVERY* and I mean *EVERY* device driver that we've put > under stress tests (e.g. peak i/o rates for > 72 hours, e.g. > massive tcp/nfs traffic, massive disk i/o traffic, etc), *EVERY* > driver tripped on an EEH error detect that was traced back to > a device driver bug. Not to blame the drivers, a lot of these > were related to pci card hardware/foirmware bugs. For example, > I think grepping for "split completion" and "NAPI" in the > patches/errata for e100 and e1000 for the last year will reveal > some of the stuff that was found. As far as I know, > for every bug found, a patch made it into mainline. Yah, those are a pain. But then, it isn't the context described by Nguyen where the driver "knows" about the errata and how to recover. It's the context of a bug where the driver does not know what's going on and/or doesn't have the proper workaround. My point was more that there are very few cases where a driver will have to do recovery of PCI error in known cases where it actually expect an error to happen. > As a rule, it seems that finding these device driver bugs was > very hard; we had some people work on these for months, and in > the case of the e1000, we managed to get Intel engineers to fly > out here and stare at PCI bus traces for a few days. (Thanks Intel!) > Ditto for Emulex. For ipr, we had inhouse people. > > So overall, PCI error detection did have the expected effect > (protecting the kernel from corruption, e.g. due to DMA's going > to wild addresses), but I don't think anybody expected that the > vast majority would be software/hardware bugs, instead of transient > effects. > > What's ironic in all of this is that by adding error recovery, > device driver bugs will be able to hide more effectively ... > if there's a pci bus error due to a driver bug, the pci card > will get rebooted, the kernel will burp for 3 seconds, and > things will keep going, and most sysadmins won't notice or > won't care. Yes, but it will be logged at least, so we'll spot a lot of these during our tests. Ben. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-03-19 1:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-18 17:24 PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery) Nguyen, Tom L 2005-03-18 18:10 ` Grant Grundler 2005-03-18 23:13 ` Benjamin Herrenschmidt 2005-03-19 0:35 ` Real-life pci errors (Was: " Linas Vepstas 2005-03-19 1:24 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox