From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-pci@atrey.karlin.mff.cuni.cz, linux-ia64@vger.kernel.org,
Linus Torvalds <torvalds@osdl.org>,
Linas Vepstas <linas@austin.ibm.com>,
"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Wed, 02 Mar 2005 09:15:40 +1100 [thread overview]
Message-ID: <1109715340.5680.34.camel@gaston> (raw)
In-Reply-To: <42249A44.4020507@pobox.com>
> I have been thinking about PCI system and parity errors, and how to
> handle them. I do not think this is the correct approach.
>
> A simple retry is... too simple. If you are having a massive problem on
> your PCI bus, more action should be taken than a retry.
It goes beyond that, see below.
> In my opinion each driver needs to be aware of PCI sys/parity errs, and
> handle them. For network drivers, this is rather simple -- check the
> hardware, then restart the DMA engine. Possibly turning off
> TSO/checksum to guarantee that bad packets are not accepted. For SATA
> and SCSI drivers, this is more complex, as one must retry a number of
> queued disk commands, after resetting the hardware.
>
> A new API handles none of this.
On IBM pSeries machine (and I'm trying to figure out an API to deal with
that generically for drivers), upon a PCI error (either MMIO error or
DMA error), the slot is put in isolation automatically.
>From this point, we can instruct the firmware to 1) re-enable MMIO, 2)
re-enable DMA, 3) proceed to a slot reset and re-enable MMIO & DMA.
That allows all sort of recovery strategies. However, obviously, not all
architectures provide those facilities.
So I'm looking into a way to expose a generic API to drivers that would
allow them to use those facilities when present, and/or fallback to
whatever they can do when not (or just retry or even no recovery).
I have some ideas, but am not fully happy with them yet. But part of the
problem is the notification of the driver.
Checking IOs is one thing, what to do once a failure is detected is
another. Also, we need asynchronous notification, since a driver may
well be idle, not doing any IO, while the bus segment on which it's
sitting is getting isolated because another card on the same segment (or
another function on the same card) triggered an error.
Then, we need at least several back-and-forth callbacks. I'm thinking
about an additional callback in pci_driver() with a message and a state
indicating what happened, and returning wether to proceed or not, I'll
try to write down the details in a later email.
Another issue finally is the type of error informations. Various systems
may provide various details, like some systems, upon a DMA error, can
provide you with the actual address that faulted. Those infos can be
very useful for diagnosing the issue (since some errors are actual bugs,
for example, we spent a lot of time chasing issues with e1000 vs.
barriers). An "error cookie" is I think a good idea, with eventually
various accessors to extract data from it, and maybe a function to dump
the content in ascii form in some buffer...
Ben.
next prev parent reply other threads:[~2005-03-01 22:18 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-01 8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 19:27 ` Linas Vepstas
2005-03-01 19:37 ` Linus Torvalds
2005-03-02 6:13 ` Hidetoshi Seto
2005-03-02 19:20 ` Linas Vepstas
2005-03-04 2:03 ` Hidetoshi Seto
2005-03-04 16:46 ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik
2005-03-01 16:49 ` Linus Torvalds
2005-03-01 16:59 ` Matthew Wilcox
2005-03-01 17:10 ` Jesse Barnes
2005-03-01 18:33 ` Linas Vepstas
2005-03-01 22:27 ` Benjamin Herrenschmidt
2005-03-02 20:02 ` Linas Vepstas
2005-03-02 22:46 ` Benjamin Herrenschmidt
2005-03-02 23:37 ` Linas Vepstas
2005-03-01 22:23 ` Benjamin Herrenschmidt
2005-03-02 3:13 ` Hidetoshi Seto
2005-03-04 13:54 ` Pavel Machek
2005-03-04 17:50 ` Jesse Barnes
2005-03-04 22:37 ` Benjamin Herrenschmidt
2005-03-04 22:57 ` Pavel Machek
2005-03-04 23:03 ` Benjamin Herrenschmidt
2005-03-04 23:18 ` Pavel Machek
2005-03-04 23:27 ` Benjamin Herrenschmidt
2005-03-02 2:28 ` Hidetoshi Seto
2005-03-02 17:44 ` Linas Vepstas
2005-03-02 18:03 ` linux-os
2005-03-02 22:40 ` Benjamin Herrenschmidt
2005-03-04 2:21 ` Hidetoshi Seto
2005-03-01 22:20 ` Benjamin Herrenschmidt
2005-03-02 18:22 ` Linas Vepstas
2005-03-02 18:41 ` Jesse Barnes
2005-03-02 19:46 ` Linas Vepstas
2005-03-02 22:43 ` Benjamin Herrenschmidt
2005-03-02 22:41 ` Benjamin Herrenschmidt
2005-03-02 23:30 ` Linas Vepstas
2005-03-02 23:40 ` Jesse Barnes
2005-03-01 19:17 ` Linas Vepstas
2005-03-01 22:15 ` Benjamin Herrenschmidt [this message]
2005-03-01 17:19 ` Andi Kleen
2005-03-01 18:08 ` Linus Torvalds
2005-03-01 18:45 ` Andi Kleen
2005-03-01 18:59 ` Linas Vepstas
2005-03-01 22:26 ` Benjamin Herrenschmidt
2005-03-01 22:24 ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1109715340.5680.34.camel@gaston \
--to=benh@kernel.crashing.org \
--cc=jgarzik@pobox.com \
--cc=linas@austin.ibm.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@atrey.karlin.mff.cuni.cz \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox