From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jesse Barnes <jbarnes@engr.sgi.com>
Cc: linux-pci@atrey.karlin.mff.cuni.cz,
Matthew Wilcox <matthew@wil.cx>,
Linus Torvalds <torvalds@osdl.org>,
Jeff Garzik <jgarzik@pobox.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-ia64@vger.kernel.org, Linas Vepstas <linas@austin.ibm.com>,
"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Wed, 02 Mar 2005 09:23:47 +1100 [thread overview]
Message-ID: <1109715827.5679.44.camel@gaston> (raw)
In-Reply-To: <200503010910.29460.jbarnes@engr.sgi.com>
On Tue, 2005-03-01 at 09:10 -0800, Jesse Barnes wrote:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> > was an error on PCI bus 73. At this point, I think the architecture
> > error handler needs to call into the PCI subsystem and say "Hey, there
> > was an error, you deal with it".
> >
> > If we're lucky, we get all the information that allows us to figure
> > out which device it was (eg a destination address that matches a BAR),
> > then we could have a ->error method in the pci_driver that handles it.
> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.
> >
> > Does this make sense?
>
> This was my thought too last time we had this discussion. A completely
> asynchronous call is probably needed in addition to Hidetoshi's proposed API,
> since as you point out, the driver may not be running when an error occurs
> (e.g. in the case of a DMA error or more general bus problem). The async
> ->error callback could do a total reset of the card, or something along those
> lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could
> potentially deal with errors as they happen (probably in the case of PIO
> related errors), when the additional context may allow us to be smarter about
> recovery.
What I think we need is an async call that takes:
- an opaque blob with the error informations and accessors (see my
reply to Jeff)
- a slot state (slot isolated, slot has been reset, slot has IOs /DMA
enabled/disabled, must probably be a bitmask)
- a bit mask of possible actions the driver can request (nothing, reset
slot, re-enable IOs, ...)
I'm afraid tho that the combinatory explosion will make it difficult to
drivers to deal with the right thing. Maybe we can simplify the mecanism
to archs that can just 1) re-enable IOs, 2) reset slot.
Note that the reason to re-enable IOs before trying to reset the slot is
that some devices will allow you to extract diagnostic informations, and
it's always useful to gather as much informations as possible upon an
error of this type.
Ben.
next prev parent reply other threads:[~2005-03-01 22:26 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-01 8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 19:27 ` Linas Vepstas
2005-03-01 19:37 ` Linus Torvalds
2005-03-02 6:13 ` Hidetoshi Seto
2005-03-02 19:20 ` Linas Vepstas
2005-03-04 2:03 ` Hidetoshi Seto
2005-03-04 16:46 ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik
2005-03-01 16:49 ` Linus Torvalds
2005-03-01 16:59 ` Matthew Wilcox
2005-03-01 17:10 ` Jesse Barnes
2005-03-01 18:33 ` Linas Vepstas
2005-03-01 22:27 ` Benjamin Herrenschmidt
2005-03-02 20:02 ` Linas Vepstas
2005-03-02 22:46 ` Benjamin Herrenschmidt
2005-03-02 23:37 ` Linas Vepstas
2005-03-01 22:23 ` Benjamin Herrenschmidt [this message]
2005-03-02 3:13 ` Hidetoshi Seto
2005-03-04 13:54 ` Pavel Machek
2005-03-04 17:50 ` Jesse Barnes
2005-03-04 22:37 ` Benjamin Herrenschmidt
2005-03-04 22:57 ` Pavel Machek
2005-03-04 23:03 ` Benjamin Herrenschmidt
2005-03-04 23:18 ` Pavel Machek
2005-03-04 23:27 ` Benjamin Herrenschmidt
2005-03-02 2:28 ` Hidetoshi Seto
2005-03-02 17:44 ` Linas Vepstas
2005-03-02 18:03 ` linux-os
2005-03-02 22:40 ` Benjamin Herrenschmidt
2005-03-04 2:21 ` Hidetoshi Seto
2005-03-01 22:20 ` Benjamin Herrenschmidt
2005-03-02 18:22 ` Linas Vepstas
2005-03-02 18:41 ` Jesse Barnes
2005-03-02 19:46 ` Linas Vepstas
2005-03-02 22:43 ` Benjamin Herrenschmidt
2005-03-02 22:41 ` Benjamin Herrenschmidt
2005-03-02 23:30 ` Linas Vepstas
2005-03-02 23:40 ` Jesse Barnes
2005-03-01 19:17 ` Linas Vepstas
2005-03-01 22:15 ` Benjamin Herrenschmidt
2005-03-01 17:19 ` Andi Kleen
2005-03-01 18:08 ` Linus Torvalds
2005-03-01 18:45 ` Andi Kleen
2005-03-01 18:59 ` Linas Vepstas
2005-03-01 22:26 ` Benjamin Herrenschmidt
2005-03-01 22:24 ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1109715827.5679.44.camel@gaston \
--to=benh@kernel.crashing.org \
--cc=jbarnes@engr.sgi.com \
--cc=jgarzik@pobox.com \
--cc=linas@austin.ibm.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@atrey.karlin.mff.cuni.cz \
--cc=matthew@wil.cx \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox