From: Linas Vepstas <linas@austin.ibm.com>
To: Jesse Barnes <jbarnes@engr.sgi.com>
Cc: linux-pci@atrey.karlin.mff.cuni.cz,
Matthew Wilcox <matthew@wil.cx>,
Linus Torvalds <torvalds@osdl.org>,
Jeff Garzik <jgarzik@pobox.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-ia64@vger.kernel.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Tue, 01 Mar 2005 18:33:33 +0000 [thread overview]
Message-ID: <20050301183333.GB1220@austin.ibm.com> (raw)
In-Reply-To: <200503010910.29460.jbarnes@engr.sgi.com>
On Tue, Mar 01, 2005 at 09:10:29AM -0800, Jesse Barnes was heard to remark:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc).
I assume "MCA" stands for machine check architecture .. except that
is not how it currently works, at least not on the systems I work with.
The PCI bridge chips on many IBM ppc64 pSereies boxes can detect
PCI errors and handle them "cleanly", without causing machine checks;
so can some PCI-Express chips (Seto is the expert on PCI-Express,
I'm not).
On ppc64, after a PCI error, the pci slot is "isolated": all i/o
to and from the device is cut off (including dma). I/O reads return
all 0xff's (which is what an empty pci/pcmcia slot returns).
There are three low-level firmware api's:
-- ask if a slot is "isolated" (returns yes/no)
-- reset the pci card (assert the #RST pci signal)
-- un-isolate the pci slot
The current ppc64 code doesn't use Seto's API, but it could, that
is the direction I'm moving towards.
I don't know if PCI-Express "isolates" the slot; Seto, can you provide
an overview of what the PCI-Express spec says?
> > If we're lucky, we get all the information that allows us to figure
> > out which device it was (eg a destination address that matches a BAR),
Seto's API allows drivers to find out is thier PCI slot is isolated.
So it works for me.
> > then we could have a ->error method in the pci_driver that handles it.
> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.
The tricky part is what to do with multi-function cards/slots
(e.g. a pci bus error on a bridge that has multiple devices under it).
In this case, multiple device drivers are affected. Thus, no single
device driver can handle the recovery of the bridge chip.
The current proposal (and prototype) has a "master recovery thread"
to handle the coordinated reset of the pci controller. This master
recovery thyread makes three calls in struct pci_driver:
void (*frozen) (struct pci_dev *); /* called when dev is first frozen */
void (*thawed) (struct pci_dev *); /* called after card is reset */
void (*perm_failure) (struct pci_dev *); /* called if card is dead */
The master recovery thread runs in the kernel. Earlier suggestions said
"run it in user space, use pci hotplug, use udev, etc." However, if
you get a pci error on a scsi card, you can't shell script
"umount /dev/sdX; rmmod scsi; clear_pci_error; insmod scsi; mount /dev/sdX"
beacuse you can't umount an open filesystem, and you can't really close
it (I fiddled with prototyping some of this, but its ugly and painful
and bizarre and outside my area of expertise :)
FWIW, the current prototype tries to do a pci hotplug if the above
routines aren't implemented in struct pci_driver. It can recover
from pci errors on ethernet cards, and I have one scsi driver that
successfully recovers with above API, and am working on adding recovery
to the symbios driver.
--linas
next prev parent reply other threads:[~2005-03-01 18:33 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-01 8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 19:27 ` Linas Vepstas
2005-03-01 19:37 ` Linus Torvalds
2005-03-02 6:13 ` Hidetoshi Seto
2005-03-02 19:20 ` Linas Vepstas
2005-03-04 2:03 ` Hidetoshi Seto
2005-03-04 16:46 ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik
2005-03-01 16:49 ` Linus Torvalds
2005-03-01 16:59 ` Matthew Wilcox
2005-03-01 17:10 ` Jesse Barnes
2005-03-01 18:33 ` Linas Vepstas [this message]
2005-03-01 22:27 ` Benjamin Herrenschmidt
2005-03-02 20:02 ` Linas Vepstas
2005-03-02 22:46 ` Benjamin Herrenschmidt
2005-03-02 23:37 ` Linas Vepstas
2005-03-01 22:23 ` Benjamin Herrenschmidt
2005-03-02 3:13 ` Hidetoshi Seto
2005-03-04 13:54 ` Pavel Machek
2005-03-04 17:50 ` Jesse Barnes
2005-03-04 22:37 ` Benjamin Herrenschmidt
2005-03-04 22:57 ` Pavel Machek
2005-03-04 23:03 ` Benjamin Herrenschmidt
2005-03-04 23:18 ` Pavel Machek
2005-03-04 23:27 ` Benjamin Herrenschmidt
2005-03-02 2:28 ` Hidetoshi Seto
2005-03-02 17:44 ` Linas Vepstas
2005-03-02 18:03 ` linux-os
2005-03-02 22:40 ` Benjamin Herrenschmidt
2005-03-04 2:21 ` Hidetoshi Seto
2005-03-01 22:20 ` Benjamin Herrenschmidt
2005-03-02 18:22 ` Linas Vepstas
2005-03-02 18:41 ` Jesse Barnes
2005-03-02 19:46 ` Linas Vepstas
2005-03-02 22:43 ` Benjamin Herrenschmidt
2005-03-02 22:41 ` Benjamin Herrenschmidt
2005-03-02 23:30 ` Linas Vepstas
2005-03-02 23:40 ` Jesse Barnes
2005-03-01 19:17 ` Linas Vepstas
2005-03-01 22:15 ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050301183333.GB1220@austin.ibm.com \
--to=linas@austin.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=jbarnes@engr.sgi.com \
--cc=jgarzik@pobox.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@atrey.karlin.mff.cuni.cz \
--cc=matthew@wil.cx \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox