public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Linas Vepstas <linas@austin.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: John Rose <johnrose@austin.ibm.com>,
	benh@kernel.crashing.org, akpm@osdl.org, Greg KH <greg@kroah.com>,
	linux-kernel@vger.kernel.org, linuxppc64-dev@ozlabs.org,
	linux-pci@atrey.karlin.mff.cuni.cz
Subject: Re: [patch 8/8] PCI Error Recovery: PPC64 core recovery routines
Date: Thu, 25 Aug 2005 11:13:25 -0500	[thread overview]
Message-ID: <20050825161325.GG25174@austin.ibm.com> (raw)
In-Reply-To: <17165.3205.505386.187453@cargo.ozlabs.ibm.com>

On Thu, Aug 25, 2005 at 10:10:45AM +1000, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
> 
> > The meta-issue that I'd like to reach consensus on first is whether
> > there should be any hot-plug recovery attempted at all.  Removing
> > hot-plug-recovery support will make many of the issues you raise 
> > to be moot.
> 
> Yes, this probably the thorniest issue we have.
> 
> My feeling is that the unplug half of it is probably fairly
> uncontroversial, but the replug half is a can of worms.  Would you
> agree with that?

Actually, no.  There are three issues:
1) hotplug routines are called from within kernel. GregKH has stated on
   multiple occasions that doing this is wrong/bad/evil. This includes
   calling hot-unplug.

2) As a result, the code to call hot-unplug is a bit messy. In
   particular, there's a bit of hoop-jumping when hotplug is built as
   as a module (and said hoops were wrecked recently when I moved the
   code around, out of the rpaphp directory).

3) Hot-unplug causes scripts to run in user-space. There is no way to 
   know when these scripts are done, so its not clear if we've waited
   long enough before calling hot-add (or if waiting is even necessary).

> Is it udev that handles the hotplug notifications on the userspace
> side in all cases (do both RHEL and SLES use udev, for instance)?

Yes, and it seems to work fine despite the fact that the current 
sles9/rhel4 use rather oooooold versions of udev, which is criminal,
according to Kay Seivers.  I have not tested new versions of udev; 
I assume new versions will work "even better".

> How hard is it to add a new sort of notification, on the kernel side
> and in udev?

Why? To acheive what goal?  (Keep in mind that user-space eeh solutions
seem to fail when the affected device is storage, since block devices
and filesystems don't like the underlying storage to wink out.)

> I think what I'd like to see is that when a slot gets isolated and the
> driver doesn't have recovery code, the kernel calls the driver's
> unplug function and generates a hotplug event to udev.  Ideally this
> would be a variant of the remove event which would say "and by the
> way, please try replugging this slot when you've finished handling the
> remove event" or something along those lines.

Ahh, yes, this addresses the timing issue raised in point 3).  However,
I'm thinking that the timing issue is not really an issue, depending on
how udev is designed.  For example, consider a 100% cpu loaded, heavy
i/o loaded PC, and now we rapidly unplug and replug some USB key.
Presumably, udev will handle this gracefully.  The kernel error recovery
essentially looks the same to udev: the burp on the bus looks like a
rapid-fire unplug-replug.

BTW, zSeries has similar concerns, where channels can come and go,
causing thousands of unplug/replug udev events in rapid succession. 
(This was discussed on the udev mailing lists in the past).  It might
be interesting to have the zSeries folks discuss the current EEH design,
as this is something they have far more experience with than any of 
the pc or unix crowd.  I personally have not discussed with any zSeries 
people.

--linas


  parent reply	other threads:[~2005-08-25 16:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20050823231817.829359000@bilge>
2005-08-23 23:35 ` [patch 0/8] PCI Error Recovery patchset Linas Vepstas
     [not found] ` <20050823232140.337320000@bilge>
2005-08-23 23:39   ` [patch 2/8] PCI Error Recovery: header file patch Linas Vepstas
     [not found] ` <20050823232140.520090000@bilge>
2005-08-23 23:41   ` [patch 3/8] PCI Error Recovery: IPR SCSI device driver Linas Vepstas
     [not found] ` <20050823232140.903067000@bilge>
2005-08-23 23:43   ` [patch 4/8] PCI Error Recovery: Symbios " Linas Vepstas
     [not found] ` <20050823232141.286102000@bilge>
2005-08-23 23:45   ` [patch 5/8] PCI Error Recovery: e100 network " Linas Vepstas
     [not found] ` <20050823232141.925586000@bilge>
2005-08-23 23:46   ` [patch 6/8] PCI Error Recovery: e1000 " Linas Vepstas
     [not found] ` <20050823232142.651390000@bilge>
2005-08-23 23:47   ` [patch 7/8] PCI Error Recovery: ixgb " Linas Vepstas
     [not found] ` <20050823232143.003048000@bilge>
2005-08-23 23:47   ` [patch 8/8] PCI Error Recovery: PPC64 core recovery routines Linas Vepstas
2005-08-24  0:43     ` Paul Mackerras
2005-08-24  4:49       ` Paul Mackerras
2005-08-24 15:45     ` John Rose
2005-08-24 16:29       ` Linas Vepstas
2005-08-25  0:10         ` Paul Mackerras
2005-08-25  0:49           ` Benjamin Herrenschmidt
2005-08-25 16:21             ` Linas Vepstas
2005-08-25 21:43               ` Benjamin Herrenschmidt
2005-08-25 23:18                 ` Paul Mackerras
2005-08-25 23:37                   ` Benjamin Herrenschmidt
2005-08-29 16:00                     ` Linas Vepstas
2005-08-29 15:57                 ` Linas Vepstas
2005-08-25 16:13           ` Linas Vepstas [this message]
2005-08-29  6:40             ` Paul Mackerras
2005-08-29 16:09               ` Linas Vepstas
2005-08-30  4:44                 ` Paul Mackerras
2005-08-30 22:33                   ` John Rose
2005-08-29 20:26               ` John Rose
2005-08-29 20:31                 ` John Rose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050825161325.GG25174@austin.ibm.com \
    --to=linas@austin.ibm.com \
    --cc=akpm@osdl.org \
    --cc=benh@kernel.crashing.org \
    --cc=greg@kroah.com \
    --cc=johnrose@austin.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@atrey.karlin.mff.cuni.cz \
    --cc=linuxppc64-dev@ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox