* unclean yanking out of device?
@ 2004-01-14 22:00 linas
2004-01-14 22:15 ` Måns Rullgård
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: linas @ 2004-01-14 22:00 UTC (permalink / raw)
To: linux-hotplug
Hi,
What is supposed to happen if I just yank out a (network/scsi) device
while it is being used, without calling any of the hotplug unregister
remove etc. functions in advance?
Yes, I'm picking through the archives, and reading the source code,
so I hope to have a clue soon ... but this kind of question is not
covered in the hotplug FAQ, from what I could tell.
I'm interested in making sure that the device can be post-facto
cleaned up nicely, deallocating resources, etc. despite this
rather brusque treatment.
--linas
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
@ 2004-01-14 22:15 ` Måns Rullgård
2004-01-14 22:19 ` Greg KH
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Måns Rullgård @ 2004-01-14 22:15 UTC (permalink / raw)
To: linux-hotplug
linas@austin.ibm.com writes:
> What is supposed to happen if I just yank out a (network/scsi) device
> while it is being used, without calling any of the hotplug unregister
> remove etc. functions in advance?
I once pulled out a USB pccard with some things attached. The system
didn't like it at all. I can't remember exactly what happened.
--
Måns Rullgård
mru@kth.se
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
2004-01-14 22:15 ` Måns Rullgård
@ 2004-01-14 22:19 ` Greg KH
2004-01-14 23:08 ` linas
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-14 22:19 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 04:00:02PM -0600, linas@austin.ibm.com wrote:
>
> What is supposed to happen if I just yank out a (network/scsi) device
> while it is being used, without calling any of the hotplug unregister
> remove etc. functions in advance?
Depends on the driver, device type, subsystem type, and most
importantly, the kernel version (2.2?, 2.4?, 2.6?)
This is better asked on the linux-kernel mailing list, with a specific
and detailed question, as you have asked a very broad one.
thanks,
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
2004-01-14 22:15 ` Måns Rullgård
2004-01-14 22:19 ` Greg KH
@ 2004-01-14 23:08 ` linas
2004-01-14 23:24 ` Greg KH
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: linas @ 2004-01-14 23:08 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 02:19:21PM -0800, Greg KH wrote:
> On Wed, Jan 14, 2004 at 04:00:02PM -0600, linas@austin.ibm.com wrote:
> >
> > What is supposed to happen if I just yank out a (network/scsi) device
> > while it is being used, without calling any of the hotplug unregister
> > remove etc. functions in advance?
>
> Depends on the driver, device type, subsystem type, and most
> importantly, the kernel version (2.2?, 2.4?, 2.6?)
>
> This is better asked on the linux-kernel mailing list, with a specific
> and detailed question, as you have asked a very broad one.
Sorry, this was intentionally a broad question; I'm hoping to hash
out a broad answer here before I go lkml.
Let me rephrase: I want to make 'appropriate' changes to the 2.6/2.7
kernel, and some selected device drivers, so that a sysadmin can
stupidly yank out an *ordinary* PCI card (not a pcmcia) without doing
an orderly shutdown in advance.
More specifically, I have a whiz-bang PCI controller that will
shut down a PCI slot when it is hit by a cosmic ray (data/address
parity error, other error, etc.). I want to be able to "recover"
that slot, (if its recoverable), and get things going again.
I'm kind-of on day 2 of thinking about this problem, and as far
as I've gotten is thinking that some of the hot-plug infrastructure
might be useful in doing this. I've also noticed that include/linux/pci.h
struct pci_driver doesn't have a "remove_because_your_hardware's_already_gone"
callback, and it seems to me that this kind of callback would be needed.
So I'm sort of trolling to find out if this kind of a problem
has been discussed before, and for other interested parties
to discuss it with.
--linas
p.s. yes, I know the broad reprecussions something like this can have.
I have noticed, for instance, that if you unplug a hot scsi cable,
that reiserfs3 panic's, whereas ext3 seems to deal with it nicely.
I'm hoping to slowly start dealing with the broader implications ...
eventually.
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (2 preceding siblings ...)
2004-01-14 23:08 ` linas
@ 2004-01-14 23:24 ` Greg KH
2004-01-14 23:49 ` Paul Ionescu
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-14 23:24 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 05:08:49PM -0600, linas@austin.ibm.com wrote:
> On Wed, Jan 14, 2004 at 02:19:21PM -0800, Greg KH wrote:
> > On Wed, Jan 14, 2004 at 04:00:02PM -0600, linas@austin.ibm.com wrote:
> > >
> > > What is supposed to happen if I just yank out a (network/scsi) device
> > > while it is being used, without calling any of the hotplug unregister
> > > remove etc. functions in advance?
> >
> > Depends on the driver, device type, subsystem type, and most
> > importantly, the kernel version (2.2?, 2.4?, 2.6?)
> >
> > This is better asked on the linux-kernel mailing list, with a specific
> > and detailed question, as you have asked a very broad one.
>
> Sorry, this was intentionally a broad question; I'm hoping to hash
> out a broad answer here before I go lkml.
>
> Let me rephrase: I want to make 'appropriate' changes to the 2.6/2.7
> kernel, and some selected device drivers, so that a sysadmin can
> stupidly yank out an *ordinary* PCI card (not a pcmcia) without doing
> an orderly shutdown in advance.
Hahahaha... no, PCI drivers can not handle that. See the PCI Hotplug
spec for why. PCI drivers need to be told to shut down before removing
them. And that will not change, sorry.
> More specifically, I have a whiz-bang PCI controller that will
> shut down a PCI slot when it is hit by a cosmic ray (data/address
> parity error, other error, etc.). I want to be able to "recover"
> that slot, (if its recoverable), and get things going again.
That pci controller needs to tell the OS that it is going to shut down
that pci slot. Otherwise that pci controller violates the PCI spec.
Note, yes I know that cPCI controllers do this, that's a totally
different story... the os still needs to know before turning off a slot.
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (3 preceding siblings ...)
2004-01-14 23:24 ` Greg KH
@ 2004-01-14 23:49 ` Paul Ionescu
2004-01-14 23:57 ` linas
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Paul Ionescu @ 2004-01-14 23:49 UTC (permalink / raw)
To: linux-hotplug
Hi Linas,
I don't think you can do this to an ordinary PCI card.
But you may be doing this to a PCI express card (which has a serial bus,
and is hot swappable). This is if I understood right the specs of PCI
express. You can try to find out if is true or not.
On Thu, 2004-01-15 at 01:08, linas@austin.ibm.com wrote:
> Let me rephrase: I want to make 'appropriate' changes to the 2.6/2.7
> kernel, and some selected device drivers, so that a sysadmin can
> stupidly yank out an *ordinary* PCI card (not a pcmcia) without doing
> an orderly shutdown in advance.
>
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (4 preceding siblings ...)
2004-01-14 23:49 ` Paul Ionescu
@ 2004-01-14 23:57 ` linas
2004-01-15 0:19 ` Oliver Neukum
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: linas @ 2004-01-14 23:57 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 03:24:12PM -0800, Greg KH wrote:
> On Wed, Jan 14, 2004 at 05:08:49PM -0600, linas@austin.ibm.com wrote:
> > > >
> > > > What is supposed to happen if I just yank out a (network/scsi) device
> > > > while it is being used, without calling any of the hotplug unregister
> > > > remove etc. functions in advance?
> >
> > Let me rephrase: I want to make 'appropriate' changes to the 2.6/2.7
> > kernel, and some selected device drivers, so that a sysadmin can
> > stupidly yank out an *ordinary* PCI card (not a pcmcia) without doing
> > an orderly shutdown in advance.
>
> Hahahaha... no, PCI drivers can not handle that.
Right. I've been tasked with modifying several specific drivers to
handle that.
> See the PCI Hotplug
> spec for why.
You mean the spec that's avaialble from http://www.pcisig.com ?
or a linux-specific spec?
> PCI drivers need to be told to shut down before removing
> them. And that will not change, sorry.
Hmm. Let me point out that other operating systems already do this
(and have been doing so for a while). The chipset I'm working with
has been shipping for, I dunno, a couple of years or something like
that. They've already gone through a four or five engineering upgrades
on the feature set. So when you say you're sorry, did you mean
"its a bad idea, I will personally oppose it", or do you mean
"you'll have to implement all of this by your lonesome"?
> > More specifically, I have a whiz-bang PCI controller that will
> > shut down a PCI slot when it is hit by a cosmic ray (data/address
> > parity error, other error, etc.). I want to be able to "recover"
> > that slot, (if its recoverable), and get things going again.
>
> That pci controller needs to tell the OS that it is going to shut down
> that pci slot. Otherwise that pci controller violates the PCI spec.
I was only half-joking about the cosmic ray. The cosmic ray violated
the PCI spec. boo hoo. Now what? Tell it to go back to the supernova
it came from?
> Note, yes I know that cPCI controllers do this, that's a totally
> different story... the os still needs to know before turning off a slot.
This isn't a cPCI chipset. The point is, the hardware went kaboom,
the pci device is in some hopeless state, and further communication
with it is impossible. My goal is to clean up the kernel state and
move on.
Let me rephrase that: I could hack up some code that does this, but
my goal is really to do it in such a way so that the patches are
accepted by Torvalds, etal. Much of the code can be done in the
arch directories, but some of it needs to go into generic files,
and into specific device drivers, and so its my interest to go as
generic as possible before I have to retreat to a rats-nest of
special-case code. Which is why I bring this up now, rather than
simply announcing a patch that has already been implemented.
--linas
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (5 preceding siblings ...)
2004-01-14 23:57 ` linas
@ 2004-01-15 0:19 ` Oliver Neukum
2004-01-15 0:19 ` Greg KH
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Oliver Neukum @ 2004-01-15 0:19 UTC (permalink / raw)
To: linux-hotplug
> That pci controller needs to tell the OS that it is going to shut down
> that pci slot. Otherwise that pci controller violates the PCI spec.
It is illegal. That doesn't mean that Linux has a fundamental problem
dealing with it.
> Note, yes I know that cPCI controllers do this, that's a totally
> different story... the os still needs to know before turning off a slot.
Most notably, it is normal for CardBus, which is an example of how
to do such things.
Regards
Oliver
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (6 preceding siblings ...)
2004-01-15 0:19 ` Oliver Neukum
@ 2004-01-15 0:19 ` Greg KH
2004-01-15 0:34 ` Greg KH
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-15 0:19 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 05:57:48PM -0600, linas@austin.ibm.com wrote:
> On Wed, Jan 14, 2004 at 03:24:12PM -0800, Greg KH wrote:
> > On Wed, Jan 14, 2004 at 05:08:49PM -0600, linas@austin.ibm.com wrote:
> > > > >
> > > > > What is supposed to happen if I just yank out a (network/scsi) device
> > > > > while it is being used, without calling any of the hotplug unregister
> > > > > remove etc. functions in advance?
> > >
> > > Let me rephrase: I want to make 'appropriate' changes to the 2.6/2.7
> > > kernel, and some selected device drivers, so that a sysadmin can
> > > stupidly yank out an *ordinary* PCI card (not a pcmcia) without doing
> > > an orderly shutdown in advance.
> >
> > Hahahaha... no, PCI drivers can not handle that.
>
> Right. I've been tasked with modifying several specific drivers to
> handle that.
What kind of hardware base is this? (PPC64, ia64, etc.)?
What specific drivers?
> > See the PCI Hotplug
> > spec for why.
>
> You mean the spec that's avaialble from http://www.pcisig.com ?
> or a linux-specific spec?
The PCI SIG one. Linux follows that spec, just like other operating
systems...
> > PCI drivers need to be told to shut down before removing
> > them. And that will not change, sorry.
>
> Hmm. Let me point out that other operating systems already do this
> (and have been doing so for a while). The chipset I'm working with
> has been shipping for, I dunno, a couple of years or something like
> that.
What chipset?
> They've already gone through a four or five engineering upgrades
> on the feature set. So when you say you're sorry, did you mean
> "its a bad idea, I will personally oppose it", or do you mean
> "you'll have to implement all of this by your lonesome"?
I (as the Linux kernel PCI and PCI Hotplug maintainer) oppose it. :)
I think you just need to change the way you are thinking about this.
Linux already has a pci hotplug framework. Just implement your specific
pci controller into this framework. That way you will not have to
modify any specific PCI card device drivers. If you do this, then all
of the existing userspace tools will work for your hardware.
> > > More specifically, I have a whiz-bang PCI controller that will
> > > shut down a PCI slot when it is hit by a cosmic ray (data/address
> > > parity error, other error, etc.). I want to be able to "recover"
> > > that slot, (if its recoverable), and get things going again.
> >
> > That pci controller needs to tell the OS that it is going to shut down
> > that pci slot. Otherwise that pci controller violates the PCI spec.
>
> I was only half-joking about the cosmic ray. The cosmic ray violated
> the PCI spec. boo hoo. Now what? Tell it to go back to the supernova
> it came from?
No, tell the driver that it is going to be shut down. The driver will
do so, and then you can safely yank it out. That's what all pci hotplug
controller drivers do, why be different?
> > Note, yes I know that cPCI controllers do this, that's a totally
> > different story... the os still needs to know before turning off a slot.
>
> This isn't a cPCI chipset. The point is, the hardware went kaboom,
> the pci device is in some hopeless state, and further communication
> with it is impossible. My goal is to clean up the kernel state and
> move on.
>
> Let me rephrase that: I could hack up some code that does this, but
> my goal is really to do it in such a way so that the patches are
> accepted by Torvalds, etal. Much of the code can be done in the
> arch directories, but some of it needs to go into generic files,
> and into specific device drivers, and so its my interest to go as
> generic as possible before I have to retreat to a rats-nest of
> special-case code.
You should only have to implement a single, hardware specific driver for
this to work properly in Linux. See the existing pci hotplug controller
drivers in drivers/pci/hotplug and the drivers/pci/hotplug/pci_hotplug.h
for the interface you need to implement.
Hope this helps,
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (7 preceding siblings ...)
2004-01-15 0:19 ` Greg KH
@ 2004-01-15 0:34 ` Greg KH
2004-01-15 0:36 ` David Brownell
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-15 0:34 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 04:36:29PM -0800, David Brownell wrote:
> >>>That pci controller needs to tell the OS that it is going to shut down
> >>>that pci slot. Otherwise that pci controller violates the PCI spec.
> >>
> >>I was only half-joking about the cosmic ray. The cosmic ray violated
> >>the PCI spec. boo hoo. Now what? Tell it to go back to the supernova
> >>it came from?
> >
> >
> >No, tell the driver that it is going to be shut down. The driver will
> >do so, and then you can safely yank it out. That's what all pci hotplug
> >controller drivers do, why be different?
>
> Actually there's a good argument that every PCI driver for hardware that
> can be packaged onto CardBus _should_ handle those "unclean" shutdown
> modes ... and as Linus has observed, it's awfully convenient that most
> such cases also cause reads from those devices to return all-ones!
>
> The electrical details are of course a different issue. Cardbus and
> the various other kinds of PCI hotplug have different answers, so
> there's no universal guarantee that drivers will get notified first.
Yes, but even cardbus notifies the driver that the device is now gone,
which is the main point I was trying to make. We already have the
framework for this to work properly with all PCI devices and drivers,
let's not go try to make up a new one.
thanks,
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (8 preceding siblings ...)
2004-01-15 0:34 ` Greg KH
@ 2004-01-15 0:36 ` David Brownell
2004-01-15 1:16 ` linas
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: David Brownell @ 2004-01-15 0:36 UTC (permalink / raw)
To: linux-hotplug
>>>That pci controller needs to tell the OS that it is going to shut down
>>>that pci slot. Otherwise that pci controller violates the PCI spec.
>>
>>I was only half-joking about the cosmic ray. The cosmic ray violated
>>the PCI spec. boo hoo. Now what? Tell it to go back to the supernova
>>it came from?
>
>
> No, tell the driver that it is going to be shut down. The driver will
> do so, and then you can safely yank it out. That's what all pci hotplug
> controller drivers do, why be different?
Actually there's a good argument that every PCI driver for hardware that
can be packaged onto CardBus _should_ handle those "unclean" shutdown
modes ... and as Linus has observed, it's awfully convenient that most
such cases also cause reads from those devices to return all-ones!
The electrical details are of course a different issue. Cardbus and
the various other kinds of PCI hotplug have different answers, so
there's no universal guarantee that drivers will get notified first.
- Dave
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (9 preceding siblings ...)
2004-01-15 0:36 ` David Brownell
@ 2004-01-15 1:16 ` linas
2004-01-15 1:20 ` linas
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: linas @ 2004-01-15 1:16 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 04:19:39PM -0800, Greg KH wrote:
>
> What kind of hardware base is this? (PPC64, ia64, etc.)?
ppc64
> What specific drivers?
pcnet32 ethernet, some gigabit ethernet cards, some fibre-channel cards,
and the symbios-2 scsi driver.
> > Hmm. Let me point out that other operating systems already do this
> > (and have been doing so for a while). The chipset I'm working with
> > has been shipping for, I dunno, a couple of years or something like
> > that.
>
> What chipset?
The one in the ppc64 mainframes. It shows up in the ppc64 arch code
under the name of "phb", and its eeh.c that deals with configing the
support I'm talking about. Right now, eeh.c calls panic() when things
go kaboom, and my goal is to replace the panic with a graceful recovery.
> > They've already gone through a four or five engineering upgrades
> > on the feature set. So when you say you're sorry, did you mean
> > "its a bad idea, I will personally oppose it", or do you mean
> > "you'll have to implement all of this by your lonesome"?
>
> I (as the Linux kernel PCI and PCI Hotplug maintainer) oppose it. :)
oh boy. Well, its reassuring to know I'm talking to the right person.
> I think you just need to change the way you are thinking about this.
> Linux already has a pci hotplug framework. Just implement your specific
> pci controller into this framework. That way you will not have to
> modify any specific PCI card device drivers. If you do this, then all
> of the existing userspace tools will work for your hardware.
I really really wasn't joking about the cosmic ray. It hits. There
is an unrecoverable parity error. What do I do?
Lets assume the adapter is in the middle of a dma when this happens.
Do I pass the known-bad garbled data to the device driver, possibly
corupting the file system, etc? Suppose the bad data was something
that trashed a pointer; some million or billion cpu cycles later
the kernel oopses for some mystery reason. Instead of allowing
this kind of 'silent corruption' to occur, the current ppc64
code for this simply halts the system, it calls panic (in eeh.c),
and that's that.
Suppose the error occured while the driver was PIO'ing some status
word to the adapter. Is it safe to continue on, and assume the adpater
is in the state that the driver thinks its in? Again, millions of
cycles later, your adapter either hangs for some mystery reason,
or maybe you've silently corrupted some data, e.g. some financial
data, which is the kind of stuff that these kinds of machines handle.
The error might have been an address error, then what? I have
a pio/dma to a known-corrupted addresss. What do I do with it?
What the chipset currently does is to bar all further i/o to that
slot until is re-enabled again. I see two scenarios for recovering:
1) shut down the device driver, then re-enable i/o, then re-init the
device driver, or
2) tell the device driver to device-remove, then scurry off to
re-enable i/o for the duration of the remove, and then hope
the driver is able to successfully complete the remove (and
not garbage anything further in the process).
I'm thinking that 2) fits in your paradigm a whole lot better
than 1) does. It might be enough, I'd have to think about it
and consult a bit. But solution 1) really does make me sleep
better at night.
Here's why I say that: the failure may be one-shot (the cosmic ray)
or it may be repeatable: overheating, vibration, dust, humidity,
sysadmin accidentally dropped a screwdriver thing. If its the latter,
then re-enabling i/o is going to do nothing except make matters
worse.
This raises a third possibility:
3) keep i/o disabled, let the device driver keep going until it
realizes that the hardware is hung, and let it deal with that.
But that's kind of stupid, since I already *know* its hung.
I may as well tell the device driver about it.
So what I *really* want to do is to tell the device driver,
"your hardware is hung. Deal with it." This is pretty darn
indistinguishable from the "stupid sysadmin unplugged before
doing hotplug-remove" scenario. So I figured if I solve the
stupid-sysadmin scenario, then I get the vibration/dust/humidity
scenario for free.
Note that the machines currently in the field can support
thousands of pci slots, and that future machines will do even more.
So even if any one pci slot/adapter goes bad once every few years,
with a thousand slots, that works out to one or two failures a day.
If we panic'ed on each one, that would be a reboot a day.
(Actually, these machines do stay up longer than that, which
means that the failure rate per slot is a whole lot less than that,
but I think this illustrates the point).
--linas
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (10 preceding siblings ...)
2004-01-15 1:16 ` linas
@ 2004-01-15 1:20 ` linas
2004-01-15 1:33 ` Greg KH
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: linas @ 2004-01-15 1:20 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 04:34:04PM -0800, Greg KH wrote:
> On Wed, Jan 14, 2004 at 04:36:29PM -0800, David Brownell wrote:
> >
> > Actually there's a good argument that every PCI driver for hardware that
> > can be packaged onto CardBus _should_ handle those "unclean" shutdown
> > modes ... and as Linus has observed, it's awfully convenient that most
> > such cases also cause reads from those devices to return all-ones!
> >
> > The electrical details are of course a different issue. Cardbus and
> > the various other kinds of PCI hotplug have different answers, so
> > there's no universal guarantee that drivers will get notified first.
>
> Yes, but even cardbus notifies the driver that the device is now gone,
> which is the main point I was trying to make. We already have the
> framework for this to work properly with all PCI devices and drivers,
> let's not go try to make up a new one.
Oh, well, that was my question, then. struct pci_driver in
include/linux/pci.h has a (*remove)() but the way its described
makes it sound like its supposed to be called before, not after
the remove.
I *like* existing frameworks, I just haven't found it yet. This
is literally day-2 on this task for me.
--linas
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (11 preceding siblings ...)
2004-01-15 1:20 ` linas
@ 2004-01-15 1:33 ` Greg KH
2004-01-15 1:35 ` Greg KH
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-15 1:33 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 07:16:04PM -0600, linas@austin.ibm.com wrote:
> On Wed, Jan 14, 2004 at 04:19:39PM -0800, Greg KH wrote:
> >
> > What kind of hardware base is this? (PPC64, ia64, etc.)?
>
> ppc64
The ppc64 pci hotplug driver that is currently in the ppc64 cvs tree
should already work for your hardware, right? Or is this for iSeries
machines? Some other hardware?
Seriously, that driver, or another one like it is what you need to use
to tell the pci card that it needs to disable the device and unbind from
it, even if the slot power has for some reason been removed.
> > What specific drivers?
>
> pcnet32 ethernet, some gigabit ethernet cards, some fibre-channel cards,
> and the symbios-2 scsi driver.
None of these should need to be modified.
> > > Hmm. Let me point out that other operating systems already do this
> > > (and have been doing so for a while). The chipset I'm working with
> > > has been shipping for, I dunno, a couple of years or something like
> > > that.
> >
> > What chipset?
>
> The one in the ppc64 mainframes.
Which ppc64 mainframes exactly? iSeries? pSeries? Model numbers? I
seem to have access to quite a lot of different ppc hardware through my
employer :)
> It shows up in the ppc64 arch code
> under the name of "phb", and its eeh.c that deals with configing the
> support I'm talking about. Right now, eeh.c calls panic() when things
> go kaboom, and my goal is to replace the panic with a graceful recovery.
Ok, it shouldn't call panic(). No driver should ever call panic().
That's just bad coding...
> > I think you just need to change the way you are thinking about this.
> > Linux already has a pci hotplug framework. Just implement your specific
> > pci controller into this framework. That way you will not have to
> > modify any specific PCI card device drivers. If you do this, then all
> > of the existing userspace tools will work for your hardware.
>
> I really really wasn't joking about the cosmic ray. It hits. There
> is an unrecoverable parity error. What do I do?
Ah, ok, I've talked to Paul M. and others at IBM you are probably
working with a lot about this. They have tried to come up with a way
for a write or read to be notified that an error just occurred, right?
> Lets assume the adapter is in the middle of a dma when this happens.
> Do I pass the known-bad garbled data to the device driver, possibly
> corupting the file system, etc? Suppose the bad data was something
> that trashed a pointer; some million or billion cpu cycles later
> the kernel oopses for some mystery reason. Instead of allowing
> this kind of 'silent corruption' to occur, the current ppc64
> code for this simply halts the system, it calls panic (in eeh.c),
> and that's that.
Again, it should not do that. We need to have a way to notify that the
write or read failed. Work on fixing that, and then modify the driver
to gracefully recover is the proper sequence. If you really want to,
tell the pci slot controller that it now needs to disable that slot,
which will cause the driver to shut down the device.
I seem to remember the proposal was quite complex last time I saw it,
has it been cleaned up any since?
If you don't know of any such proposal, I suggest you look into it, as
you don't want to duplicate the work others are already doing.
thanks,
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (12 preceding siblings ...)
2004-01-15 1:33 ` Greg KH
@ 2004-01-15 1:35 ` Greg KH
2004-01-15 1:57 ` linas
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-15 1:35 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 07:20:57PM -0600, linas@austin.ibm.com wrote:
> I *like* existing frameworks, I just haven't found it yet.
remove() is what notifies the pci driver that the device is about to go
away. For cardbus devices, it can mean that the device has already gone
away. It is also the same function that is called by the pci hotplug
core to tell the driver to shut down.
thanks,
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (13 preceding siblings ...)
2004-01-15 1:35 ` Greg KH
@ 2004-01-15 1:57 ` linas
2004-01-15 17:17 ` Richard Troth
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: linas @ 2004-01-15 1:57 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 05:33:39PM -0800, Greg KH wrote:
>
> The ppc64 pci hotplug driver that is currently in the ppc64 cvs tree
> should already work for your hardware, right?
Will look into it.
> Which ppc64 mainframes exactly?
I think the phb is in just about everything with a power4 processor in it.
> Ah, ok, I've talked to Paul M. and others at IBM you are probably
Yeah, Paul just said "keep me informed about what you do" and that's it.
> working with a lot about this. They have tried to come up with a way
> for a write or read to be notified that an error just occurred, right?
Depends on what you mean by 'notify'. It is known, synchronously, when
error occured during PIO read, and so that case can be handled immediately.
DMA... not sure.
> Again, it should not do that. We need to have a way to notify that the
> write or read failed.
Well, notify who of what?
> Work on fixing that, and then modify the driver
> to gracefully recover is the proper sequence. If you really want to,
> tell the pci slot controller that it now needs to disable that slot,
> which will cause the driver to shut down the device.
Well, the actual sequence is the inverse: the pci slot controller
has already disabled the slot. Its already gone. Which is why
I was pursuing the "unclean remove" idea.
> I seem to remember the proposal was quite complex last time I saw it,
> has it been cleaned up any since?
!! I have seen no proposal at all, you're the first to claim more.
:( Any names?
> If you don't know of any such proposal, I suggest you look into it, as
> you don't want to duplicate the work others are already doing.
I thought I was handed everything by the last guy who worked on it,
who had no proposal at all. Other than a three-sentance description,
and we've sort-of moved beyond that already ...
I'm also a little confused by your reply, since you start by saying that
the ethernet/scsi device drivers don't need to be modified, and then
say that they do need to be modified to recover gracefully ... so which
is it?
--linas
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (14 preceding siblings ...)
2004-01-15 1:57 ` linas
@ 2004-01-15 17:17 ` Richard Troth
2004-01-15 18:38 ` linas
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Richard Troth @ 2004-01-15 17:17 UTC (permalink / raw)
To: linux-hotplug
> remove() is what notifies the pci driver that the device is about to go
> away. ...
(cardbus aside)
Perhaps we need a removed() entry point?
Something that explicitly means "has already gone"
rather than implying "is about to go".
-- R;
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (15 preceding siblings ...)
2004-01-15 17:17 ` Richard Troth
@ 2004-01-15 18:38 ` linas
2004-01-15 21:22 ` David Hinds
2004-01-24 0:38 ` Greg KH
18 siblings, 0 replies; 20+ messages in thread
From: linas @ 2004-01-15 18:38 UTC (permalink / raw)
To: linux-hotplug
On Thu, Jan 15, 2004 at 11:17:31AM -0600, Richard Troth wrote:
> > remove() is what notifies the pci driver that the device is about to go
> > away. ...
>
> (cardbus aside)
> Perhaps we need a removed() entry point?
> Something that explicitly means "has already gone"
> rather than implying "is about to go".
That's exactly what I was thinking.
This is based on the assumption that some device driver writers
really want a remove() that is different from been_removed().
Forcing a device driver writer to implement been_removed()
only (as pcmcia does) does not give them the chance to clean
up the way they might have wanted to if they could have assumed
the device was still 'live', e.g. clean up some lock on some
remote device (such as some iSCSI lock sitting on the disk drive
or something, i dunno).
--linas
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (16 preceding siblings ...)
2004-01-15 18:38 ` linas
@ 2004-01-15 21:22 ` David Hinds
2004-01-24 0:38 ` Greg KH
18 siblings, 0 replies; 20+ messages in thread
From: David Hinds @ 2004-01-15 21:22 UTC (permalink / raw)
To: linux-hotplug
On Thu, 15 Jan 2004 12:38:53 -0600, linas@austin.ibm.com wrote:
> On Thu, Jan 15, 2004 at 11:17:31AM -0600, Richard Troth wrote:
> > > remove() is what notifies the pci driver that the device is about to go
> > > away. ...
> >
> > (cardbus aside)
> > Perhaps we need a removed() entry point?
> > Something that explicitly means "has already gone"
> > rather than implying "is about to go".
>
> That's exactly what I was thinking.
>
> This is based on the assumption that some device driver writers
> really want a remove() that is different from been_removed().
>
> Forcing a device driver writer to implement been_removed()
> only (as pcmcia does) does not give them the chance to clean
> up the way they might have wanted to if they could have assumed
> the device was still 'live', e.g. clean up some lock on some
> remote device (such as some iSCSI lock sitting on the disk drive
> or something, i dunno).
PCMCIA actually does not implement a "been removed" event. It reports
the removal as soon as possible, which will be while the hardware is
still available, if the user requested the removal before ejecting the
hardware. (the removal request is actually reported as another event,
and a driver can choose to field these events, and accept/deny the
requests)
Separate entry points for clean and unclean removal are not useful,
because the clean removal case has to be hardened against the device
going away while the removal is in progress... and once you've done
that, you'd might as well call the clean removal code even when you
know the hardware is already gone.
-- Dave
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: unclean yanking out of device?
2004-01-14 22:00 unclean yanking out of device? linas
` (17 preceding siblings ...)
2004-01-15 21:22 ` David Hinds
@ 2004-01-24 0:38 ` Greg KH
18 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-01-24 0:38 UTC (permalink / raw)
To: linux-hotplug
On Wed, Jan 14, 2004 at 07:57:31PM -0600, linas@austin.ibm.com wrote:
>
> I'm also a little confused by your reply, since you start by saying that
> the ethernet/scsi device drivers don't need to be modified, and then
> say that they do need to be modified to recover gracefully ... so which
> is it?
Sorry, I originally thought you were talking about pci hotplug support.
Now I know differently. You will have to change you drivers when you
come up with whatever platform specific implementation you design to
handle this issue.
Good luck,
greg k-h
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2004-01-24 0:38 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-14 22:00 unclean yanking out of device? linas
2004-01-14 22:15 ` Måns Rullgård
2004-01-14 22:19 ` Greg KH
2004-01-14 23:08 ` linas
2004-01-14 23:24 ` Greg KH
2004-01-14 23:49 ` Paul Ionescu
2004-01-14 23:57 ` linas
2004-01-15 0:19 ` Oliver Neukum
2004-01-15 0:19 ` Greg KH
2004-01-15 0:34 ` Greg KH
2004-01-15 0:36 ` David Brownell
2004-01-15 1:16 ` linas
2004-01-15 1:20 ` linas
2004-01-15 1:33 ` Greg KH
2004-01-15 1:35 ` Greg KH
2004-01-15 1:57 ` linas
2004-01-15 17:17 ` Richard Troth
2004-01-15 18:38 ` linas
2004-01-15 21:22 ` David Hinds
2004-01-24 0:38 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).