linux-hotplug.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How does Linux handle PCI-E Surprise unplug?
@ 2010-02-18  6:17 Rajat Jain
  2010-02-18 14:52 ` Greg KH
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Rajat Jain @ 2010-02-18  6:17 UTC (permalink / raw)
  To: linux-hotplug

Hi,

I'm keen to understand how the Linux kernel handles surprise removal of
a device, from a PCI-e slot that supports "Hot-plug Surprise" removal
(in slot capabilities). 

Consider that the device in the slot is working normally, with its
driver attached to the device, and is doing all sorts of read / write
operations on the device registers that have been mapped into the PCI
memory space. Now when that device is suddenly plugged out (and thus its
registers suddenly disappear from the PCI memory space), the device
driver is still unaware as it is doing the register reads / writes on
the device. At this point, IMHO any attempt to access the device
registers will result in an exception (BUS error?) as the device is
gone. Correct?

From what I understood from the "pciehp.ko" code, it seems that on
detecting that a device has been plugged out, pciehp_disable_slot() is
called only in workqueue context (that will happen later). This would
lead to removal of the device from the PCI hierarchy in the kernel data
structures, and ultimately the driver will also be detached. 

But my question is that it may be too late for all of the above to
happen, and the driver code may still get a chance to run and continue
execution from where it was interrupted (accessing device registers) due
to surprise removal interrupt. And this is bound to break things. So
what am I missing here?

Thanks,

Rajat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
@ 2010-02-18 14:52 ` Greg KH
  2010-02-19  4:13 ` Rajat Jain
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2010-02-18 14:52 UTC (permalink / raw)
  To: linux-hotplug

On Thu, Feb 18, 2010 at 11:35:33AM +0530, Rajat Jain wrote:
> Hi,
> 
> I'm keen to understand how the Linux kernel handles surprise removal of
> a device, from a PCI-e slot that supports "Hot-plug Surprise" removal
> (in slot capabilities). 
> 
> Consider that the device in the slot is working normally, with its
> driver attached to the device, and is doing all sorts of read / write
> operations on the device registers that have been mapped into the PCI
> memory space. Now when that device is suddenly plugged out (and thus its
> registers suddenly disappear from the PCI memory space), the device
> driver is still unaware as it is doing the register reads / writes on
> the device. At this point, IMHO any attempt to access the device
> registers will result in an exception (BUS error?) as the device is
> gone. Correct?

The driver will suddenly start reading all 0xff and will then need to
abort whatever it was doing.  Usually all drivers handle this just fine
today, as this is what they needed to do when they were pccard devices.
Nothing new here at all.

> From what I understood from the "pciehp.ko" code, it seems that on
> detecting that a device has been plugged out, pciehp_disable_slot() is
> called only in workqueue context (that will happen later). This would
> lead to removal of the device from the PCI hierarchy in the kernel data
> structures, and ultimately the driver will also be detached. 

Yes.

> But my question is that it may be too late for all of the above to
> happen, and the driver code may still get a chance to run and continue
> execution from where it was interrupted (accessing device registers) due
> to surprise removal interrupt. And this is bound to break things. So
> what am I missing here?

Nothing, the driver individually needs to handle the fact that it might
at any time, start getting invalid data.  If it doesn't, it needs to be
fixed.  Do you have a driver that does not handle this properly?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
  2010-02-18 14:52 ` Greg KH
@ 2010-02-19  4:13 ` Rajat Jain
  2010-02-19  4:27 ` Greg KH
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Rajat Jain @ 2010-02-19  4:13 UTC (permalink / raw)
  To: linux-hotplug

Hello Greg,

> > Consider that the device in the slot is working normally, with its
> > driver attached to the device, and is doing all sorts of read /
write
> > operations on the device registers that have been mapped into the
PCI
> > memory space. Now when that device is suddenly plugged out (and thus
its
> > registers suddenly disappear from the PCI memory space), the device
> > driver is still unaware as it is doing the register reads / writes
on
> > the device. At this point, IMHO any attempt to access the device
> > registers will result in an exception (BUS error?) as the device is
> > gone. Correct?
> 
> The driver will suddenly start reading all 0xff and will then need to
> abort whatever it was doing.  Usually all drivers handle this just
fine
> today, as this is what they needed to do when they were pccard
devices.
> Nothing new here at all.

It is fairly common for the drivers to have such code:

Val1 = ioread32(reg1);
Val2 = ioread32(reg2);
Val3 = ioread32(reg3);
Val4 = ioread32(reg4);

Do you mean the above code is wrong and it should be re-written as:

If ((Val1 = ioread32(reg1)) = 0xFFFFFFFF)
	/* Abort */
If ((Val2 = ioread32(reg2)) = 0xFFFFFFFF)
	/* Abort */
Etc ....

Checking for 0xFFFFFFFF at every read is a pain, don't you think so? And
more over, what is a register ACTUALLY contains the value 0xFFFFFFFF?
How do we differentiate this with the case when the device has been
plugged out?

Finally, how do we re-write the following code to handle this correctly?

iowrite32(val1, reg1);
iowrite32(val2, reg2);
iowrite32(val3, reg3);
iowrite32(val4, reg4);

Thanks,

Rajat Jain

> 
> > From what I understood from the "pciehp.ko" code, it seems that on
> > detecting that a device has been plugged out, pciehp_disable_slot()
is
> > called only in workqueue context (that will happen later). This
would
> > lead to removal of the device from the PCI hierarchy in the kernel
data
> > structures, and ultimately the driver will also be detached.
> 
> Yes.
> 
> > But my question is that it may be too late for all of the above to
> > happen, and the driver code may still get a chance to run and
continue
> > execution from where it was interrupted (accessing device registers)
due
> > to surprise removal interrupt. And this is bound to break things. So
> > what am I missing here?
> 
> Nothing, the driver individually needs to handle the fact that it
might
> at any time, start getting invalid data.  If it doesn't, it needs to
be
> fixed.  Do you have a driver that does not handle this properly?
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
  2010-02-18 14:52 ` Greg KH
  2010-02-19  4:13 ` Rajat Jain
@ 2010-02-19  4:27 ` Greg KH
  2010-02-19  6:41 ` Hidetoshi Seto
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2010-02-19  4:27 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Feb 19, 2010 at 09:31:46AM +0530, Rajat Jain wrote:
> > The driver will suddenly start reading all 0xff and will then need to
> > abort whatever it was doing.  Usually all drivers handle this just
> fine
> > today, as this is what they needed to do when they were pccard
> devices.
> > Nothing new here at all.
> 
> It is fairly common for the drivers to have such code:
> 
> Val1 = ioread32(reg1);
> Val2 = ioread32(reg2);
> Val3 = ioread32(reg3);
> Val4 = ioread32(reg4);
> 
> Do you mean the above code is wrong and it should be re-written as:
> 
> If ((Val1 = ioread32(reg1)) = 0xFFFFFFFF)
> 	/* Abort */
> If ((Val2 = ioread32(reg2)) = 0xFFFFFFFF)
> 	/* Abort */
> Etc ....

No, they can check the last one, or something every once in a while.

> Checking for 0xFFFFFFFF at every read is a pain, don't you think so? And
> more over, what is a register ACTUALLY contains the value 0xFFFFFFFF?
> How do we differentiate this with the case when the device has been
> plugged out?

Test a value that you know will not be this one.

> Finally, how do we re-write the following code to handle this correctly?
> 
> iowrite32(val1, reg1);
> iowrite32(val2, reg2);
> iowrite32(val3, reg3);
> iowrite32(val4, reg4);

You wait until you do a read :)

Seriously, look at the existing drivers in the kernel, they should all
handle this just fine.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
                   ` (2 preceding siblings ...)
  2010-02-19  4:27 ` Greg KH
@ 2010-02-19  6:41 ` Hidetoshi Seto
  2010-03-08  7:23 ` Rajat Jain
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Hidetoshi Seto @ 2010-02-19  6:41 UTC (permalink / raw)
  To: linux-hotplug

(2010/02/19 13:01), Rajat Jain wrote:
> 
> It is fairly common for the drivers to have such code:
> 
> Val1 = ioread32(reg1);
> Val2 = ioread32(reg2);
> Val3 = ioread32(reg3);
> Val4 = ioread32(reg4);
> 
> Do you mean the above code is wrong and it should be re-written as:
> 
> If ((Val1 = ioread32(reg1)) = 0xFFFFFFFF)
> 	/* Abort */
> If ((Val2 = ioread32(reg2)) = 0xFFFFFFFF)
> 	/* Abort */
> Etc ....
> 
> Checking for 0xFFFFFFFF at every read is a pain, don't you think so?

... Dejavu?

http://marc.info/?l=linux-kernel&m\x108125011020312
  [RFC] readX_check() - Interface for PCI-X error recovery
  Date: 2004-04-06 11:04:49

> And
> more over, what is a register ACTUALLY contains the value 0xFFFFFFFF?
> How do we differentiate this with the case when the device has been
> plugged out?

An example I know is:
[arch/powerpc/include/asm/eeh.h]
  static inline u8 eeh_readb(const volatile void __iomem *addr)
  {
	u8 val = in_8(addr);
	if (EEH_POSSIBLE_ERROR(val, u8))
		return eeh_check_failure(addr, val);
	return val;
  }

> 
> Finally, how do we re-write the following code to handle this correctly?
> 
> iowrite32(val1, reg1);
> iowrite32(val2, reg2);
> iowrite32(val3, reg3);
> iowrite32(val4, reg4);

One answer is as already posted by Greg, "read it."

If you made a request by writing some data, I think you will wait
a response from the device, with setting some reasonable timeout.
Soon in some form you will get a message like "success", "retry" or
"failed", or nothing if timeout.  Then you can report it to userland
and/or start next conversation with the device.


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
                   ` (3 preceding siblings ...)
  2010-02-19  6:41 ` Hidetoshi Seto
@ 2010-03-08  7:23 ` Rajat Jain
  2010-03-08  8:24 ` Kenji Kaneshige
  2010-03-08 22:49 ` Grant Grundler
  6 siblings, 0 replies; 8+ messages in thread
From: Rajat Jain @ 2010-03-08  7:23 UTC (permalink / raw)
  To: linux-hotplug


Hello Greg / All,

Replying on this thread as I have a small query left regarding this...

> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com]
> Sent: Thursday, February 18, 2010 8:23 PM
> To: Rajat Jain
> Cc: linux-hotplug@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: How does Linux handle PCI-E Surprise unplug?
> 
> On Thu, Feb 18, 2010 at 11:35:33AM +0530, Rajat Jain wrote:
> > Hi,
> >
> > I'm keen to understand how the Linux kernel handles surprise removal
of
> > a device, from a PCI-e slot that supports "Hot-plug Surprise"
removal
> > (in slot capabilities).
> >
> > Consider that the device in the slot is working normally, with its
> > driver attached to the device, and is doing all sorts of read /
write
> > operations on the device registers that have been mapped into the
PCI
> > memory space. Now when that device is suddenly plugged out (and thus
its
> > registers suddenly disappear from the PCI memory space), the device
> > driver is still unaware as it is doing the register reads / writes
on
> > the device. At this point, IMHO any attempt to access the device
> > registers will result in an exception (BUS error?) as the device is
> > gone. Correct?
> 
> The driver will suddenly start reading all 0xff and will then need to
> abort whatever it was doing.  Usually all drivers handle this just
fine
> today, as this is what they needed to do when they were pccard
devices.
> Nothing new here at all.
> 

Does that mean accessing PCI memory mapped registers for a non-existent
device has effects that are localized ONLY to the driver of that device?
In other words, kernel does not notice or does not even become AWARE
that a driver is trying to access non-existent memory mapped registers? 

So trying to access a HW that does not exist is totally legal in that
sense? Is HW NOT supposed to generate an error in this case - as far as
I had read, any access to non existent physical addresses should result
in a bus error ... no?

Thanks,

Rajat Jain

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
                   ` (4 preceding siblings ...)
  2010-03-08  7:23 ` Rajat Jain
@ 2010-03-08  8:24 ` Kenji Kaneshige
  2010-03-08 22:49 ` Grant Grundler
  6 siblings, 0 replies; 8+ messages in thread
From: Kenji Kaneshige @ 2010-03-08  8:24 UTC (permalink / raw)
  To: linux-hotplug

Rajat Jain wrote:
> Hello Greg / All,
> 
> Replying on this thread as I have a small query left regarding this...
> 
>> -----Original Message-----
>> From: Greg KH [mailto:greg@kroah.com]
>> Sent: Thursday, February 18, 2010 8:23 PM
>> To: Rajat Jain
>> Cc: linux-hotplug@vger.kernel.org; linux-pci@vger.kernel.org
>> Subject: Re: How does Linux handle PCI-E Surprise unplug?
>>
>> On Thu, Feb 18, 2010 at 11:35:33AM +0530, Rajat Jain wrote:
>>> Hi,
>>>
>>> I'm keen to understand how the Linux kernel handles surprise removal
> of
>>> a device, from a PCI-e slot that supports "Hot-plug Surprise"
> removal
>>> (in slot capabilities).
>>>
>>> Consider that the device in the slot is working normally, with its
>>> driver attached to the device, and is doing all sorts of read /
> write
>>> operations on the device registers that have been mapped into the
> PCI
>>> memory space. Now when that device is suddenly plugged out (and thus
> its
>>> registers suddenly disappear from the PCI memory space), the device
>>> driver is still unaware as it is doing the register reads / writes
> on
>>> the device. At this point, IMHO any attempt to access the device
>>> registers will result in an exception (BUS error?) as the device is
>>> gone. Correct?
>> The driver will suddenly start reading all 0xff and will then need to
>> abort whatever it was doing.  Usually all drivers handle this just
> fine
>> today, as this is what they needed to do when they were pccard
> devices.
>> Nothing new here at all.
>>
> 
> Does that mean accessing PCI memory mapped registers for a non-existent
> device has effects that are localized ONLY to the driver of that device?
> In other words, kernel does not notice or does not even become AWARE
> that a driver is trying to access non-existent memory mapped registers? 
> 
> So trying to access a HW that does not exist is totally legal in that
> sense? Is HW NOT supposed to generate an error in this case - as far as
> I had read, any access to non existent physical addresses should result
> in a bus error ... no?

This will cause Unsupported Request Error. But how it is handled
depends on platform, I think.

Thanks,
Kenji Kaneshige




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How does Linux handle PCI-E Surprise unplug?
  2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
                   ` (5 preceding siblings ...)
  2010-03-08  8:24 ` Kenji Kaneshige
@ 2010-03-08 22:49 ` Grant Grundler
  6 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2010-03-08 22:49 UTC (permalink / raw)
  To: linux-hotplug

On Mon, Mar 08, 2010 at 12:41:33PM +0530, Rajat Jain wrote:
...
> Does that mean accessing PCI memory mapped registers for a non-existent
> device has effects that are localized ONLY to the driver of that device?

What Kenji said.

> In other words, kernel does not notice or does not even become AWARE
> that a driver is trying to access non-existent memory mapped registers? 
> 
> So trying to access a HW that does not exist is totally legal in that
> sense?

Yes and no. Depends on who you ask. Old intel platforms tolerated
this just fine. Many other !intel platforms did not.

> Is HW NOT supposed to generate an error in this case - as far as
> I had read, any access to non existent physical addresses should result
> in a bus error ... no?

Correct. The access will result in a "Master Abort" (timeout when device
fails to respond.) This is when the platform dependent behaviour
comes into play. "Hard Fail" systems will end up in a system error
handler (HPMC, MCE, or equivalent) and "Soft Fail" systems (like
older Intel) will have the chipset return ~0 (and some ancient PCI
chipsets had BIOS return 0 instead).

hth,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-03-08 22:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-18  6:17 How does Linux handle PCI-E Surprise unplug? Rajat Jain
2010-02-18 14:52 ` Greg KH
2010-02-19  4:13 ` Rajat Jain
2010-02-19  4:27 ` Greg KH
2010-02-19  6:41 ` Hidetoshi Seto
2010-03-08  7:23 ` Rajat Jain
2010-03-08  8:24 ` Kenji Kaneshige
2010-03-08 22:49 ` Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).