linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pciehp: errors on resume
@ 2013-01-28 12:46 Paul Bolle
  2013-01-29  2:10 ` Gu Zheng
       [not found] ` <512336fb.42d70e0a.73ed.ffffc77eSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Bolle @ 2013-01-28 12:46 UTC (permalink / raw)
  To: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas; +Cc: linux-pci

0) Since release v3.7 a laptop I use prints these errors at every
resume:
    pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add
    pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00

That must have been caused by commit
87683e22c646e563061a91f4a0106e6913acebf8 ("PCI: pciehp: Always implement
resume, regardless of pciehp_force param").

1) Those messages appear to be printed for the wireless card that is
apparently attached to one of this laptop's two pcie ports:
    lspci | grep -e 00:1c.1 -e 03:00.0
    00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
    03:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)

    lspci -vt | grep -e 1c.1 -e 03
           +-1c.1-[03]----00.0  Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection

2) There are no apparent issues with this wireless card on all those
resumes. So could these errors somehow be suppressed? 

3) For what it's worth, the callchain that triggers these errors seems
to be:
    pciehp_resume()
        pciehp_enable_slot()
            board_added()
                pciehp_configure_device()
                    pci_get_slot()
                    ctrl_err([...] "Device %s already exists " [...])
                ctrl_err([...] "Cannot add device at [...]")


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-28 12:46 pciehp: errors on resume Paul Bolle
@ 2013-01-29  2:10 ` Gu Zheng
  2013-01-29 10:35   ` Paul Bolle
       [not found] ` <512336fb.42d70e0a.73ed.ffffc77eSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 1 reply; 18+ messages in thread
From: Gu Zheng @ 2013-01-29  2:10 UTC (permalink / raw)
  To: Paul Bolle; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

Hi Paul,
	I think cause of the issue you mentioned is the device already exists when we resume.
Because we do not really suspend/remove the device when suspend, the device still exists, so 
we try to add the device when resume will failed. The suspend routine does not match the resume,
it seems a bug here.

On 01/28/2013 08:46 PM, Paul Bolle wrote:

> 0) Since release v3.7 a laptop I use prints these errors at every
> resume:
>     pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add
>     pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00
> 
> That must have been caused by commit
> 87683e22c646e563061a91f4a0106e6913acebf8 ("PCI: pciehp: Always implement
> resume, regardless of pciehp_force param").
> 
> 1) Those messages appear to be printed for the wireless card that is
> apparently attached to one of this laptop's two pcie ports:
>     lspci | grep -e 00:1c.1 -e 03:00.0
>     00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
>     03:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)
> 
>     lspci -vt | grep -e 1c.1 -e 03
>            +-1c.1-[03]----00.0  Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection
> 
> 2) There are no apparent issues with this wireless card on all those
> resumes. So could these errors somehow be suppressed? 
> 
> 3) For what it's worth, the callchain that triggers these errors seems
> to be:
>     pciehp_resume()
>         pciehp_enable_slot()
>             board_added()
>                 pciehp_configure_device()
>                     pci_get_slot()
>                     ctrl_err([...] "Device %s already exists " [...])
>                 ctrl_err([...] "Cannot add device at [...]")
> 
> 
> Paul Bolle
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29  2:10 ` Gu Zheng
@ 2013-01-29 10:35   ` Paul Bolle
  2013-01-29 11:35     ` Rafael J. Wysocki
  2013-01-30  3:08     ` Gu Zheng
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Bolle @ 2013-01-29 10:35 UTC (permalink / raw)
  To: Gu Zheng; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

Gu,

On Tue, 2013-01-29 at 10:10 +0800, Gu Zheng wrote:
> I think cause of the issue you mentioned is the device already exists when we resume.
> Because we do not really suspend/remove the device when suspend, the device still exists, so 
> we try to add the device when resume will failed. The suspend routine does not match the resume,
> it seems a bug here.

Thanks. So should the fix here be to actually suspend and remove this
device? (Note that pciehp_suspend() is now basically a NOP.)


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29 10:35   ` Paul Bolle
@ 2013-01-29 11:35     ` Rafael J. Wysocki
  2013-01-29 12:32       ` Paul Bolle
  2013-01-30  3:47       ` Gu Zheng
  2013-01-30  3:08     ` Gu Zheng
  1 sibling, 2 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2013-01-29 11:35 UTC (permalink / raw)
  To: Paul Bolle; +Cc: Gu Zheng, Oliver Neukum, Bjorn Helgaas, linux-pci

On Tuesday, January 29, 2013 11:35:32 AM Paul Bolle wrote:
> Gu,
> 
> On Tue, 2013-01-29 at 10:10 +0800, Gu Zheng wrote:
> > I think cause of the issue you mentioned is the device already exists when we resume.
> > Because we do not really suspend/remove the device when suspend, the device still exists, so 
> > we try to add the device when resume will failed. The suspend routine does not match the resume,
> > it seems a bug here.
> 
> Thanks. So should the fix here be to actually suspend and remove this
> device? (Note that pciehp_suspend() is now basically a NOP.)

I think it wouldn't be useful to remove devices on all suspends, but then there
are a few different situations that resume has to consider and handle correctly:

1) Device has been removed while suspended.
2) Device was present before suspend and is still there.
3) Device has been added while suspended.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29 11:35     ` Rafael J. Wysocki
@ 2013-01-29 12:32       ` Paul Bolle
  2013-01-29 14:45         ` Martin Mokrejs
  2013-01-30  3:47       ` Gu Zheng
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Bolle @ 2013-01-29 12:32 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Gu Zheng, Oliver Neukum, Bjorn Helgaas, linux-pci

On Tue, 2013-01-29 at 12:35 +0100, Rafael J. Wysocki wrote:
> On Tuesday, January 29, 2013 11:35:32 AM Paul Bolle wrote:
> > Thanks. So should the fix here be to actually suspend and remove this
> > device? (Note that pciehp_suspend() is now basically a NOP.)
> 
> I think it wouldn't be useful to remove devices on all suspends, but then there
> are a few different situations that resume has to consider and handle correctly:
> 
> 1) Device has been removed while suspended.
> 2) Device was present before suspend and is still there.
> 3) Device has been added while suspended.

Obviously, I care about situation 2) here. (It would be quite a feat to
remove or add a wireless card while the laptop is suspended.)

Anyhow, is there any reason to not treat the fact that a device turns
out to be already there when board_added() is called as an occurrence of
situation 2)? In other words: is the only reason that a device already
exists when board_added() is called that it was there while suspending
and is still there while resuming?


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29 12:32       ` Paul Bolle
@ 2013-01-29 14:45         ` Martin Mokrejs
  2013-01-29 21:41           ` Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Mokrejs @ 2013-01-29 14:45 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Rafael J. Wysocki, Gu Zheng, Oliver Neukum, Bjorn Helgaas,
	linux-pci

Hi,

Paul Bolle wrote:
> On Tue, 2013-01-29 at 12:35 +0100, Rafael J. Wysocki wrote:
>> On Tuesday, January 29, 2013 11:35:32 AM Paul Bolle wrote:
>>> Thanks. So should the fix here be to actually suspend and remove this
>>> device? (Note that pciehp_suspend() is now basically a NOP.)
>>
>> I think it wouldn't be useful to remove devices on all suspends, but then there
>> are a few different situations that resume has to consider and handle correctly:
>>
>> 1) Device has been removed while suspended.
>> 2) Device was present before suspend and is still there.
>> 3) Device has been added while suspended.
> 
> Obviously, I care about situation 2) here. (It would be quite a feat to
> remove or add a wireless card while the laptop is suspended.)

  I don't use suspend on my laptop to avoid possible problems with crashes
and data loss (so I rather quit my apps and shutdown). Therefore I have no
experience with that ... but it is my impression that the obvious use is to
close the lid, unplug an external mouse/keyboard and go elsewhere, maybe
re-plug the mouse/keyboard into different slots (hey, I can't remember
whether I had mouse in this or the other socket) and open the lid. Even
being unintentionally nasty to the OS and re-plugging the devices after
I opened the laptop lid! I understand one is in all these scenarios just
doing the bad thing and just asking for a trouble ... but that's why I
don't use that feature. My memory is leaky and my behavior is error-prone
so I am sure once I would screw the order. Maybe OS could force disconnect of all
external devices which are reasonably safe (mouse and keyboard while not
say external USB drive)?

  Nevertheless, is it possible to have an external drive connection preserved
through suspend so that it survives under same device id? I don't know if this
a possible at all ... If not, then OS could could unplug even external drives.

My apologies if I am talking talking non-sense here. I really don't use suspend
so my knowledge is zero.
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29 14:45         ` Martin Mokrejs
@ 2013-01-29 21:41           ` Rafael J. Wysocki
  0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2013-01-29 21:41 UTC (permalink / raw)
  To: Martin Mokrejs
  Cc: Paul Bolle, Gu Zheng, Oliver Neukum, Bjorn Helgaas, linux-pci

On Tuesday, January 29, 2013 03:45:28 PM Martin Mokrejs wrote:
> Hi,
> 
> Paul Bolle wrote:
> > On Tue, 2013-01-29 at 12:35 +0100, Rafael J. Wysocki wrote:
> >> On Tuesday, January 29, 2013 11:35:32 AM Paul Bolle wrote:
> >>> Thanks. So should the fix here be to actually suspend and remove this
> >>> device? (Note that pciehp_suspend() is now basically a NOP.)
> >>
> >> I think it wouldn't be useful to remove devices on all suspends, but then there
> >> are a few different situations that resume has to consider and handle correctly:
> >>
> >> 1) Device has been removed while suspended.
> >> 2) Device was present before suspend and is still there.
> >> 3) Device has been added while suspended.
> > 
> > Obviously, I care about situation 2) here. (It would be quite a feat to
> > remove or add a wireless card while the laptop is suspended.)
> 
>   I don't use suspend on my laptop to avoid possible problems with crashes
> and data loss (so I rather quit my apps and shutdown). Therefore I have no
> experience with that ... but it is my impression that the obvious use is to
> close the lid, unplug an external mouse/keyboard and go elsewhere, maybe
> re-plug the mouse/keyboard into different slots (hey, I can't remember
> whether I had mouse in this or the other socket) and open the lid. Even
> being unintentionally nasty to the OS and re-plugging the devices after
> I opened the laptop lid! I understand one is in all these scenarios just
> doing the bad thing and just asking for a trouble ... but that's why I
> don't use that feature. My memory is leaky and my behavior is error-prone
> so I am sure once I would screw the order. Maybe OS could force disconnect of all
> external devices which are reasonably safe (mouse and keyboard while not
> say external USB drive)?
> 
>   Nevertheless, is it possible to have an external drive connection preserved
> through suspend so that it survives under same device id? I don't know if this
> a possible at all ... If not, then OS could could unplug even external drives.

This is possible for PCI devices.  The IDs won't change if the device is
plugged in all the time.

Moreover, in some cases there's no way to tell the difference between "internal"
and "external".  The OS just has to handle all of the possible situations
during resume correctly.  If it doesn't, then there is a bug that needs to be
fixed.

And avoiding a given feature just because you're scared of it doesn't lead to
any progress.  If you instead use it and report problems with it, then chances
are those problems will be addressed over time.

I use suspend all the time and it works for me.  I don't have any PCIe hotplug
devices, though, so any information on whether or not it works for people is
quite valuable to me.

I'll have a look at the pciehp driver's resume, but that's going to take some
time.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29 10:35   ` Paul Bolle
  2013-01-29 11:35     ` Rafael J. Wysocki
@ 2013-01-30  3:08     ` Gu Zheng
  2013-01-30  8:31       ` Paul Bolle
  1 sibling, 1 reply; 18+ messages in thread
From: Gu Zheng @ 2013-01-30  3:08 UTC (permalink / raw)
  To: Paul Bolle; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

On 01/29/2013 06:35 PM, Paul Bolle wrote:

> Gu,
> 
> On Tue, 2013-01-29 at 10:10 +0800, Gu Zheng wrote:
>> I think cause of the issue you mentioned is the device already exists when we resume.
>> Because we do not really suspend/remove the device when suspend, the device still exists, so 
>> we try to add the device when resume will failed. The suspend routine does not match the resume,
>> it seems a bug here.
> 
> Thanks. So should the fix here be to actually suspend and remove this
> device? (Note that pciehp_suspend() is now basically a NOP.)
> 


Hi Paul,
	The resume routine should handle all possible situations here. As Rafael said, 
in some cases there's no way to tell the difference between "internal" and "external".
If this problem seriously disturbs you, you can avoid this feature. But the better way is
reporting a bug report on the bugzilla, it can call more developers to fix this issue.

Thanks,
Gu     

> 
> Paul Bolle
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-29 11:35     ` Rafael J. Wysocki
  2013-01-29 12:32       ` Paul Bolle
@ 2013-01-30  3:47       ` Gu Zheng
  1 sibling, 0 replies; 18+ messages in thread
From: Gu Zheng @ 2013-01-30  3:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Paul Bolle, Oliver Neukum, Bjorn Helgaas, linux-pci

On 01/29/2013 07:35 PM, Rafael J. Wysocki wrote:

> On Tuesday, January 29, 2013 11:35:32 AM Paul Bolle wrote:
>> Gu,
>>
>> On Tue, 2013-01-29 at 10:10 +0800, Gu Zheng wrote:
>>> I think cause of the issue you mentioned is the device already exists when we resume.
>>> Because we do not really suspend/remove the device when suspend, the device still exists, so 
>>> we try to add the device when resume will failed. The suspend routine does not match the resume,
>>> it seems a bug here.
>>
>> Thanks. So should the fix here be to actually suspend and remove this
>> device? (Note that pciehp_suspend() is now basically a NOP.)
> 
> I think it wouldn't be useful to remove devices on all suspends, but then there
> are a few different situations that resume has to consider and handle correctly:
> 
> 1) Device has been removed while suspended.
> 2) Device was present before suspend and is still there.
> 3) Device has been added while suspended.
> 

Hi Rafael,
	Though removing devices on all suspends does not seem useful, I do not think
let the pciehp_suspend() be a NOP and the resume handle all the conditions is a good way.
Do the current pciehp driver's suspend and resume routines follow the pcie specification?
And, what's the difficulty to detect and handle the different situations that you mentioned
above?

Thanks,
Gu

> Thanks,
> Rafael
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-30  3:08     ` Gu Zheng
@ 2013-01-30  8:31       ` Paul Bolle
  2013-01-31  6:47         ` Gu Zheng
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Bolle @ 2013-01-30  8:31 UTC (permalink / raw)
  To: Gu Zheng; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

On Wed, 2013-01-30 at 11:08 +0800, Gu Zheng wrote:
> On 01/29/2013 06:35 PM, Paul Bolle wrote:
> > So should the fix here be to actually suspend and remove this
> > device? (Note that pciehp_suspend() is now basically a NOP.)
> 	The resume routine should handle all possible situations here. As Rafael said, 
> in some cases there's no way to tell the difference between "internal" and "external".
> If this problem seriously disturbs you, you can avoid this feature.

0) Actually, before v3.7 I was unaware of pciehp. And if these errors
hadn't shown up I would still be. I'm only using pciehp because Fedora
17 has CONFIG_HOTPLUG_PCI_PCIE enabled in its kernel config. 

>  But the better way is
> reporting a bug report on the bugzilla, it can call more developers to fix this issue.

1) Fine with me, though my experience is that kernel bugs should first
be discussed on the appropriate mailinglist.

2) Regarding the errors I see at resume: it seems that if a device
already exists when board_added() is called, this almost certainly means
we're resuming with the same device we suspended with. So there's no
reason to send errors to the log.

Is there any other way that a device could already exist when
board_added() is called?


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-30  8:31       ` Paul Bolle
@ 2013-01-31  6:47         ` Gu Zheng
  2013-01-31 10:58           ` Paul Bolle
  2013-01-31 13:16           ` Rafael J. Wysocki
  0 siblings, 2 replies; 18+ messages in thread
From: Gu Zheng @ 2013-01-31  6:47 UTC (permalink / raw)
  To: Paul Bolle; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

On 01/30/2013 04:31 PM, Paul Bolle wrote:

> On Wed, 2013-01-30 at 11:08 +0800, Gu Zheng wrote:
>> On 01/29/2013 06:35 PM, Paul Bolle wrote:
>>> So should the fix here be to actually suspend and remove this
>>> device? (Note that pciehp_suspend() is now basically a NOP.)
>> 	The resume routine should handle all possible situations here. As Rafael said, 
>> in some cases there's no way to tell the difference between "internal" and "external".
>> If this problem seriously disturbs you, you can avoid this feature.
> 
> 0) Actually, before v3.7 I was unaware of pciehp. And if these errors
> hadn't shown up I would still be. I'm only using pciehp because Fedora
> 17 has CONFIG_HOTPLUG_PCI_PCIE enabled in its kernel config. 
> 
>>  But the better way is
>> reporting a bug report on the bugzilla, it can call more developers to fix this issue.
> 
> 1) Fine with me, though my experience is that kernel bugs should first
> be discussed on the appropriate mailinglist.
> 
> 2) Regarding the errors I see at resume: it seems that if a device
> already exists when board_added() is called, this almost certainly means
> we're resuming with the same device we suspended with. So there's no
> reason to send errors to the log.


No, It's hard to detect whether the existed device is the one you want to resume.
Maybe the existed device was added during suspend, and the one you really want to
resume was removed. 

> 
> Is there any other way that a device could already exist when
> board_added() is called?


It's hard to say, the pcie device is a typical one.

> 
> 
> Paul Bolle
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-31  6:47         ` Gu Zheng
@ 2013-01-31 10:58           ` Paul Bolle
  2013-01-31 13:18             ` Rafael J. Wysocki
  2013-01-31 13:16           ` Rafael J. Wysocki
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Bolle @ 2013-01-31 10:58 UTC (permalink / raw)
  To: Gu Zheng; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

On Thu, 2013-01-31 at 14:47 +0800, Gu Zheng wrote:
> On 01/30/2013 04:31 PM, Paul Bolle wrote:
> > 2) Regarding the errors I see at resume: it seems that if a device
> > already exists when board_added() is called, this almost certainly means
> > we're resuming with the same device we suspended with. So there's no
> > reason to send errors to the log.
> 
> No, It's hard to detect whether the existed device is the one you want to resume.
> Maybe the existed device was added during suspend, and the one you really want to
> resume was removed. 

Because the domain, bus, slot, and function being equal doesn't mean
it's the same device? (I had to look up those names in man 8 lspci, I'm
unfamiliar with all this.)

Could an additional test on vendor ID and device ID help. Ie, if
board_added() and friends notice a device already exists and the
previous an current device have identical IDs, would we then know all
that's needed to not bother to scare people with errors?


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-31  6:47         ` Gu Zheng
  2013-01-31 10:58           ` Paul Bolle
@ 2013-01-31 13:16           ` Rafael J. Wysocki
  1 sibling, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2013-01-31 13:16 UTC (permalink / raw)
  To: Gu Zheng; +Cc: Paul Bolle, Oliver Neukum, Bjorn Helgaas, linux-pci

On Thursday, January 31, 2013 02:47:18 PM Gu Zheng wrote:
> On 01/30/2013 04:31 PM, Paul Bolle wrote:
> 
> > On Wed, 2013-01-30 at 11:08 +0800, Gu Zheng wrote:
> >> On 01/29/2013 06:35 PM, Paul Bolle wrote:
> >>> So should the fix here be to actually suspend and remove this
> >>> device? (Note that pciehp_suspend() is now basically a NOP.)
> >> 	The resume routine should handle all possible situations here. As Rafael said, 
> >> in some cases there's no way to tell the difference between "internal" and "external".
> >> If this problem seriously disturbs you, you can avoid this feature.
> > 
> > 0) Actually, before v3.7 I was unaware of pciehp. And if these errors
> > hadn't shown up I would still be. I'm only using pciehp because Fedora
> > 17 has CONFIG_HOTPLUG_PCI_PCIE enabled in its kernel config. 
> > 
> >>  But the better way is
> >> reporting a bug report on the bugzilla, it can call more developers to fix this issue.
> > 
> > 1) Fine with me, though my experience is that kernel bugs should first
> > be discussed on the appropriate mailinglist.
> > 
> > 2) Regarding the errors I see at resume: it seems that if a device
> > already exists when board_added() is called, this almost certainly means
> > we're resuming with the same device we suspended with. So there's no
> > reason to send errors to the log.
> 
> 
> No, It's hard to detect whether the existed device is the one you want to resume.
> Maybe the existed device was added during suspend, and the one you really want to
> resume was removed. 

We can save the PCI config space of it on suspend and then compare with what we
have on resume.  There are a few registers there whose values shouldn't change
and should differ for different devices.  That shouldn't be too hard I suppose.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-01-31 10:58           ` Paul Bolle
@ 2013-01-31 13:18             ` Rafael J. Wysocki
  0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2013-01-31 13:18 UTC (permalink / raw)
  To: Paul Bolle; +Cc: Gu Zheng, Oliver Neukum, Bjorn Helgaas, linux-pci

On Thursday, January 31, 2013 11:58:54 AM Paul Bolle wrote:
> On Thu, 2013-01-31 at 14:47 +0800, Gu Zheng wrote:
> > On 01/30/2013 04:31 PM, Paul Bolle wrote:
> > > 2) Regarding the errors I see at resume: it seems that if a device
> > > already exists when board_added() is called, this almost certainly means
> > > we're resuming with the same device we suspended with. So there's no
> > > reason to send errors to the log.
> > 
> > No, It's hard to detect whether the existed device is the one you want to resume.
> > Maybe the existed device was added during suspend, and the one you really want to
> > resume was removed. 
> 
> Because the domain, bus, slot, and function being equal doesn't mean
> it's the same device? (I had to look up those names in man 8 lspci, I'm
> unfamiliar with all this.)

No, that's not sufficient in general.

> Could an additional test on vendor ID and device ID help. Ie, if
> board_added() and friends notice a device already exists and the
> previous an current device have identical IDs, would we then know all
> that's needed to not bother to scare people with errors?

Yes, that's roughly the way to go in my opinion.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
       [not found] ` <512336fb.42d70e0a.73ed.ffffc77eSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-02-19  9:28   ` Paul Bolle
  2013-02-19 10:07     ` Gu Zheng
       [not found]     ` <5124284e.09d80e0a.294a.ffff9ee6SMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Bolle @ 2013-02-19  9:28 UTC (permalink / raw)
  To: Wei Yang; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

Richard,

On Tue, 2013-02-19 at 16:25 +0800, Wei Yang wrote:
> Sorry for bothering, I am looking at this and try to understand the process, 
> while get some confusion.
> 
> 1. The error log will be printed every time suspend/resume, no matter whether
>    the device is plug in/plug out during the suspend as discussed below?
> 
>    If the device is always there, no one touch it, the error message still be
>    printed?
> 
> 2. In my mind, before the pcied_init is called, those pci_dev are
>    already enumerated, such as the wireless card in this case. 
> 
>    During the boot stage, if the pciehp_force is set to true, the error messge
>    still be printed? Since I don't have those devices to create pcie_device, I
>    can't test this.
> 
> 3. Do you think it would be find to remove those devices at the suspend stage?
>    Then they will be added again at the resume stage?

Bypassing your questions, I'd like to point you at
http://article.gmane.org/gmane.linux.kernel.pci/20077 , in which Rafael
suggested a possible solution to this situation. (There's some extra
info in other messages in this thread.)

I must confess that I'm not at all sure how to implement it and that so
far I have, rather cowardly, not even drafted a solution along those
lines.


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
  2013-02-19  9:28   ` Paul Bolle
@ 2013-02-19 10:07     ` Gu Zheng
       [not found]       ` <20130222074942.GA2398@richard.(null)>
       [not found]     ` <5124284e.09d80e0a.294a.ffff9ee6SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 1 reply; 18+ messages in thread
From: Gu Zheng @ 2013-02-19 10:07 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Wei Yang, Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas,
	linux-pci

On 02/19/2013 05:28 PM, Paul Bolle wrote:

> Richard,
> 
> On Tue, 2013-02-19 at 16:25 +0800, Wei Yang wrote:
>> Sorry for bothering, I am looking at this and try to understand the process, 
>> while get some confusion.
>>
>> 1. The error log will be printed every time suspend/resume, no matter whether
>>    the device is plug in/plug out during the suspend as discussed below?
>>
>>    If the device is always there, no one touch it, the error message still be
>>    printed?
>>
>> 2. In my mind, before the pcied_init is called, those pci_dev are
>>    already enumerated, such as the wireless card in this case. 
>>
>>    During the boot stage, if the pciehp_force is set to true, the error messge
>>    still be printed? Since I don't have those devices to create pcie_device, I
>>    can't test this.
>>
>> 3. Do you think it would be find to remove those devices at the suspend stage?
>>    Then they will be added again at the resume stage?
> 
> Bypassing your questions, I'd like to point you at
> http://article.gmane.org/gmane.linux.kernel.pci/20077 , in which Rafael
> suggested a possible solution to this situation. (There's some extra
> info in other messages in this thread.)


Yes, Rafael's suggestion seems a possible solution to this situation.
In my mind, it's impossible to figure out a unique pci device with
the registers in the PCI config space, something like "vender_id + device_id"
can not describe a unique device.
The pcie device has a feature likes "series number" which could be used to figure
out a unique one, but this feature is optional. If it's not set, we still
can not detect a unique pcie device.
Sorry for my poor knowledge, if what I said has any mistake, please figure it out!:)

Thanks,
Gu 

 


> I must confess that I'm not at all sure how to implement it and that so
> far I have, rather cowardly, not even drafted a solution along those
> lines.
> 
> 
> Paul Bolle
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
       [not found]     ` <5124284e.09d80e0a.294a.ffff9ee6SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-02-20  8:08       ` Paul Bolle
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Bolle @ 2013-02-20  8:08 UTC (permalink / raw)
  To: Wei Yang; +Cc: Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas, linux-pci

Richard,

On Wed, 2013-02-20 at 09:34 +0800, Wei Yang wrote:
> I hope I can have similar devices, so I can reproduce it. Since I am not very
> familiar with the process.

Well, here it is triggered by the combination of this port
    00:1c.1 PCI bridge [0604]: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 [8086:2841] (rev 03)

with this card
    03:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection [8086:4230] (rev 61)

(The first PCIE port is apparently unused on this machine. I assume it's
meant for a WWAN card, which I don't use.)

But feel free to post some code here. I'm fine with testing any
reasonable solution.


Paul Bolle


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: pciehp: errors on resume
       [not found]       ` <20130222074942.GA2398@richard.(null)>
@ 2013-02-22  8:58         ` Gu Zheng
  0 siblings, 0 replies; 18+ messages in thread
From: Gu Zheng @ 2013-02-22  8:58 UTC (permalink / raw)
  To: Wei Yang
  Cc: Paul Bolle, Oliver Neukum, Rafael J. Wysocki, Bjorn Helgaas,
	linux-pci

On 02/22/2013 03:49 PM, Wei Yang wrote:

> On Tue, Feb 19, 2013 at 06:07:30PM +0800, Gu Zheng wrote:
>> On 02/19/2013 05:28 PM, Paul Bolle wrote:
>>
>>> Richard,
>>>
>>> On Tue, 2013-02-19 at 16:25 +0800, Wei Yang wrote:
>>>> Sorry for bothering, I am looking at this and try to understand the process, 
>>>> while get some confusion.
>>>>
>>>> 1. The error log will be printed every time suspend/resume, no matter whether
>>>>    the device is plug in/plug out during the suspend as discussed below?
>>>>
>>>>    If the device is always there, no one touch it, the error message still be
>>>>    printed?
>>>>
>>>> 2. In my mind, before the pcied_init is called, those pci_dev are
>>>>    already enumerated, such as the wireless card in this case. 
>>>>
>>>>    During the boot stage, if the pciehp_force is set to true, the error messge
>>>>    still be printed? Since I don't have those devices to create pcie_device, I
>>>>    can't test this.
>>>>
>>>> 3. Do you think it would be find to remove those devices at the suspend stage?
>>>>    Then they will be added again at the resume stage?
>>>
>>> Bypassing your questions, I'd like to point you at
>>> http://article.gmane.org/gmane.linux.kernel.pci/20077 , in which Rafael
>>> suggested a possible solution to this situation. (There's some extra
>>> info in other messages in this thread.)
>>
>>
>> Yes, Rafael's suggestion seems a possible solution to this situation.
>> In my mind, it's impossible to figure out a unique pci device with
>> the registers in the PCI config space, something like "vender_id + device_id"
>> can not describe a unique device.
>> The pcie device has a feature likes "series number" which could be used to figure
>> out a unique one, but this feature is optional. If it's not set, we still
>> can not detect a unique pcie device.
>> Sorry for my poor knowledge, if what I said has any mistake, please figure it out!:)
> 
> Gu,
> 
> After reading the code, there are several sets of hotplug code, for example
> pciehp, acpiphp.
> 
> For acpiphp, I found there is notification handler _handle_hotplug_event_bridge
> and _handle_hotplug_event_func to handle the hardware events. 
> 
> While for pciehp, I didn't find such handler to handle a hardware event. The
> pciehp_resume is the start point?

Look into pcie_init_slot(),pcie_init_notification(), pciehp use workqueue each slot
to handle hardware event in polling/intrupt.
Pciehp_resume() has relations to power manager. In fact, it's near the end point in the
resume routine. 
And further more, you can see documentation/power/pci.txt and follow pci_pm_resume().

Thanks,
Gu

> 
>>
>> Thanks,
>> Gu 
>>
>>
>>
>>
>>> I must confess that I'm not at all sure how to implement it and that so
>>> far I have, rather cowardly, not even drafted a solution along those
>>> lines.
>>>
>>>
>>> Paul Bolle
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-02-22  8:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-28 12:46 pciehp: errors on resume Paul Bolle
2013-01-29  2:10 ` Gu Zheng
2013-01-29 10:35   ` Paul Bolle
2013-01-29 11:35     ` Rafael J. Wysocki
2013-01-29 12:32       ` Paul Bolle
2013-01-29 14:45         ` Martin Mokrejs
2013-01-29 21:41           ` Rafael J. Wysocki
2013-01-30  3:47       ` Gu Zheng
2013-01-30  3:08     ` Gu Zheng
2013-01-30  8:31       ` Paul Bolle
2013-01-31  6:47         ` Gu Zheng
2013-01-31 10:58           ` Paul Bolle
2013-01-31 13:18             ` Rafael J. Wysocki
2013-01-31 13:16           ` Rafael J. Wysocki
     [not found] ` <512336fb.42d70e0a.73ed.ffffc77eSMTPIN_ADDED_BROKEN@mx.google.com>
2013-02-19  9:28   ` Paul Bolle
2013-02-19 10:07     ` Gu Zheng
     [not found]       ` <20130222074942.GA2398@richard.(null)>
2013-02-22  8:58         ` Gu Zheng
     [not found]     ` <5124284e.09d80e0a.294a.ffff9ee6SMTPIN_ADDED_BROKEN@mx.google.com>
2013-02-20  8:08       ` Paul Bolle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).