linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
@ 2014-06-17 19:27 Myron Stowe
  2014-06-17 21:07 ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Myron Stowe @ 2014-06-17 19:27 UTC (permalink / raw)
  To: bhelgaas, linux-pci; +Cc: kaneshige.kenji, linux-kernel

During PCIe hot-plug initialization - pciehp_probe - data structures
related to slot capabilities are set up.  As part of this set up, ISRs are
put in place to handle slot events and all event bits are cleared out.

This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
Slot Status bit to the event bits that are cleared out during
initialization.

Reference:
  PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
  (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
---

 drivers/pci/hotplug/pciehp_hpc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 42914e0..0568416 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 		PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
-		PCI_EXP_SLTSTA_CC);
+		PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
 
 	/* Disable software notification */
 	pcie_disable_notification(ctrl);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-17 19:27 [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits Myron Stowe
@ 2014-06-17 21:07 ` Bjorn Helgaas
  2014-06-17 22:55   ` Guenter Roeck
  2014-06-30 16:49   ` Myron Stowe
  0 siblings, 2 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2014-06-17 21:07 UTC (permalink / raw)
  To: Myron Stowe
  Cc: linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
> During PCIe hot-plug initialization - pciehp_probe - data structures
> related to slot capabilities are set up.  As part of this set up, ISRs are
> put in place to handle slot events and all event bits are cleared out.
>
> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
> Slot Status bit to the event bits that are cleared out during
> initialization.

I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
notifications for hot-plug and removal").  Prior to that, pcie_isr()
didn't look at the PCI_EXP_SLTSTA_DLLSC bit.

Apparently there's a non-public report of spurious messages like this
at boot-time, with no actual hotplug events:

  pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
  pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
0000:83:00, cannot hot-add
  pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00

Device 0000:83:00.0 was enumerated previously, during the normal PCI
device enumeration.  I suspect DLLSC was set by the hardware when the
link came up after power-on, and it remained set until Linux booted.
Then when we take the first pciehp interrupt, we notice DLLSC is set
and think it's a new link state change.

This might be system-specific, because some BIOSes might clear DLLSC
before handing off to the OS.  Or my theory might be all wet.

But I think we should clear DLLSC in any case.  We clear all the other
RW1C bits in Slot Status, and I think we should clear DLLSC as well.

I'd like to include a bugzilla or mailing list reference for the
spurious message, and I'd also like confirmation that this changes
actually fixes it.

Bjorn

> Reference:
>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>
> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
> ---
>
>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 42914e0..0568416 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
> -               PCI_EXP_SLTSTA_CC);
> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>
>         /* Disable software notification */
>         pcie_disable_notification(ctrl);
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-17 21:07 ` Bjorn Helgaas
@ 2014-06-17 22:55   ` Guenter Roeck
  2014-06-24 20:34     ` Rajat Jain
  2014-06-30 16:49   ` Myron Stowe
  1 sibling, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2014-06-17 22:55 UTC (permalink / raw)
  To: Bjorn Helgaas, Myron Stowe
  Cc: linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>> During PCIe hot-plug initialization - pciehp_probe - data structures
>> related to slot capabilities are set up.  As part of this set up, ISRs are
>> put in place to handle slot events and all event bits are cleared out.
>>
>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>> Slot Status bit to the event bits that are cleared out during
>> initialization.
>
> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> notifications for hot-plug and removal").  Prior to that, pcie_isr()
> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>
> Apparently there's a non-public report of spurious messages like this
> at boot-time, with no actual hotplug events:
>
>    pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>    pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> 0000:83:00, cannot hot-add
>    pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>

I think I have seen that message once in a while in our systems.
Rajat, didn't we talk about this a while ago ?

Guenter

> Device 0000:83:00.0 was enumerated previously, during the normal PCI
> device enumeration.  I suspect DLLSC was set by the hardware when the
> link came up after power-on, and it remained set until Linux booted.
> Then when we take the first pciehp interrupt, we notice DLLSC is set
> and think it's a new link state change.
>
> This might be system-specific, because some BIOSes might clear DLLSC
> before handing off to the OS.  Or my theory might be all wet.
>
> But I think we should clear DLLSC in any case.  We clear all the other
> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>
> I'd like to include a bugzilla or mailing list reference for the
> spurious message, and I'd also like confirmation that this changes
> actually fixes it.
>
> Bjorn
>
>> Reference:
>>    PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>    (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>
>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>> ---
>>
>>   drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 42914e0..0568416 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>          pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>                  PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>                  PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>> -               PCI_EXP_SLTSTA_CC);
>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>
>>          /* Disable software notification */
>>          pcie_disable_notification(ctrl);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-17 22:55   ` Guenter Roeck
@ 2014-06-24 20:34     ` Rajat Jain
  2014-06-24 21:16       ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Rajat Jain @ 2014-06-24 20:34 UTC (permalink / raw)
  To: Guenter Roeck, Bjorn Helgaas, Myron Stowe
  Cc: linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

SGVsbG8sDQoNClNvcnJ5LCBJIG1pc3NlZCB0aGlzIGVtYWlsLg0KDQpQbGVhc2Ugc2VlIGJlbG93
LiANCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1wY2ktb3du
ZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtcGNpLQ0KPiBvd25lckB2Z2VyLmtlcm5l
bC5vcmddIE9uIEJlaGFsZiBPZiBHdWVudGVyIFJvZWNrDQo+IFNlbnQ6IFR1ZXNkYXksIEp1bmUg
MTcsIDIwMTQgMzo1NiBQTQ0KPiBUbzogQmpvcm4gSGVsZ2FhczsgTXlyb24gU3Rvd2UNCj4gQ2M6
IGxpbnV4LXBjaUB2Z2VyLmtlcm5lbC5vcmc7IEtlbmppIEthbmVzaGlnZTsgbGludXgtDQo+IGtl
cm5lbEB2Z2VyLmtlcm5lbC5vcmc7IFJhamF0IEphaW4NCj4gU3ViamVjdDogUmU6IFtQQVRDSF0g
UENJOiBwY2llaHA6IEluY2x1ZGUgdGhlIERhdGEgTGluayBMYXllciBTdGF0ZSBDaGFuZ2VkDQo+
IGJpdCB3aGVuIGNsZWFyaW5nIHRoZSBTbG90IFN0YXR1cyByZWdpc3RlcidzIGV2ZW50IGJpdHMN
Cj4gDQo+IE9uIDA2LzE3LzIwMTQgMDI6MDcgUE0sIEJqb3JuIEhlbGdhYXMgd3JvdGU6DQo+ID4g
T24gVHVlLCBKdW4gMTcsIDIwMTQgYXQgMToyNyBQTSwgTXlyb24gU3Rvd2UNCj4gPG15cm9uLnN0
b3dlQHJlZGhhdC5jb20+IHdyb3RlOg0KPiA+PiBEdXJpbmcgUENJZSBob3QtcGx1ZyBpbml0aWFs
aXphdGlvbiAtIHBjaWVocF9wcm9iZSAtIGRhdGEgc3RydWN0dXJlcw0KPiA+PiByZWxhdGVkIHRv
IHNsb3QgY2FwYWJpbGl0aWVzIGFyZSBzZXQgdXAuICBBcyBwYXJ0IG9mIHRoaXMgc2V0IHVwLA0K
PiA+PiBJU1JzIGFyZSBwdXQgaW4gcGxhY2UgdG8gaGFuZGxlIHNsb3QgZXZlbnRzIGFuZCBhbGwg
ZXZlbnQgYml0cyBhcmUgY2xlYXJlZA0KPiBvdXQuDQo+ID4+DQo+ID4+IFRoaXMgcGF0Y2ggYWRk
cyB0aGUgRGF0YSBMaW5rIExheWVyIFN0YXRlIENoYW5nZWQNCj4gPj4gKFBDSV9FWFBfU0xUU1RB
X0RMTFNDKSBTbG90IFN0YXR1cyBiaXQgdG8gdGhlIGV2ZW50IGJpdHMgdGhhdCBhcmUNCj4gPj4g
Y2xlYXJlZCBvdXQgZHVyaW5nIGluaXRpYWxpemF0aW9uLg0KPiA+DQo+ID4gSSB0aGluayB0aGlz
IGlzIHJlbGF0ZWQgdG8gZTQ4ZjFiNjdmNjY4ICgiUENJOiBwY2llaHA6IFVzZSBsaW5rIGNoYW5n
ZQ0KPiA+IG5vdGlmaWNhdGlvbnMgZm9yIGhvdC1wbHVnIGFuZCByZW1vdmFsIikuICBQcmlvciB0
byB0aGF0LCBwY2llX2lzcigpDQo+ID4gZGlkbid0IGxvb2sgYXQgdGhlIFBDSV9FWFBfU0xUU1RB
X0RMTFNDIGJpdC4NCj4gPg0KPiA+IEFwcGFyZW50bHkgdGhlcmUncyBhIG5vbi1wdWJsaWMgcmVw
b3J0IG9mIHNwdXJpb3VzIG1lc3NhZ2VzIGxpa2UgdGhpcw0KPiA+IGF0IGJvb3QtdGltZSwgd2l0
aCBubyBhY3R1YWwgaG90cGx1ZyBldmVudHM6DQo+ID4NCj4gPiAgICBwY2llaHAgMDAwMDo4Mjow
NC4wOnBjaWUyNDogc2xvdCg0KTogTGluayBVcCBldmVudA0KPiA+ICAgIHBjaWVocCAwMDAwOjgy
OjA0LjA6cGNpZTI0OiBEZXZpY2UgMDAwMDo4MzowMC4wIGFscmVhZHkgZXhpc3RzIGF0DQo+ID4g
MDAwMDo4MzowMCwgY2Fubm90IGhvdC1hZGQNCj4gPiAgICBwY2llaHAgMDAwMDo4MjowNC4wOnBj
aWUyNDogQ2Fubm90IGFkZCBkZXZpY2UgYXQgMDAwMDo4MzowMA0KPiA+DQo+IA0KPiBJIHRoaW5r
IEkgaGF2ZSBzZWVuIHRoYXQgbWVzc2FnZSBvbmNlIGluIGEgd2hpbGUgaW4gb3VyIHN5c3RlbXMu
DQo+IFJhamF0LCBkaWRuJ3Qgd2UgdGFsayBhYm91dCB0aGlzIGEgd2hpbGUgYWdvID8NCg0KRXNz
ZW50aWFsbHkgbXkgaGljY3VwIHdhcyB0aGF0IEkgd2FzIG5vdCBzdXJlIHdoZXRoZXIgdGhlIGRy
aXZlciBzaG91bGQgb3Igc2hvdWxkIG5vdCB0YWtlIGNhcmUgb2YgdGhlIGxpbmsgY2hhbmdlIGV2
ZW50cyB0aGF0IGhhdmUgaGFwcGVuZWQgQkVGT1JFIHRoZSBkcml2ZXIgZ2V0cyBsb2FkZWQuIFRo
aXMgaGFzIG1vcmUgaW1wYWN0IGlmIHRoZSBwY2llaHAgaXMgYnVpbHQgYXMgYSBrZXJuZWwgbW9k
dWxlLg0KDQpBcyBhbiBleGFtcGxlOg0KDQpJdCBpcyBjb21tb24gZm9yIHRoZSBwbGF0Zm9ybSBp
bml0IGNvZGUgYnVpbHQgaW50byB0aGUga2VybmVsLCB0byB0YWtlIHRoZSBQQ0kgZGV2aWNlcyBv
biB0aGUgYm9hcmQgb3V0IG9mIHJlc2V0LiBBbmQgdGhhdCBjYW4gaGFwcGVuIGFmdGVyIHRoZSBQ
Q0kgZW51bWVyYXRpb24gYnV0IGJlZm9yZSB0aGUgcGNpZWhwIGRyaXZlciBnZXRzIGxvYWRlZC4g
VGh1cyBpbiB0aGlzIGNvbmRpdGlvbiwgd2l0aCB0aGlzIHBhdGNoLCB0aGUgcGNpZWhwIHdpbGwg
aWdub3JlIHRoZSBsaW5rdXAgZXZlbnQsIHRodXMgZGV2aWNlIHdpbGwgbm90IGJlIHZpc2libGUu
IFRoZSBvbmx5IHdheSAob3RoZXIgdGhhbiBhIHJlc2NhbikgdG8gZG8gaG90LWFkZCB0aGUgZGV2
aWNlIHdvdWxkIGJlIHRvIGFwcGx5LWFuZC1yZW1vdmUtcmVzZXQtc2lnbmFsIHRvIHRoZSBkZXZp
Y2UgYWdhaW4uIEF0IHdoaWNoIHBvaW50IHBjaWVocCBtYXkgZ2l2ZSBhIHdhcm5pbmcgYWJvdXQg
YWJvdXQgYW4gYXR0ZW1wdCB0byByZW1vdmUgYSBub24tZXhpc3RlbnQgY2FyZCwgYW5kIHRoZW4g
d2lsbCBwcm9jZWVkIGZpbmUgd2l0aCBob3QtYWRkLg0KDQpUaGUgcGF0Y2ggbG9va3MgZ29vZCwg
b25seSB3YW50ZWQgdG8gbWFrZSBzdXJlIHRoYXQgd2UgdW5kZXJzdGFuZCBhbmQgYWdyZWUgdGhh
dCB0aGUgcGNpZWhwIHNob3VsZCBOT1QgcHJvY2VzcyBhbmQgbGluayBldmVudHMgdGhhdCBoYXZl
IGhhcHBlbmVkIGJlZm9yZSB0aGUgcGNpZWhwIHdhcyBsb2FkZWQuDQoNCkFja2VkLWJ5OiBSYWph
dCBKYWluIDxyYWphdGphaW5AanVuaXBlci5uZXQ+DQoNClRoYW5rcywNCg0KUmFqYXQNCg0KDQo=

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-24 20:34     ` Rajat Jain
@ 2014-06-24 21:16       ` Bjorn Helgaas
  2014-06-25  5:26         ` Rajat Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2014-06-24 21:16 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Guenter Roeck, Myron Stowe, linux-pci@vger.kernel.org,
	Kenji Kaneshige, linux-kernel@vger.kernel.org, Rajat Jain

On Tue, Jun 24, 2014 at 2:34 PM, Rajat Jain <rajatjain@juniper.net> wrote:
> Hello,
>
> Sorry, I missed this email.
>
> Please see below.
>
>> -----Original Message-----
>> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-
>> owner@vger.kernel.org] On Behalf Of Guenter Roeck
>> Sent: Tuesday, June 17, 2014 3:56 PM
>> To: Bjorn Helgaas; Myron Stowe
>> Cc: linux-pci@vger.kernel.org; Kenji Kaneshige; linux-
>> kernel@vger.kernel.org; Rajat Jain
>> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed
>> bit when clearing the Slot Status register's event bits
>>
>> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
>> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe
>> <myron.stowe@redhat.com> wrote:
>> >> During PCIe hot-plug initialization - pciehp_probe - data structures
>> >> related to slot capabilities are set up.  As part of this set up,
>> >> ISRs are put in place to handle slot events and all event bits are cleared
>> out.
>> >>
>> >> This patch adds the Data Link Layer State Changed
>> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are
>> >> cleared out during initialization.
>> >
>> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> > notifications for hot-plug and removal").  Prior to that, pcie_isr()
>> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>> >
>> > Apparently there's a non-public report of spurious messages like this
>> > at boot-time, with no actual hotplug events:
>> >
>> >    pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>> >    pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> > 0000:83:00, cannot hot-add
>> >    pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>> >
>>
>> I think I have seen that message once in a while in our systems.
>> Rajat, didn't we talk about this a while ago ?
>
> Essentially my hiccup was that I was not sure whether the driver should or should not take care of the link change events that have happened BEFORE the driver gets loaded. This has more impact if the pciehp is built as a kernel module.
>
> As an example:
>
> It is common for the platform init code built into the kernel, to take the PCI devices on the board out of reset. And that can happen after the PCI enumeration but before the pciehp driver gets loaded. Thus in this condition, with this patch, the pciehp will ignore the linkup event, thus device will not be visible. The only way (other than a rescan) to do hot-add the device would be to apply-and-remove-reset-signal to the device again. At which point pciehp may give a warning about about an attempt to remove a non-existent card, and then will proceed fine with hot-add.

When you saw these problems, was pciehp a module?  We changed it
(c10cc483bf3f in v3.11) so it can't be a module any more, but if there
can still be a significant window between enumeration and pciehp init,
I'd like to understand more about it.

> The patch looks good, only wanted to make sure that we understand and agree that the pciehp should NOT process and link events that have happened before the pciehp was loaded.

Currently (before this patch), we clear the Attention Button Pressed,
Power Fault Detected, MRL Sensor Changed, Presence Detect Changed, and
Command Completed bits, but we *don't* clear Data Link Layer State
Changed.

That asymmetry seems hard to justify.  For example, if a card were
inserted after enumeration but before pciehp is initialized, we'd miss
the PDC indication, so I think we would fail to notice the new device.
That seems basically the same as missing the linkup event in your
example.  In both cases, I think we *should* notice the new device, so
the fact that we don't is a problem.

But since pciehp can't be a module any more, the window between PCI
enumeration and pciehp initialization should be relatively small.  I
think the best way to fix this would be to integrate pciehp more
closely into PCI enumeration, e.g., by doing pcie_init() at the point
where we discover the bridge, before we enumerate any devices below
the bridge.  This is somewhat tangled up with acpihp, so I don't know
how complicated it would be to do this.

So I guess my argument is:

  - Ignoring events that happen before pciehp init decreases our
dependency on BIOS
  - Handling all events consistently is very important
  - We currently have a problem with missing pre-pciehp events, but
there is a way to fix this

> Acked-by: Rajat Jain <rajatjain@juniper.net>

Thanks for taking a look at this!

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-24 21:16       ` Bjorn Helgaas
@ 2014-06-25  5:26         ` Rajat Jain
  2014-06-25 17:51           ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Rajat Jain @ 2014-06-25  5:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Guenter Roeck, Myron Stowe, linux-pci@vger.kernel.org,
	Kenji Kaneshige, linux-kernel@vger.kernel.org, Rajat Jain

SGVsbG8sDQoNCj4gPj4NCj4gPj4gT24gMDYvMTcvMjAxNCAwMjowNyBQTSwgQmpvcm4gSGVsZ2Fh
cyB3cm90ZToNCj4gPj4gPiBPbiBUdWUsIEp1biAxNywgMjAxNCBhdCAxOjI3IFBNLCBNeXJvbiBT
dG93ZQ0KPiA+PiA8bXlyb24uc3Rvd2VAcmVkaGF0LmNvbT4gd3JvdGU6DQo+ID4+ID4+IER1cmlu
ZyBQQ0llIGhvdC1wbHVnIGluaXRpYWxpemF0aW9uIC0gcGNpZWhwX3Byb2JlIC0gZGF0YQ0KPiA+
PiA+PiBzdHJ1Y3R1cmVzIHJlbGF0ZWQgdG8gc2xvdCBjYXBhYmlsaXRpZXMgYXJlIHNldCB1cC4g
IEFzIHBhcnQgb2YNCj4gPj4gPj4gdGhpcyBzZXQgdXAsIElTUnMgYXJlIHB1dCBpbiBwbGFjZSB0
byBoYW5kbGUgc2xvdCBldmVudHMgYW5kIGFsbA0KPiA+PiA+PiBldmVudCBiaXRzIGFyZSBjbGVh
cmVkDQo+ID4+IG91dC4NCj4gPj4gPj4NCj4gPj4gPj4gVGhpcyBwYXRjaCBhZGRzIHRoZSBEYXRh
IExpbmsgTGF5ZXIgU3RhdGUgQ2hhbmdlZA0KPiA+PiA+PiAoUENJX0VYUF9TTFRTVEFfRExMU0Mp
IFNsb3QgU3RhdHVzIGJpdCB0byB0aGUgZXZlbnQgYml0cyB0aGF0IGFyZQ0KPiA+PiA+PiBjbGVh
cmVkIG91dCBkdXJpbmcgaW5pdGlhbGl6YXRpb24uDQo+ID4+ID4NCj4gPj4gPiBJIHRoaW5rIHRo
aXMgaXMgcmVsYXRlZCB0byBlNDhmMWI2N2Y2NjggKCJQQ0k6IHBjaWVocDogVXNlIGxpbmsNCj4g
Pj4gPiBjaGFuZ2Ugbm90aWZpY2F0aW9ucyBmb3IgaG90LXBsdWcgYW5kIHJlbW92YWwiKS4gIFBy
aW9yIHRvIHRoYXQsDQo+ID4+ID4gcGNpZV9pc3IoKSBkaWRuJ3QgbG9vayBhdCB0aGUgUENJX0VY
UF9TTFRTVEFfRExMU0MgYml0Lg0KPiA+PiA+DQo+ID4+ID4gQXBwYXJlbnRseSB0aGVyZSdzIGEg
bm9uLXB1YmxpYyByZXBvcnQgb2Ygc3B1cmlvdXMgbWVzc2FnZXMgbGlrZQ0KPiA+PiA+IHRoaXMg
YXQgYm9vdC10aW1lLCB3aXRoIG5vIGFjdHVhbCBob3RwbHVnIGV2ZW50czoNCj4gPj4gPg0KPiA+
PiA+ICAgIHBjaWVocCAwMDAwOjgyOjA0LjA6cGNpZTI0OiBzbG90KDQpOiBMaW5rIFVwIGV2ZW50
DQo+ID4+ID4gICAgcGNpZWhwIDAwMDA6ODI6MDQuMDpwY2llMjQ6IERldmljZSAwMDAwOjgzOjAw
LjAgYWxyZWFkeSBleGlzdHMNCj4gPj4gPiBhdCAwMDAwOjgzOjAwLCBjYW5ub3QgaG90LWFkZA0K
PiA+PiA+ICAgIHBjaWVocCAwMDAwOjgyOjA0LjA6cGNpZTI0OiBDYW5ub3QgYWRkIGRldmljZSBh
dCAwMDAwOjgzOjAwDQo+ID4+ID4NCj4gPj4NCj4gPj4gSSB0aGluayBJIGhhdmUgc2VlbiB0aGF0
IG1lc3NhZ2Ugb25jZSBpbiBhIHdoaWxlIGluIG91ciBzeXN0ZW1zLg0KPiA+PiBSYWphdCwgZGlk
bid0IHdlIHRhbGsgYWJvdXQgdGhpcyBhIHdoaWxlIGFnbyA/DQo+ID4NCj4gPiBFc3NlbnRpYWxs
eSBteSBoaWNjdXAgd2FzIHRoYXQgSSB3YXMgbm90IHN1cmUgd2hldGhlciB0aGUgZHJpdmVyIHNo
b3VsZCBvcg0KPiBzaG91bGQgbm90IHRha2UgY2FyZSBvZiB0aGUgbGluayBjaGFuZ2UgZXZlbnRz
IHRoYXQgaGF2ZSBoYXBwZW5lZCBCRUZPUkUNCj4gdGhlIGRyaXZlciBnZXRzIGxvYWRlZC4gVGhp
cyBoYXMgbW9yZSBpbXBhY3QgaWYgdGhlIHBjaWVocCBpcyBidWlsdCBhcyBhIGtlcm5lbA0KPiBt
b2R1bGUuDQo+ID4NCj4gPiBBcyBhbiBleGFtcGxlOg0KPiA+DQo+ID4gSXQgaXMgY29tbW9uIGZv
ciB0aGUgcGxhdGZvcm0gaW5pdCBjb2RlIGJ1aWx0IGludG8gdGhlIGtlcm5lbCwgdG8gdGFrZSB0
aGUgUENJDQo+IGRldmljZXMgb24gdGhlIGJvYXJkIG91dCBvZiByZXNldC4gQW5kIHRoYXQgY2Fu
IGhhcHBlbiBhZnRlciB0aGUgUENJDQo+IGVudW1lcmF0aW9uIGJ1dCBiZWZvcmUgdGhlIHBjaWVo
cCBkcml2ZXIgZ2V0cyBsb2FkZWQuIFRodXMgaW4gdGhpcyBjb25kaXRpb24sDQo+IHdpdGggdGhp
cyBwYXRjaCwgdGhlIHBjaWVocCB3aWxsIGlnbm9yZSB0aGUgbGlua3VwIGV2ZW50LCB0aHVzIGRl
dmljZSB3aWxsIG5vdA0KPiBiZSB2aXNpYmxlLiBUaGUgb25seSB3YXkgKG90aGVyIHRoYW4gYSBy
ZXNjYW4pIHRvIGRvIGhvdC1hZGQgdGhlIGRldmljZSB3b3VsZA0KPiBiZSB0byBhcHBseS1hbmQt
cmVtb3ZlLXJlc2V0LXNpZ25hbCB0byB0aGUgZGV2aWNlIGFnYWluLiBBdCB3aGljaCBwb2ludA0K
PiBwY2llaHAgbWF5IGdpdmUgYSB3YXJuaW5nIGFib3V0IGFib3V0IGFuIGF0dGVtcHQgdG8gcmVt
b3ZlIGEgbm9uLWV4aXN0ZW50DQo+IGNhcmQsIGFuZCB0aGVuIHdpbGwgcHJvY2VlZCBmaW5lIHdp
dGggaG90LWFkZC4NCj4gDQo+IFdoZW4geW91IHNhdyB0aGVzZSBwcm9ibGVtcywgd2FzIHBjaWVo
cCBhIG1vZHVsZT8NCg0KTm8sIHdlIGRpZCBub3QgYWN0dWFsbHkgaGl0ICp0aGlzKiBwcm9ibGVt
IChETExTQyBnZXR0aW5nIHNldCBiZWZvcmUgcGNpZWhwIGluaXQpLiBXZSBoYWQgb2JzZXJ2ZWQg
YW5vdGhlciBIVyBzcGVjaWZpYyBwcm9ibGVtIHdoZXJlIGJvdGggbGluayBldmVudCBhbmQgcHJl
c2VuY2UgZGV0ZWN0IHdlcmUgdG9nZ2xpbmcsIGhlbmNlIDIgaW5kZXBlbmRlbnQgaG90LXBsdWcg
aW50ZXJydXB0cyBldmVudHMgd2VyZSBjb21pbmcsIHRodXMgcmVzdWx0aW5nIGluIHNwdXJpb3Vz
IG1lc3NhZ2VzLg0KDQo+IFdlIGNoYW5nZWQgaXQNCj4gKGMxMGNjNDgzYmYzZiBpbiB2My4xMSkg
c28gaXQgY2FuJ3QgYmUgYSBtb2R1bGUgYW55IG1vcmUsIGJ1dCBpZiB0aGVyZSBjYW4gc3RpbGwN
Cj4gYmUgYSBzaWduaWZpY2FudCB3aW5kb3cgYmV0d2VlbiBlbnVtZXJhdGlvbiBhbmQgcGNpZWhw
IGluaXQsIEknZCBsaWtlIHRvDQo+IHVuZGVyc3RhbmQgbW9yZSBhYm91dCBpdC4NCg0KT2gsIG15
IGJhZC4gSSBkaWQgbm90IGV2ZXIgdHJ5IHRvIGFjdHVhbGx5IGJ1aWxkIGl0IGFzIGEgbW9kdWxl
LCBJIGp1c3QgYXNzdW1lZCB0aGF0IGl0J2QgYmUgcG9zc2libGUuIEkgd2FzIGp1c3QgdGhpbmtp
bmcgb2YgY29ybmVyIGNhc2VzIEkgZ3Vlc3MuIA0KDQo+IA0KPiA+IFRoZSBwYXRjaCBsb29rcyBn
b29kLCBvbmx5IHdhbnRlZCB0byBtYWtlIHN1cmUgdGhhdCB3ZSB1bmRlcnN0YW5kIGFuZA0KPiBh
Z3JlZSB0aGF0IHRoZSBwY2llaHAgc2hvdWxkIE5PVCBwcm9jZXNzIGFuZCBsaW5rIGV2ZW50cyB0
aGF0IGhhdmUNCj4gaGFwcGVuZWQgYmVmb3JlIHRoZSBwY2llaHAgd2FzIGxvYWRlZC4NCj4gDQo+
IEN1cnJlbnRseSAoYmVmb3JlIHRoaXMgcGF0Y2gpLCB3ZSBjbGVhciB0aGUgQXR0ZW50aW9uIEJ1
dHRvbiBQcmVzc2VkLCBQb3dlcg0KPiBGYXVsdCBEZXRlY3RlZCwgTVJMIFNlbnNvciBDaGFuZ2Vk
LCBQcmVzZW5jZSBEZXRlY3QgQ2hhbmdlZCwgYW5kDQo+IENvbW1hbmQgQ29tcGxldGVkIGJpdHMs
IGJ1dCB3ZSAqZG9uJ3QqIGNsZWFyIERhdGEgTGluayBMYXllciBTdGF0ZQ0KPiBDaGFuZ2VkLg0K
PiANCj4gVGhhdCBhc3ltbWV0cnkgc2VlbXMgaGFyZCB0byBqdXN0aWZ5Lg0KDQpBZ3JlZS4NCg0K
ID4gRm9yIGV4YW1wbGUsIGlmIGEgY2FyZCB3ZXJlIGluc2VydGVkDQo+IGFmdGVyIGVudW1lcmF0
aW9uIGJ1dCBiZWZvcmUgcGNpZWhwIGlzIGluaXRpYWxpemVkLCB3ZSdkIG1pc3MgdGhlIFBEQw0K
PiBpbmRpY2F0aW9uLCBzbyBJIHRoaW5rIHdlIHdvdWxkIGZhaWwgdG8gbm90aWNlIHRoZSBuZXcg
ZGV2aWNlLg0KPiBUaGF0IHNlZW1zIGJhc2ljYWxseSB0aGUgc2FtZSBhcyBtaXNzaW5nIHRoZSBs
aW5rdXAgZXZlbnQgaW4geW91ciBleGFtcGxlLg0KPiBJbiBib3RoIGNhc2VzLCBJIHRoaW5rIHdl
ICpzaG91bGQqIG5vdGljZSB0aGUgbmV3IGRldmljZSwgc28gdGhlIGZhY3QgdGhhdCB3ZQ0KPiBk
b24ndCBpcyBhIHByb2JsZW0uDQoNCkh1bW1tLCBvayB1bmRlcnN0b29kLiBTbyB0aGF0IG1lYW5z
IHdlIGhhdmUgZGVmaW5lZCBhIHByb2JsZW0gc3RhdGVtZW50IGF0IGxlYXN0IDotKS4gWWVzLCBJ
IHVuZGVyc3RhbmQgdGhhdCB0aGUgd2luZG93IGlzIHZlcnkgc21hbGwgaGVyZS4gQWxzbyBJIGFn
cmVlIHdpdGggeW91ciBvcGluaW9uIHdoYXRldmVyIHNvbHV0aW9uIHdlIHB1dCBpdCB0byB0YWtl
IGNhcmUgb2YgZXZlbnRzIHRoYXQgaGFwcGVuZWQgYmVmb3JlIHBjaWVocCBpbml0LCBzaG91bGQg
YmUgY29uc2lzdGVudCBmb3IgYWxsIGV2ZW50cyAobm90IGp1c3QgbGluayBjaGFuZ2VzKS4gVGhl
IHByb2JsZW0gY2FuIHdhaXQgdG8gYmUgc29sdmVkIEkgdGhpbmsuDQoNCj4gDQo+IEJ1dCBzaW5j
ZSBwY2llaHAgY2FuJ3QgYmUgYSBtb2R1bGUgYW55IG1vcmUsIHRoZSB3aW5kb3cgYmV0d2VlbiBQ
Q0kNCj4gZW51bWVyYXRpb24gYW5kIHBjaWVocCBpbml0aWFsaXphdGlvbiBzaG91bGQgYmUgcmVs
YXRpdmVseSBzbWFsbC4gIEkgdGhpbmsgdGhlDQo+IGJlc3Qgd2F5IHRvIGZpeCB0aGlzIHdvdWxk
IGJlIHRvIGludGVncmF0ZSBwY2llaHAgbW9yZSBjbG9zZWx5IGludG8gUENJDQo+IGVudW1lcmF0
aW9uLCBlLmcuLCBieSBkb2luZyBwY2llX2luaXQoKSBhdCB0aGUgcG9pbnQgd2hlcmUgd2UgZGlz
Y292ZXIgdGhlDQo+IGJyaWRnZSwgYmVmb3JlIHdlIGVudW1lcmF0ZSBhbnkgZGV2aWNlcyBiZWxv
dyB0aGUgYnJpZGdlLiAgVGhpcyBpcw0KPiBzb21ld2hhdCB0YW5nbGVkIHVwIHdpdGggYWNwaWhw
LCBzbyBJIGRvbid0IGtub3cgaG93IGNvbXBsaWNhdGVkIGl0IHdvdWxkDQo+IGJlIHRvIGRvIHRo
aXMuDQo+IA0KPiBTbyBJIGd1ZXNzIG15IGFyZ3VtZW50IGlzOg0KPiANCj4gICAtIElnbm9yaW5n
IGV2ZW50cyB0aGF0IGhhcHBlbiBiZWZvcmUgcGNpZWhwIGluaXQgZGVjcmVhc2VzIG91ciBkZXBl
bmRlbmN5DQo+IG9uIEJJT1MNCg0KQWdyZWUuDQoNCj4gICAtIEhhbmRsaW5nIGFsbCBldmVudHMg
Y29uc2lzdGVudGx5IGlzIHZlcnkgaW1wb3J0YW50DQoNCkFncmVlLg0KDQo+ICAgLSBXZSBjdXJy
ZW50bHkgaGF2ZSBhIHByb2JsZW0gd2l0aCBtaXNzaW5nIHByZS1wY2llaHAgZXZlbnRzLCBidXQg
dGhlcmUgaXMNCj4gYSB3YXkgdG8gZml4IHRoaXMNCg0KU29tZXdoYXQgYWdyZWUgKEkgZG8gbm90
IHVuZGVyc3RhbmQgd2hhdCB0aGF0IHdheSBpcyBjdXJyZW50bHkuIEJ1dCBJIHRoaW5rIGl0IGNh
biB3YWl0IGFzIHdlIGRvbid0IGhpdCB0aGF0IHByb2JsZW0gdG9kYXkgYW5kIGFsc28gdGhlIHdp
bmRvdyB3aGVyZSBpdCBjYW4gb2NjdXIgaXMgdmVyeSBzbWFsbCkuDQoNClRoYW5rcywNCg0KUmFq
YXQNCg0KPiANCj4gPiBBY2tlZC1ieTogUmFqYXQgSmFpbiA8cmFqYXRqYWluQGp1bmlwZXIubmV0
Pg0KPiANCj4gVGhhbmtzIGZvciB0YWtpbmcgYSBsb29rIGF0IHRoaXMhDQo+IA0KPiBCam9ybg0K

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-25  5:26         ` Rajat Jain
@ 2014-06-25 17:51           ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2014-06-25 17:51 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Guenter Roeck, Myron Stowe, linux-pci@vger.kernel.org,
	Kenji Kaneshige, linux-kernel@vger.kernel.org, Rajat Jain

On Tue, Jun 24, 2014 at 11:26 PM, Rajat Jain <rajatjain@juniper.net> wrote:

>> > Essentially my hiccup was that I was not sure whether the driver should or
>> should not take care of the link change events that have happened BEFORE
>> the driver gets loaded. This has more impact if the pciehp is built as a kernel
>> module.
>> >
>> > As an example:
>> >
>> > It is common for the platform init code built into the kernel, to take the PCI
>> devices on the board out of reset. And that can happen after the PCI
>> enumeration but before the pciehp driver gets loaded. Thus in this condition,
>> with this patch, the pciehp will ignore the linkup event, thus device will not
>> be visible. The only way (other than a rescan) to do hot-add the device would
>> be to apply-and-remove-reset-signal to the device again. At which point
>> pciehp may give a warning about about an attempt to remove a non-existent
>> card, and then will proceed fine with hot-add.
>>
>> When you saw these problems, was pciehp a module?
>
> No, we did not actually hit *this* problem (DLLSC getting set before pciehp init). We had observed another HW specific problem where both link event and presence detect were toggling, hence 2 independent hot-plug interrupts events were coming, thus resulting in spurious messages.

OK, this sounds like a slightly different problem that we can work on
later if it turns out to be necessary.

>  > For example, if a card were inserted
>> after enumeration but before pciehp is initialized, we'd miss the PDC
>> indication, so I think we would fail to notice the new device.
>> That seems basically the same as missing the linkup event in your example.
>> In both cases, I think we *should* notice the new device, so the fact that we
>> don't is a problem.
> ...

>>   - We currently have a problem with missing pre-pciehp events, but there is
>> a way to fix this
>
> Somewhat agree (I do not understand what that way is currently. But I think it can wait as we don't hit that problem today and also the window where it can occur is very small).

My thought was that we could fix this by integrating pciehp into PCI
enumeration, so we're prepared to handle hotplug events on the bridge
before we enumerate any device below the bridge.  But I haven't
actually tried this so I don't know whether it's practical.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-17 21:07 ` Bjorn Helgaas
  2014-06-17 22:55   ` Guenter Roeck
@ 2014-06-30 16:49   ` Myron Stowe
  2014-07-01 19:29     ` Bjorn Helgaas
  1 sibling, 1 reply; 11+ messages in thread
From: Myron Stowe @ 2014-06-30 16:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Myron Stowe, linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>> During PCIe hot-plug initialization - pciehp_probe - data structures
>> related to slot capabilities are set up.  As part of this set up, ISRs are
>> put in place to handle slot events and all event bits are cleared out.
>>
>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>> Slot Status bit to the event bits that are cleared out during
>> initialization.
>
> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> notifications for hot-plug and removal").  Prior to that, pcie_isr()
> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>
> Apparently there's a non-public report of spurious messages like this
> at boot-time, with no actual hotplug events:
>
>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> 0000:83:00, cannot hot-add
>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>
> Device 0000:83:00.0 was enumerated previously, during the normal PCI
> device enumeration.  I suspect DLLSC was set by the hardware when the
> link came up after power-on, and it remained set until Linux booted.
> Then when we take the first pciehp interrupt, we notice DLLSC is set
> and think it's a new link state change.
>
> This might be system-specific, because some BIOSes might clear DLLSC
> before handing off to the OS.  Or my theory might be all wet.
>
> But I think we should clear DLLSC in any case.  We clear all the other
> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>
> I'd like to include a bugzilla or mailing list reference for the
> spurious message, and I'd also like confirmation that this changes
> actually fixes it.

I just got confirmation from the reporter that the patch fixes the
issue they were encountering.

I've also asked them numerous time if I could make the bug public as I
expect others may be hitting it and could benefit from a BZ and
analysis/explanation.  I just repeated my request - if they reply
positively I'll follow through with some documentation.

Myron

>
> Bjorn
>
>> Reference:
>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>
>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>> ---
>>
>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 42914e0..0568416 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>> -               PCI_EXP_SLTSTA_CC);
>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>
>>         /* Disable software notification */
>>         pcie_disable_notification(ctrl);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-06-30 16:49   ` Myron Stowe
@ 2014-07-01 19:29     ` Bjorn Helgaas
  2014-07-07 20:26       ` Myron Stowe
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2014-07-01 19:29 UTC (permalink / raw)
  To: Myron Stowe
  Cc: Myron Stowe, linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>> related to slot capabilities are set up.  As part of this set up, ISRs are
>>> put in place to handle slot events and all event bits are cleared out.
>>>
>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>> Slot Status bit to the event bits that are cleared out during
>>> initialization.
>>
>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> notifications for hot-plug and removal").  Prior to that, pcie_isr()
>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>
>> Apparently there's a non-public report of spurious messages like this
>> at boot-time, with no actual hotplug events:
>>
>>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> 0000:83:00, cannot hot-add
>>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>
>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>> device enumeration.  I suspect DLLSC was set by the hardware when the
>> link came up after power-on, and it remained set until Linux booted.
>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>> and think it's a new link state change.
>>
>> This might be system-specific, because some BIOSes might clear DLLSC
>> before handing off to the OS.  Or my theory might be all wet.
>>
>> But I think we should clear DLLSC in any case.  We clear all the other
>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>
>> I'd like to include a bugzilla or mailing list reference for the
>> spurious message, and I'd also like confirmation that this changes
>> actually fixes it.
>
> I just got confirmation from the reporter that the patch fixes the
> issue they were encountering.
>
> I've also asked them numerous time if I could make the bug public as I
> expect others may be hitting it and could benefit from a BZ and
> analysis/explanation.  I just repeated my request - if they reply
> positively I'll follow through with some documentation.

I'd be glad to merge this, as soon as we get more details about the
problem it fixes.  I think it's good to include some specifics, e.g.,
"Unexpected Link Up event," etc., because it helps other people with
the same problem to find the solution.  It's OK if details about
pre-production machines and so on are removed, but it's better if
changes to generic code are supported by public evidence that
everybody can evaluate.

If you want to pull out the public stuff into a kernel.org bugzilla,
that would be fine, too.  That would actually be better because then
we don't have to depend on Red Hat maintaining it.

Bjorn

>>> Reference:
>>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>
>>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>>> ---
>>>
>>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>> index 42914e0..0568416 100644
>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>> -               PCI_EXP_SLTSTA_CC);
>>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>
>>>         /* Disable software notification */
>>>         pcie_disable_notification(ctrl);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-07-01 19:29     ` Bjorn Helgaas
@ 2014-07-07 20:26       ` Myron Stowe
  2014-07-07 20:56         ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Myron Stowe @ 2014-07-07 20:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Myron Stowe, linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>>> related to slot capabilities are set up.  As part of this set up, ISRs are
>>>> put in place to handle slot events and all event bits are cleared out.
>>>>
>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>>> Slot Status bit to the event bits that are cleared out during
>>>> initialization.
>>>
>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>>> notifications for hot-plug and removal").  Prior to that, pcie_isr()
>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>>
>>> Apparently there's a non-public report of spurious messages like this
>>> at boot-time, with no actual hotplug events:
>>>
>>>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>>> 0000:83:00, cannot hot-add
>>>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>>
>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>>> device enumeration.  I suspect DLLSC was set by the hardware when the
>>> link came up after power-on, and it remained set until Linux booted.
>>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>>> and think it's a new link state change.
>>>
>>> This might be system-specific, because some BIOSes might clear DLLSC
>>> before handing off to the OS.  Or my theory might be all wet.
>>>
>>> But I think we should clear DLLSC in any case.  We clear all the other
>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>>
>>> I'd like to include a bugzilla or mailing list reference for the
>>> spurious message, and I'd also like confirmation that this changes
>>> actually fixes it.
>>
>> I just got confirmation from the reporter that the patch fixes the
>> issue they were encountering.
>>
>> I've also asked them numerous time if I could make the bug public as I
>> expect others may be hitting it and could benefit from a BZ and
>> analysis/explanation.  I just repeated my request - if they reply
>> positively I'll follow through with some documentation.
>
> I'd be glad to merge this, as soon as we get more details about the
> problem it fixes.  I think it's good to include some specifics, e.g.,
> "Unexpected Link Up event," etc., because it helps other people with
> the same problem to find the solution.  It's OK if details about
> pre-production machines and so on are removed, but it's better if
> changes to generic code are supported by public evidence that
> everybody can evaluate.
>
> If you want to pull out the public stuff into a kernel.org bugzilla,
> that would be fine, too.  That would actually be better because then
> we don't have to depend on Red Hat maintaining it.

Agreed (but I couldn't get the vendor to agree to open up the Red Hat
BZ).  I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611
and included a scrubbed 'dmesg' log there along with an analysis of
how we are hitting the issue.

>
> Bjorn
>
>>>> Reference:
>>>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>>
>>>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>>>> ---
>>>>
>>>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>>> index 42914e0..0568416 100644
>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>>> -               PCI_EXP_SLTSTA_CC);
>>>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>>
>>>>         /* Disable software notification */
>>>>         pcie_disable_notification(ctrl);
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits
  2014-07-07 20:26       ` Myron Stowe
@ 2014-07-07 20:56         ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2014-07-07 20:56 UTC (permalink / raw)
  To: Myron Stowe
  Cc: Myron Stowe, linux-pci@vger.kernel.org, Kenji Kaneshige,
	linux-kernel@vger.kernel.org, Rajat Jain

On Mon, Jul 7, 2014 at 2:26 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
>>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>>>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>>>> related to slot capabilities are set up.  As part of this set up, ISRs are
>>>>> put in place to handle slot events and all event bits are cleared out.
>>>>>
>>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>>>> Slot Status bit to the event bits that are cleared out during
>>>>> initialization.
>>>>
>>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>>>> notifications for hot-plug and removal").  Prior to that, pcie_isr()
>>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>>>
>>>> Apparently there's a non-public report of spurious messages like this
>>>> at boot-time, with no actual hotplug events:
>>>>
>>>>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>>>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>>>> 0000:83:00, cannot hot-add
>>>>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>>>
>>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>>>> device enumeration.  I suspect DLLSC was set by the hardware when the
>>>> link came up after power-on, and it remained set until Linux booted.
>>>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>>>> and think it's a new link state change.
>>>>
>>>> This might be system-specific, because some BIOSes might clear DLLSC
>>>> before handing off to the OS.  Or my theory might be all wet.
>>>>
>>>> But I think we should clear DLLSC in any case.  We clear all the other
>>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>>>
>>>> I'd like to include a bugzilla or mailing list reference for the
>>>> spurious message, and I'd also like confirmation that this changes
>>>> actually fixes it.
>>>
>>> I just got confirmation from the reporter that the patch fixes the
>>> issue they were encountering.
>>>
>>> I've also asked them numerous time if I could make the bug public as I
>>> expect others may be hitting it and could benefit from a BZ and
>>> analysis/explanation.  I just repeated my request - if they reply
>>> positively I'll follow through with some documentation.
>>
>> I'd be glad to merge this, as soon as we get more details about the
>> problem it fixes.  I think it's good to include some specifics, e.g.,
>> "Unexpected Link Up event," etc., because it helps other people with
>> the same problem to find the solution.  It's OK if details about
>> pre-production machines and so on are removed, but it's better if
>> changes to generic code are supported by public evidence that
>> everybody can evaluate.
>>
>> If you want to pull out the public stuff into a kernel.org bugzilla,
>> that would be fine, too.  That would actually be better because then
>> we don't have to depend on Red Hat maintaining it.
>
> Agreed (but I couldn't get the vendor to agree to open up the Red Hat
> BZ).  I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611
> and included a scrubbed 'dmesg' log there along with an analysis of
> how we are hitting the issue.

Beautiful, thanks!  I marked this for stable and put this in
pci/hotplug for v3.17.

>>>>> Reference:
>>>>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>>>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>>>
>>>>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>>>>> ---
>>>>>
>>>>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>>>> index 42914e0..0568416 100644
>>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>>>> -               PCI_EXP_SLTSTA_CC);
>>>>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>>>
>>>>>         /* Disable software notification */
>>>>>         pcie_disable_notification(ctrl);
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-07 20:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-17 19:27 [PATCH] PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits Myron Stowe
2014-06-17 21:07 ` Bjorn Helgaas
2014-06-17 22:55   ` Guenter Roeck
2014-06-24 20:34     ` Rajat Jain
2014-06-24 21:16       ` Bjorn Helgaas
2014-06-25  5:26         ` Rajat Jain
2014-06-25 17:51           ` Bjorn Helgaas
2014-06-30 16:49   ` Myron Stowe
2014-07-01 19:29     ` Bjorn Helgaas
2014-07-07 20:26       ` Myron Stowe
2014-07-07 20:56         ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).