linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux 5.15-rc1
       [not found] ` <20210913141818.GA27911@codemonkey.org.uk>
@ 2021-09-13 18:59   ` Heiner Kallweit
  2021-09-13 19:51     ` Linus Torvalds
  2021-09-13 20:15     ` Dave Jones
  2021-09-13 23:46   ` Bjorn Helgaas
  1 sibling, 2 replies; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-13 18:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List,
	Bjorn Helgaas
  Cc: linux-pci@vger.kernel.org

On 13.09.2021 16:18, Dave Jones wrote:
> [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
> [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
> [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
> [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
> [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update

Thanks for the report! The stalls may be related to this one. Device is:
02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)

With an older kernel you may experience the stall when accessing the vpd
attribute of this device in sysfs.

Maybe the device indicates VPD capability but doesn't actually support it.
Could you please provide the "lspci -vv" output for this device?

And could you please test with the following applied to verify the
assumption? It disables VPD access for this device.

---
 drivers/pci/vpd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 517789205..fc92e880e 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
 /*
  * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
  * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
-- 
2.33.0







^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 18:59   ` Linux 5.15-rc1 Heiner Kallweit
@ 2021-09-13 19:51     ` Linus Torvalds
  2021-09-13 20:09       ` Bjorn Helgaas
  2021-09-13 20:11       ` Heiner Kallweit
  2021-09-13 20:15     ` Dave Jones
  1 sibling, 2 replies; 35+ messages in thread
From: Linus Torvalds @ 2021-09-13 19:51 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> With an older kernel you may experience the stall when accessing the vpd
> attribute of this device in sysfs.

Honestly, that old behavior seems to be the *much* better behavior.

A synchronous stall at boot time is truly annoying, and a pain to deal
with (and debug).

That pci_vpd_read() function is clearly NOT designed to deal with
boot-time callers in the first place, so I think that commit is simply
wrong.

And yes, I see that "128ms timeout". If it was _one_ timeout, that
would be one thing,. But it looks like it's repeated over and over.

Not acceptable at boot time. Not at all.

Bjorn. Please revert. Or I can do it.

            Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 19:51     ` Linus Torvalds
@ 2021-09-13 20:09       ` Bjorn Helgaas
  2021-09-13 20:11       ` Heiner Kallweit
  1 sibling, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-13 20:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiner Kallweit, Dave Jones, Linux Kernel Mailing List,
	Bjorn Helgaas, linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 12:51:30PM -0700, Linus Torvalds wrote:
> On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> > With an older kernel you may experience the stall when accessing the vpd
> > attribute of this device in sysfs.
> 
> Honestly, that old behavior seems to be the *much* better behavior.
> 
> A synchronous stall at boot time is truly annoying, and a pain to deal
> with (and debug).
> 
> That pci_vpd_read() function is clearly NOT designed to deal with
> boot-time callers in the first place, so I think that commit is simply
> wrong.
> 
> And yes, I see that "128ms timeout". If it was _one_ timeout, that
> would be one thing,. But it looks like it's repeated over and over.
> 
> Not acceptable at boot time. Not at all.
> 
> Bjorn. Please revert. Or I can do it.

Sorry about this.

Dave said it wasn't a trivial revert, but I'll be happy to work with
Heiner and Dave to revert and test it.

I agree that we shouldn't read VPD at boot-time unless we actually
need the data then.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 19:51     ` Linus Torvalds
  2021-09-13 20:09       ` Bjorn Helgaas
@ 2021-09-13 20:11       ` Heiner Kallweit
  2021-09-13 20:15         ` Linus Torvalds
  1 sibling, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On 13.09.2021 21:51, Linus Torvalds wrote:
> On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> With an older kernel you may experience the stall when accessing the vpd
>> attribute of this device in sysfs.
> 
> Honestly, that old behavior seems to be the *much* better behavior.
> 
> A synchronous stall at boot time is truly annoying, and a pain to deal
> with (and debug).
> 
> That pci_vpd_read() function is clearly NOT designed to deal with
> boot-time callers in the first place, so I think that commit is simply
> wrong.
> 
> And yes, I see that "128ms timeout". If it was _one_ timeout, that
> would be one thing,. But it looks like it's repeated over and over.
> 
No. The timeout is not the issue, otherwise you would see the message
"VPD access failed.."  over and over again. The issue here seems to be
that this call in PCI config space access to adress
vpd->cap + PCI_VPD_ADDR stalls.

In a first place this seems to be due to a buggy device. We'll know
for sure once Dave checks with the test patch applied. To deal with
such buggy devices we have the VPD blacklist quirk.

Secondly you could blame the PCI subsystem for not detecting stalled
access to a buggy device. However I don't know the PCIe spec good
enough to really comment on this.

> Not acceptable at boot time. Not at all.
> 
> Bjorn. Please revert. Or I can do it.
> 
>             Linus
> 
Heiner

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 18:59   ` Linux 5.15-rc1 Heiner Kallweit
  2021-09-13 19:51     ` Linus Torvalds
@ 2021-09-13 20:15     ` Dave Jones
  2021-09-13 20:22       ` Heiner Kallweit
  2021-09-13 20:32       ` Heiner Kallweit
  1 sibling, 2 replies; 35+ messages in thread
From: Dave Jones @ 2021-09-13 20:15 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote:
 > On 13.09.2021 16:18, Dave Jones wrote:
 > > [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
 > > [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
 > > [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
 > > [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
 > > [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
 > 
 > Thanks for the report! The stalls may be related to this one. Device is:
 > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
 > 
 > With an older kernel you may experience the stall when accessing the vpd
 > attribute of this device in sysfs.
 > 
 > Maybe the device indicates VPD capability but doesn't actually support it.
 > Could you please provide the "lspci -vv" output for this device?

02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
        Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at dfc00000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
                Not readable
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Latency Tolerance Reporting
                Max snoop latency: 71680ns
                Max no snoop latency: 71680ns
        Kernel driver in use: ahci


 > And could you please test with the following applied to verify the
 > assumption? It disables VPD access for this device.
 > 
 > ---
 >  drivers/pci/vpd.c | 1 +
 >  1 file changed, 1 insertion(+)
 > 
 > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
 > index 517789205..fc92e880e 100644
 > --- a/drivers/pci/vpd.c
 > +++ b/drivers/pci/vpd.c
 > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
 >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
 >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
 >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
 > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
 >  /*
 >   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
 >   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.


This didn't help I'm afraid :(
It changed the VPD warning, but that's about it...

[  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
[  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
[  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs


	Dave


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:11       ` Heiner Kallweit
@ 2021-09-13 20:15         ` Linus Torvalds
  0 siblings, 0 replies; 35+ messages in thread
From: Linus Torvalds @ 2021-09-13 20:15 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 1:11 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> No. The timeout is not the issue, otherwise you would see the message
> "VPD access failed.."  over and over again.

Ahh, I did check that that error did exist, but you're right, it's not
there all the time.

> The issue here seems to be
> that this call in PCI config space access to adress
> vpd->cap + PCI_VPD_ADDR stalls.

Ok. Regardless, we shouldn't do this since the boot code shouldn't
need any of the VPD information.

        Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:15     ` Dave Jones
@ 2021-09-13 20:22       ` Heiner Kallweit
  2021-09-13 20:32         ` Dave Jones
  2021-09-13 20:32       ` Heiner Kallweit
  1 sibling, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:22 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List,
	Bjorn Helgaas, linux-pci@vger.kernel.org

On 13.09.2021 22:15, Dave Jones wrote:
> On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote:
>  > On 13.09.2021 16:18, Dave Jones wrote:
>  > > [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
>  > > [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
>  > > [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
>  > > [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
>  > > [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
>  > 
>  > Thanks for the report! The stalls may be related to this one. Device is:
>  > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
>  > 
>  > With an older kernel you may experience the stall when accessing the vpd
>  > attribute of this device in sysfs.
>  > 
>  > Maybe the device indicates VPD capability but doesn't actually support it.
>  > Could you please provide the "lspci -vv" output for this device?
> 
> 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
>         Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 16
>         Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K]
>         Expansion ROM at dfc00000 [disabled] [size=64K]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [70] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>                 LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
>                         ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
>                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
>                          10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
>                          EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>                          FRS- TPHComp- ExtTPHComp-
>                          AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
>                          AtomicOpsCtl: ReqEn-
>                 LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
>                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>                          EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
>                          Retimer- 2Retimers- CrosslinkRes: unsupported
>         Capabilities: [d0] Vital Product Data
>                 Not readable
>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>                 AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>                 HeaderLog: 00000000 00000000 00000000 00000000
>         Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
>         Capabilities: [150 v1] Power Budgeting <?>
>         Capabilities: [160 v1] Latency Tolerance Reporting
>                 Max snoop latency: 71680ns
>                 Max no snoop latency: 71680ns
>         Kernel driver in use: ahci
> 
> 
>  > And could you please test with the following applied to verify the
>  > assumption? It disables VPD access for this device.
>  > 
>  > ---
>  >  drivers/pci/vpd.c | 1 +
>  >  1 file changed, 1 insertion(+)
>  > 
>  > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
>  > index 517789205..fc92e880e 100644
>  > --- a/drivers/pci/vpd.c
>  > +++ b/drivers/pci/vpd.c
>  > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>  > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
>  >  /*
>  >   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
>  >   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
> 
> 
> This didn't help I'm afraid :(
> It changed the VPD warning, but that's about it...
> 
> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
> 
With this patch there's no VPD access to this device any longer. So this can't be
the root cause. Do you have any other PCI device that has VPD capability?
-> Capabilities: [...] Vital Product Data

> 
> 	Dave
> 
Heiner

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:22       ` Heiner Kallweit
@ 2021-09-13 20:32         ` Dave Jones
  2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
  2021-09-13 20:59           ` Linux 5.15-rc1 Heiner Kallweit
  0 siblings, 2 replies; 35+ messages in thread
From: Dave Jones @ 2021-09-13 20:32 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:

 > > This didn't help I'm afraid :(
 > > It changed the VPD warning, but that's about it...
 > > 
 > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
 > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
 > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
 > > 
 > With this patch there's no VPD access to this device any longer. So this can't be
 > the root cause. Do you have any other PCI device that has VPD capability?
 > -> Capabilities: [...] Vital Product Data


01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Device 1dcf:030a
	...
	        Capabilities: [e0] Vital Product Data
                Unknown small resource type 06, will not decode more.


I'll add that to the quirk list and see if that helps.

	Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:15     ` Dave Jones
  2021-09-13 20:22       ` Heiner Kallweit
@ 2021-09-13 20:32       ` Heiner Kallweit
  2021-09-13 20:36         ` Linus Torvalds
  2021-09-13 20:41         ` Dave Jones
  1 sibling, 2 replies; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:32 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List,
	Bjorn Helgaas, linux-pci@vger.kernel.org

On 13.09.2021 22:15, Dave Jones wrote:
> On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote:
>  > On 13.09.2021 16:18, Dave Jones wrote:
>  > > [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
>  > > [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
>  > > [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
>  > > [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
>  > > [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
>  > 
>  > Thanks for the report! The stalls may be related to this one. Device is:
>  > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
>  > 
>  > With an older kernel you may experience the stall when accessing the vpd
>  > attribute of this device in sysfs.
>  > 
>  > Maybe the device indicates VPD capability but doesn't actually support it.
>  > Could you please provide the "lspci -vv" output for this device?
> 
> 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
>         Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 16
>         Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K]
>         Expansion ROM at dfc00000 [disabled] [size=64K]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [70] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>                 LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
>                         ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
>                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
>                          10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
>                          EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>                          FRS- TPHComp- ExtTPHComp-
>                          AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
>                          AtomicOpsCtl: ReqEn-
>                 LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
>                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>                          EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
>                          Retimer- 2Retimers- CrosslinkRes: unsupported
>         Capabilities: [d0] Vital Product Data
>                 Not readable
>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>                 AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>                 HeaderLog: 00000000 00000000 00000000 00000000
>         Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
>         Capabilities: [150 v1] Power Budgeting <?>
>         Capabilities: [160 v1] Latency Tolerance Reporting
>                 Max snoop latency: 71680ns
>                 Max no snoop latency: 71680ns
>         Kernel driver in use: ahci
> 
> 
>  > And could you please test with the following applied to verify the
>  > assumption? It disables VPD access for this device.
>  > 
>  > ---
>  >  drivers/pci/vpd.c | 1 +
>  >  1 file changed, 1 insertion(+)
>  > 
>  > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
>  > index 517789205..fc92e880e 100644
>  > --- a/drivers/pci/vpd.c
>  > +++ b/drivers/pci/vpd.c
>  > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>  > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
>  >  /*
>  >   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
>  >   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
> 
> 
> This didn't help I'm afraid :(
> It changed the VPD warning, but that's about it...
> 
> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
> 

OK, so this device is buggy too but not the root cause. After checking again the
stalls happen for VPD access to both ports of the Intel network adapter.

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

I modified the test patch accordingly, could you please test again?

---
 drivers/pci/vpd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 517789205..fc92e880e 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x10fb, quirk_blacklist_vpd);
 /*
  * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
  * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:32       ` Heiner Kallweit
@ 2021-09-13 20:36         ` Linus Torvalds
  2021-09-13 20:41         ` Dave Jones
  1 sibling, 0 replies; 35+ messages in thread
From: Linus Torvalds @ 2021-09-13 20:36 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 1:33 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> OK, so this device is buggy too but not the root cause. After checking again the
> stalls happen for VPD access to both ports of the Intel network adapter.

I really don't want to add quirks like this.

If this happens to a major manufacturer (whoe _defined_ the PCI
standard, for chrissake), then this is not something where we want a
quirk to avoid a boot-time delay.

That commit needs to be reverted. If reverting it is hard, then we
need to revert everything it depends on.

              Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:32       ` Heiner Kallweit
  2021-09-13 20:36         ` Linus Torvalds
@ 2021-09-13 20:41         ` Dave Jones
  1 sibling, 0 replies; 35+ messages in thread
From: Dave Jones @ 2021-09-13 20:41 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas,
	linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 10:32:56PM +0200, Heiner Kallweit wrote:

 > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
 > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
 > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
 > > 
 > 
 > OK, so this device is buggy too but not the root cause. After checking again the
 > stalls happen for VPD access to both ports of the Intel network adapter.
 > 
 > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
 > 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
 > 
 > I modified the test patch accordingly, could you please test again?

That does avoid the stall.

	Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-13 20:32         ` Dave Jones
@ 2021-09-13 20:44           ` Heiner Kallweit
  2021-09-13 23:32             ` [Intel-wired-lan] " Hisashi T Fujinaka
  2021-09-13 20:59           ` Linux 5.15-rc1 Heiner Kallweit
  1 sibling, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:44 UTC (permalink / raw)
  To: Jesse Brandeburg, Tony Nguyen
  Cc: linux-pci@vger.kernel.org, Linus Torvalds,
	Linux Kernel Mailing List, Bjorn Helgaas, Dave Jones,
	intel-wired-lan

On 13.09.2021 22:32, Dave Jones wrote:

+ Jesse and Tony as Intel NIC maintainers

> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
> 
>  > > This didn't help I'm afraid :(
>  > > It changed the VPD warning, but that's about it...
>  > > 
>  > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>  > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
>  > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>  > > 
>  > With this patch there's no VPD access to this device any longer. So this can't be
>  > the root cause. Do you have any other PCI device that has VPD capability?
>  > -> Capabilities: [...] Vital Product Data
> 
> 
> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>         Subsystem: Device 1dcf:030a
> 	...
> 	        Capabilities: [e0] Vital Product Data
>                 Unknown small resource type 06, will not decode more.
> 

When searching I found the same symptom of invalid VPD data for 82599EB.
Do these adapters have non-VPD data in VPD address space? Or is the actual
VPD data at another offset than 0? I know that few Chelsio devices have
such a non-standard VPD structure.

> 
> I'll add that to the quirk list and see if that helps.
> 
> 	Dave
> 
Heiner

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:32         ` Dave Jones
  2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
@ 2021-09-13 20:59           ` Heiner Kallweit
  2021-09-13 23:35             ` Bjorn Helgaas
  1 sibling, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:59 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org

On 13.09.2021 22:32, Dave Jones wrote:
> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
> 
>  > > This didn't help I'm afraid :(
>  > > It changed the VPD warning, but that's about it...
>  > > 
>  > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>  > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
>  > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>  > > 
>  > With this patch there's no VPD access to this device any longer. So this can't be
>  > the root cause. Do you have any other PCI device that has VPD capability?
>  > -> Capabilities: [...] Vital Product Data
> 
> 
> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>         Subsystem: Device 1dcf:030a
> 	...
> 	        Capabilities: [e0] Vital Product Data
>                 Unknown small resource type 06, will not decode more.
> 

The stall being discussed would have been prevented by the VPD tag
verification in pci_vpd_size(). It seems that now random data is
interpreted as VPD tags what results in VPD access to an address
that makes the device stall.
I do not really follow Linus' argumentation that VPD shouldn't be
accessed during boot because other slow "VPD-like" devices are
accessed too, e.g. DDR SPD via I2C.

> 
> I'll add that to the quirk list and see if that helps.
> 
> 	Dave
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
@ 2021-09-13 23:32             ` Hisashi T Fujinaka
  2021-09-14  5:51               ` Heiner Kallweit
  0 siblings, 1 reply; 35+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-13 23:32 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jesse Brandeburg, Tony Nguyen, Dave Jones,
	linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas, Linus Torvalds, todd.fujinaka

On Mon, 13 Sep 2021, Heiner Kallweit wrote:

> On 13.09.2021 22:32, Dave Jones wrote:
>
> + Jesse and Tony as Intel NIC maintainers
>
>> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
>>
>> >> This didn't help I'm afraid :(
>> >> It changed the VPD warning, but that's about it...
>> >>
>> >> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>> >> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)
>> >> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>> >>
>> > With this patch there's no VPD access to this device any longer. So this can't be
>> > the root cause. Do you have any other PCI device that has VPD capability?
>> > -> Capabilities: [...] Vital Product Data
>>
>>
>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>         Subsystem: Device 1dcf:030a
>> 	...
>> 	        Capabilities: [e0] Vital Product Data
>>                 Unknown small resource type 06, will not decode more.
>>
>
> When searching I found the same symptom of invalid VPD data for 82599EB.
> Do these adapters have non-VPD data in VPD address space? Or is the actual
> VPD data at another offset than 0? I know that few Chelsio devices have
> such a non-standard VPD structure.
>
>>
>> I'll add that to the quirk list and see if that helps.
>>
>> 	Dave
>>
> Heiner

Sorry to reply from my personal account. If I did it from my work
account I'd be top-posting because of Outlook and that goes over like a
lead balloon.

Anyway, can you send us a dump of your eeprom using ethtool -e? You can
either send it via a bug on e1000.sourceforge.net or try sending it to
todd.fujinaka@intel.com

The other thing is I'm wondering is what the subvendor device ID you
have is referring to because it's not in the pci database. Some ODMs
like getting creative with what they put in the NVM.

Todd Fujinaka (todd.fujinaka@intel.com)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:59           ` Linux 5.15-rc1 Heiner Kallweit
@ 2021-09-13 23:35             ` Bjorn Helgaas
  0 siblings, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-13 23:35 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Bjorn Helgaas, linux-pci@vger.kernel.org

On Mon, Sep 13, 2021 at 10:59:05PM +0200, Heiner Kallweit wrote:
> On 13.09.2021 22:32, Dave Jones wrote:
> > On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
> > 
> >  > > This didn't help I'm afraid :(
> >  > > It changed the VPD warning, but that's about it...
> >  > > 
> >  > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
> >  > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
> >  > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
> >  > > 
> >  > With this patch there's no VPD access to this device any longer. So this can't be
> >  > the root cause. Do you have any other PCI device that has VPD capability?
> >  > -> Capabilities: [...] Vital Product Data
> > 
> > 
> > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >         Subsystem: Device 1dcf:030a
> > 	...
> > 	        Capabilities: [e0] Vital Product Data
> >                 Unknown small resource type 06, will not decode more.
> > 
> 
> The stall being discussed would have been prevented by the VPD tag
> verification in pci_vpd_size(). It seems that now random data is
> interpreted as VPD tags what results in VPD access to an address
> that makes the device stall.

It's possible we need to validate more.  But that's not the solution
to the current problem.

> I do not really follow Linus' argumentation that VPD shouldn't be
> accessed during boot because other slow "VPD-like" devices are
> accessed too, e.g. DDR SPD via I2C.

I do.  If we don't *need* it during boot, there's no reason to read it
then.  It only slows down boot.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
       [not found] ` <20210913141818.GA27911@codemonkey.org.uk>
  2021-09-13 18:59   ` Linux 5.15-rc1 Heiner Kallweit
@ 2021-09-13 23:46   ` Bjorn Helgaas
  2021-09-14  0:39     ` Dave Jones
  2021-09-14  6:21     ` Heiner Kallweit
  1 sibling, 2 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-13 23:46 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Heiner Kallweit; +Cc: linux-kernel, linux-pci

On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote:
> On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
>  > So 5.15 isn't shaping up to be a particularly large release, at least
>  > in number of commits. At only just over 10k non-merge commits, this is
>  > in fact the smallest rc1 we have had in the 5.x series. We're usually
>  > hovering in the 12-14k commit range.
> 
> This release takes over two minutes longer to boot on one my
> machines than 5.14.  The time just seems to be unaccounted for, even
> with initcall_debug

> ...
> [    2.194093] pci 0000:01:00.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
> [    2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
> [    2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000
> [    2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref]
> [    2.194113] pci 0000:01:00.0: reg 0x18: [io  0xe020-0xe03f]
> [    2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref]
> [    2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref]
> [    2.194136] pci 0000:01:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
> [    2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
> [    2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> 
> * stall here for 86 seconds *
> 
> [   88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref]
> ...


> 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit
> commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Sun Aug 8 19:22:52 2021 +0200
> 
>     PCI/VPD: Determine VPD size in pci_vpd_init()
> 
>     Determine VPD size in pci_vpd_init().
> 
>     Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the
>     dynamic size calculation.  Prerequisite is that we move the quirks from
>     FINAL to HEADER so they are run before pci_vpd_init().
> 
>     Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> 
> Which unfortunately doesn't revert cleanly I can't test it reverted in
> isolation.
> 
> My guess is there's something quirky about the PCI bus on this machine
> that causes stalls until we hit timeout, but I'm not sure where to begin
> debugging this.

Sorry for the inconvenience of this, and thank you very much for doing
the bisection to track it down.

We *could* revert 7bac54497c3e, but it'd be messy because a bunch of
follow-up stuff depends on it.

I propose something like the patch below.  Would you mind trying it
out?


commit 4ede9949b93c ("PCI/VPD: Defer VPD sizing until first access")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Mon Sep 13 16:13:26 2021 -0500

    PCI/VPD: Defer VPD sizing until first access
    
    7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at
    enumeration-time to find the size.  But this is quite slow, and we don't
    need the size until we actually need data from VPD.  Dave reported a boot
    slowdown of more than two minutes [1].
    
    Defer the VPD sizing until a driver or the user requests information from
    VPD.  If devices are quirked because VPD is known not to work, clear the
    vpd.cap pointer so we don't access it at all.
    
    [1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/
    Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()")
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 25557b272a4f..ca823ceee10c 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -46,13 +46,12 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev)
 }
 
 #define PCI_VPD_MAX_SIZE	(PCI_VPD_ADDR_MASK + 1)
-#define PCI_VPD_SZ_INVALID	UINT_MAX
 
 /**
  * pci_vpd_size - determine actual size of Vital Product Data
  * @dev:	pci device struct
  */
-static size_t pci_vpd_size(struct pci_dev *dev)
+static void pci_vpd_size(struct pci_dev *dev)
 {
 	size_t off = 0, size;
 	unsigned char tag, header[1+2];	/* 1 byte tag, 2 bytes length */
@@ -71,7 +70,7 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 			if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) {
 				pci_warn(dev, "failed VPD read at offset %zu\n",
 					 off + 1);
-				return off ?: PCI_VPD_SZ_INVALID;
+				goto finish;
 			}
 			size = pci_vpd_lrdt_size(header);
 			if (off + size > PCI_VPD_MAX_SIZE)
@@ -87,16 +86,19 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 
 			off += PCI_VPD_SRDT_TAG_SIZE + size;
 			if (tag == PCI_VPD_STIN_END)	/* End tag descriptor */
-				return off;
+				goto finish;
 		}
 	}
-	return off;
+	goto finish;
 
 error:
 	pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n",
 		 header[0], size, off, off == 0 ?
 		 "; assume missing optional EEPROM" : "");
-	return off ?: PCI_VPD_SZ_INVALID;
+finish:
+	dev->vpd.len = off;
+	if (off == 0)
+		dev->vpd.cap = 0;		/* No VPD at all */
 }
 
 /*
@@ -145,9 +147,6 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	u8 *buf = arg;
 
-	if (!vpd->cap)
-		return -ENODEV;
-
 	if (pos < 0)
 		return -EINVAL;
 
@@ -206,9 +205,6 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	int ret = 0;
 
-	if (!vpd->cap)
-		return -ENODEV;
-
 	if (pos < 0 || (pos & 3) || (count & 3))
 		return -EINVAL;
 
@@ -244,12 +240,6 @@ void pci_vpd_init(struct pci_dev *dev)
 {
 	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
 	mutex_init(&dev->vpd.lock);
-
-	if (!dev->vpd.len)
-		dev->vpd.len = pci_vpd_size(dev);
-
-	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
-		dev->vpd.cap = 0;
 }
 
 static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
@@ -294,25 +284,29 @@ const struct attribute_group pci_dev_vpd_attr_group = {
 
 void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
 {
-	unsigned int len = dev->vpd.len;
+	struct pci_vpd *vpd = &dev->vpd;
+	unsigned int len;
 	void *buf;
 	int cnt;
 
-	if (!dev->vpd.cap)
+	if (!vpd->cap)
 		return ERR_PTR(-ENODEV);
 
 	buf = kmalloc(len, GFP_KERNEL);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	cnt = pci_read_vpd(dev, 0, len, buf);
+	if (vpd->len == 0)
+		pci_vpd_size(dev);
+
+	cnt = pci_read_vpd(dev, 0, vpd->len, buf);
 	if (cnt != len) {
 		kfree(buf);
 		return ERR_PTR(-EIO);
 	}
 
 	if (size)
-		*size = len;
+		*size = vpd->len;
 
 	return buf;
 }
@@ -374,6 +368,7 @@ static int pci_vpd_find_info_keyword(const u8 *buf, unsigned int off,
  */
 ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
 {
+	struct pci_vpd *vpd;
 	ssize_t ret;
 
 	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
@@ -381,11 +376,27 @@ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
 		if (!dev)
 			return -ENODEV;
 
+		vpd = &dev->vpd;
+		if (!vpd->cap) {
+			pci_dev_put(dev);
+			return -ENODEV;
+		}
+
+		if (vpd->len == 0)
+			pci_vpd_size(dev);
+
 		ret = pci_vpd_read(dev, pos, count, buf);
 		pci_dev_put(dev);
 		return ret;
 	}
 
+	vpd = &dev->vpd;
+	if (!vpd->cap)
+		return -ENODEV;
+
+	if (vpd->len == 0)
+		pci_vpd_size(dev);
+
 	return pci_vpd_read(dev, pos, count, buf);
 }
 EXPORT_SYMBOL(pci_read_vpd);
@@ -399,6 +410,7 @@ EXPORT_SYMBOL(pci_read_vpd);
  */
 ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf)
 {
+	struct pci_vpd *vpd;
 	ssize_t ret;
 
 	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
@@ -406,11 +418,26 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void
 		if (!dev)
 			return -ENODEV;
 
+		vpd = &dev->vpd;
+		if (!vpd->cap) {
+			pci_dev_put(dev);
+			return -ENODEV;
+		}
+
+		if (vpd->len == 0)
+			pci_vpd_size(dev);
+
 		ret = pci_vpd_write(dev, pos, count, buf);
 		pci_dev_put(dev);
 		return ret;
 	}
 
+	if (!vpd->cap)
+		return -ENODEV;
+
+	if (vpd->len == 0)
+		pci_vpd_size(dev);
+
 	return pci_vpd_write(dev, pos, count, buf);
 }
 EXPORT_SYMBOL(pci_write_vpd);
@@ -500,27 +527,27 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
  */
 static void quirk_blacklist_vpd(struct pci_dev *dev)
 {
-	dev->vpd.len = PCI_VPD_SZ_INVALID;
+	dev->vpd.cap = 0;
 	pci_warn(dev, FW_BUG "disabling VPD access (can't determine size of non-standard VPD format)\n");
 }
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
 /*
  * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
  * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
  */
-DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
-			       PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
+			      PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
 
 static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
 {
@@ -545,7 +572,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
 		dev->vpd.len = 2048;
 }
 
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
-			 quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
+			quirk_chelsio_extend_vpd);
 
 #endif

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 23:46   ` Bjorn Helgaas
@ 2021-09-14  0:39     ` Dave Jones
  2021-09-14  6:21     ` Heiner Kallweit
  1 sibling, 0 replies; 35+ messages in thread
From: Dave Jones @ 2021-09-14  0:39 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linus Torvalds, Heiner Kallweit, linux-kernel, linux-pci

On Mon, Sep 13, 2021 at 06:46:08PM -0500, Bjorn Helgaas wrote:
 > On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote:
 > > On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
 > >  > So 5.15 isn't shaping up to be a particularly large release, at least
 > >  > in number of commits. At only just over 10k non-merge commits, this is
 > >  > in fact the smallest rc1 we have had in the 5.x series. We're usually
 > >  > hovering in the 12-14k commit range.
 > > 
 > > This release takes over two minutes longer to boot on one my
 > > machines than 5.14.  The time just seems to be unaccounted for, even
 > > with initcall_debug
 > 
 > Sorry for the inconvenience of this, and thank you very much for doing
 > the bisection to track it down.
 > 
 > We *could* revert 7bac54497c3e, but it'd be messy because a bunch of
 > follow-up stuff depends on it.
 > 
 > I propose something like the patch below.  Would you mind trying it
 > out?

This also fixes the problem for me.

	Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-13 23:32             ` [Intel-wired-lan] " Hisashi T Fujinaka
@ 2021-09-14  5:51               ` Heiner Kallweit
  2021-09-14 14:24                 ` Dave Jones
  0 siblings, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-14  5:51 UTC (permalink / raw)
  To: Dave Jones
  Cc: Jesse Brandeburg, Tony Nguyen, linux-pci@vger.kernel.org,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	todd.fujinaka, Hisashi T Fujinaka

On 14.09.2021 01:32, Hisashi T Fujinaka wrote:
> On Mon, 13 Sep 2021, Heiner Kallweit wrote:
> 
>> On 13.09.2021 22:32, Dave Jones wrote:
>>
>> + Jesse and Tony as Intel NIC maintainers
>>
>>> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
>>>
>>> >> This didn't help I'm afraid :(
>>> >> It changed the VPD warning, but that's about it...
>>> >>
>>> >> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>>> >> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)
>>> >> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>>> >>
>>> > With this patch there's no VPD access to this device any longer. So this can't be
>>> > the root cause. Do you have any other PCI device that has VPD capability?
>>> > -> Capabilities: [...] Vital Product Data
>>>
>>>
>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>>         Subsystem: Device 1dcf:030a
>>>     ...
>>>             Capabilities: [e0] Vital Product Data
>>>                 Unknown small resource type 06, will not decode more.
>>>
>>
>> When searching I found the same symptom of invalid VPD data for 82599EB.
>> Do these adapters have non-VPD data in VPD address space? Or is the actual
>> VPD data at another offset than 0? I know that few Chelsio devices have
>> such a non-standard VPD structure.
>>
>>>
>>> I'll add that to the quirk list and see if that helps.
>>>
>>>     Dave
>>>
>> Heiner
> 
> Sorry to reply from my personal account. If I did it from my work
> account I'd be top-posting because of Outlook and that goes over like a
> lead balloon.
> 
> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
> either send it via a bug on e1000.sourceforge.net or try sending it to
> todd.fujinaka@intel.com
> 
> The other thing is I'm wondering is what the subvendor device ID you
> have is referring to because it's not in the pci database. Some ODMs
> like getting creative with what they put in the NVM.
> 
> Todd Fujinaka (todd.fujinaka@intel.com)

Thanks for the prompt reply. Dave, could you please provide the requested
information?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 23:46   ` Bjorn Helgaas
  2021-09-14  0:39     ` Dave Jones
@ 2021-09-14  6:21     ` Heiner Kallweit
  2021-09-14 11:26       ` Bjorn Helgaas
  1 sibling, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-14  6:21 UTC (permalink / raw)
  To: Bjorn Helgaas, Dave Jones, Linus Torvalds; +Cc: linux-kernel, linux-pci

On 14.09.2021 01:46, Bjorn Helgaas wrote:
> On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote:
>> On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
>>  > So 5.15 isn't shaping up to be a particularly large release, at least
>>  > in number of commits. At only just over 10k non-merge commits, this is
>>  > in fact the smallest rc1 we have had in the 5.x series. We're usually
>>  > hovering in the 12-14k commit range.
>>
>> This release takes over two minutes longer to boot on one my
>> machines than 5.14.  The time just seems to be unaccounted for, even
>> with initcall_debug
> 
>> ...
>> [    2.194093] pci 0000:01:00.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
>> [    2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
>> [    2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000
>> [    2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref]
>> [    2.194113] pci 0000:01:00.0: reg 0x18: [io  0xe020-0xe03f]
>> [    2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref]
>> [    2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref]
>> [    2.194136] pci 0000:01:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
>> [    2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
>> [    2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
>>
>> * stall here for 86 seconds *
>>
>> [   88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref]
>> ...
> 
> 
>> 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit
>> commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e
>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>> Date:   Sun Aug 8 19:22:52 2021 +0200
>>
>>     PCI/VPD: Determine VPD size in pci_vpd_init()
>>
>>     Determine VPD size in pci_vpd_init().
>>
>>     Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the
>>     dynamic size calculation.  Prerequisite is that we move the quirks from
>>     FINAL to HEADER so they are run before pci_vpd_init().
>>
>>     Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com
>>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>
>>
>> Which unfortunately doesn't revert cleanly I can't test it reverted in
>> isolation.
>>
>> My guess is there's something quirky about the PCI bus on this machine
>> that causes stalls until we hit timeout, but I'm not sure where to begin
>> debugging this.
> 
> Sorry for the inconvenience of this, and thank you very much for doing
> the bisection to track it down.
> 
> We *could* revert 7bac54497c3e, but it'd be messy because a bunch of
> follow-up stuff depends on it.
> 
> I propose something like the patch below.  Would you mind trying it
> out?
> 
> 
> commit 4ede9949b93c ("PCI/VPD: Defer VPD sizing until first access")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Mon Sep 13 16:13:26 2021 -0500
> 
>     PCI/VPD: Defer VPD sizing until first access
>     
>     7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at
>     enumeration-time to find the size.  But this is quite slow, and we don't
>     need the size until we actually need data from VPD.  Dave reported a boot
>     slowdown of more than two minutes [1].
>     
>     Defer the VPD sizing until a driver or the user requests information from
>     VPD.  If devices are quirked because VPD is known not to work, clear the
>     vpd.cap pointer so we don't access it at all.
>     
>     [1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/
>     Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()")
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> index 25557b272a4f..ca823ceee10c 100644
> --- a/drivers/pci/vpd.c
> +++ b/drivers/pci/vpd.c
> @@ -46,13 +46,12 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev)
>  }
>  
>  #define PCI_VPD_MAX_SIZE	(PCI_VPD_ADDR_MASK + 1)
> -#define PCI_VPD_SZ_INVALID	UINT_MAX
>  
>  /**
>   * pci_vpd_size - determine actual size of Vital Product Data
>   * @dev:	pci device struct
>   */
> -static size_t pci_vpd_size(struct pci_dev *dev)
> +static void pci_vpd_size(struct pci_dev *dev)
>  {
>  	size_t off = 0, size;
>  	unsigned char tag, header[1+2];	/* 1 byte tag, 2 bytes length */
> @@ -71,7 +70,7 @@ static size_t pci_vpd_size(struct pci_dev *dev)
>  			if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) {
>  				pci_warn(dev, "failed VPD read at offset %zu\n",
>  					 off + 1);
> -				return off ?: PCI_VPD_SZ_INVALID;
> +				goto finish;
>  			}
>  			size = pci_vpd_lrdt_size(header);
>  			if (off + size > PCI_VPD_MAX_SIZE)
> @@ -87,16 +86,19 @@ static size_t pci_vpd_size(struct pci_dev *dev)
>  
>  			off += PCI_VPD_SRDT_TAG_SIZE + size;
>  			if (tag == PCI_VPD_STIN_END)	/* End tag descriptor */
> -				return off;
> +				goto finish;
>  		}
>  	}
> -	return off;
> +	goto finish;
>  
>  error:
>  	pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n",
>  		 header[0], size, off, off == 0 ?
>  		 "; assume missing optional EEPROM" : "");
> -	return off ?: PCI_VPD_SZ_INVALID;
> +finish:
> +	dev->vpd.len = off;
> +	if (off == 0)
> +		dev->vpd.cap = 0;		/* No VPD at all */
>  }
>  
>  /*
> @@ -145,9 +147,6 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	u8 *buf = arg;
>  
> -	if (!vpd->cap)
> -		return -ENODEV;
> -
>  	if (pos < 0)
>  		return -EINVAL;
>  
> @@ -206,9 +205,6 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	int ret = 0;
>  
> -	if (!vpd->cap)
> -		return -ENODEV;
> -
>  	if (pos < 0 || (pos & 3) || (count & 3))
>  		return -EINVAL;
>  
> @@ -244,12 +240,6 @@ void pci_vpd_init(struct pci_dev *dev)
>  {
>  	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
>  	mutex_init(&dev->vpd.lock);
> -
> -	if (!dev->vpd.len)
> -		dev->vpd.len = pci_vpd_size(dev);
> -
> -	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
> -		dev->vpd.cap = 0;
>  }
>  
>  static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
> @@ -294,25 +284,29 @@ const struct attribute_group pci_dev_vpd_attr_group = {
>  
>  void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
>  {
> -	unsigned int len = dev->vpd.len;
> +	struct pci_vpd *vpd = &dev->vpd;
> +	unsigned int len;
>  	void *buf;
>  	int cnt;
>  
> -	if (!dev->vpd.cap)
> +	if (!vpd->cap)
>  		return ERR_PTR(-ENODEV);
>  
>  	buf = kmalloc(len, GFP_KERNEL);
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
>  
> -	cnt = pci_read_vpd(dev, 0, len, buf);
> +	if (vpd->len == 0)
> +		pci_vpd_size(dev);
> +
> +	cnt = pci_read_vpd(dev, 0, vpd->len, buf);
>  	if (cnt != len) {
>  		kfree(buf);
>  		return ERR_PTR(-EIO);
>  	}
>  
>  	if (size)
> -		*size = len;
> +		*size = vpd->len;
>  
>  	return buf;
>  }
> @@ -374,6 +368,7 @@ static int pci_vpd_find_info_keyword(const u8 *buf, unsigned int off,
>   */
>  ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
>  {
> +	struct pci_vpd *vpd;
>  	ssize_t ret;
>  
>  	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
> @@ -381,11 +376,27 @@ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
>  		if (!dev)
>  			return -ENODEV;
>  
> +		vpd = &dev->vpd;
> +		if (!vpd->cap) {
> +			pci_dev_put(dev);
> +			return -ENODEV;
> +		}
> +
> +		if (vpd->len == 0)
> +			pci_vpd_size(dev);
> +
>  		ret = pci_vpd_read(dev, pos, count, buf);
>  		pci_dev_put(dev);
>  		return ret;
>  	}
>  
> +	vpd = &dev->vpd;
> +	if (!vpd->cap)
> +		return -ENODEV;
> +
> +	if (vpd->len == 0)
> +		pci_vpd_size(dev);
> +
>  	return pci_vpd_read(dev, pos, count, buf);
>  }
>  EXPORT_SYMBOL(pci_read_vpd);
> @@ -399,6 +410,7 @@ EXPORT_SYMBOL(pci_read_vpd);
>   */
>  ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf)
>  {
> +	struct pci_vpd *vpd;
>  	ssize_t ret;
>  
>  	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
> @@ -406,11 +418,26 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void
>  		if (!dev)
>  			return -ENODEV;
>  
> +		vpd = &dev->vpd;
> +		if (!vpd->cap) {
> +			pci_dev_put(dev);
> +			return -ENODEV;
> +		}
> +
> +		if (vpd->len == 0)
> +			pci_vpd_size(dev);
> +
>  		ret = pci_vpd_write(dev, pos, count, buf);
>  		pci_dev_put(dev);
>  		return ret;
>  	}
>  
> +	if (!vpd->cap)
> +		return -ENODEV;
> +
> +	if (vpd->len == 0)
> +		pci_vpd_size(dev);
> +
>  	return pci_vpd_write(dev, pos, count, buf);
>  }
>  EXPORT_SYMBOL(pci_write_vpd);
> @@ -500,27 +527,27 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
>   */
>  static void quirk_blacklist_vpd(struct pci_dev *dev)
>  {
> -	dev->vpd.len = PCI_VPD_SZ_INVALID;
> +	dev->vpd.cap = 0;
>  	pci_warn(dev, FW_BUG "disabling VPD access (can't determine size of non-standard VPD format)\n");
>  }
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>  /*

Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
blacklisted devices the vpd sysfs attribute isn't visibale. The needed
changes to the patch are minimal.

>   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
>   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
>   */
> -DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
> -			       PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
> +			      PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
>  
>  static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>  {
> @@ -545,7 +572,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>  		dev->vpd.len = 2048;
>  }
>  
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> -			 quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> +			quirk_chelsio_extend_vpd);
>  
>  #endif
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14  6:21     ` Heiner Kallweit
@ 2021-09-14 11:26       ` Bjorn Helgaas
  2021-09-14 17:07         ` Heiner Kallweit
  0 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-14 11:26 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
> On 14.09.2021 01:46, Bjorn Helgaas wrote:

> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
> >  /*
> 
> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
> changes to the patch are minimal.

What do you have in mind?  The only thing I can think of would be to
add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
devices start off with vpd.cap == 0 and vpd.len == 0, so a
FIXUP_HEADER quirk would have to set a sentinel value or some other
bit.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14  5:51               ` Heiner Kallweit
@ 2021-09-14 14:24                 ` Dave Jones
  2021-09-14 18:28                   ` Fujinaka, Todd
  2021-09-14 20:00                   ` Hisashi T Fujinaka
  0 siblings, 2 replies; 35+ messages in thread
From: Dave Jones @ 2021-09-14 14:24 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jesse Brandeburg, Tony Nguyen, linux-pci@vger.kernel.org,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	todd.fujinaka, Hisashi T Fujinaka

On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
 
 > > Sorry to reply from my personal account. If I did it from my work
 > > account I'd be top-posting because of Outlook and that goes over like a
 > > lead balloon.
 > > 
 > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can
 > > either send it via a bug on e1000.sourceforge.net or try sending it to
 > > todd.fujinaka@intel.com
 > > 
 > > The other thing is I'm wondering is what the subvendor device ID you
 > > have is referring to because it's not in the pci database. Some ODMs
 > > like getting creative with what they put in the NVM.
 > > 
 > > Todd Fujinaka (todd.fujinaka@intel.com)
 > 
 > Thanks for the prompt reply. Dave, could you please provide the requested
 > information?

sent off-list.

	Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 11:26       ` Bjorn Helgaas
@ 2021-09-14 17:07         ` Heiner Kallweit
  2021-09-14 21:55           ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-14 17:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On 14.09.2021 13:26, Bjorn Helgaas wrote:
> On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
>> On 14.09.2021 01:46, Bjorn Helgaas wrote:
> 
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>>>  /*
>>
>> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
>> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
>> changes to the patch are minimal.
> 
> What do you have in mind?  The only thing I can think of would be to
> add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
> VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
> devices start off with vpd.cap == 0 and vpd.len == 0, so a
> FIXUP_HEADER quirk would have to set a sentinel value or some other
> bit.
> 
> Bjorn
> 

Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel?

And one more question: Why do you move the "if (!vpd->cap)" check from
pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit.

Here comes my version. Your changes to pci_vpd_size() I left as-is.
I tested the positive case and it works as expected.


diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 25557b272..04b14c488 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -52,7 +52,7 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev)
  * pci_vpd_size - determine actual size of Vital Product Data
  * @dev:	pci device struct
  */
-static size_t pci_vpd_size(struct pci_dev *dev)
+static void pci_vpd_size(struct pci_dev *dev)
 {
 	size_t off = 0, size;
 	unsigned char tag, header[1+2];	/* 1 byte tag, 2 bytes length */
@@ -71,7 +71,7 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 			if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) {
 				pci_warn(dev, "failed VPD read at offset %zu\n",
 					 off + 1);
-				return off ?: PCI_VPD_SZ_INVALID;
+				goto finish;
 			}
 			size = pci_vpd_lrdt_size(header);
 			if (off + size > PCI_VPD_MAX_SIZE)
@@ -87,16 +87,19 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 
 			off += PCI_VPD_SRDT_TAG_SIZE + size;
 			if (tag == PCI_VPD_STIN_END)	/* End tag descriptor */
-				return off;
+				goto finish;
 		}
 	}
-	return off;
+	goto finish;
 
 error:
 	pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n",
 		 header[0], size, off, off == 0 ?
 		 "; assume missing optional EEPROM" : "");
-	return off ?: PCI_VPD_SZ_INVALID;
+finish:
+	dev->vpd.len = off;
+	if (off == 0)
+		dev->vpd.cap = 0;		/* No VPD at all */
 }
 
 /*
@@ -145,6 +148,8 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	u8 *buf = arg;
 
+	if (vpd->len == 0 && vpd->cap)
+		pci_vpd_size(dev);
 	if (!vpd->cap)
 		return -ENODEV;
 
@@ -206,6 +211,8 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	int ret = 0;
 
+	if (vpd->len == 0 && vpd->cap)
+		pci_vpd_size(dev);
 	if (!vpd->cap)
 		return -ENODEV;
 
@@ -245,9 +252,6 @@ void pci_vpd_init(struct pci_dev *dev)
 	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
 	mutex_init(&dev->vpd.lock);
 
-	if (!dev->vpd.len)
-		dev->vpd.len = pci_vpd_size(dev);
-
 	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
 		dev->vpd.cap = 0;
 }
@@ -294,25 +298,27 @@ const struct attribute_group pci_dev_vpd_attr_group = {
 
 void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
 {
-	unsigned int len = dev->vpd.len;
+	struct pci_vpd *vpd = &dev->vpd;
 	void *buf;
 	int cnt;
 
-	if (!dev->vpd.cap)
+	if (vpd->len == 0 && vpd->cap)
+		pci_vpd_size(dev);
+	if (!vpd->cap)
 		return ERR_PTR(-ENODEV);
 
-	buf = kmalloc(len, GFP_KERNEL);
+	buf = kmalloc(vpd->len, GFP_KERNEL);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	cnt = pci_read_vpd(dev, 0, len, buf);
-	if (cnt != len) {
+	cnt = pci_read_vpd(dev, 0, vpd->len, buf);
+	if (cnt != vpd->len) {
 		kfree(buf);
 		return ERR_PTR(-EIO);
 	}
 
 	if (size)
-		*size = len;
+		*size = vpd->len;
 
 	return buf;
 }
-- 
2.33.0





^ permalink raw reply related	[flat|nested] 35+ messages in thread

* RE: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 14:24                 ` Dave Jones
@ 2021-09-14 18:28                   ` Fujinaka, Todd
  2021-09-14 20:00                   ` Hisashi T Fujinaka
  1 sibling, 0 replies; 35+ messages in thread
From: Fujinaka, Todd @ 2021-09-14 18:28 UTC (permalink / raw)
  To: Dave Jones, Heiner Kallweit
  Cc: Brandeburg, Jesse, Nguyen, Anthony L, linux-pci@vger.kernel.org,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	Hisashi T Fujinaka

Wow, I'm waiting for the hardware guy to look at this but this is an off-brand 10Gtek NIC from Amazon that just has nonsense data in the VPD as far as I can tell (3's).

I'll let you know if I find out that the critical settings are bad, but I would just ignore any VPD errors.

Todd Fujinaka
Software Application Engineer
Ethernet Products Group
Intel Corporation
todd.fujinaka@intel.com

-----Original Message-----
From: Dave Jones <davej@codemonkey.org.uk> 
Sent: Tuesday, September 14, 2021 7:24 AM
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; linux-pci@vger.kernel.org; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; intel-wired-lan <intel-wired-lan@lists.osuosl.org>; Bjorn Helgaas <bhelgaas@google.com>; Fujinaka, Todd <todd.fujinaka@intel.com>; Hisashi T Fujinaka <htodd@twofifty.com>
Subject: Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue

On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
 
 > > Sorry to reply from my personal account. If I did it from my work  > > account I'd be top-posting because of Outlook and that goes over like a  > > lead balloon.
 > >
 > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can  > > either send it via a bug on e1000.sourceforge.net or try sending it to  > > todd.fujinaka@intel.com  > >  > > The other thing is I'm wondering is what the subvendor device ID you  > > have is referring to because it's not in the pci database. Some ODMs  > > like getting creative with what they put in the NVM.
 > >
 > > Todd Fujinaka (todd.fujinaka@intel.com)  >  > Thanks for the prompt reply. Dave, could you please provide the requested  > information?

sent off-list.

	Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 14:24                 ` Dave Jones
  2021-09-14 18:28                   ` Fujinaka, Todd
@ 2021-09-14 20:00                   ` Hisashi T Fujinaka
  2021-09-14 21:51                     ` Heiner Kallweit
  1 sibling, 1 reply; 35+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-14 20:00 UTC (permalink / raw)
  To: Dave Jones
  Cc: Heiner Kallweit, linux-pci@vger.kernel.org,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas

On Tue, 14 Sep 2021, Dave Jones wrote:

> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>
> > > Sorry to reply from my personal account. If I did it from my work
> > > account I'd be top-posting because of Outlook and that goes over like a
> > > lead balloon.
> > >
> > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can
> > > either send it via a bug on e1000.sourceforge.net or try sending it to
> > > todd.fujinaka@intel.com
> > >
> > > The other thing is I'm wondering is what the subvendor device ID you
> > > have is referring to because it's not in the pci database. Some ODMs
> > > like getting creative with what they put in the NVM.
> > >
> > > Todd Fujinaka (todd.fujinaka@intel.com)
> >
> > Thanks for the prompt reply. Dave, could you please provide the requested
> > information?
>
> sent off-list.
>
> 	Dave

Whoops. I replied from outlook again.

I have confirmation that this should be a valid image. The VPD is just a
series of 3's. There are changes to preboot header, flash and BAR size,
and as far as I can tell, a nonsense subdevice ID, but this should work.

What was the original question?

Todd Fujinaka <todd.fujinaka@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 20:00                   ` Hisashi T Fujinaka
@ 2021-09-14 21:51                     ` Heiner Kallweit
  2021-09-15 14:18                       ` Hisashi T Fujinaka
  0 siblings, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-14 21:51 UTC (permalink / raw)
  To: Hisashi T Fujinaka, Dave Jones
  Cc: linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
> On Tue, 14 Sep 2021, Dave Jones wrote:
> 
>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>
>> > > Sorry to reply from my personal account. If I did it from my work
>> > > account I'd be top-posting because of Outlook and that goes over like a
>> > > lead balloon.
>> > >
>> > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>> > > either send it via a bug on e1000.sourceforge.net or try sending it to
>> > > todd.fujinaka@intel.com
>> > >
>> > > The other thing is I'm wondering is what the subvendor device ID you
>> > > have is referring to because it's not in the pci database. Some ODMs
>> > > like getting creative with what they put in the NVM.
>> > >
>> > > Todd Fujinaka (todd.fujinaka@intel.com)
>> >
>> > Thanks for the prompt reply. Dave, could you please provide the requested
>> > information?
>>
>> sent off-list.
>>
>>     Dave
> 
> Whoops. I replied from outlook again.
> 
> I have confirmation that this should be a valid image. The VPD is just a
> series of 3's. There are changes to preboot header, flash and BAR size,
> and as far as I can tell, a nonsense subdevice ID, but this should work.
> 
> What was the original question?
> 
"lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
code resulted in a stall. So it seems the data doesn't have valid VPD
format as defined in PCI specification.

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Device 1dcf:030a
	...
	        Capabilities: [e0] Vital Product Data
                *Unknown small resource type 06, will not decode more.*

Not sure which method is used by the driver to get the EEPROM content.
For the issue here is relevant what is exposed via PCI VPD.

The related kernel error message has been reported few times, e.g. here:
https://access.redhat.com/solutions/3001451
Only due to a change in kernel code this became a more prominent
issue now.

You say that VPD is just a series of 3's. This may explain why kernel and
tools complain about an invalid VPD format. VPD misses the tag structure.

> Todd Fujinaka <todd.fujinaka@intel.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 17:07         ` Heiner Kallweit
@ 2021-09-14 21:55           ` Bjorn Helgaas
  2021-09-14 22:06             ` Heiner Kallweit
  0 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-14 21:55 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On Tue, Sep 14, 2021 at 07:07:40PM +0200, Heiner Kallweit wrote:
> On 14.09.2021 13:26, Bjorn Helgaas wrote:
> > On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
> >> On 14.09.2021 01:46, Bjorn Helgaas wrote:
> > 
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
> >>>  /*
> >>
> >> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
> >> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
> >> changes to the patch are minimal.
> > 
> > What do you have in mind?  The only thing I can think of would be to
> > add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
> > VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
> > devices start off with vpd.cap == 0 and vpd.len == 0, so a
> > FIXUP_HEADER quirk would have to set a sentinel value or some other
> > bit.
> 
> Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel?

Sentinel values aren't really my favorite thing, but it certainly does
have the advantage of hiding the sysfs attribute.

> And one more question: Why do you move the "if (!vpd->cap)" check from
> pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit.

I'm pretty sure I *had* a reason, but I can't remember right now :(
Moving it sure does uglify pci_read_vpd() and pci_write_vpd(), though.

What do you think of the following?  (This is a diff from v5.15-rc1.)

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 25557b272a4f..4be24890132e 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -99,6 +99,24 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 	return off ?: PCI_VPD_SZ_INVALID;
 }
 
+static bool pci_vpd_available(struct pci_dev *dev)
+{
+	struct pci_vpd *vpd = &dev->vpd;
+
+	if (!vpd->cap)
+		return false;
+
+	if (vpd->len == 0) {
+		vpd->len = pci_vpd_size(dev);
+		if (vpd->len == PCI_VPD_SZ_INVALID) {
+			vpd->cap = 0;
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Wait for last operation to complete.
  * This code has to spin since there is no other notification from the PCI
@@ -145,7 +163,7 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	u8 *buf = arg;
 
-	if (!vpd->cap)
+	if (!pci_vpd_available(dev))
 		return -ENODEV;
 
 	if (pos < 0)
@@ -206,7 +224,7 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	int ret = 0;
 
-	if (!vpd->cap)
+	if (!pci_vpd_available(dev))
 		return -ENODEV;
 
 	if (pos < 0 || (pos & 3) || (count & 3))
@@ -242,14 +260,11 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 
 void pci_vpd_init(struct pci_dev *dev)
 {
+	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
+		return;
+
 	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
 	mutex_init(&dev->vpd.lock);
-
-	if (!dev->vpd.len)
-		dev->vpd.len = pci_vpd_size(dev);
-
-	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
-		dev->vpd.cap = 0;
 }
 
 static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
@@ -294,13 +309,14 @@ const struct attribute_group pci_dev_vpd_attr_group = {
 
 void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
 {
-	unsigned int len = dev->vpd.len;
+	unsigned int len;
 	void *buf;
 	int cnt;
 
-	if (!dev->vpd.cap)
+	if (!pci_vpd_available(dev))
 		return ERR_PTR(-ENODEV);
 
+	len = dev->vpd.len;
 	buf = kmalloc(len, GFP_KERNEL);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 21:55           ` Bjorn Helgaas
@ 2021-09-14 22:06             ` Heiner Kallweit
  2021-09-14 22:33               ` Dave Jones
  0 siblings, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-14 22:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On 14.09.2021 23:55, Bjorn Helgaas wrote:
> On Tue, Sep 14, 2021 at 07:07:40PM +0200, Heiner Kallweit wrote:
>> On 14.09.2021 13:26, Bjorn Helgaas wrote:
>>> On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
>>>> On 14.09.2021 01:46, Bjorn Helgaas wrote:
>>>
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>>>>>  /*
>>>>
>>>> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
>>>> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
>>>> changes to the patch are minimal.
>>>
>>> What do you have in mind?  The only thing I can think of would be to
>>> add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
>>> VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
>>> devices start off with vpd.cap == 0 and vpd.len == 0, so a
>>> FIXUP_HEADER quirk would have to set a sentinel value or some other
>>> bit.
>>
>> Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel?
> 
> Sentinel values aren't really my favorite thing, but it certainly does
> have the advantage of hiding the sysfs attribute.
> 
>> And one more question: Why do you move the "if (!vpd->cap)" check from
>> pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit.
> 
> I'm pretty sure I *had* a reason, but I can't remember right now :(
> Moving it sure does uglify pci_read_vpd() and pci_write_vpd(), though.
> 
> What do you think of the following?  (This is a diff from v5.15-rc1.)
> 
> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> index 25557b272a4f..4be24890132e 100644
> --- a/drivers/pci/vpd.c
> +++ b/drivers/pci/vpd.c
> @@ -99,6 +99,24 @@ static size_t pci_vpd_size(struct pci_dev *dev)
>  	return off ?: PCI_VPD_SZ_INVALID;
>  }
>  
> +static bool pci_vpd_available(struct pci_dev *dev)
> +{
> +	struct pci_vpd *vpd = &dev->vpd;
> +
> +	if (!vpd->cap)
> +		return false;
> +
> +	if (vpd->len == 0) {
> +		vpd->len = pci_vpd_size(dev);
> +		if (vpd->len == PCI_VPD_SZ_INVALID) {
> +			vpd->cap = 0;
> +			return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * Wait for last operation to complete.
>   * This code has to spin since there is no other notification from the PCI
> @@ -145,7 +163,7 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	u8 *buf = arg;
>  
> -	if (!vpd->cap)
> +	if (!pci_vpd_available(dev))
>  		return -ENODEV;
>  
>  	if (pos < 0)
> @@ -206,7 +224,7 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	int ret = 0;
>  
> -	if (!vpd->cap)
> +	if (!pci_vpd_available(dev))
>  		return -ENODEV;
>  
>  	if (pos < 0 || (pos & 3) || (count & 3))
> @@ -242,14 +260,11 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
>  
>  void pci_vpd_init(struct pci_dev *dev)
>  {
> +	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
> +		return;
> +
>  	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
>  	mutex_init(&dev->vpd.lock);
> -
> -	if (!dev->vpd.len)
> -		dev->vpd.len = pci_vpd_size(dev);
> -
> -	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
> -		dev->vpd.cap = 0;
>  }
>  
>  static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
> @@ -294,13 +309,14 @@ const struct attribute_group pci_dev_vpd_attr_group = {
>  
>  void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
>  {
> -	unsigned int len = dev->vpd.len;
> +	unsigned int len;
>  	void *buf;
>  	int cnt;
>  
> -	if (!dev->vpd.cap)
> +	if (!pci_vpd_available(dev))
>  		return ERR_PTR(-ENODEV);
>  
> +	len = dev->vpd.len;
>  	buf = kmalloc(len, GFP_KERNEL);
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
> 

This looks very good to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 22:06             ` Heiner Kallweit
@ 2021-09-14 22:33               ` Dave Jones
  2021-09-15 21:31                 ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Jones @ 2021-09-14 22:33 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Bjorn Helgaas, Linus Torvalds, linux-kernel, linux-pci

On Wed, Sep 15, 2021 at 12:06:33AM +0200, Heiner Kallweit wrote:

 > > What do you think of the following?  (This is a diff from v5.15-rc1.)
 > > 
 > 
 > This looks very good to me.

fwiw, I tested this too, and it still works.

	Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 21:51                     ` Heiner Kallweit
@ 2021-09-15 14:18                       ` Hisashi T Fujinaka
  2021-09-15 16:05                         ` Heiner Kallweit
  0 siblings, 1 reply; 35+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-15 14:18 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On Tue, 14 Sep 2021, Heiner Kallweit wrote:

> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>> On Tue, 14 Sep 2021, Dave Jones wrote:
>>
>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>>
>>>>> Sorry to reply from my personal account. If I did it from my work
>>>>> account I'd be top-posting because of Outlook and that goes over like a
>>>>> lead balloon.
>>>>>
>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to
>>>>> todd.fujinaka@intel.com
>>>>>
>>>>> The other thing is I'm wondering is what the subvendor device ID you
>>>>> have is referring to because it's not in the pci database. Some ODMs
>>>>> like getting creative with what they put in the NVM.
>>>>>
>>>>> Todd Fujinaka (todd.fujinaka@intel.com)
>>>>
>>>> Thanks for the prompt reply. Dave, could you please provide the requested
>>>> information?
>>>
>>> sent off-list.
>>>
>>>     Dave
>>
>> Whoops. I replied from outlook again.
>>
>> I have confirmation that this should be a valid image. The VPD is just a
>> series of 3's. There are changes to preboot header, flash and BAR size,
>> and as far as I can tell, a nonsense subdevice ID, but this should work.
>>
>> What was the original question?
>>
> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
> code resulted in a stall. So it seems the data doesn't have valid VPD
> format as defined in PCI specification.
>
> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>        Subsystem: Device 1dcf:030a
> 	...
> 	        Capabilities: [e0] Vital Product Data
>                *Unknown small resource type 06, will not decode more.*
>
> Not sure which method is used by the driver to get the EEPROM content.
> For the issue here is relevant what is exposed via PCI VPD.
>
> The related kernel error message has been reported few times, e.g. here:
> https://access.redhat.com/solutions/3001451
> Only due to a change in kernel code this became a more prominent
> issue now.
>
> You say that VPD is just a series of 3's. This may explain why kernel and
> tools complain about an invalid VPD format. VPD misses the tag structure.

I think I conflated two issues and yours may not be the one with the
weird Amazon NIC. In any case, the VPD does not match the spec and two
people have confirmed it's just full of 3's. With the bogus subvendor
ID, I'm thinking this is not an Intel NIC.

Next step is to contact whoever made the NIC and ask them for guidance.

Todd Fujinaka <todd.fujinaka@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 14:18                       ` Hisashi T Fujinaka
@ 2021-09-15 16:05                         ` Heiner Kallweit
  2021-09-15 16:16                           ` Hisashi T Fujinaka
  0 siblings, 1 reply; 35+ messages in thread
From: Heiner Kallweit @ 2021-09-15 16:05 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Dave Jones, linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
> On Tue, 14 Sep 2021, Heiner Kallweit wrote:
> 
>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>>> On Tue, 14 Sep 2021, Dave Jones wrote:
>>>
>>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>>>
>>>>>> Sorry to reply from my personal account. If I did it from my work
>>>>>> account I'd be top-posting because of Outlook and that goes over like a
>>>>>> lead balloon.
>>>>>>
>>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to
>>>>>> todd.fujinaka@intel.com
>>>>>>
>>>>>> The other thing is I'm wondering is what the subvendor device ID you
>>>>>> have is referring to because it's not in the pci database. Some ODMs
>>>>>> like getting creative with what they put in the NVM.
>>>>>>
>>>>>> Todd Fujinaka (todd.fujinaka@intel.com)
>>>>>
>>>>> Thanks for the prompt reply. Dave, could you please provide the requested
>>>>> information?
>>>>
>>>> sent off-list.
>>>>
>>>>     Dave
>>>
>>> Whoops. I replied from outlook again.
>>>
>>> I have confirmation that this should be a valid image. The VPD is just a
>>> series of 3's. There are changes to preboot header, flash and BAR size,
>>> and as far as I can tell, a nonsense subdevice ID, but this should work.
>>>
>>> What was the original question?
>>>
>> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
>> code resulted in a stall. So it seems the data doesn't have valid VPD
>> format as defined in PCI specification.
>>
>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>        Subsystem: Device 1dcf:030a
>>     ...
>>             Capabilities: [e0] Vital Product Data
>>                *Unknown small resource type 06, will not decode more.*
>>
>> Not sure which method is used by the driver to get the EEPROM content.
>> For the issue here is relevant what is exposed via PCI VPD.
>>
>> The related kernel error message has been reported few times, e.g. here:
>> https://access.redhat.com/solutions/3001451
>> Only due to a change in kernel code this became a more prominent
>> issue now.
>>
>> You say that VPD is just a series of 3's. This may explain why kernel and
>> tools complain about an invalid VPD format. VPD misses the tag structure.
> 
> I think I conflated two issues and yours may not be the one with the
> weird Amazon NIC. In any case, the VPD does not match the spec and two
> people have confirmed it's just full of 3's. With the bogus subvendor
> ID, I'm thinking this is not an Intel NIC.
> 
> Next step is to contact whoever made the NIC and ask them for guidance.
> 
In an earlier mail in this thread was stated that subvendor id is unknown.
Checking here https://pcisig.com/membership/member-companies?combine=1dcf
it says: Beijing Sinead Technology Co., Ltd.

> Todd Fujinaka <todd.fujinaka@intel.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 16:05                         ` Heiner Kallweit
@ 2021-09-15 16:16                           ` Hisashi T Fujinaka
  2021-09-15 22:32                             ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-15 16:16 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On Wed, 15 Sep 2021, Heiner Kallweit wrote:

> On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
>> On Tue, 14 Sep 2021, Heiner Kallweit wrote:
>>
>>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>>>> On Tue, 14 Sep 2021, Dave Jones wrote:
>>>>
>>>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>>>>
>>>>>>> Sorry to reply from my personal account. If I did it from my work
>>>>>>> account I'd be top-posting because of Outlook and that goes over like a
>>>>>>> lead balloon.
>>>>>>>
>>>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>>>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to
>>>>>>> todd.fujinaka@intel.com
>>>>>>>
>>>>>>> The other thing is I'm wondering is what the subvendor device ID you
>>>>>>> have is referring to because it's not in the pci database. Some ODMs
>>>>>>> like getting creative with what they put in the NVM.
>>>>>>>
>>>>>>> Todd Fujinaka (todd.fujinaka@intel.com)
>>>>>>
>>>>>> Thanks for the prompt reply. Dave, could you please provide the requested
>>>>>> information?
>>>>>
>>>>> sent off-list.
>>>>>
>>>>>     Dave
>>>>
>>>> Whoops. I replied from outlook again.
>>>>
>>>> I have confirmation that this should be a valid image. The VPD is just a
>>>> series of 3's. There are changes to preboot header, flash and BAR size,
>>>> and as far as I can tell, a nonsense subdevice ID, but this should work.
>>>>
>>>> What was the original question?
>>>>
>>> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
>>> code resulted in a stall. So it seems the data doesn't have valid VPD
>>> format as defined in PCI specification.
>>>
>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>>        Subsystem: Device 1dcf:030a
>>>     ...
>>>             Capabilities: [e0] Vital Product Data
>>>                *Unknown small resource type 06, will not decode more.*
>>>
>>> Not sure which method is used by the driver to get the EEPROM content.
>>> For the issue here is relevant what is exposed via PCI VPD.
>>>
>>> The related kernel error message has been reported few times, e.g. here:
>>> https://access.redhat.com/solutions/3001451
>>> Only due to a change in kernel code this became a more prominent
>>> issue now.
>>>
>>> You say that VPD is just a series of 3's. This may explain why kernel and
>>> tools complain about an invalid VPD format. VPD misses the tag structure.
>>
>> I think I conflated two issues and yours may not be the one with the
>> weird Amazon NIC. In any case, the VPD does not match the spec and two
>> people have confirmed it's just full of 3's. With the bogus subvendor
>> ID, I'm thinking this is not an Intel NIC.
>>
>> Next step is to contact whoever made the NIC and ask them for guidance.
>>
> In an earlier mail in this thread was stated that subvendor id is unknown.
> Checking here https://pcisig.com/membership/member-companies?combine=1dcf
> it says: Beijing Sinead Technology Co., Ltd.

Huh. I didn't realize there was an official list beyond pciids.ucw.cz.

In any case, that's who you need to talk to about the unlisted (to
Linux) vendor ID and also the odd VPD data.

-- 
Hisashi T Fujinaka - htodd@twofifty.com
BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 22:33               ` Dave Jones
@ 2021-09-15 21:31                 ` Bjorn Helgaas
  0 siblings, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-15 21:31 UTC (permalink / raw)
  To: Dave Jones, Heiner Kallweit, Linus Torvalds, linux-kernel,
	linux-pci

On Tue, Sep 14, 2021 at 06:33:27PM -0400, Dave Jones wrote:
> On Wed, Sep 15, 2021 at 12:06:33AM +0200, Heiner Kallweit wrote:
> 
>  > > What do you think of the following?  (This is a diff from v5.15-rc1.)
>  > > 
>  > 
>  > This looks very good to me.
> 
> fwiw, I tested this too, and it still works.

Thanks very much for proactively testing this.  I hated to burden you
before anybody else had looked at it.

I added this to for-linus for v5.15.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 16:16                           ` Hisashi T Fujinaka
@ 2021-09-15 22:32                             ` Bjorn Helgaas
  2021-09-15 23:46                               ` Hisashi T Fujinaka
  0 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-15 22:32 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Heiner Kallweit, Dave Jones, linux-pci@vger.kernel.org,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas

On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote:
> On Wed, 15 Sep 2021, Heiner Kallweit wrote:
> > On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
> > > On Tue, 14 Sep 2021, Heiner Kallweit wrote:
> > > > On 14.09.2021 22:00, Hisashi T Fujinaka wrote:

> > > > > I have confirmation that this should be a valid image. The
> > > > > VPD is just a series of 3's. There are changes to preboot
> > > > > header, flash and BAR size, and as far as I can tell, a
> > > > > nonsense subdevice ID, but this should work.
> > > > > 
> > > > > What was the original question?
> > > > > 
> > > > "lspci -vv" complains about an invalid short tag 0x06 and the
> > > > PCI VPD code resulted in a stall. So it seems the data doesn't
> > > > have valid VPD format as defined in PCI specification.
> > > > 
> > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> > > >        Subsystem: Device 1dcf:030a
> > > >     ...
> > > >             Capabilities: [e0] Vital Product Data
> > > >                *Unknown small resource type 06, will not decode more.*
> > > > 
> > > > Not sure which method is used by the driver to get the EEPROM
> > > > content.  For the issue here is relevant what is exposed via
> > > > PCI VPD.
> > > > 
> > > > The related kernel error message has been reported few times,
> > > > e.g. here: https://access.redhat.com/solutions/3001451 Only
> > > > due to a change in kernel code this became a more prominent
> > > > issue now.
> > > > 
> > > > You say that VPD is just a series of 3's. This may explain why
> > > > kernel and tools complain about an invalid VPD format. VPD
> > > > misses the tag structure.
> > > 
> > > I think I conflated two issues and yours may not be the one with the
> > > weird Amazon NIC. In any case, the VPD does not match the spec and two
> > > people have confirmed it's just full of 3's. With the bogus subvendor
> > > ID, I'm thinking this is not an Intel NIC.

A series of 0x03 0x03 0x03 ... bytes would decode as "small items of
type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ...
bytes.  That would decode to a series of "small items of type 06",
each of length four (one byte for the tag, three bytes of data).

Prior to v5.15, we would complain "invalid short VPD tag 06" and stop
reading.  As of v5.15, I think we'll just keep reading looking for a
valid "end" tag, but we'll never find one.

I think in v5.15 there will be no error message because the series of
four-byte small data items happens to fit exactly in the maximum 32KB
size of VPD and is technically legal syntactic structure, even though
it makes no sense.

But it will be much slower and might account for the boot slowdown
Dave reported.

> > In an earlier mail in this thread was stated that subvendor id is unknown.
> > Checking here https://pcisig.com/membership/member-companies?combine=1dcf
> > it says: Beijing Sinead Technology Co., Ltd.
> 
> Huh. I didn't realize there was an official list beyond pciids.ucw.cz.
> 
> In any case, that's who you need to talk to about the unlisted (to
> Linux) vendor ID and also the odd VPD data.

https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it
is basically crowd-sourced data, not the "official" Vendor IDs
controlled by the PCI SIG.

I submitted an addition to https://pci-ids.ucw.cz/

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 22:32                             ` Bjorn Helgaas
@ 2021-09-15 23:46                               ` Hisashi T Fujinaka
  2021-09-17 15:09                                 ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-15 23:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dave Jones, linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas, Heiner Kallweit

On Wed, 15 Sep 2021, Bjorn Helgaas wrote:

> On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote:
>> On Wed, 15 Sep 2021, Heiner Kallweit wrote:
>>> On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
>>>> On Tue, 14 Sep 2021, Heiner Kallweit wrote:
>>>>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>
>>>>>> I have confirmation that this should be a valid image. The
>>>>>> VPD is just a series of 3's. There are changes to preboot
>>>>>> header, flash and BAR size, and as far as I can tell, a
>>>>>> nonsense subdevice ID, but this should work.
>>>>>>
>>>>>> What was the original question?
>>>>>>
>>>>> "lspci -vv" complains about an invalid short tag 0x06 and the
>>>>> PCI VPD code resulted in a stall. So it seems the data doesn't
>>>>> have valid VPD format as defined in PCI specification.
>>>>>
>>>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>>>>        Subsystem: Device 1dcf:030a
>>>>>     ...
>>>>>             Capabilities: [e0] Vital Product Data
>>>>>                *Unknown small resource type 06, will not decode more.*
>>>>>
>>>>> Not sure which method is used by the driver to get the EEPROM
>>>>> content.  For the issue here is relevant what is exposed via
>>>>> PCI VPD.
>>>>>
>>>>> The related kernel error message has been reported few times,
>>>>> e.g. here: https://access.redhat.com/solutions/3001451 Only
>>>>> due to a change in kernel code this became a more prominent
>>>>> issue now.
>>>>>
>>>>> You say that VPD is just a series of 3's. This may explain why
>>>>> kernel and tools complain about an invalid VPD format. VPD
>>>>> misses the tag structure.
>>>>
>>>> I think I conflated two issues and yours may not be the one with the
>>>> weird Amazon NIC. In any case, the VPD does not match the spec and two
>>>> people have confirmed it's just full of 3's. With the bogus subvendor
>>>> ID, I'm thinking this is not an Intel NIC.
>
> A series of 0x03 0x03 0x03 ... bytes would decode as "small items of
> type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ...
> bytes.  That would decode to a series of "small items of type 06",
> each of length four (one byte for the tag, three bytes of data).
>
> Prior to v5.15, we would complain "invalid short VPD tag 06" and stop
> reading.  As of v5.15, I think we'll just keep reading looking for a
> valid "end" tag, but we'll never find one.
>
> I think in v5.15 there will be no error message because the series of
> four-byte small data items happens to fit exactly in the maximum 32KB
> size of VPD and is technically legal syntactic structure, even though
> it makes no sense.
>
> But it will be much slower and might account for the boot slowdown
> Dave reported.
>
>>> In an earlier mail in this thread was stated that subvendor id is unknown.
>>> Checking here https://pcisig.com/membership/member-companies?combine=1dcf
>>> it says: Beijing Sinead Technology Co., Ltd.
>>
>> Huh. I didn't realize there was an official list beyond pciids.ucw.cz.
>>
>> In any case, that's who you need to talk to about the unlisted (to
>> Linux) vendor ID and also the odd VPD data.
>
> https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it
> is basically crowd-sourced data, not the "official" Vendor IDs
> controlled by the PCI SIG.
>
> I submitted an addition to https://pci-ids.ucw.cz/
>
> Bjorn

Just for my edumacation, do they keep track of device IDs, subvendor IDs
(which are probably just the same as vendor IDs), and subdevice IDs in
the PCI SIG? Or even the branding strings?

Todd Fujinaka <todd.fujinaka@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 23:46                               ` Hisashi T Fujinaka
@ 2021-09-17 15:09                                 ` Bjorn Helgaas
  0 siblings, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2021-09-17 15:09 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Dave Jones, linux-pci@vger.kernel.org, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas, Heiner Kallweit

On Wed, Sep 15, 2021 at 04:46:54PM -0700, Hisashi T Fujinaka wrote:
> On Wed, 15 Sep 2021, Bjorn Helgaas wrote:
> 
> > On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote:
> > > On Wed, 15 Sep 2021, Heiner Kallweit wrote:

> > > > In an earlier mail in this thread was stated that subvendor id is unknown.
> > > > Checking here https://pcisig.com/membership/member-companies?combine=1dcf
> > > > it says: Beijing Sinead Technology Co., Ltd.
> > > 
> > > Huh. I didn't realize there was an official list beyond pciids.ucw.cz.
> > > 
> > > In any case, that's who you need to talk to about the unlisted (to
> > > Linux) vendor ID and also the odd VPD data.
> > 
> > https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it
> > is basically crowd-sourced data, not the "official" Vendor IDs
> > controlled by the PCI SIG.
> > 
> > I submitted an addition to https://pci-ids.ucw.cz/
> > 
> > Bjorn
> 
> Just for my edumacation, do they keep track of device IDs, subvendor IDs
> (which are probably just the same as vendor IDs), and subdevice IDs in
> the PCI SIG? Or even the branding strings?

The PCI SIG does not manage Device IDs.  That's the responsibility of
the vendor.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-09-17 15:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAHk-=wgbygOb3hRV+7YOpVcMPTP2oQ=iw6tf09Ydspg7o7BsWQ@mail.gmail.com>
     [not found] ` <20210913141818.GA27911@codemonkey.org.uk>
2021-09-13 18:59   ` Linux 5.15-rc1 Heiner Kallweit
2021-09-13 19:51     ` Linus Torvalds
2021-09-13 20:09       ` Bjorn Helgaas
2021-09-13 20:11       ` Heiner Kallweit
2021-09-13 20:15         ` Linus Torvalds
2021-09-13 20:15     ` Dave Jones
2021-09-13 20:22       ` Heiner Kallweit
2021-09-13 20:32         ` Dave Jones
2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
2021-09-13 23:32             ` [Intel-wired-lan] " Hisashi T Fujinaka
2021-09-14  5:51               ` Heiner Kallweit
2021-09-14 14:24                 ` Dave Jones
2021-09-14 18:28                   ` Fujinaka, Todd
2021-09-14 20:00                   ` Hisashi T Fujinaka
2021-09-14 21:51                     ` Heiner Kallweit
2021-09-15 14:18                       ` Hisashi T Fujinaka
2021-09-15 16:05                         ` Heiner Kallweit
2021-09-15 16:16                           ` Hisashi T Fujinaka
2021-09-15 22:32                             ` Bjorn Helgaas
2021-09-15 23:46                               ` Hisashi T Fujinaka
2021-09-17 15:09                                 ` Bjorn Helgaas
2021-09-13 20:59           ` Linux 5.15-rc1 Heiner Kallweit
2021-09-13 23:35             ` Bjorn Helgaas
2021-09-13 20:32       ` Heiner Kallweit
2021-09-13 20:36         ` Linus Torvalds
2021-09-13 20:41         ` Dave Jones
2021-09-13 23:46   ` Bjorn Helgaas
2021-09-14  0:39     ` Dave Jones
2021-09-14  6:21     ` Heiner Kallweit
2021-09-14 11:26       ` Bjorn Helgaas
2021-09-14 17:07         ` Heiner Kallweit
2021-09-14 21:55           ` Bjorn Helgaas
2021-09-14 22:06             ` Heiner Kallweit
2021-09-14 22:33               ` Dave Jones
2021-09-15 21:31                 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).