[REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
@ 2018-01-23 12:29 Arjun Vynipadath
  2018-01-23 12:43 ` Ganesh Goudar
  2018-02-12 17:43 ` Bjorn Helgaas
  0 siblings, 2 replies; 6+ messages in thread
From: Arjun Vynipadath @ 2018-01-23 12:29 UTC (permalink / raw)
  To: bhelgaas, linux-pci, linux-kernel, davem, netdev
  Cc: arjun, leedom, santosh, ganeshgr, nirranjan, kumaras, swise

Sending on behalf of "Casey Leedom <leedom@chelsio.com>"

Way back on April 11, 2016 we reported a regression in Linux kernel 4.6-rc2 
brought on by kernel.org commit 104daa71b396. This commit calculates the 
size of a PCI Device's VPD area by parsing the VPD Structure at offset 0x000, 
and restricts accesses to the VPD to that computed size.

Our devices have a second VPD structure which is located starting at offset 
0x400 which is the "real" VPD[1].  The 104daa71b396 commit (plus a follow on 
commit 408641e93aa5) caused efforts to read past the end of that computed 
length of the VPD to return silently without error leaving stack junk in the 
VPD read buffers.

We introduced kernel.org commit cb92148b to allow a driver to tell the 
kernel how large the VPD area really is, introducing a new API 
pci_set_vpd_size() for this purpose.

Now we've discovered a new subtlety to the problem.

We have a KVM Hypervisor running a 4.9.70 kernel.  So it has all of the 
above commits.  When we attach our Physical Function 4 to a Virtual Machine 
and attempt to run cxgb4 in that VM, we see the problem again.  The issue is 
that all of the VM Guest OS's efforts to access the PCIe VPD Capability are 
trapped into the KVM 4.9.70 kernel and executed there, with the results 
routed back to the VM Guest OS.  The cxgb4 driver in the VM Guest OS uses 
the new pci_set_vpd_size() to notify the OS of the true size of the VPD, but 
that information of course is never sent to the KVM 4.9.70 Hypervisor. 
(And, truth be told, if the Guest OS were older than 4.6, it wouldn't even 
know that it needed to do this.)  The result is that again we get silent VPD 
read failures with random stack garbage in the VPD read buffers. (sigh) 

It strikes me that the only way to handle this issue is to have KVM 
circumvent the VPD-Size Restricted logic which was added in kernel.org 
commits 104daa71b396 and 408641e93aa5.  Maybe via a __pci_read_vpd() or 
similar API.  But we are open to other suggestions.

Thoughts?

Casey.

[1] Chelsio adapters actually have two VPD structures stored in the VPD.  An
    abbreviated on at Offset 0x0 and the complete VPD at Offset 0x400.  The
    abbreviated one only contains the PN, SN and EC Keywords, while the
    complete VPD contains those plus various adapter constants contained in
    V0, V1, etc.  And it also contains the Base Ethernet MAC Address in the
    "NA" Keyword which the cxgb4 driver needs when it can't contact the
    adapter firmware.  (We don't have the "NA" Keyword in the VPD Structure
    at Offset 0x000 because that's not an allowed VPD Keyword in the PCI-E
    3.0 specification.)

    Note that two other drivers look like they may also do something
    similar, the Broadcom bnx2x and tg3.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
  2018-01-23 12:29 [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access") Arjun Vynipadath
@ 2018-01-23 12:43 ` Ganesh Goudar
  2018-02-12 17:43 ` Bjorn Helgaas
  1 sibling, 0 replies; 6+ messages in thread
From: Ganesh Goudar @ 2018-01-23 12:43 UTC (permalink / raw)
  To: Arjun Vynipadath
  Cc: bhelgaas, linux-pci, linux-kernel, davem, netdev, leedom, santosh,
	nirranjan, kumaras, swise, hare

+Hannes Reinecke

On Tuesday, January 01/23/18, 2018 at 17:59:09 +0530, Arjun Vynipadath wrote:
> Sending on behalf of "Casey Leedom <leedom@chelsio.com>"
> 
> Way back on April 11, 2016 we reported a regression in Linux kernel 4.6-rc2 
> brought on by kernel.org commit 104daa71b396. This commit calculates the 
> size of a PCI Device's VPD area by parsing the VPD Structure at offset 0x000, 
> and restricts accesses to the VPD to that computed size.
> 
> Our devices have a second VPD structure which is located starting at offset 
> 0x400 which is the "real" VPD[1].  The 104daa71b396 commit (plus a follow on 
> commit 408641e93aa5) caused efforts to read past the end of that computed 
> length of the VPD to return silently without error leaving stack junk in the 
> VPD read buffers.
> 
> We introduced kernel.org commit cb92148b to allow a driver to tell the 
> kernel how large the VPD area really is, introducing a new API 
> pci_set_vpd_size() for this purpose.
> 
> Now we've discovered a new subtlety to the problem.
> 
> We have a KVM Hypervisor running a 4.9.70 kernel.  So it has all of the 
> above commits.  When we attach our Physical Function 4 to a Virtual Machine 
> and attempt to run cxgb4 in that VM, we see the problem again.  The issue is 
> that all of the VM Guest OS's efforts to access the PCIe VPD Capability are 
> trapped into the KVM 4.9.70 kernel and executed there, with the results 
> routed back to the VM Guest OS.  The cxgb4 driver in the VM Guest OS uses 
> the new pci_set_vpd_size() to notify the OS of the true size of the VPD, but 
> that information of course is never sent to the KVM 4.9.70 Hypervisor. 
> (And, truth be told, if the Guest OS were older than 4.6, it wouldn't even 
> know that it needed to do this.)  The result is that again we get silent VPD 
> read failures with random stack garbage in the VPD read buffers. (sigh) 
> 
> It strikes me that the only way to handle this issue is to have KVM 
> circumvent the VPD-Size Restricted logic which was added in kernel.org 
> commits 104daa71b396 and 408641e93aa5.  Maybe via a __pci_read_vpd() or 
> similar API.  But we are open to other suggestions.
> 
> Thoughts?
> 
> Casey.
> 
> [1] Chelsio adapters actually have two VPD structures stored in the VPD.  An
>     abbreviated on at Offset 0x0 and the complete VPD at Offset 0x400.  The
>     abbreviated one only contains the PN, SN and EC Keywords, while the
>     complete VPD contains those plus various adapter constants contained in
>     V0, V1, etc.  And it also contains the Base Ethernet MAC Address in the
>     "NA" Keyword which the cxgb4 driver needs when it can't contact the
>     adapter firmware.  (We don't have the "NA" Keyword in the VPD Structure
>     at Offset 0x000 because that's not an allowed VPD Keyword in the PCI-E
>     3.0 specification.)
> 
>     Note that two other drivers look like they may also do something
>     similar, the Broadcom bnx2x and tg3.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
  2018-01-23 12:29 [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access") Arjun Vynipadath
  2018-01-23 12:43 ` Ganesh Goudar
@ 2018-02-12 17:43 ` Bjorn Helgaas
  2018-02-12 17:56   ` Alexander Duyck
  1 sibling, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2018-02-12 17:43 UTC (permalink / raw)
  To: Arjun Vynipadath
  Cc: bhelgaas, linux-pci, linux-kernel, davem, netdev, leedom, santosh,
	ganeshgr, nirranjan, kumaras, swise, hare

On Tue, Jan 23, 2018 at 05:59:09PM +0530, Arjun Vynipadath wrote:
> Sending on behalf of "Casey Leedom <leedom@chelsio.com>"
> 
> Way back on April 11, 2016 we reported a regression in Linux kernel 4.6-rc2 
> brought on by kernel.org commit 104daa71b396. This commit calculates the 
> size of a PCI Device's VPD area by parsing the VPD Structure at offset 0x000, 
> and restricts accesses to the VPD to that computed size.
> 
> Our devices have a second VPD structure which is located starting at offset 
> 0x400 which is the "real" VPD[1].  The 104daa71b396 commit (plus a follow on 
> commit 408641e93aa5) caused efforts to read past the end of that computed 
> length of the VPD to return silently without error leaving stack junk in the 
> VPD read buffers.
> 
> We introduced kernel.org commit cb92148b to allow a driver to tell the 
> kernel how large the VPD area really is, introducing a new API 
> pci_set_vpd_size() for this purpose.
> 
> Now we've discovered a new subtlety to the problem.
> 
> We have a KVM Hypervisor running a 4.9.70 kernel.  So it has all of the 
> above commits.  When we attach our Physical Function 4 to a Virtual Machine 
> and attempt to run cxgb4 in that VM, we see the problem again.  The issue is 
> that all of the VM Guest OS's efforts to access the PCIe VPD Capability are 
> trapped into the KVM 4.9.70 kernel and executed there, with the results 
> routed back to the VM Guest OS.  The cxgb4 driver in the VM Guest OS uses 
> the new pci_set_vpd_size() to notify the OS of the true size of the VPD, but 
> that information of course is never sent to the KVM 4.9.70 Hypervisor. 
> (And, truth be told, if the Guest OS were older than 4.6, it wouldn't even 
> know that it needed to do this.)  The result is that again we get silent VPD 
> read failures with random stack garbage in the VPD read buffers. (sigh) 

Let me pull out one tiny piece of this problem: If the VPD read
returns failure, the caller should not look at the read buffer.  But
we should *never* copy random stack garbage into the read buffer, no
matter what the VPD read returns.

I guess it's the 4.9.70 kernel that's putting garbage into the VPD
read buffer?  Is this something that needs to be fixed in the current
upstream kernel?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
  2018-02-12 17:43 ` Bjorn Helgaas
@ 2018-02-12 17:56   ` Alexander Duyck
  2018-02-12 18:06     ` Casey Leedom
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Duyck @ 2018-02-12 17:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Arjun Vynipadath, Bjorn Helgaas, linux-pci, linux-kernel,
	David Miller, Netdev, Casey Leedom, Santosh Raspatur,
	Ganesh Goudar, nirranjan, kumaras, swise, Hannes Reinecke

On Mon, Feb 12, 2018 at 9:43 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Tue, Jan 23, 2018 at 05:59:09PM +0530, Arjun Vynipadath wrote:
>> Sending on behalf of "Casey Leedom <leedom@chelsio.com>"
>>
>> Way back on April 11, 2016 we reported a regression in Linux kernel 4.6-rc2
>> brought on by kernel.org commit 104daa71b396. This commit calculates the
>> size of a PCI Device's VPD area by parsing the VPD Structure at offset 0x000,
>> and restricts accesses to the VPD to that computed size.
>>
>> Our devices have a second VPD structure which is located starting at offset
>> 0x400 which is the "real" VPD[1].  The 104daa71b396 commit (plus a follow on
>> commit 408641e93aa5) caused efforts to read past the end of that computed
>> length of the VPD to return silently without error leaving stack junk in the
>> VPD read buffers.
>>
>> We introduced kernel.org commit cb92148b to allow a driver to tell the
>> kernel how large the VPD area really is, introducing a new API
>> pci_set_vpd_size() for this purpose.
>>
>> Now we've discovered a new subtlety to the problem.
>>
>> We have a KVM Hypervisor running a 4.9.70 kernel.  So it has all of the
>> above commits.  When we attach our Physical Function 4 to a Virtual Machine
>> and attempt to run cxgb4 in that VM, we see the problem again.  The issue is
>> that all of the VM Guest OS's efforts to access the PCIe VPD Capability are
>> trapped into the KVM 4.9.70 kernel and executed there, with the results
>> routed back to the VM Guest OS.  The cxgb4 driver in the VM Guest OS uses
>> the new pci_set_vpd_size() to notify the OS of the true size of the VPD, but
>> that information of course is never sent to the KVM 4.9.70 Hypervisor.
>> (And, truth be told, if the Guest OS were older than 4.6, it wouldn't even
>> know that it needed to do this.)  The result is that again we get silent VPD
>> read failures with random stack garbage in the VPD read buffers. (sigh)
>
> Let me pull out one tiny piece of this problem: If the VPD read
> returns failure, the caller should not look at the read buffer.  But
> we should *never* copy random stack garbage into the read buffer, no
> matter what the VPD read returns.
>
> I guess it's the 4.9.70 kernel that's putting garbage into the VPD
> read buffer?  Is this something that needs to be fixed in the current
> upstream kernel?

My guess would be that it is not random stack garbage that is being
put in the read. If the read buffer was not initialized it will
contain random data. I suspect that is what we are likely seeing is
the use of uninitialized memory for the read buffer.

There should already be a fix for this that was added in commit
1c7de2b4ff88 "PCI: Enable access to non-standard VPD for Chelsio
devices (cxgb3)". If you are missing device IDs you need to add them
to the existing quirk in order to make certain all your devices are
covered. Just adding the quirk to your driver won't fix the issue.
Adding a workaround to KVM that allows for you to circumvent this
would just make for a huge security hole since as I recall one of the
reasons why we were putting the limits on this in the first place was
because access to uninitialized VPD memory on some devices was causing
some pretty serious issues.

- Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
  2018-02-12 17:56   ` Alexander Duyck
@ 2018-02-12 18:06     ` Casey Leedom
  0 siblings, 0 replies; 6+ messages in thread
From: Casey Leedom @ 2018-02-12 18:06 UTC (permalink / raw)
  To: Alexander Duyck, Bjorn Helgaas
  Cc: Arjun V., Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, David Miller, Netdev,
	Santosh Rastapur, Ganesh GR, Nirranjan Kirubaharan, Kumar A S,
	SWise OGC, Hannes Reinecke

  Yes, the issue is not that random stack garbage is being copied.  It's
that the VPD Read silently returns without any error if the read exceeds th=
e
computed/set size (something I'm still not comfortable with).

  And yes, we did try to address this with kernel.org commit cb92148b where
I introduced the new pci_set_vpd_size() API.  Later, in commit 1c7de2b4ff88=
,
Alexey Kardashevskiy introduced the cxgb3 PCIe Quirk using the new
pci_set_vpd_size() API for our T3-based adapters rather than modifying the
driver code as I'd done for cxgb4.  In talking with Bjorn I think that
Alexey's approach is better and I'm going to work up a patch to do the same
for all of our adapters.

  Thanks to everyone for helping talk through this!

Casey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
@ 2016-04-12  5:29 Hariprasad Shenai
  0 siblings, 0 replies; 6+ messages in thread
From: Hariprasad Shenai @ 2016-04-12  5:29 UTC (permalink / raw)
  To: bhelgaas, linux-pci, linux-kernel, davem, netdev
  Cc: swise, leedom, santosh, kumaras, nirranjan

Hi All,

The following patch introduced a regression, causing cxgb4 driver to fail in
PCIe probe.

commit 104daa71b39614343929e1982170d5fcb0569bb5
Author: Hannes Reinecke <hare@suse.de>
Author: Hannes Reinecke <hare@suse.de>
Date:   Mon Feb 15 09:42:01 2016 +0100

    PCI: Determine actual VPD size on first access

    PCI-2.2 VPD entries have a maximum size of 32k, but might actually be
    smaller than that.  To figure out the actual size one has to read the VPD
    area until the 'end marker' is reached.

    Per spec, reading outside of the VPD space is "not allowed."  In practice,
    it may cause simple read errors or even crash the card.  To make matters
    worse not every PCI card implements this properly, leaving us with no 'end'
    marker or even completely invalid data.

    Try to determine the size of the VPD data when it's first accessed.  If no
    valid data can be read an I/O error will be returned when reading or
    writing the sysfs attribute.

    As the amount of VPD data is unknown initially the size of the sysfs
    attribute will always be set to '0'.

    [bhelgaas: changelog, use 0/1 (not false/true) for bitfield, tweak
    pci_vpd_pci22_read() error checking]
    Tested-by: Shane Seymour <shane.seymour@hpe.com>
    Tested-by: Babu Moger <babu.moger@oracle.com>
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Alexander Duyck <alexander.duyck@gmail.com>

The problem is stemming from the fact that the Chelsio adapters actually have
two VPD structures stored in the VPD.  An abbreviated on at Offset 0x0 and the
complete VPD at Offset 0x400.  The abbreviated one only contains the PN, SN and
EC Keywords, while the complete VPD contains those plus various adapter
constants contained in V0, V1, etc.  And it also contains the Base Ethernet MAC
Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact
the adapter firmware.  (We don't have the "NA" Keywork in the VPD Structure at
Offset 0x0 because that's not an allowed VPD Keyword in the PCI-E 3.0
specification.)

  With the new code, the computed size of the VPD is 0x200 and so our efforts
to read the VPD at Offset 0x400 silently fails. We check the result of the
read looking for a signature 0x82 byte but we're checking against random stack
garbage.

  The end result is that the cxgb4 driver now fails the PCI-E Probe.

Thanks,
Hari

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-02-12 18:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-23 12:29 [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access") Arjun Vynipadath
2018-01-23 12:43 ` Ganesh Goudar
2018-02-12 17:43 ` Bjorn Helgaas
2018-02-12 17:56   ` Alexander Duyck
2018-02-12 18:06     ` Casey Leedom
  -- strict thread matches above, loose matches on Subject: below --
2016-04-12  5:29 Hariprasad Shenai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).