Linux PCI subsystem development
 help / color / mirror / Atom feed
* VPD blacklist of Marvell QLogic 1077/2261
@ 2020-12-17  0:44 Arun Easi
  2020-12-17 17:48 ` Bjorn Helgaas
  0 siblings, 1 reply; 3+ messages in thread
From: Arun Easi @ 2020-12-17  0:44 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Girish Basrur, Quinn Tran

Hi Bjorn,

This is regarding the blacklisting of one of the Marvell QLogic FC
adapter (1077/2261) on VPD area access. The commit that did was
this:

--8<-- pruned commit message --8<--
| commit 0d5370d1d85251e5893ab7c90a429464de2e140b
| Author: Ethan Zhao <ethan.zhao@oracle.com>
| Date:   Mon Feb 27 17:08:44 2017 +0900
| 
|     PCI: Prevent VPD access for QLogic ISP2722
|     Call Trace:
|      <NMI>  [<ffffffff817193de>] dump_stack+0x63/0x81
|      [<ffffffff81714072>] panic+0xd0/0x20e
|      [<ffffffff8101c8b4>] do_nmi+0xf4/0x170
|      <<EOE>>  [<ffffffff815db4b3>] raw_pci_read+0x23/0x40
|      [<ffffffff815db4fc>] pci_read+0x2c/0x30
|      [<ffffffff8136f612>] pci_user_read_config_word+0x72/0x110
|      [<ffffffff8136f746>] pci_vpd_pci22_wait+0x96/0x130
|      [<ffffffff8136ff9b>] pci_vpd_pci22_read+0xdb/0x1a0
|      [<ffffffff8136ea30>] pci_read_vpd+0x20/0x30


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d5370d1d85251e5893ab7c90a429464de2e140b

While investigating the original report of the issue, I found an
interesting information that may explain why Ethan Zhao was
hitting the NMI/crash.

If you notice the stack referred in the commit, you could see that
a bunch of old vpd access functions, pci_vpd_pci22*, were referred.
When these functions were in use (these functions were renamed
around 2016 Feb by f1cd93f9aabe), there was a critical VPD
access bug missing; missing in fact most of the life of those
functions.

This one:
    104daa71b396: PCI: Determine actual VPD size on first access

Without this patch, a read of the vpd sysfs file can access area
outside of VPD space, which is not allowed by the spec.

My guess here is that, Ethan, when trying to access VPD of the QLogic
1077/2261 adapter, was using a kernel that had the bug present and
it led up to the NMI/crash he was observing on his machine.

We had an early firmware that returned CA on VPD access beyond
bounds that is known to NMI some servers. The FW has since changed to
not clear the VPD flag upon out-of-bound access.

In light of the above, plus the fact that I did try the
experiment on multiple setups and was not able to reproduce the
issue, would you be willing to revert the above patch? If so, I
could send a git revert (or equivalent) patch of the commit.

This blacklisting is preventing multiple customers from accessing
the VPD area of the said production adapter and making their life
a bit difficult.

Regards,
-Arun

Old discussion of the topic:
    https://lkml.org/lkml/2019/5/21/991

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: VPD blacklist of Marvell QLogic 1077/2261
  2020-12-17  0:44 VPD blacklist of Marvell QLogic 1077/2261 Arun Easi
@ 2020-12-17 17:48 ` Bjorn Helgaas
  2020-12-19  0:27   ` [EXT] " Arun Easi
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2020-12-17 17:48 UTC (permalink / raw)
  To: Arun Easi; +Cc: linux-pci, Bjorn Helgaas, Girish Basrur, Quinn Tran

On Wed, Dec 16, 2020 at 04:44:47PM -0800, Arun Easi wrote:
> Hi Bjorn,
> 
> This is regarding the blacklisting of one of the Marvell QLogic FC
> adapter (1077/2261) on VPD area access. The commit that did was
> this:
> 
> --8<-- pruned commit message --8<--
> | commit 0d5370d1d85251e5893ab7c90a429464de2e140b
> | Author: Ethan Zhao <ethan.zhao@oracle.com>
> | Date:   Mon Feb 27 17:08:44 2017 +0900
> | 
> |     PCI: Prevent VPD access for QLogic ISP2722
> |     Call Trace:
> |      <NMI>  [<ffffffff817193de>] dump_stack+0x63/0x81
> |      [<ffffffff81714072>] panic+0xd0/0x20e
> |      [<ffffffff8101c8b4>] do_nmi+0xf4/0x170
> |      <<EOE>>  [<ffffffff815db4b3>] raw_pci_read+0x23/0x40
> |      [<ffffffff815db4fc>] pci_read+0x2c/0x30
> |      [<ffffffff8136f612>] pci_user_read_config_word+0x72/0x110
> |      [<ffffffff8136f746>] pci_vpd_pci22_wait+0x96/0x130
> |      [<ffffffff8136ff9b>] pci_vpd_pci22_read+0xdb/0x1a0
> |      [<ffffffff8136ea30>] pci_read_vpd+0x20/0x30
> 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d5370d1d85251e5893ab7c90a429464de2e140b
> 
> While investigating the original report of the issue, I found an
> interesting information that may explain why Ethan Zhao was
> hitting the NMI/crash.
> 
> If you notice the stack referred in the commit, you could see that
> a bunch of old vpd access functions, pci_vpd_pci22*, were referred.
> When these functions were in use (these functions were renamed
> around 2016 Feb by f1cd93f9aabe), there was a critical VPD
> access bug missing; missing in fact most of the life of those
> functions.
> 
> This one:
>     104daa71b396: PCI: Determine actual VPD size on first access
> 
> Without this patch, a read of the vpd sysfs file can access area
> outside of VPD space, which is not allowed by the spec.
> 
> My guess here is that, Ethan, when trying to access VPD of the QLogic
> 1077/2261 adapter, was using a kernel that had the bug present and
> it led up to the NMI/crash he was observing on his machine.
> 
> We had an early firmware that returned CA on VPD access beyond
> bounds that is known to NMI some servers. The FW has since changed to
> not clear the VPD flag upon out-of-bound access.
> 
> In light of the above, plus the fact that I did try the
> experiment on multiple setups and was not able to reproduce the
> issue, would you be willing to revert the above patch? If so, I
> could send a git revert (or equivalent) patch of the commit.
> 
> This blacklisting is preventing multiple customers from accessing
> the VPD area of the said production adapter and making their life
> a bit difficult.
> 
> Regards,
> -Arun
> 
> Old discussion of the topic:
>     https://lkml.org/lkml/2019/5/21/991

It makes sense to revert 0d5370d1d852 ("PCI: Prevent VPD access for
QLogic ISP2722") as long as the resulting kernel works correctly on
all versions of the 2722 firmware.  We have to assume some customers
still have the old firmware on their adapters.

Bjorn

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [EXT] Re: VPD blacklist of Marvell QLogic 1077/2261
  2020-12-17 17:48 ` Bjorn Helgaas
@ 2020-12-19  0:27   ` Arun Easi
  0 siblings, 0 replies; 3+ messages in thread
From: Arun Easi @ 2020-12-19  0:27 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Bjorn Helgaas, Girish Basrur, Quinn Tran

On Thu, 17 Dec 2020, 9:48am, Bjorn Helgaas wrote:

> ----------------------------------------------------------------------
> On Wed, Dec 16, 2020 at 04:44:47PM -0800, Arun Easi wrote:
> > Hi Bjorn,
> > 
> > This is regarding the blacklisting of one of the Marvell QLogic FC
> > adapter (1077/2261) on VPD area access. The commit that did was
> > this:
> > 
> > --8<-- pruned commit message --8<--
> > | commit 0d5370d1d85251e5893ab7c90a429464de2e140b
> > | Author: Ethan Zhao <ethan.zhao@oracle.com>
> > | Date:   Mon Feb 27 17:08:44 2017 +0900
> > | 
> > |     PCI: Prevent VPD access for QLogic ISP2722
> > |     Call Trace:
> > |      <NMI>  [<ffffffff817193de>] dump_stack+0x63/0x81
> > |      [<ffffffff81714072>] panic+0xd0/0x20e
> > |      [<ffffffff8101c8b4>] do_nmi+0xf4/0x170
> > |      <<EOE>>  [<ffffffff815db4b3>] raw_pci_read+0x23/0x40
> > |      [<ffffffff815db4fc>] pci_read+0x2c/0x30
> > |      [<ffffffff8136f612>] pci_user_read_config_word+0x72/0x110
> > |      [<ffffffff8136f746>] pci_vpd_pci22_wait+0x96/0x130
> > |      [<ffffffff8136ff9b>] pci_vpd_pci22_read+0xdb/0x1a0
> > |      [<ffffffff8136ea30>] pci_read_vpd+0x20/0x30
> > 
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3D0d5370d1d85251e5893ab7c90a429464de2e140b&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=P-q_Qkt75qFy33SvdD2nAxAyN87eO1d-mFO-lqNOomw&m=ELM92cQ6_T7TDXwxkSNKe9OLqFeqz7taCSqUM65T0s0&s=FIa8kmuzs8pePhWEFklKpu49_i72T5mJMZgLp1QrEg0&e= 
> > 
> > While investigating the original report of the issue, I found an
> > interesting information that may explain why Ethan Zhao was
> > hitting the NMI/crash.
> > 
> > If you notice the stack referred in the commit, you could see that
> > a bunch of old vpd access functions, pci_vpd_pci22*, were referred.
> > When these functions were in use (these functions were renamed
> > around 2016 Feb by f1cd93f9aabe), there was a critical VPD
> > access bug missing; missing in fact most of the life of those
> > functions.
> > 
> > This one:
> >     104daa71b396: PCI: Determine actual VPD size on first access
> > 
> > Without this patch, a read of the vpd sysfs file can access area
> > outside of VPD space, which is not allowed by the spec.
> > 
> > My guess here is that, Ethan, when trying to access VPD of the QLogic
> > 1077/2261 adapter, was using a kernel that had the bug present and
> > it led up to the NMI/crash he was observing on his machine.
> > 
> > We had an early firmware that returned CA on VPD access beyond
> > bounds that is known to NMI some servers. The FW has since changed to
> > not clear the VPD flag upon out-of-bound access.
> > 
> > In light of the above, plus the fact that I did try the
> > experiment on multiple setups and was not able to reproduce the
> > issue, would you be willing to revert the above patch? If so, I
> > could send a git revert (or equivalent) patch of the commit.
> > 
> > This blacklisting is preventing multiple customers from accessing
> > the VPD area of the said production adapter and making their life
> > a bit difficult.
> > 
> > Regards,
> > -Arun
> > 
> > Old discussion of the topic:
> >     https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_5_21_991&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=P-q_Qkt75qFy33SvdD2nAxAyN87eO1d-mFO-lqNOomw&m=ELM92cQ6_T7TDXwxkSNKe9OLqFeqz7taCSqUM65T0s0&s=1vnPz2Y-ziZnOVBFLuc9QupJeVDdtUUJTMJrNknSe7o&e= 
> 
> It makes sense to revert 0d5370d1d852 ("PCI: Prevent VPD access for
> QLogic ISP2722") as long as the resulting kernel works correctly on
> all versions of the 2722 firmware.  We have to assume some customers
> still have the old firmware on their adapters.
> 
> Bjorn
> 

Thanks Bjorn. I'll post the patch with a stable cc of v4.6+, which has the 
VPD size check available.

Regards,
-Arun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-19  0:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-17  0:44 VPD blacklist of Marvell QLogic 1077/2261 Arun Easi
2020-12-17 17:48 ` Bjorn Helgaas
2020-12-19  0:27   ` [EXT] " Arun Easi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox