All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@redhat.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH] pci: Use a bus-global mutex to protect VPD operations
Date: Tue, 19 May 2015 17:07:05 -0700	[thread overview]
Message-ID: <555BD029.7050803@redhat.com> (raw)
In-Reply-To: <20150519160158.00002cd6@unknown>



On 05/19/2015 04:01 PM, Jesse Brandeburg wrote:
> On Tue, 19 May 2015 10:55:03 -0700
> Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>
>> On 05/18/2015 05:00 PM, Mark D Rustad wrote:
>>> Some devices have a problem with concurrent VPD access to different
>>> functions of the same physical device, so move the protecting mutex
>>> from the pci_vpd structure to the pci_bus structure. There are a
>>> number of reports on support sites for a variety of devices from
>>> various vendors getting the "vpd r/w failed" message. This is likely
>>> to at least fix some of them. Thanks to Shannon Nelson for helping
>>> to come up with this approach.
>>>
>>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>>> Acked-by: Shannon Nelson <shannon.nelson@intel.com>
>>> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Instead of moving the mutex lock around you would be much better served
>> by simply removing the duplicate VPD entries for a given device in a
>> PCIe quirk.  Then you can save yourself the extra pain and effort of
>> having to deal with serialized VPD accesses for a multifunction device.
>>
>> The logic for the quirk should be fairly simple.
>>     1.  Scan for any other devices with VPD that share the same bus and
>> device number.
>>     2.  If bdf is equal to us keep searching.
>>     3.  If bdf is less than our bdf we release our VPD area and set VPD
>> pointer to NULL.
> But Alex if you do this you're violating the principle of least
> surprise, not to mention changing a user-space interface which should
> not be done.

I'm willing to back off on dropping the VPD info for those functions 
entirely, but the lock should not be pushed to the bus.

> Mark's solution is pretty graceful and solves the issue at heart, which
> is that
> 1) several Intel chips have this issue
> 2) it appears that several other vendor's chips have this issue (or
> similar) as well, but even if they don't Mark's fix will not change
> their general operation, only make a small serializing effect when
> multiple simultaneous reads are made.

2 is based on a false premise.  The "vpd r/w failed" error is about as 
common as dev_watchdog().  Just because it presents with a similar 
symptom doesn't mean it is the same issue.

> This is a reasonably small fix, with a small kernel footprint, which
> does not require changing user expectations or violating user-space
> semantics that are already established, so I support it as is.

I am not against the shared lock approach, but the bus is the wrong 
place for this.  Sharing a bus does not mean that the devices are all a 
part of the same chip, it only means they share a bus.  I would guess 
that this fix has not been tested with any LOM parts such as e1000e, or 
in a virtualization environment, as this would exhibit different 
behavior with this patch.  For example does it make sense for an e1000e 
LOM to be joined at the hip with a SATA or USB controller.  They could 
all be from different manufacturers with different requirements.

If the bug is in Intel Ethernet with VPD then I would suggest tweaking 
the VPD logic and adding a Intel Ethernet PCI quirk.  It doesn't make 
sense to assume based on one common error message that all of creation 
has the same issue.

If anything I believe Mark's patches have revealed a bigger issue. That 
is the fact that the sysfs file is reading outside of the VPD area which 
the PCI spec doesn't have a defined behavior for.  I suspect this is the 
cause of a number of the issues being reported as Broadcom had to 
specifically quirk to prevent it, and I found one discussion that 
indicated something similar might be needed for Realtek.

- Alex





WARNING: multiple messages have this Message-ID (diff)
From: Alexander Duyck <alexander.h.duyck@redhat.com>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Mark D Rustad <mark.d.rustad@intel.com>,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH] pci: Use a bus-global mutex to protect VPD operations
Date: Tue, 19 May 2015 17:07:05 -0700	[thread overview]
Message-ID: <555BD029.7050803@redhat.com> (raw)
In-Reply-To: <20150519160158.00002cd6@unknown>



On 05/19/2015 04:01 PM, Jesse Brandeburg wrote:
> On Tue, 19 May 2015 10:55:03 -0700
> Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>
>> On 05/18/2015 05:00 PM, Mark D Rustad wrote:
>>> Some devices have a problem with concurrent VPD access to different
>>> functions of the same physical device, so move the protecting mutex
>>> from the pci_vpd structure to the pci_bus structure. There are a
>>> number of reports on support sites for a variety of devices from
>>> various vendors getting the "vpd r/w failed" message. This is likely
>>> to at least fix some of them. Thanks to Shannon Nelson for helping
>>> to come up with this approach.
>>>
>>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>>> Acked-by: Shannon Nelson <shannon.nelson@intel.com>
>>> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Instead of moving the mutex lock around you would be much better served
>> by simply removing the duplicate VPD entries for a given device in a
>> PCIe quirk.  Then you can save yourself the extra pain and effort of
>> having to deal with serialized VPD accesses for a multifunction device.
>>
>> The logic for the quirk should be fairly simple.
>>     1.  Scan for any other devices with VPD that share the same bus and
>> device number.
>>     2.  If bdf is equal to us keep searching.
>>     3.  If bdf is less than our bdf we release our VPD area and set VPD
>> pointer to NULL.
> But Alex if you do this you're violating the principle of least
> surprise, not to mention changing a user-space interface which should
> not be done.

I'm willing to back off on dropping the VPD info for those functions 
entirely, but the lock should not be pushed to the bus.

> Mark's solution is pretty graceful and solves the issue at heart, which
> is that
> 1) several Intel chips have this issue
> 2) it appears that several other vendor's chips have this issue (or
> similar) as well, but even if they don't Mark's fix will not change
> their general operation, only make a small serializing effect when
> multiple simultaneous reads are made.

2 is based on a false premise.  The "vpd r/w failed" error is about as 
common as dev_watchdog().  Just because it presents with a similar 
symptom doesn't mean it is the same issue.

> This is a reasonably small fix, with a small kernel footprint, which
> does not require changing user expectations or violating user-space
> semantics that are already established, so I support it as is.

I am not against the shared lock approach, but the bus is the wrong 
place for this.  Sharing a bus does not mean that the devices are all a 
part of the same chip, it only means they share a bus.  I would guess 
that this fix has not been tested with any LOM parts such as e1000e, or 
in a virtualization environment, as this would exhibit different 
behavior with this patch.  For example does it make sense for an e1000e 
LOM to be joined at the hip with a SATA or USB controller.  They could 
all be from different manufacturers with different requirements.

If the bug is in Intel Ethernet with VPD then I would suggest tweaking 
the VPD logic and adding a Intel Ethernet PCI quirk.  It doesn't make 
sense to assume based on one common error message that all of creation 
has the same issue.

If anything I believe Mark's patches have revealed a bigger issue. That 
is the fact that the sysfs file is reading outside of the VPD area which 
the PCI spec doesn't have a defined behavior for.  I suspect this is the 
cause of a number of the issues being reported as Broadcom had to 
specifically quirk to prevent it, and I found one discussion that 
indicated something similar might be needed for Realtek.

- Alex





  reply	other threads:[~2015-05-20  0:07 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-19  0:00 [Intel-wired-lan] [PATCH] pci: Use a bus-global mutex to protect VPD operations Mark D Rustad
2015-05-19  0:00 ` Mark D Rustad
2015-05-19 17:55 ` [Intel-wired-lan] " Alexander Duyck
2015-05-19 17:55   ` Alexander Duyck
2015-05-19 18:28   ` Rustad, Mark D
2015-05-19 18:28     ` Rustad, Mark D
2015-05-19 20:58     ` Alexander Duyck
2015-05-19 20:58       ` Alexander Duyck
2015-05-19 21:53       ` Rustad, Mark D
2015-05-19 21:53         ` Rustad, Mark D
2015-05-19 23:19         ` Alexander Duyck
2015-05-19 23:19           ` Alexander Duyck
2015-05-19 23:01   ` Jesse Brandeburg
2015-05-19 23:01     ` Jesse Brandeburg
2015-05-20  0:07     ` Alexander Duyck [this message]
2015-05-20  0:07       ` Alexander Duyck
2015-05-20  0:34       ` Rustad, Mark D
2015-05-20  0:34         ` Rustad, Mark D
2015-05-20  1:02         ` Alexander Duyck
2015-05-20  1:02           ` Alexander Duyck
2015-05-20 16:00           ` Rustad, Mark D
2015-05-20 16:00             ` Rustad, Mark D
2015-05-20 21:26             ` Alexander Duyck
2015-05-20 21:26               ` Alexander Duyck
2015-05-27 17:27 ` Bjorn Helgaas
2015-05-27 17:27   ` Bjorn Helgaas
2015-05-27 19:11   ` [Intel-wired-lan] " Rustad, Mark D
2015-05-27 19:11     ` Rustad, Mark D
  -- strict thread matches above, loose matches on Subject: below --
2015-05-16  0:14 [Intel-wired-lan] " Mark D Rustad
2015-05-18 23:31 ` Nelson, Shannon
2015-05-18 23:45   ` Jeff Kirsher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=555BD029.7050803@redhat.com \
    --to=alexander.h.duyck@redhat.com \
    --cc=intel-wired-lan@osuosl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.