From: Keith Busch <keith.busch@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
Wei Zhang <wzhang@fb.com>, Jens Axboe <axboe@fb.com>
Subject: Re: [PATCH 2/3] pci/msix: Skip disabling removed devices
Date: Fri, 19 Aug 2016 13:11:45 -0400 [thread overview]
Message-ID: <20160819171145.GB28276@localhost.localdomain> (raw)
In-Reply-To: <20160818232941.GT27353@localhost>
On Thu, Aug 18, 2016 at 06:29:41PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 08, 2016 at 01:14:26PM -0600, Keith Busch wrote:
> > @@ -999,6 +999,11 @@ void pci_msix_shutdown(struct pci_dev *dev)
> > if (!pci_msi_enable || !dev || !dev->msix_enabled)
> > return;
> >
> > + if (!pci_device_is_present(dev)) {
>
> It doesn't really make sense to me to use pci_device_is_present()
> (which calls pci_bus_read_dev_vendor_id()) for this purpose.
Roger that.
Based of some feedback from Lukas, it sounds like we could add a new
state to pci_dev for "is_removed" that can be set by pciehp or pcie-dpc.
Does that sound reasonable? If so, I think we can use that criteria
instead.
> Adding a new config read and checking for failure seems like just
> narrowing the window -- a device that stops responding between this
> point and the next required config read could still cause a problem.
That's true, but it's not a window that concerns us in real life. It just
doesn't really occur that a driver is unbinding from a device right as you
hot remove it. The driver is unbinding from the device _because_ it was
hot removed, so presence is down long before we release the MSI vectors.
But if we want to close that gab, it should be trivial if we can add the
is_removed state to pci_dev.
> Is this fixing a performance problem? What's the specific motivation
> for this?
When you hot remove a device, a non-posted command to that device must be
handled by something. This is what I meant by "completion synthesis",
though I must have pulled that terminology from PCI DPC.
This series plus some additional enhancements we've made sense the
initial posting significantly reduces total teardown time. We've seen
transactions to non-existent devices go from ~1000 down to just 1. These
take on the order of milliseconds for hardware to produce the completion
that allows software to proceed, so the many unnecessary commands add
up very quickly and noticeably degrades user experience.
Also, considering Linux PCI enumeration is serialized with a single mutex,
this becomes a very big deal when hot plugging entire trays of devices.
If you would like, I would be happy setup a conference call to discuss
this further. We have protocol traces showing timings that I think make
a very compelling case for reducing our software overhead.
> And you also mention "instability with hardware" -- what exactly is
> that? I understand some slowdown if we do config accesses to
> non-existent devices, but I don't understand hardware instability.
I should have left that description out. I'm just a software engineer, and
the root causes you may be looking for are at a lower level than I grok.
But since I already went there, I'll just say occasionally completion
timeout doesn't quite work on time. That's not supposed to happen,
but we don't always get perfection out of devices. After millions of
non-posted commands to devices that don't exist, rare cases of too many
completion timeouts result in machine checking NMI.
I can only say empirically, Linux just works better if we don't
gratuitously rely on hardware to cleanup software's unnecessary
transactions. That observation alone doesn't justify a change, so I hope
the improved hot plug teardown can carry this patch concept forward.
next prev parent reply other threads:[~2016-08-19 17:11 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
2016-08-18 22:38 ` Bjorn Helgaas
2016-08-08 19:14 ` [PATCH 2/3] pci/msix: Skip disabling " Keith Busch
2016-08-18 23:29 ` Bjorn Helgaas
2016-08-19 17:11 ` Keith Busch [this message]
2016-08-08 19:14 ` [PATCH 3/3] pcie/aer: Cache capability position Keith Busch
2016-08-09 17:33 ` Bjorn Helgaas
2016-09-06 21:05 ` Jon Derrick
2016-09-06 21:18 ` Keith Busch
2016-08-09 17:36 ` [PATCH 0/3] Limiting pci access requests Bjorn Helgaas
2016-08-09 18:56 ` Keith Busch
2016-08-09 18:56 ` Lukas Wunner
2016-08-17 21:05 ` Keith Busch
2016-08-18 14:02 ` Lukas Wunner
2016-08-18 16:05 ` Keith Busch
2016-08-18 16:59 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160819171145.GB28276@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=axboe@fb.com \
--cc=bhelgaas@google.com \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=wzhang@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).