From: Lukas Wunner <lukas@wunner.de>
To: Keith Busch <keith.busch@intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
linux-pci@vger.kernel.org, Wei Zhang <wzhang@fb.com>,
Jens Axboe <axboe@fb.com>
Subject: Re: [PATCH 0/3] Limiting pci access requests
Date: Thu, 18 Aug 2016 18:59:21 +0200 [thread overview]
Message-ID: <20160818165921.GA10737@wunner.de> (raw)
In-Reply-To: <20160818160516.GB27538@localhost.localdomain>
On Thu, Aug 18, 2016 at 12:05:16PM -0400, Keith Busch wrote:
> On Thu, Aug 18, 2016 at 04:02:13PM +0200, Lukas Wunner wrote:
> > How is it possible that a device is accessed that no longer exists?
>
> Surprise hot removal.
>
> > Are these (native) pciehp ports and the attached pci_dev isn't torn
> > down quickly enough? Do we need some kind of locking or an atomic flag
> > that prevents accesses to devices until they're torn down completely?
>
> Tearing down a device and unbinding it from a driver generates lots of
> additional accesses. Patch 2/3 removes MSI-x teardown which was one of
> the larger sources of config and MMIO access to a non-existent device.
>
> There are others, too. Heck, even checking if the device is present
> (pci_device_is_present) generates config access to the removed device. :)
>
> What do you think about adding a state to the pci_dev to say that it is
> removed? The state can be set by pciehp or pcie-dpc if either detects
> removal or link down, or on the first ~0 completion. Then have the
> teardown check for the removal state before doing orderly device removal.
Exactly.
Attribute names that come to mind: "removed", "hot_removed",
"surprise_removed", perhaps with an "is_" prefix.
In principle this could be checked at the lowest level when
accessing config space in drivers/pci/access.c, and immediately
return ~0. With the check wrapped in unlikely().
aerdrv is not the only driver that has trouble with surprise removal:
Unplugging the Thunderbolt Ethernet adapter on a Mac while the interface
is up currently causes a lockup in the tg3 driver. Same with nouveau,
which often queries a timer on the GPU and ends up in an infinite loop
if the timer readout returns with -1. If the drivers could sense hot
removal by querying a flag, they could react accordingly in their
->remove hook, so this would be a real improvement.
> > Since your patches pertain to aerdrv, do we need synchronization between
> > the pciehp and aer drivers so that aer doesn't touch a device for which
> > pciehp has sensed removal? (Is the interrupt shared between pciehp and
> > aerdrv?)
>
> pciehp and aerdrv can share an interrupt on root ports, but that's it.
> The aer driver, though, does access every device in its sub-tree.
> There's also pciehp and pcie-dpc that could benifit from coordination.
>
> I can look into these, but it's much less trivial than these incremental
> improvements. I'm hoping we can clean up these biggest offenders first
> before attempting a more risky synchronization among the different
> services.
There's no synchronization necessary if there's just a flag to be
checked. Of course if aerdrv/dpc or other drivers need to react
immediately on hot removal, we'd need a separate ->hot_remove hook.
Best regards,
Lukas
prev parent reply other threads:[~2016-08-18 16:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
2016-08-18 22:38 ` Bjorn Helgaas
2016-08-08 19:14 ` [PATCH 2/3] pci/msix: Skip disabling " Keith Busch
2016-08-18 23:29 ` Bjorn Helgaas
2016-08-19 17:11 ` Keith Busch
2016-08-08 19:14 ` [PATCH 3/3] pcie/aer: Cache capability position Keith Busch
2016-08-09 17:33 ` Bjorn Helgaas
2016-09-06 21:05 ` Jon Derrick
2016-09-06 21:18 ` Keith Busch
2016-08-09 17:36 ` [PATCH 0/3] Limiting pci access requests Bjorn Helgaas
2016-08-09 18:56 ` Keith Busch
2016-08-09 18:56 ` Lukas Wunner
2016-08-17 21:05 ` Keith Busch
2016-08-18 14:02 ` Lukas Wunner
2016-08-18 16:05 ` Keith Busch
2016-08-18 16:59 ` Lukas Wunner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160818165921.GA10737@wunner.de \
--to=lukas@wunner.de \
--cc=axboe@fb.com \
--cc=helgaas@kernel.org \
--cc=keith.busch@intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=wzhang@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.