From: Lukas Wunner <lukas@wunner.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Keith Busch <keith.busch@intel.com>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
Wei Zhang <wzhang@fb.com>, Austin Bolen <Austin.Bolen@dell.com>
Subject: Re: [PATCHv4 next 0/3] Limiting pci access
Date: Sat, 21 Jan 2017 08:31:40 +0100 [thread overview]
Message-ID: <20170121073140.GA26396@wunner.de> (raw)
In-Reply-To: <20170120213550.GA16618@localhost.localdomain>
On Fri, Jan 20, 2017 at 04:35:50PM -0500, Keith Busch wrote:
> On Tue, Dec 13, 2016 at 04:19:32PM -0500, Keith Busch wrote:
> > On Tue, Dec 13, 2016 at 02:50:12PM -0600, Bjorn Helgaas wrote:
> > > And we're apparently still doing a lot of these accesses? I'm still
> > > curious about exactly what these are, because it may be that we're
> > > doing more than necessary.
> >
> > It's the MSI-x masking that's our next highest contributor. Masking
> > vectors still requires non-posted commands, and since they're not going
> > through managed API accessors like config space uses, the removed flag
> > is needed for checking before doing significant MMIO.
>
> Hi Bjorn,
>
> Just wanted to do another check with you on this. We'd still like to fence
> off all config access with appropriate error codes, and short cut the most
> significant MMIO access to improve surprise removal. There may still be
> other offenders, but these are the most important ones we've identified.
>
> I've updated the series to make the new flag an atomic accessor as
> requested, and improved the change logs with more information compelling
> the change. Otherwise it's much the same as before. I know you weren't
> keen on capturing all the access under the umbrella of improving device
> unbinding time, but the general concensus among device makers is that
> it's a good thing to have software return an error early rather than
> send a command we know will fail. Any other thoughts I should consider
> before posting v5?
I think Bjorn was pondering whether a flag to indicate surprise removal
should be put in struct device rather than struct pci_dev, so as to
cover other buses capable of surprise removal. There's already an
"offline" flag in struct device which is set when user space initiates
a safe hot removal via sysfs.
Bjorn cc'ed his e-mails of Dec 13 to Greg KH and Alan Stern but got no
replies.
@Greg KH:
Would you entertain a patch adding a bit to struct device which indicates
the device was surprise removed? The PCIe Hotplug and PCIe Downstream
Port Containment drivers are both able to recognize surprise removal and
can set the bit.
When removing the device we currently perform numerous accesses to config
space in the PCI core. Additionally the driver for the removed device
(e.g. network driver, GPU driver) performs mmio accesses to orderly shut
down the device. E.g. when unplugging the Apple Thunderbolt Ethernet
adapter the kernel currently locks up as the tg3 driver tries to shutdown
the removed device. If we had a bit to indicate surprise removal we could
handle this properly in the PCI core and device driver ->remove hooks.
For comparison, this is what macOS recommends to driver developers:
"PCI device drivers are typically developed with the expectation
that the device will not be removed from the PCI bus during its
operation. However, Thunderbolt technology allows PCI data to be
tunneled through a Thunderbolt connection, and the Thunderbolt
cables may be unplugged from the host or device at any time.
Therefore, the system must be able to cope with the removal of
PCI devices by the user at any time.
The PCI device drivers used with Thunderbolt devices may need to
be updated in order to handle surprise or unplanned removal.
In particular, MMIO cycles and PCI Configuration accesses require
special attention. [...] As a basic guideline, developers should
modify their drivers to handle a return value of 0xFFFFFFFF.
If any thread, callback, interrupt filter, or code path in a
driver receives 0xFFFFFFFF indicating the device has been
unplugged, then all threads, callbacks, interrupt filters,
interrupt handlers, and other code paths in that driver must
cease MMIO reads and writes immediately and prepare for
termination. [...]
Once it has been determined that a device is no longer connected,
do not try to clean up or reset the hardware as attempts to
communicate with the hardware may lead to further delays. [...]
A typical way for a developer to solve this problem is to provide
a single bottleneck routine for all MMIO reads and have that
routine check the status of the device before beginning the actual
transaction."
Source: https://developer.apple.com/library/content/documentation/HardwareDrivers/Conceptual/ThunderboltDevGuide/Basics02/Basics02.html
We lack a comparable functionality and the question is whether to
support it only in the PCI core or in a more general fashion in the
driver core. Other buses (such as USB) have to support surprise
removal as well.
Thanks,
Lukas
next prev parent reply other threads:[~2017-01-21 7:30 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-28 22:58 [PATCHv4 next 0/3] Limiting pci access Keith Busch
2016-10-28 22:58 ` [PATCHv4 next 1/3] pci: Add is_removed state Keith Busch
2016-10-31 10:41 ` Lukas Wunner
2016-12-13 20:56 ` Bjorn Helgaas
2016-12-13 23:07 ` Keith Busch
2016-12-14 2:50 ` Bjorn Helgaas
2016-12-14 2:54 ` Bjorn Helgaas
2016-12-13 23:54 ` Lukas Wunner
2016-10-28 22:58 ` [PATCHv4 next 2/3] pci: No config access for removed devices Keith Busch
2016-10-31 12:18 ` Lukas Wunner
2016-10-28 22:58 ` [PATCHv4 next 3/3] pci/msix: Skip disabling " Keith Busch
2016-10-31 11:00 ` Lukas Wunner
2016-10-31 13:54 ` Keith Busch
2016-12-13 21:18 ` Bjorn Helgaas
2016-12-13 23:01 ` Keith Busch
2016-11-18 23:25 ` [PATCHv4 next 0/3] Limiting pci access Keith Busch
2016-11-23 16:09 ` Bjorn Helgaas
2016-11-28 9:14 ` Wei Zhang
2016-11-28 10:22 ` Lukas Wunner
2016-11-28 18:02 ` Keith Busch
2016-12-08 17:54 ` Bjorn Helgaas
2016-12-08 19:32 ` Keith Busch
2016-12-12 23:42 ` Bjorn Helgaas
2016-12-13 0:55 ` Keith Busch
2016-12-13 20:50 ` Bjorn Helgaas
2016-12-13 23:18 ` Keith Busch
[not found] ` <B58D82457FDA0744A320A2FC5AC253B93D82F37D@fmsmsx104.amr.corp.intel.com>
[not found] ` <20170120213550.GA16618@localhost.localdomain>
2017-01-21 7:31 ` Lukas Wunner [this message]
2017-01-21 8:42 ` Greg Kroah-Hartman
2017-01-21 14:22 ` Lukas Wunner
2017-01-25 11:47 ` Greg Kroah-Hartman
2017-01-23 16:04 ` Keith Busch
2017-01-25 0:44 ` Austin.Bolen
2017-01-25 21:17 ` Bjorn Helgaas
2017-01-26 1:12 ` Austin.Bolen
2017-02-01 16:04 ` Bjorn Helgaas
2017-02-03 20:30 ` Austin.Bolen
2017-02-03 20:39 ` Greg KH
2017-02-03 21:43 ` Austin.Bolen
2017-01-25 11:48 ` Greg Kroah-Hartman
2017-01-28 7:36 ` Christoph Hellwig
2018-11-13 6:05 ` Bjorn Helgaas
2018-11-13 14:59 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170121073140.GA26396@wunner.de \
--to=lukas@wunner.de \
--cc=Austin.Bolen@dell.com \
--cc=bhelgaas@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=keith.busch@intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=wzhang@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).