linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Lukas Wunner <lukas@wunner.de>, Wei Zhang <wzhang@fb.com>
Subject: Re: [PATCHv4 next 0/3] Limiting pci access
Date: Mon, 12 Dec 2016 19:55:47 -0500	[thread overview]
Message-ID: <20161213005547.GA12844@localhost.localdomain> (raw)
In-Reply-To: <20161212234226.GA7973@bhelgaas-glaptop.roam.corp.google.com>

On Mon, Dec 12, 2016 at 05:42:27PM -0600, Bjorn Helgaas wrote:
> On Thu, Dec 08, 2016 at 02:32:53PM -0500, Keith Busch wrote:
> > Depending on the device and the driver, there are hundreds to thousands
> > of non-posted transactions submitted to the device to complete driver
> > unbinding and removal. Since the device is gone, hardware has to handle
> > that as an error condition, which is slower than the a successful
> > non-posted transaction. Since we're doing 1000 of them for no particular
> > reason, it takes a long time. If you hot remove a switch with multiple
> > downstream devices, the serialized removal adds up to many seconds.
> 
> Another thread mentioned 1-2us as a reasonable config access cost, and
> I'm still a little puzzled about how we get to something on the order
> of a million times that cost.
> 
> I know this is all pretty hand-wavey, but 1000 config accesses to shut
> down a device seems unreasonably high.  The entire config space is
> only 4096 bytes, and most devices use a small fraction of that.  If
> we're really doing 1000 accesses, it sounds like we're doing something
> wrong, like polling without a delay or something.

Every time pci_find_ext_capability is called on a removed device, the
kernel will do 481 failed config space accesses trying to find that
capability. The kernel used to do that multiple times to find the AER
capability under conditions common to surprise removal.

But now that we cache the AER position (commit: 66b80809), we've
eliminated by far the worst offender. The counts I'm telling you are
still referencing the original captured traces showing long tear down
times, so it's not up-to-date with the most recent version of the kernel.
 
> I measured the cost of config reads during enumeration using the TSC
> on a 2.8GHz CPU and found the following:
> 
>   1580 cycles, 0.565 usec (device present)
>   1230 cycles, 0.440 usec (empty slot)
>   2130 cycles, 0.761 usec (unimplemented function of multi-function device)
> 
> So 1-2usec does seem the right order of magnitude, and my "empty slot"
> error responses are actually *faster* than the "device present" ones,
> which is plausible to me because the Downstream Port can generate the
> error response immediately without sending a packet down the link.
> The "unimplemented function" responses take longer than the "empty
> slot", which makes sense because the Downstream Port does have to send
> a packet to the device, which then complains because it doesn't
> implement that function.
> 
> Of course, these aren't the same case as yours, where the link used to
> be up but is no longer.  Is there some hardware timeout to see if the
> link will come back?

Yes, the hardware does not respond immediately under this test, which
is considered an error condition. This is a reason why PCIe Device
Capabilities 2 Completion Timeout Ranges are recommended to be in the
10ms range.

  reply	other threads:[~2016-12-13  0:55 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-28 22:58 [PATCHv4 next 0/3] Limiting pci access Keith Busch
2016-10-28 22:58 ` [PATCHv4 next 1/3] pci: Add is_removed state Keith Busch
2016-10-31 10:41   ` Lukas Wunner
2016-12-13 20:56   ` Bjorn Helgaas
2016-12-13 23:07     ` Keith Busch
2016-12-14  2:50       ` Bjorn Helgaas
2016-12-14  2:54         ` Bjorn Helgaas
2016-12-13 23:54     ` Lukas Wunner
2016-10-28 22:58 ` [PATCHv4 next 2/3] pci: No config access for removed devices Keith Busch
2016-10-31 12:18   ` Lukas Wunner
2016-10-28 22:58 ` [PATCHv4 next 3/3] pci/msix: Skip disabling " Keith Busch
2016-10-31 11:00   ` Lukas Wunner
2016-10-31 13:54     ` Keith Busch
2016-12-13 21:18   ` Bjorn Helgaas
2016-12-13 23:01     ` Keith Busch
2016-11-18 23:25 ` [PATCHv4 next 0/3] Limiting pci access Keith Busch
2016-11-23 16:09   ` Bjorn Helgaas
2016-11-28  9:14     ` Wei Zhang
2016-11-28 10:22       ` Lukas Wunner
2016-11-28 18:02     ` Keith Busch
2016-12-08 17:54       ` Bjorn Helgaas
2016-12-08 19:32         ` Keith Busch
2016-12-12 23:42           ` Bjorn Helgaas
2016-12-13  0:55             ` Keith Busch [this message]
2016-12-13 20:50               ` Bjorn Helgaas
2016-12-13 23:18                 ` Keith Busch
     [not found]                   ` <B58D82457FDA0744A320A2FC5AC253B93D82F37D@fmsmsx104.amr.corp.intel.com>
     [not found]                     ` <20170120213550.GA16618@localhost.localdomain>
2017-01-21  7:31                       ` Lukas Wunner
2017-01-21  8:42                         ` Greg Kroah-Hartman
2017-01-21 14:22                           ` Lukas Wunner
2017-01-25 11:47                             ` Greg Kroah-Hartman
2017-01-23 16:04                           ` Keith Busch
2017-01-25  0:44                             ` Austin.Bolen
2017-01-25 21:17                               ` Bjorn Helgaas
2017-01-26  1:12                                 ` Austin.Bolen
2017-02-01 16:04                                   ` Bjorn Helgaas
2017-02-03 20:30                                     ` Austin.Bolen
2017-02-03 20:39                                       ` Greg KH
2017-02-03 21:43                                     ` Austin.Bolen
2017-01-25 11:48                             ` Greg Kroah-Hartman
2017-01-28  7:36                             ` Christoph Hellwig
2018-11-13  6:05                   ` Bjorn Helgaas
2018-11-13 14:59                     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161213005547.GA12844@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=wzhang@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).