Re: [PATCHv4 next 0/3] Limiting pci access

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Keith Busch <keith.busch@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Lukas Wunner <lukas@wunner.de>, Wei Zhang <wzhang@fb.com>
Subject: Re: [PATCHv4 next 0/3] Limiting pci access
Date: Tue, 13 Dec 2016 18:18:40 -0500	[thread overview]
Message-ID: <20161213231840.GD12113@localhost.localdomain> (raw)
In-Reply-To: <20161213205012.GA29950@bhelgaas-glaptop.roam.corp.google.com>

On Tue, Dec 13, 2016 at 02:50:12PM -0600, Bjorn Helgaas wrote:
> On Mon, Dec 12, 2016 at 07:55:47PM -0500, Keith Busch wrote:
> > On Mon, Dec 12, 2016 at 05:42:27PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Dec 08, 2016 at 02:32:53PM -0500, Keith Busch wrote:
> > > > Depending on the device and the driver, there are hundreds to thousands
> > > > of non-posted transactions submitted to the device to complete driver
> > > > unbinding and removal. Since the device is gone, hardware has to handle
> > > > that as an error condition, which is slower than the a successful
> > > > non-posted transaction. Since we're doing 1000 of them for no particular
> > > > reason, it takes a long time. If you hot remove a switch with multiple
> > > > downstream devices, the serialized removal adds up to many seconds.
> > > 
> > > Another thread mentioned 1-2us as a reasonable config access cost, and
> > > I'm still a little puzzled about how we get to something on the order
> > > of a million times that cost.
> > > 
> > > I know this is all pretty hand-wavey, but 1000 config accesses to shut
> > > down a device seems unreasonably high.  The entire config space is
> > > only 4096 bytes, and most devices use a small fraction of that.  If
> > > we're really doing 1000 accesses, it sounds like we're doing something
> > > wrong, like polling without a delay or something.
> > 
> > Every time pci_find_ext_capability is called on a removed device, the
> > kernel will do 481 failed config space accesses trying to find that
> > capability. The kernel used to do that multiple times to find the AER
> > capability under conditions common to surprise removal.
> 
> Right, that's a perfect example.  I'd rather fix issues like this by
> caching the position as you did with AER.  The "removed" bit makes
> these issues "go away" without addressing the underlying problem.
> 
> We might still need a "removed" bit for other reasons, but I want to
> be clear about those reasons, not just throw it in under the general
> "make it go fast" umbrella.
> 
> > But now that we cache the AER position (commit: 66b80809), we've
> > eliminated by far the worst offender. The counts I'm telling you are
> > still referencing the original captured traces showing long tear down
> > times, so it's not up-to-date with the most recent version of the kernel.
> >  
> > > I measured the cost of config reads during enumeration using the TSC
> > > on a 2.8GHz CPU and found the following:
> > > 
> > >   1580 cycles, 0.565 usec (device present)
> > >   1230 cycles, 0.440 usec (empty slot)
> > >   2130 cycles, 0.761 usec (unimplemented function of multi-function device)
> > > 
> > > So 1-2usec does seem the right order of magnitude, and my "empty slot"
> > > error responses are actually *faster* than the "device present" ones,
> > > which is plausible to me because the Downstream Port can generate the
> > > error response immediately without sending a packet down the link.
> > > The "unimplemented function" responses take longer than the "empty
> > > slot", which makes sense because the Downstream Port does have to send
> > > a packet to the device, which then complains because it doesn't
> > > implement that function.
> > > 
> > > Of course, these aren't the same case as yours, where the link used to
> > > be up but is no longer.  Is there some hardware timeout to see if the
> > > link will come back?
> > 
> > Yes, the hardware does not respond immediately under this test, which
> > is considered an error condition. This is a reason why PCIe Device
> > Capabilities 2 Completion Timeout Ranges are recommended to be in the
> > 10ms range.
> 
> And we're apparently still doing a lot of these accesses?  I'm still
> curious about exactly what these are, because it may be that we're
> doing more than necessary.

It's the MSI-x masking that's our next highest contributor. Masking
vectors still requires non-posted commands, and since they're not going
through managed API accessors like config space uses, the removed flag
is needed for checking before doing significant MMIO.

next prev parent reply	other threads:[~2016-12-13 23:18 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-28 22:58 [PATCHv4 next 0/3] Limiting pci access Keith Busch
2016-10-28 22:58 ` [PATCHv4 next 1/3] pci: Add is_removed state Keith Busch
2016-10-31 10:41   ` Lukas Wunner
2016-12-13 20:56   ` Bjorn Helgaas
2016-12-13 23:07     ` Keith Busch
2016-12-14  2:50       ` Bjorn Helgaas
2016-12-14  2:54         ` Bjorn Helgaas
2016-12-13 23:54     ` Lukas Wunner
2016-10-28 22:58 ` [PATCHv4 next 2/3] pci: No config access for removed devices Keith Busch
2016-10-31 12:18   ` Lukas Wunner
2016-10-28 22:58 ` [PATCHv4 next 3/3] pci/msix: Skip disabling " Keith Busch
2016-10-31 11:00   ` Lukas Wunner
2016-10-31 13:54     ` Keith Busch
2016-12-13 21:18   ` Bjorn Helgaas
2016-12-13 23:01     ` Keith Busch
2016-11-18 23:25 ` [PATCHv4 next 0/3] Limiting pci access Keith Busch
2016-11-23 16:09   ` Bjorn Helgaas
2016-11-28  9:14     ` Wei Zhang
2016-11-28 10:22       ` Lukas Wunner
2016-11-28 18:02     ` Keith Busch
2016-12-08 17:54       ` Bjorn Helgaas
2016-12-08 19:32         ` Keith Busch
2016-12-12 23:42           ` Bjorn Helgaas
2016-12-13  0:55             ` Keith Busch
2016-12-13 20:50               ` Bjorn Helgaas
2016-12-13 23:18                 ` Keith Busch [this message]
     [not found]                   ` <B58D82457FDA0744A320A2FC5AC253B93D82F37D@fmsmsx104.amr.corp.intel.com>
     [not found]                     ` <20170120213550.GA16618@localhost.localdomain>
2017-01-21  7:31                       ` Lukas Wunner
2017-01-21  8:42                         ` Greg Kroah-Hartman
2017-01-21 14:22                           ` Lukas Wunner
2017-01-25 11:47                             ` Greg Kroah-Hartman
2017-01-23 16:04                           ` Keith Busch
2017-01-25  0:44                             ` Austin.Bolen
2017-01-25 21:17                               ` Bjorn Helgaas
2017-01-26  1:12                                 ` Austin.Bolen
2017-02-01 16:04                                   ` Bjorn Helgaas
2017-02-03 20:30                                     ` Austin.Bolen
2017-02-03 20:39                                       ` Greg KH
2017-02-03 21:43                                     ` Austin.Bolen
2017-01-25 11:48                             ` Greg Kroah-Hartman
2017-01-28  7:36                             ` Christoph Hellwig
2018-11-13  6:05                   ` Bjorn Helgaas
2018-11-13 14:59                     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161213231840.GD12113@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=wzhang@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).