public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: Jon Mason <mason@myri.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@suse.de>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH 2/3] pci: Clamp pcie_set_readrq() when using "performance" settings
Date: Tue, 04 Oct 2011 17:52:02 +0200	[thread overview]
Message-ID: <1317743522.29415.225.camel@pasglop> (raw)
In-Reply-To: <20111004144215.GE19130@kvack.org>

On Tue, 2011-10-04 at 10:42 -0400, Benjamin LaHaise wrote:
> On Mon, Oct 03, 2011 at 04:55:48PM -0500, Jon Mason wrote:
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > 
> > When configuring the PCIe settings for "performance", we allow parents
> > to have a larger Max Payload Size than children and rely on children
> > Max Read Request Size to not be larger than their own MPS to avoid
> > having the host bridge generate responses they can't cope with.
> 
> I'm pretty sure that simply will not work, and is an incorrect understanding 
> of how PCIe bridges and devices interact with regards to transaction size 
> limits. 

Hi Ben !

I beg to disagree :) See below.

>  Here's why: I am actually implementing a PCIe nic on an FPGA at 
> present, and have just been in the process of tuning how memory read 
> requests are issued and processed.  It is perfectly valid for a PCIe 
> endpoint to issue a read request for an entire 4KB block (assuming it 
> respects the no 4KB boundary crossings rule), even when the MPS setting 
> is only 64 or 128 bytes.

But not if the Max Read Request Size of the endpoint is clamped which
afaik is the whole point of the exercise.

>   However, the root complex or PCIe bridge *must 
> not* exceed the Maximum Payload Size for any completions with data or 
> posted writes.  Multiple completions are okay and expected for read 
> requests.  If the MPS on the bridge is set to a larger value than 
> what all of the endpoints connected to it, the bridge or root complex will 
> happily send read completions exceeding the endpoint's MPS.  This can and 
> will lead to failure on the parts of endpoints.

Hence the clamping of MRRS which is done by Jon's patch, the patch
referenced here by me additionally prevents drivers who blindly try to
set it back to 4096 to also be appropriately limited.

Note that in practice (though I haven't put that logic in Linux bare
metal yet), pHyp has an additional refinement which is to "know" what
the real max read response of the host bridge is and only clamp the MRRS
if the MPS of the device is lower than that. In practice, that means
that we don't clamp on most high speed adapters as our bridges never
reply with more than 512 bytes in a TLP, but this will require passing
some platforms specific information down which we don't have at hand
just yet.

This is really the only way to avoid bogging everybody down to 128 bytes
if you have one hotplug leg on a switch or one slow device. For example
on some of our machines, if we don't apply that technique, the PCI-X ->
USB leg of the main switch will cause everything to go down to 128
bytes, including the on-board SAS controllers. (The chipset has 6 host
bridges or so but all the on-board stuff is behind a switch on one of
them).

Cheers,
Ben.



  parent reply	other threads:[~2011-10-04 15:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-03 14:50 [PATCH 2/3] pci: Clamp pcie_set_readrq() when using "performance" settings Jon Mason
2011-10-03 20:56 ` Linus Torvalds
2011-10-04 15:40   ` Benjamin Herrenschmidt
2011-10-04 15:48     ` Linus Torvalds
2011-10-04 15:56       ` Bjorn Helgaas
2011-10-04 16:08       ` Benjamin Herrenschmidt
2011-10-04 16:51         ` Linus Torvalds
2011-10-04 17:30           ` Benjamin Herrenschmidt
2011-10-04 17:36             ` Linus Torvalds
2011-10-05  7:01               ` Benjamin Herrenschmidt
2011-10-05 14:49                 ` Linus Torvalds
2011-10-05 16:26                   ` Jesse Barnes
2011-10-04 17:41             ` Benjamin LaHaise
2011-10-03 21:55 ` Jon Mason
2011-10-04 14:42   ` Benjamin LaHaise
2011-10-04 15:37     ` Benjamin LaHaise
2011-10-04 15:52     ` Benjamin Herrenschmidt [this message]
2011-10-04 15:59       ` Benjamin LaHaise
2011-10-04 16:19         ` Benjamin Herrenschmidt
2011-10-04 16:44           ` Benjamin LaHaise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1317743522.29415.225.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=bcrl@kvack.org \
    --cc=bhelgaas@google.com \
    --cc=gregkh@suse.de \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mason@myri.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox