Re: One Question About PCIe BUS Config Type with pcie_bus_safe or pcie_bus_perf On NVMe Device

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

From: Sinan Kaya <okaya@codeaurora.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Ron Yuan <ron.yuan@memblaze.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Bo Chen <bo.chen@memblaze.com>,
	William Huang <william.huang@memblaze.com>,
	Fengming Wu <fengming.wu@memblaze.com>,
	Jason Jiang <jason.jiang@microsemi.com>,
	Radjendirane Codandaramane <radjendirane.codanda@microsemi.com>,
	Ramyakanth Edupuganti <Ramyakanth.Edupuganti@microsemi.com>,
	William Cheng <william.cheng@microsemi.com>,
	"Kim Helper (khelper)" <khelper@micron.com>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: One Question About PCIe BUS Config Type with pcie_bus_safe or pcie_bus_perf On NVMe Device
Date: Mon, 22 Jan 2018 21:27:13 -0500	[thread overview]
Message-ID: <09bd3f5c-4671-d9dd-fa39-4d7619ee5860@codeaurora.org> (raw)
In-Reply-To: <20180123001634.GD5317@bhelgaas-glaptop.roam.corp.google.com>

On 1/22/2018 7:16 PM, Bjorn Helgaas wrote:
> On Mon, Jan 22, 2018 at 06:24:22PM -0500, Sinan Kaya wrote:
>> On 1/22/2018 5:51 PM, Bjorn Helgaas wrote:
>>> On Mon, Jan 22, 2018 at 05:04:03PM -0500, Sinan Kaya wrote:
>>>> On 1/22/2018 4:36 PM, Bjorn Helgaas wrote:
>>
>>>>> Reducing MPS may be necessary if there are several devices in the
>>>>> hierarchy and one requires a smaller MPS than the others.  That
>>>>> obviously reduces the maximum read and write performance.
>>>>>
>>>>> Reducing the MRRS may be useful to prevent one device from hogging
>>>>> a link, but of course, it reduces read performance for that device
>>>>> because we need more read requests.
>>>>
>>>> Maybe, a picture could help.
>>>>
>>>>                root (MPS=256)
>>>>                  |
>>>>          ------------------
>>>>         /                  \
>>>>    bridge0 (MPS=256)      bridge1 (MPS=128)
>>>>       /                       \
>>>>     EP0 (MPS=256)            EP1 (MPS=128)
>>>>
>>>> If I understood this right, code allows the configuration above with
>>>> the performance mode so that MPS doesn't have to be uniform across
>>>> the tree. 
>>>
>>> Yes.  In PERFORMANCE mode, we will set EP1's MRRS=128 and
>>> EP0's MRRS=256, just as you show.
>>>
>>>> It just needs to be consistent between the root port and endpoints.
>>>
>>> No, it doesn't need to be consistent.  In PERFORMANCE mode, we'll set
>>> the root's MPS=256 and EP1's MPS=128.
>>>
>>> (I'm not actually 100% convinced that the PERFORMANCE mode approach of
>>> reducing MRRS is safe, necessary, and maintainable.  I suspect that in
>>> many of the interesting cases, the device we care about is the only
>>> one below a Root Port, and we can get the performance we need by
>>> maximizing MPS and MRRS for that Root Port and its children,
>>> independent of the rest of the system.)
>>
>> Maybe, I started seeing more and more NVMe devices behind a switch every
>> day. That's why, I'm concerned.
> 
> Are there a mix of devices that support large MPS and those that only
> support a smaller MPS behind the same switch?

I guess we should understand the environment from Ron Yuan. Can we see
lspci -vvv output of your system?

We could maybe detect coexistence of slow and fast device condition and
put some suggestions like moving the card to another slot as a best effort
solution.

> 
> I don't have any actual data about topologies of interest.  I'm just
> guessing that high-performance devices will often have their own root
> port, without being mixed in with low-speed devices.  It's OK to have
> switches in the path; it's the mixing high-speed with low-speed
> devices that causes the problems.

Agreed. Mine was just a general rant. I was curious if we could make it better.

> 
>>>> Why are we reducing MRRS in this case?
>>>
>>> We have to set EP1's MRRS=128 so it will never receive a completion
>>> larger than 128 bytes.  If we set EP1's MRRS=256, it could receive
>>> 256-byte TLPs, which it would treat as malformed.  (We also assume no
>>> peer-to-peer DMA that targets EP1.)
>>
>> What if we were to keep root port MPS as 128? and not touch the BIOS
>> configured MRRS (4k) ?
>>
>> Everybody should be happy, right?
> 
> No.  In your picture (which helps a lot, thank you!), if you set the
> root's MPS=128, EP1 will be fine no matter what its MRRS is because
> the entire path has MPS=128.
> 
> But if EP0's MRRS is larger than 128, it's in trouble because it may
> issue a 256-byte DMA write, which the root port will treat as
> malformed.  The root port, as the receiver of that Memory Write
> Request, is required to check the TLP against the MPS in its Device
> Control (not Device Capability) register (this is in PCIe r4.0, sec
> 2.2.2).

Yeah, this breaks my theory.

> 
>> I know there is a rule to check the completions against MPS. 
> 
> Sec 2.2.2 says a receiver must check "data payloads".  I think that
> includes both Completions and Memory Writes.
> 
>> Root port could generate transactions that is a multiple of 128
>> bytes for reads.
> 
> If the root port generates a 256-byte Memory Read request to EP1,
> that's fine because EP1 will only respond with 128-byte completions.
> If it sends that 256-byte Memory Read to EP0, we have a problem
> because EP0 may generate a 256-byte completion, which will cause an
> error if the root port has MPS=128.
> 
>> Is there any rule against checking incoming writes?
> 
> Sec 2.2.2 says a receiver *must* check the payload size of incoming
> Memory Write requests.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

next prev parent reply	other threads:[~2018-01-23  2:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <SH2PR01MB106A1E21DEB5FE3FFB3D61C83E90@SH2PR01MB106.CHNPR01.prod.partner.outlook.cn>
     [not found] ` <ef16a3cc-b641-a30d-644a-7940e340eb3e@codeaurora.org>
     [not found]   ` <SHAPR01MB173A5EA1677C2138CB528F2FEE90@SHAPR01MB173.CHNPR01.prod.partner.outlook.cn>
     [not found]     ` <5727b0b1-f0d5-7c78-373e-fc9d1bd662d2@codeaurora.org>
     [not found]       ` <CABhMZUU0643U-qVc9ymA+1PMZToSLFm2dg8-cu=iQ2LGw2Pi8Q@mail.gmail.com>
     [not found]         ` <SHAPR01MB173A36104635A8BFF9A83E1FEE80@SHAPR01MB173.CHNPR01.prod.partner.outlook.cn>
2018-01-18 16:24           ` One Question About PCIe BUS Config Type with pcie_bus_safe or pcie_bus_perf On NVMe Device Sinan Kaya
2018-01-19 20:51             ` Bjorn Helgaas
2018-01-20 19:20               ` Sinan Kaya
2018-01-20 19:29                 ` Sinan Kaya
2018-01-22 21:36                 ` Bjorn Helgaas
2018-01-22 22:04                   ` Sinan Kaya
2018-01-22 22:51                     ` Bjorn Helgaas
2018-01-22 23:24                       ` Sinan Kaya
2018-01-23  0:16                         ` Bjorn Helgaas
2018-01-23  2:27                           ` Sinan Kaya [this message]
2018-01-23 13:25                             ` Ron Yuan
2018-01-23 14:01                               ` Ron Yuan
2018-01-23 17:48                                 ` Bjorn Helgaas
2018-01-23 18:07                                   ` Bjorn Helgaas
2018-01-23 14:38                               ` Bjorn Helgaas
2018-01-23 23:50                                 ` Radjendirane Codandaramane
2018-01-24 16:29                                   ` Myron Stowe
2018-01-24 17:59                                     ` Ron Yuan
2018-01-24 18:01                                   ` Bjorn Helgaas
2018-01-31  8:40                                     ` Ron Yuan
2018-02-01  0:01                                       ` Myron Stowe
2018-02-01  0:13                                         ` Sinan Kaya
2018-02-01  3:37                                           ` Bjorn Helgaas
2018-02-01 15:14                                             ` Sinan Kaya
2018-02-05  1:02                                               ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09bd3f5c-4671-d9dd-fa39-4d7619ee5860@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=Ramyakanth.Edupuganti@microsemi.com \
    --cc=bhelgaas@google.com \
    --cc=bo.chen@memblaze.com \
    --cc=fengming.wu@memblaze.com \
    --cc=helgaas@kernel.org \
    --cc=jason.jiang@microsemi.com \
    --cc=khelper@micron.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=radjendirane.codanda@microsemi.com \
    --cc=ron.yuan@memblaze.com \
    --cc=william.cheng@microsemi.com \
    --cc=william.huang@memblaze.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox