about mpss with pcie_bus

All of lore.kernel.org
 help / color / mirror / Atom feed

* about mpss with pcie_bus_perf
@ 2014-01-10  0:46 Yinghai Lu
  2014-01-14 22:54 ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Yinghai Lu @ 2014-01-10  0:46 UTC (permalink / raw)
  To: Myron Stowe, Bjorn Helgaas, linux-pci@vger.kernel.org

looks like we have some problem with MPSS.

+-02.2-[10-1f]----00.0-[11-13]--+-02.0-[12]--+-00.0
                           |                      |                   \-00.1
                           |                      \-03.0-[13]----00.0

kernel boot with pce_bus_perf:
00:02.2: cap/ctl: 256/256
10:00.0: cap/ctl: 256/256
11:02.0: cap/ctl: 256/256
12:00.0: cap/ctl: 128/128
12:00.1: cap/ctl: 128/128

11:03.0: cap/ctl: 256/256
13:00.0: cap/ctl: 256/256

Should we set MPSS to 128?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about mpss with pcie_bus_perf
  2014-01-10  0:46 about mpss with pcie_bus_perf Yinghai Lu
@ 2014-01-14 22:54 ` Bjorn Helgaas
  2014-01-15  0:34   ` Jon Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2014-01-14 22:54 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Myron Stowe, linux-pci@vger.kernel.org, Yijing Wang, Jon Mason

[+cc Jon, Yijing]

On Thu, Jan 9, 2014 at 5:46 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> looks like we have some problem with MPSS.
>
> +-02.2-[10-1f]----00.0-[11-13]--+-02.0-[12]--+-00.0
>                            |                      |                   \-00.1
>                            |                      \-03.0-[13]----00.0
>
> kernel boot with pce_bus_perf:
> 00:02.2: cap/ctl: 256/256
> 10:00.0: cap/ctl: 256/256
> 11:02.0: cap/ctl: 256/256
> 12:00.0: cap/ctl: 128/128
> 12:00.1: cap/ctl: 128/128
>
> 11:03.0: cap/ctl: 256/256
> 13:00.0: cap/ctl: 256/256
>
> Should we set MPSS to 128?

Please propose a patch and/or open a bug report.  I don't do enough
with MPS to make the problem and its solution immediately obvious to
me.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about mpss with pcie_bus_perf
  2014-01-14 22:54 ` Bjorn Helgaas
@ 2014-01-15  0:34   ` Jon Mason
  2014-01-15  2:12     ` Yijing Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Mason @ 2014-01-15  0:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Myron Stowe, linux-pci@vger.kernel.org, Yijing Wang,
	Jon Mason

On Tue, Jan 14, 2014 at 3:54 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Jon, Yijing]
>
> On Thu, Jan 9, 2014 at 5:46 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> looks like we have some problem with MPSS.
>>
>> +-02.2-[10-1f]----00.0-[11-13]--+-02.0-[12]--+-00.0
>>                            |                      |                   \-00.1
>>                            |                      \-03.0-[13]----00.0
>>
>> kernel boot with pce_bus_perf:
>> 00:02.2: cap/ctl: 256/256
>> 10:00.0: cap/ctl: 256/256
>> 11:02.0: cap/ctl: 256/256
>> 12:00.0: cap/ctl: 128/128
>> 12:00.1: cap/ctl: 128/128
>>
>> 11:03.0: cap/ctl: 256/256
>> 13:00.0: cap/ctl: 256/256
>>
>> Should we set MPSS to 128?
>
> Please propose a patch and/or open a bug report.  I don't do enough
> with MPS to make the problem and its solution immediately obvious to
> me.

Not a lot of verbiage in here, but I believe this is the expected
behavior for the "pcie_bus_perf" kernel boot parm.  With it, each pci
device sets its MPS to the max of the parent

>From the commit log:

    - A more optimal way is possible, if it falls within a couple of
      constraints:
    * The top-level host bridge will never generate packets larger than the
      smallest TLP (or if it can be controlled independently from its MPS at
      least)
    * The device will never generate packets larger than MPS (which can be
      configured via MRRS)
    * No support of direct PCI-E <-> PCI-E transfers between devices without
      some additional code to specifically deal with that case

    Then we can use an approach that basically ignores downstream requests
    and focuses exclusively on upstream requests. In that case, all we need
    to care about is that a device MPS is no larger than its parent MPS,
    which allows us to keep all switches/bridges to the max MPS supported by
    their parent and eventually the PHB.


If this is not behaving as described (which I can't tell from the log
above), then feel free to assign the bug to me.

Thanks,
Jon



>
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about mpss with pcie_bus_perf
  2014-01-15  0:34   ` Jon Mason
@ 2014-01-15  2:12     ` Yijing Wang
  2014-01-15 18:18       ` Jon Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Yijing Wang @ 2014-01-15  2:12 UTC (permalink / raw)
  To: Jon Mason, Bjorn Helgaas
  Cc: Yinghai Lu, Myron Stowe, linux-pci@vger.kernel.org, Jon Mason

On 2014/1/15 8:34, Jon Mason wrote:
> On Tue, Jan 14, 2014 at 3:54 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> [+cc Jon, Yijing]
>>
>> On Thu, Jan 9, 2014 at 5:46 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> looks like we have some problem with MPSS.
>>>
>>> +-02.2-[10-1f]----00.0-[11-13]--+-02.0-[12]--+-00.0
>>>                            |    |            \-00.1
>>>                            |    \-03.0-[13]----00.0
>>>
>>> kernel boot with pce_bus_perf:
>>> 00:02.2: cap/ctl: 256/256
>>> 10:00.0: cap/ctl: 256/256
>>> 11:02.0: cap/ctl: 256/256
>>> 12:00.0: cap/ctl: 128/128
>>> 12:00.1: cap/ctl: 128/128
>>>
>>> 11:03.0: cap/ctl: 256/256
>>> 13:00.0: cap/ctl: 256/256
>>>
>>> Should we set MPSS to 128?
>>
>> Please propose a patch and/or open a bug report.  I don't do enough
>> with MPS to make the problem and its solution immediately obvious to
>> me.
> 
> Not a lot of verbiage in here, but I believe this is the expected
> behavior for the "pcie_bus_perf" kernel boot parm.  With it, each pci
> device sets its MPS to the max of the parent

Yes, it's the expected behavior for the "pcie_bus_per".
Pcie_write_mrrs() will additionally set mrrs to largest supported value for safe.

> 
>>From the commit log:
> 
>     - A more optimal way is possible, if it falls within a couple of
>       constraints:
>     * The top-level host bridge will never generate packets larger than the
>       smallest TLP (or if it can be controlled independently from its MPS at
>       least)
>     * The device will never generate packets larger than MPS (which can be
>       configured via MRRS)
>     * No support of direct PCI-E <-> PCI-E transfers between devices without
>       some additional code to specifically deal with that case
> 
>     Then we can use an approach that basically ignores downstream requests
>     and focuses exclusively on upstream requests. In that case, all we need

Hi Jon, I do not quite understand why we can ignores downstream here , as a model like Yinghai's pcie topo:

mps 256           mps 256               mps 256
root port ------Switch port(UP) -------Switch port(DP) A   --------PCIe Endpoint Device ( mps = 128)
                               |                          <-------Read Request to upstream is safe because MRRS is set to properly value.
			       |	                  <-------TLP payload won't excess (mps=128) as a transmitter, so this is also safe.
 			       |			  -------->Downstream TLP like read completion and some other TLP write to PCIe EP device
                               |				   My question is here, how can we ensure Downstream is safe?
 			       |
  			       |
                               |-------Switch port(DP) B

Sorry to disturb you, I would be appreciate if you can me any advice. Thanks!

>     to care about is that a device MPS is no larger than its parent MPS,
>     which allows us to keep all switches/bridges to the max MPS supported by
>     their parent and eventually the PHB.
> 
> 
> If this is not behaving as described (which I can't tell from the log
> above), then feel free to assign the bug to me.
> 
> Thanks,
> Jon
> 
> 
> 
>>
>> Bjorn
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about mpss with pcie_bus_perf
  2014-01-15  2:12     ` Yijing Wang
@ 2014-01-15 18:18       ` Jon Mason
  2014-01-16  1:56         ` Yijing Wang
  2014-01-16  4:27         ` Yinghai Lu
  0 siblings, 2 replies; 7+ messages in thread
From: Jon Mason @ 2014-01-15 18:18 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, Yinghai Lu, Myron Stowe, linux-pci@vger.kernel.org,
	Jon Mason

On Tue, Jan 14, 2014 at 7:12 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> On 2014/1/15 8:34, Jon Mason wrote:
>> On Tue, Jan 14, 2014 at 3:54 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> [+cc Jon, Yijing]
>>>
>>> On Thu, Jan 9, 2014 at 5:46 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>> looks like we have some problem with MPSS.
>>>>
>>>> +-02.2-[10-1f]----00.0-[11-13]--+-02.0-[12]--+-00.0
>>>>                            |    |            \-00.1
>>>>                            |    \-03.0-[13]----00.0
>>>>
>>>> kernel boot with pce_bus_perf:
>>>> 00:02.2: cap/ctl: 256/256
>>>> 10:00.0: cap/ctl: 256/256
>>>> 11:02.0: cap/ctl: 256/256
>>>> 12:00.0: cap/ctl: 128/128
>>>> 12:00.1: cap/ctl: 128/128
>>>>
>>>> 11:03.0: cap/ctl: 256/256
>>>> 13:00.0: cap/ctl: 256/256
>>>>
>>>> Should we set MPSS to 128?
>>>
>>> Please propose a patch and/or open a bug report.  I don't do enough
>>> with MPS to make the problem and its solution immediately obvious to
>>> me.
>>
>> Not a lot of verbiage in here, but I believe this is the expected
>> behavior for the "pcie_bus_perf" kernel boot parm.  With it, each pci
>> device sets its MPS to the max of the parent
>
> Yes, it's the expected behavior for the "pcie_bus_per".
> Pcie_write_mrrs() will additionally set mrrs to largest supported value for safe.
>
>>
>>>From the commit log:
>>
>>     - A more optimal way is possible, if it falls within a couple of
>>       constraints:
>>     * The top-level host bridge will never generate packets larger than the
>>       smallest TLP (or if it can be controlled independently from its MPS at
>>       least)
>>     * The device will never generate packets larger than MPS (which can be
>>       configured via MRRS)
>>     * No support of direct PCI-E <-> PCI-E transfers between devices without
>>       some additional code to specifically deal with that case
>>
>>     Then we can use an approach that basically ignores downstream requests
>>     and focuses exclusively on upstream requests. In that case, all we need
>
> Hi Jon, I do not quite understand why we can ignores downstream here , as a model like Yinghai's pcie topo:
>
> mps 256           mps 256               mps 256
> root port ------Switch port(UP) -------Switch port(DP) A   --------PCIe Endpoint Device ( mps = 128)
>                                |                          <-------Read Request to upstream is safe because MRRS is set to properly value.
>                                |                          <-------TLP payload won't excess (mps=128) as a transmitter, so this is also safe.
>                                |                          -------->Downstream TLP like read completion and some other TLP write to PCIe EP device
>                                |                                   My question is here, how can we ensure Downstream is safe?
>                                |
>                                |
>                                |-------Switch port(DP) B
>
> Sorry to disturb you, I would be appreciate if you can me any advice. Thanks!

If all inter-device communication is removed, then the only
communication is CPU, Endpoint, and switches in-between.  Going from
CPU to Endpoint, the MPS is actually going to be the Cache Line size.
Since the Cache line size is 64B on x86 and most other architectures,
there is no worry that the endpoint will get a PCIE packet larger than
the MPS.  Also, using the MRRS to clamp down the endpoint to the MPS
of the switches should ensure no reads larger than the MPS.  Going
from Endpoint to CPU, we must ensure that all switches have a MPSS
large enough for any device under them.  If not, then we must clamp
down the Endpoint MPS.

If all of this works, then we can ensure a much larger MPS for all of
the PCI devices under a switch and not be bound by the smallest MPSS
of an endpoint on the switch.

Thanks,
Jon

>
>>     to care about is that a device MPS is no larger than its parent MPS,
>>     which allows us to keep all switches/bridges to the max MPS supported by
>>     their parent and eventually the PHB.
>>
>>
>> If this is not behaving as described (which I can't tell from the log
>> above), then feel free to assign the bug to me.
>>
>> Thanks,
>> Jon
>>
>>
>>
>>>
>>> Bjorn
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about mpss with pcie_bus_perf
  2014-01-15 18:18       ` Jon Mason
@ 2014-01-16  1:56         ` Yijing Wang
  2014-01-16  4:27         ` Yinghai Lu
  1 sibling, 0 replies; 7+ messages in thread
From: Yijing Wang @ 2014-01-16  1:56 UTC (permalink / raw)
  To: Jon Mason
  Cc: Bjorn Helgaas, Yinghai Lu, Myron Stowe, linux-pci@vger.kernel.org,
	Jon Mason

>>> Not a lot of verbiage in here, but I believe this is the expected
>>> behavior for the "pcie_bus_perf" kernel boot parm.  With it, each pci
>>> device sets its MPS to the max of the parent
>>
>> Yes, it's the expected behavior for the "pcie_bus_per".
>> Pcie_write_mrrs() will additionally set mrrs to largest supported value for safe.
>>
>>>
>>> >From the commit log:
>>>
>>>     - A more optimal way is possible, if it falls within a couple of
>>>       constraints:
>>>     * The top-level host bridge will never generate packets larger than the
>>>       smallest TLP (or if it can be controlled independently from its MPS at
>>>       least)
>>>     * The device will never generate packets larger than MPS (which can be
>>>       configured via MRRS)
>>>     * No support of direct PCI-E <-> PCI-E transfers between devices without
>>>       some additional code to specifically deal with that case
>>>
>>>     Then we can use an approach that basically ignores downstream requests
>>>     and focuses exclusively on upstream requests. In that case, all we need
>>
>> Hi Jon, I do not quite understand why we can ignores downstream here , as a model like Yinghai's pcie topo:
>>
>> mps 256           mps 256               mps 256
>> root port ------Switch port(UP) -------Switch port(DP) A   --------PCIe Endpoint Device ( mps = 128)
>>                                |                          <-------Read Request to upstream is safe because MRRS is set to properly value.
>>                                |                          <-------TLP payload won't excess (mps=128) as a transmitter, so this is also safe.
>>                                |                          -------->Downstream TLP like read completion and some other TLP write to PCIe EP device
>>                                |                                   My question is here, how can we ensure Downstream is safe?
>>                                |
>>                                |
>>                                |-------Switch port(DP) B
>>
>> Sorry to disturb you, I would be appreciate if you can me any advice. Thanks!
> 
> If all inter-device communication is removed, then the only
> communication is CPU, Endpoint, and switches in-between.  Going from
> CPU to Endpoint, the MPS is actually going to be the Cache Line size.
> Since the Cache line size is 64B on x86 and most other architectures,
> there is no worry that the endpoint will get a PCIE packet larger than
> the MPS.  Also, using the MRRS to clamp down the endpoint to the MPS
> of the switches should ensure no reads larger than the MPS.  Going
> from Endpoint to CPU, we must ensure that all switches have a MPSS
> large enough for any device under them.  If not, then we must clamp
> down the Endpoint MPS.
> 
> If all of this works, then we can ensure a much larger MPS for all of
> the PCI devices under a switch and not be bound by the smallest MPSS
> of an endpoint on the switch.
> 

Hi Jon, I got it, Thanks for your explanation.


Thanks!
Yijing.



>>
>>>     to care about is that a device MPS is no larger than its parent MPS,
>>>     which allows us to keep all switches/bridges to the max MPS supported by
>>>     their parent and eventually the PHB.
>>>
>>>
>>> If this is not behaving as described (which I can't tell from the log
>>> above), then feel free to assign the bug to me.
>>>
>>> Thanks,
>>> Jon
>>>
>>>
>>>
>>>>
>>>> Bjorn
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about mpss with pcie_bus_perf
  2014-01-15 18:18       ` Jon Mason
  2014-01-16  1:56         ` Yijing Wang
@ 2014-01-16  4:27         ` Yinghai Lu
  1 sibling, 0 replies; 7+ messages in thread
From: Yinghai Lu @ 2014-01-16  4:27 UTC (permalink / raw)
  To: Jon Mason
  Cc: Yijing Wang, Bjorn Helgaas, Myron Stowe,
	linux-pci@vger.kernel.org, Jon Mason

On Wed, Jan 15, 2014 at 10:18 AM, Jon Mason <jdmason@kudzu.us> wrote:
>
> If all inter-device communication is removed, then the only
> communication is CPU, Endpoint, and switches in-between.  Going from
> CPU to Endpoint, the MPS is actually going to be the Cache Line size.
> Since the Cache line size is 64B on x86 and most other architectures,
> there is no worry that the endpoint will get a PCIE packet larger than
> the MPS.  Also, using the MRRS to clamp down the endpoint to the MPS
> of the switches should ensure no reads larger than the MPS.  Going
> from Endpoint to CPU, we must ensure that all switches have a MPSS
> large enough for any device under them.  If not, then we must clamp
> down the Endpoint MPS.
>
> If all of this works, then we can ensure a much larger MPS for all of
> the PCI devices under a switch and not be bound by the smallest MPSS
> of an endpoint on the switch.

I'm confused by above statement.

On system have pcie hotplug support, BIOS set root port mps to 255,
and end port 255 during post.
then hot-remove and hot-add cards, new MPS will be 128 default.
when driver put load the end device, we will have lots of AER about TLP etc.

After change root port mps and end device mps to 128, we will not have
AER anymore.

So question is: root port's mpss 256 and end device's mpss 128 should
work well without any problem?

Also I have noticed BIOS set MPSS to 256 and MRRS 512.
so what is reason for current code for pcie_bus_perf to limit MRRS with MPSS?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-16  4:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-10  0:46 about mpss with pcie_bus_perf Yinghai Lu
2014-01-14 22:54 ` Bjorn Helgaas
2014-01-15  0:34   ` Jon Mason
2014-01-15  2:12     ` Yijing Wang
2014-01-15 18:18       ` Jon Mason
2014-01-16  1:56         ` Yijing Wang
2014-01-16  4:27         ` Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.