* [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
@ 2025-04-22 13:02 Ilpo Järvinen
2025-04-22 20:08 ` Bjorn Helgaas
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Ilpo Järvinen @ 2025-04-22 13:02 UTC (permalink / raw)
To: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, linux-kernel
Cc: Ilpo Järvinen
When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
Tag Requester (note: there is currently no 10-Bit Tag support in the
kernel). While those can be configured to the recommended values by FW,
kernel may decide to overwrite the initial values.
Unfortunately, there is no mechanism for FW to indicate OS which parts
of PCIe configuration should not be altered. Thus, the only option is
to add such logic into the kernel as quirks.
There is a pre-existing quirk flag to disable Extended Tags. Depending
on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
kernel thinks is the best for performance (the largest supported
value), resulting in performance degradation instead with these Root
Ports. (There would have been a pre-existing quirk to disallow
increasing MRRS but it is not identical to rejecting >128B MRRS.)
Add a quirk that disallows enabling Extended Tags and setting MRRS
larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
bifurcated to x2. Reject >128B MRRS only when it is going to be written
by the kernel (this assumes FW configured a good initial value for MRRS
in case the kernel is not touching MRRS at all).
It was first attempted to always write MRRS when the quirk is needed
(always overwrite the initial value). That turned out to be quite
invasive change, however, given the complexity of the initial setup
callchain and various stages returning early when they decide no changes
are necessary, requiring override each. As such, the initial value for
MRRS is now left into the hands of FW.
Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
v2:
- Explain in changelog why FW cannot solve this on its own
- Moved the quirk under arch/x86/pci/
- Don't NULL check value from pci_find_host_bridge()
- Added comment above the quirk about the performance degradation
- Removed all setup chain 128B quirk overrides expect for MRRS write
itself (assumes a sane initial value is set by FW)
arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
drivers/pci/pci.c | 15 ++++++++-------
include/linux/pci.h | 1 +
3 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index efefeb82ab61..aa9617bc4b55 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
+/*
+ * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
+ * performance with Extended Tags and MRRS > 128B. Workaround the performance
+ * problems by disabling Extended Tags and limiting MRRS to 128B.
+ *
+ * https://cdrdv2.intel.com/v1/dl/getContent/837176
+ */
+static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+ u32 linkcap;
+
+ pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
+ if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
+ return;
+
+ bridge->no_ext_tags = 1;
+ bridge->only_128b_mrrs = 1;
+ pci_info(pdev, "Disabling Extended Tags and limiting MRRS to 128B (performance reasons due to 2x PCIe link)\n");
+}
+
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db0, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db1, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db2, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db3, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db6, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db7, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db8, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db9, quirk_pcie2x_no_tags_no_mrrs);
+
/*
* Fixup to mark boot BIOS video selected by BIOS before it changes
*
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24..2ca9cb30fbd3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5941,7 +5941,7 @@ EXPORT_SYMBOL(pcie_get_readrq);
int pcie_set_readrq(struct pci_dev *dev, int rq)
{
u16 v;
- int ret;
+ int ret, max_mrrs = 4096;
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
if (rq < 128 || rq > 4096 || !is_power_of_2(rq))
@@ -5961,13 +5961,14 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
- if (bridge->no_inc_mrrs) {
- int max_mrrs = pcie_get_readrq(dev);
+ if (bridge->no_inc_mrrs)
+ max_mrrs = pcie_get_readrq(dev);
+ if (bridge->only_128b_mrrs)
+ max_mrrs = 128;
- if (rq > max_mrrs) {
- pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
- return -EINVAL;
- }
+ if (rq > max_mrrs) {
+ pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
+ return -EINVAL;
}
ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e8e3fd77e96..6dc7a05f4d4b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -603,6 +603,7 @@ struct pci_host_bridge {
unsigned int ignore_reset_delay:1; /* For entire hierarchy */
unsigned int no_ext_tags:1; /* No Extended Tags */
unsigned int no_inc_mrrs:1; /* No Increase MRRS */
+ unsigned int only_128b_mrrs:1; /* Only 128B MRRS */
unsigned int native_aer:1; /* OS may use PCIe AER */
unsigned int native_pcie_hotplug:1; /* OS may use PCIe hotplug */
unsigned int native_shpc_hotplug:1; /* OS may use SHPC hotplug */
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-04-22 13:02 [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6 Ilpo Järvinen
@ 2025-04-22 20:08 ` Bjorn Helgaas
2025-04-22 22:24 ` Dan Williams
2025-04-22 20:15 ` Ingo Molnar
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2025-04-22 20:08 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, linux-kernel
On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> Tag Requester (note: there is currently no 10-Bit Tag support in the
> kernel). While those can be configured to the recommended values by FW,
> kernel may decide to overwrite the initial values.
>
> Unfortunately, there is no mechanism for FW to indicate OS which parts
> of PCIe configuration should not be altered. Thus, the only option is
> to add such logic into the kernel as quirks.
>
> There is a pre-existing quirk flag to disable Extended Tags. Depending
> on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> kernel thinks is the best for performance (the largest supported
> value), resulting in performance degradation instead with these Root
> Ports. (There would have been a pre-existing quirk to disallow
> increasing MRRS but it is not identical to rejecting >128B MRRS.)
>
> Add a quirk that disallows enabling Extended Tags and setting MRRS
> larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> bifurcated to x2. Reject >128B MRRS only when it is going to be written
> by the kernel (this assumes FW configured a good initial value for MRRS
> in case the kernel is not touching MRRS at all).
>
> It was first attempted to always write MRRS when the quirk is needed
> (always overwrite the initial value). That turned out to be quite
> invasive change, however, given the complexity of the initial setup
> callchain and various stages returning early when they decide no changes
> are necessary, requiring override each. As such, the initial value for
> MRRS is now left into the hands of FW.
>
> Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
v1/rfc: https://lore.kernel.org/all/20250304135108.2599-1-ilpo.jarvinen@linux.intel.com/
I suppose it's quixotic to hope for anything better than quirks that
have to be added or updated for every new processor that comes along.
ACPI _HPX might be a possible way for the platform to tell us what to
do here. ACPI r6.5, sec 6.2.9 says it's for hot-added devices and
"Functions not configured by the platform firmware during initial
system boot" (how are we supposed to determine that?) In any case,
Linux does evaluate _HPX for every device in pci_configure_device().
I'm not sure _HPX really works; it's very general, and I would expect
to see reports of problems if firmware really tried to use it.
Or, I guess a _DSM function would be a possible way to communicate
this.
> v2:
> - Explain in changelog why FW cannot solve this on its own
> - Moved the quirk under arch/x86/pci/
> - Don't NULL check value from pci_find_host_bridge()
> - Added comment above the quirk about the performance degradation
> - Removed all setup chain 128B quirk overrides expect for MRRS write
> itself (assumes a sane initial value is set by FW)
>
> arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
> drivers/pci/pci.c | 15 ++++++++-------
> include/linux/pci.h | 1 +
> 3 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index efefeb82ab61..aa9617bc4b55 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
>
> +/*
> + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> + * problems by disabling Extended Tags and limiting MRRS to 128B.
> + *
> + * https://cdrdv2.intel.com/v1/dl/getContent/837176
> + */
> +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
> +{
> + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> + u32 linkcap;
> +
> + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
> + if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
> + return;
> +
> + bridge->no_ext_tags = 1;
> + bridge->only_128b_mrrs = 1;
> + pci_info(pdev, "Disabling Extended Tags and limiting MRRS to 128B (performance reasons due to 2x PCIe link)\n");
> +}
> +
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db0, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db1, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db2, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db3, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db6, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db7, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db8, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db9, quirk_pcie2x_no_tags_no_mrrs);
> +
> /*
> * Fixup to mark boot BIOS video selected by BIOS before it changes
> *
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4d7c9f64ea24..2ca9cb30fbd3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5941,7 +5941,7 @@ EXPORT_SYMBOL(pcie_get_readrq);
> int pcie_set_readrq(struct pci_dev *dev, int rq)
> {
> u16 v;
> - int ret;
> + int ret, max_mrrs = 4096;
> struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>
> if (rq < 128 || rq > 4096 || !is_power_of_2(rq))
> @@ -5961,13 +5961,14 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
>
> v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
>
> - if (bridge->no_inc_mrrs) {
> - int max_mrrs = pcie_get_readrq(dev);
> + if (bridge->no_inc_mrrs)
> + max_mrrs = pcie_get_readrq(dev);
> + if (bridge->only_128b_mrrs)
> + max_mrrs = 128;
>
> - if (rq > max_mrrs) {
> - pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
> - return -EINVAL;
> - }
> + if (rq > max_mrrs) {
> + pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
> + return -EINVAL;
> }
>
> ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd77e96..6dc7a05f4d4b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -603,6 +603,7 @@ struct pci_host_bridge {
> unsigned int ignore_reset_delay:1; /* For entire hierarchy */
> unsigned int no_ext_tags:1; /* No Extended Tags */
> unsigned int no_inc_mrrs:1; /* No Increase MRRS */
> + unsigned int only_128b_mrrs:1; /* Only 128B MRRS */
> unsigned int native_aer:1; /* OS may use PCIe AER */
> unsigned int native_pcie_hotplug:1; /* OS may use PCIe hotplug */
> unsigned int native_shpc_hotplug:1; /* OS may use SHPC hotplug */
>
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-04-22 13:02 [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6 Ilpo Järvinen
2025-04-22 20:08 ` Bjorn Helgaas
@ 2025-04-22 20:15 ` Ingo Molnar
2025-05-01 8:57 ` Lukas Wunner
2025-06-01 17:22 ` Manivannan Sadhasivam
3 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2025-04-22 20:15 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, linux-kernel
* Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote:
> When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> Tag Requester (note: there is currently no 10-Bit Tag support in the
> kernel). While those can be configured to the recommended values by FW,
> kernel may decide to overwrite the initial values.
>
> Unfortunately, there is no mechanism for FW to indicate OS which parts
> of PCIe configuration should not be altered. Thus, the only option is
> to add such logic into the kernel as quirks.
>
> There is a pre-existing quirk flag to disable Extended Tags. Depending
> on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> kernel thinks is the best for performance (the largest supported
> value), resulting in performance degradation instead with these Root
> Ports. (There would have been a pre-existing quirk to disallow
> increasing MRRS but it is not identical to rejecting >128B MRRS.)
>
> Add a quirk that disallows enabling Extended Tags and setting MRRS
> larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> bifurcated to x2. Reject >128B MRRS only when it is going to be written
> by the kernel (this assumes FW configured a good initial value for MRRS
> in case the kernel is not touching MRRS at all).
>
> It was first attempted to always write MRRS when the quirk is needed
> (always overwrite the initial value). That turned out to be quite
> invasive change, however, given the complexity of the initial setup
> callchain and various stages returning early when they decide no changes
> are necessary, requiring override each. As such, the initial value for
> MRRS is now left into the hands of FW.
>
> Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>
> v2:
> - Explain in changelog why FW cannot solve this on its own
> - Moved the quirk under arch/x86/pci/
> - Don't NULL check value from pci_find_host_bridge()
> - Added comment above the quirk about the performance degradation
> - Removed all setup chain 128B quirk overrides expect for MRRS write
> itself (assumes a sane initial value is set by FW)
>
> arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
> drivers/pci/pci.c | 15 ++++++++-------
> include/linux/pci.h | 1 +
> 3 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index efefeb82ab61..aa9617bc4b55 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
>
> +/*
> + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> + * problems by disabling Extended Tags and limiting MRRS to 128B.
No objections to your fix, just an obligatory:
s/Workaround
/Work around
:)
With that, FWIIW:
Acked-by: Ingo Molnar <mingo@kernel.org>
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-04-22 20:08 ` Bjorn Helgaas
@ 2025-04-22 22:24 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2025-04-22 22:24 UTC (permalink / raw)
To: Bjorn Helgaas, Ilpo Järvinen
Cc: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, linux-kernel
Bjorn Helgaas wrote:
[..]
> v1/rfc: https://lore.kernel.org/all/20250304135108.2599-1-ilpo.jarvinen@linux.intel.com/
>
> I suppose it's quixotic to hope for anything better than quirks that
> have to be added or updated for every new processor that comes along.
>
> ACPI _HPX might be a possible way for the platform to tell us what to
> do here. ACPI r6.5, sec 6.2.9 says it's for hot-added devices and
> "Functions not configured by the platform firmware during initial
> system boot" (how are we supposed to determine that?) In any case,
> Linux does evaluate _HPX for every device in pci_configure_device().
>
> I'm not sure _HPX really works; it's very general, and I would expect
> to see reports of problems if firmware really tried to use it.
>
> Or, I guess a _DSM function would be a possible way to communicate
> this.
Ok, I am reading this as "Maintainer asserts the quirk is unsavory,
please make this be something that Linux can ask the platform firmware
if it needs to apply, or make existing _HPX just work. Either of those
is preferable to a new entry in the quirk table."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-04-22 13:02 [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6 Ilpo Järvinen
2025-04-22 20:08 ` Bjorn Helgaas
2025-04-22 20:15 ` Ingo Molnar
@ 2025-05-01 8:57 ` Lukas Wunner
2025-06-01 17:22 ` Manivannan Sadhasivam
3 siblings, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2025-05-01 8:57 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Dan Williams, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, linux-pci,
linux-kernel
On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> Tag Requester (note: there is currently no 10-Bit Tag support in the
> kernel).
[...]
> Add a quirk that disallows enabling Extended Tags and setting MRRS
> larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> bifurcated to x2. Reject >128B MRRS only when it is going to be written
> by the kernel (this assumes FW configured a good initial value for MRRS
> in case the kernel is not touching MRRS at all).
I note that there's the existing quirk_brcm_5719_limit_mrrs(),
which limits MRRS to 2048 on certain revisions of Broadcom
Ethernet adapters. This became necessary to work around an
internal FIFO problem, see commit 2c55a3d08ade ("tg3: Scale back
code that modifies MRRS") and commit 0b471506712d ("tg3: Recode
PCI MRRS adjustment as a PCI quirk").
The quirk works by overriding the MRRS which was originally set
on enumeration by pcie_bus_configure_settings(). The overriding
happens at enable time, i.e. when a driver starts to makes use
of the device:
do_pci_enable_device()
pci_host_bridge_enable_device()
pcibios_enable_device()
pci_fixup_device()
quirk_brcm_5719_limit_mrrs()
Now if you look further above in do_pci_enable_device(), there's
a call to pci_host_bridge_enable_device(), which invokes the
->enable_device() callback in struct pci_host_bridge.
Currently there's only a single host brige driver implementing
that callback, controller/dwc/pci-imx6.c.
One option would be to set that callback on the host bridge
if a Granite Rapids Root Port is found. And then enforce the
mrrs limit in the callback. That approach may be more acceptable
upstream than adding a custom "only_128b_mrrs" bit to struct
pci_host_bridge.
Another option would be to amend x86's pcibios_enable_device()
to check whether there's a Granite Rapids Root Port above the
device and enforce the mrrs limit if so.
The only downside I see is that the Broadcom quirk will run
afterwards and increase the MRRS again. But it's highly unlikely
that one of these old Broadcom chips is used on a present-day
Granite Rapids server, so it may not be a problem in practice.
And the worst thing that can happen is suboptimal performance.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-04-22 13:02 [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6 Ilpo Järvinen
` (2 preceding siblings ...)
2025-05-01 8:57 ` Lukas Wunner
@ 2025-06-01 17:22 ` Manivannan Sadhasivam
2025-06-02 5:10 ` Ilpo Järvinen
3 siblings, 1 reply; 8+ messages in thread
From: Manivannan Sadhasivam @ 2025-06-01 17:22 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, linux-kernel
On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> Tag Requester (note: there is currently no 10-Bit Tag support in the
> kernel). While those can be configured to the recommended values by FW,
> kernel may decide to overwrite the initial values.
>
> Unfortunately, there is no mechanism for FW to indicate OS which parts
> of PCIe configuration should not be altered. Thus, the only option is
> to add such logic into the kernel as quirks.
>
> There is a pre-existing quirk flag to disable Extended Tags. Depending
> on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> kernel thinks is the best for performance (the largest supported
> value), resulting in performance degradation instead with these Root
> Ports. (There would have been a pre-existing quirk to disallow
> increasing MRRS but it is not identical to rejecting >128B MRRS.)
>
> Add a quirk that disallows enabling Extended Tags and setting MRRS
> larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> bifurcated to x2. Reject >128B MRRS only when it is going to be written
> by the kernel (this assumes FW configured a good initial value for MRRS
> in case the kernel is not touching MRRS at all).
>
> It was first attempted to always write MRRS when the quirk is needed
> (always overwrite the initial value). That turned out to be quite
> invasive change, however, given the complexity of the initial setup
> callchain and various stages returning early when they decide no changes
> are necessary, requiring override each. As such, the initial value for
> MRRS is now left into the hands of FW.
>
> Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>
> v2:
> - Explain in changelog why FW cannot solve this on its own
> - Moved the quirk under arch/x86/pci/
> - Don't NULL check value from pci_find_host_bridge()
> - Added comment above the quirk about the performance degradation
> - Removed all setup chain 128B quirk overrides expect for MRRS write
> itself (assumes a sane initial value is set by FW)
>
> arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
> drivers/pci/pci.c | 15 ++++++++-------
> include/linux/pci.h | 1 +
> 3 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index efefeb82ab61..aa9617bc4b55 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
>
> +/*
> + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> + * problems by disabling Extended Tags and limiting MRRS to 128B.
> + *
> + * https://cdrdv2.intel.com/v1/dl/getContent/837176
> + */
> +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
> +{
> + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> + u32 linkcap;
> +
> + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
> + if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
> + return;
> +
> + bridge->no_ext_tags = 1;
> + bridge->only_128b_mrrs = 1;
My 2 cents here. Wouldn't it work if you hardcode MRRS to 128 in PCI_EXP_DEVCTL
here and then set pci_host_bridge::no_inc_mrrs to 1? This would avoid
introducing an extra flag and also serve the same purpose.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-06-01 17:22 ` Manivannan Sadhasivam
@ 2025-06-02 5:10 ` Ilpo Järvinen
2025-06-02 6:29 ` Manivannan Sadhasivam
0 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2025-06-02 5:10 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, LKML
[-- Attachment #1: Type: text/plain, Size: 4474 bytes --]
On Sun, 1 Jun 2025, Manivannan Sadhasivam wrote:
> On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> > When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> > configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> > Tag Requester (note: there is currently no 10-Bit Tag support in the
> > kernel). While those can be configured to the recommended values by FW,
> > kernel may decide to overwrite the initial values.
> >
> > Unfortunately, there is no mechanism for FW to indicate OS which parts
> > of PCIe configuration should not be altered. Thus, the only option is
> > to add such logic into the kernel as quirks.
> >
> > There is a pre-existing quirk flag to disable Extended Tags. Depending
> > on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> > kernel thinks is the best for performance (the largest supported
> > value), resulting in performance degradation instead with these Root
> > Ports. (There would have been a pre-existing quirk to disallow
> > increasing MRRS but it is not identical to rejecting >128B MRRS.)
> >
> > Add a quirk that disallows enabling Extended Tags and setting MRRS
> > larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> > bifurcated to x2. Reject >128B MRRS only when it is going to be written
> > by the kernel (this assumes FW configured a good initial value for MRRS
> > in case the kernel is not touching MRRS at all).
> >
> > It was first attempted to always write MRRS when the quirk is needed
> > (always overwrite the initial value). That turned out to be quite
> > invasive change, however, given the complexity of the initial setup
> > callchain and various stages returning early when they decide no changes
> > are necessary, requiring override each. As such, the initial value for
> > MRRS is now left into the hands of FW.
> >
> > Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > ---
> >
> > v2:
> > - Explain in changelog why FW cannot solve this on its own
> > - Moved the quirk under arch/x86/pci/
> > - Don't NULL check value from pci_find_host_bridge()
> > - Added comment above the quirk about the performance degradation
> > - Removed all setup chain 128B quirk overrides expect for MRRS write
> > itself (assumes a sane initial value is set by FW)
> >
> > arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
> > drivers/pci/pci.c | 15 ++++++++-------
> > include/linux/pci.h | 1 +
> > 3 files changed, 39 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > index efefeb82ab61..aa9617bc4b55 100644
> > --- a/arch/x86/pci/fixup.c
> > +++ b/arch/x86/pci/fixup.c
> > @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
> >
> > +/*
> > + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> > + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> > + * problems by disabling Extended Tags and limiting MRRS to 128B.
> > + *
> > + * https://cdrdv2.intel.com/v1/dl/getContent/837176
> > + */
> > +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
> > +{
> > + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> > + u32 linkcap;
> > +
> > + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
> > + if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
> > + return;
> > +
> > + bridge->no_ext_tags = 1;
> > + bridge->only_128b_mrrs = 1;
>
> My 2 cents here. Wouldn't it work if you hardcode MRRS to 128 in PCI_EXP_DEVCTL
> here and then set pci_host_bridge::no_inc_mrrs to 1? This would avoid
> introducing an extra flag and also serve the same purpose.
Hi Mani,
Thanks for the suggestion but it won't work because this is the Root Port.
The devices underneath it need this setting so we cannot set them to 128B
reliable here (is there anything that guarantees those devices have been
enumerated at this point?).
I've v3 already prepared which uses the enable device hook as suggested by
Lukas. I'll send it soon.
--
i.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
2025-06-02 5:10 ` Ilpo Järvinen
@ 2025-06-02 6:29 ` Manivannan Sadhasivam
0 siblings, 0 replies; 8+ messages in thread
From: Manivannan Sadhasivam @ 2025-06-02 6:29 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Dan Williams, Lukas Wunner, Bjorn Helgaas, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-pci, LKML
On Mon, Jun 02, 2025 at 08:10:42AM +0300, Ilpo Järvinen wrote:
> On Sun, 1 Jun 2025, Manivannan Sadhasivam wrote:
>
> > On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> > > When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> > > configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> > > Tag Requester (note: there is currently no 10-Bit Tag support in the
> > > kernel). While those can be configured to the recommended values by FW,
> > > kernel may decide to overwrite the initial values.
> > >
> > > Unfortunately, there is no mechanism for FW to indicate OS which parts
> > > of PCIe configuration should not be altered. Thus, the only option is
> > > to add such logic into the kernel as quirks.
> > >
> > > There is a pre-existing quirk flag to disable Extended Tags. Depending
> > > on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> > > kernel thinks is the best for performance (the largest supported
> > > value), resulting in performance degradation instead with these Root
> > > Ports. (There would have been a pre-existing quirk to disallow
> > > increasing MRRS but it is not identical to rejecting >128B MRRS.)
> > >
> > > Add a quirk that disallows enabling Extended Tags and setting MRRS
> > > larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> > > bifurcated to x2. Reject >128B MRRS only when it is going to be written
> > > by the kernel (this assumes FW configured a good initial value for MRRS
> > > in case the kernel is not touching MRRS at all).
> > >
> > > It was first attempted to always write MRRS when the quirk is needed
> > > (always overwrite the initial value). That turned out to be quite
> > > invasive change, however, given the complexity of the initial setup
> > > callchain and various stages returning early when they decide no changes
> > > are necessary, requiring override each. As such, the initial value for
> > > MRRS is now left into the hands of FW.
> > >
> > > Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > > ---
> > >
> > > v2:
> > > - Explain in changelog why FW cannot solve this on its own
> > > - Moved the quirk under arch/x86/pci/
> > > - Don't NULL check value from pci_find_host_bridge()
> > > - Added comment above the quirk about the performance degradation
> > > - Removed all setup chain 128B quirk overrides expect for MRRS write
> > > itself (assumes a sane initial value is set by FW)
> > >
> > > arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
> > > drivers/pci/pci.c | 15 ++++++++-------
> > > include/linux/pci.h | 1 +
> > > 3 files changed, 39 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > index efefeb82ab61..aa9617bc4b55 100644
> > > --- a/arch/x86/pci/fixup.c
> > > +++ b/arch/x86/pci/fixup.c
> > > @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
> > >
> > > +/*
> > > + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> > > + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> > > + * problems by disabling Extended Tags and limiting MRRS to 128B.
> > > + *
> > > + * https://cdrdv2.intel.com/v1/dl/getContent/837176
> > > + */
> > > +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
> > > +{
> > > + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> > > + u32 linkcap;
> > > +
> > > + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
> > > + if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
> > > + return;
> > > +
> > > + bridge->no_ext_tags = 1;
> > > + bridge->only_128b_mrrs = 1;
> >
> > My 2 cents here. Wouldn't it work if you hardcode MRRS to 128 in PCI_EXP_DEVCTL
> > here and then set pci_host_bridge::no_inc_mrrs to 1? This would avoid
> > introducing an extra flag and also serve the same purpose.
>
> Hi Mani,
>
> Thanks for the suggestion but it won't work because this is the Root Port.
> The devices underneath it need this setting so we cannot set them to 128B
> reliable here (is there anything that guarantees those devices have been
> enumerated at this point?).
>
I was under assumption that the kernel would honor the Root Port MRRS value
while setting the device MRRS, but I was wrong. Kernel just takes the MPS
value (but that considers the Root Port's MPS) for MRRS and scales it down if
the hardware doesn't support it.
So yes, setting the Root Port's MRRS would have no effect in limiting the device
MRRS. And I believe, Xeon6 Root Ports doesn't suffer from same 128B MRRS
limitation for MPS, otherwise, you could just set Root Port MPS to 128B and it
will make sure that both MPS and MRRS of endpoint devices will be capped to this
value.
> I've v3 already prepared which uses the enable device hook as suggested by
> Lukas. I'll send it soon.
>
The callback is supposed to be used for performing enablement steps required by
the *Host Bridge* devices, not Root Ports. So I do have a concern to use it to
address the Root Port quirks. That's why I thought of limiting it to the PCI
quirks.
But feel free to submit the patch. Let's see what Bjorn and others feel about
it.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-02 6:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-22 13:02 [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6 Ilpo Järvinen
2025-04-22 20:08 ` Bjorn Helgaas
2025-04-22 22:24 ` Dan Williams
2025-04-22 20:15 ` Ingo Molnar
2025-05-01 8:57 ` Lukas Wunner
2025-06-01 17:22 ` Manivannan Sadhasivam
2025-06-02 5:10 ` Ilpo Järvinen
2025-06-02 6:29 ` Manivannan Sadhasivam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox