All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Lukas Wunner <lukas@wunner.de>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
Date: Tue, 22 Apr 2025 15:08:57 -0500	[thread overview]
Message-ID: <20250422200857.GA381276@bhelgaas> (raw)
In-Reply-To: <20250422130207.3124-1-ilpo.jarvinen@linux.intel.com>

On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> Tag Requester (note: there is currently no 10-Bit Tag support in the
> kernel). While those can be configured to the recommended values by FW,
> kernel may decide to overwrite the initial values.
> 
> Unfortunately, there is no mechanism for FW to indicate OS which parts
> of PCIe configuration should not be altered. Thus, the only option is
> to add such logic into the kernel as quirks.
> 
> There is a pre-existing quirk flag to disable Extended Tags. Depending
> on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> kernel thinks is the best for performance (the largest supported
> value), resulting in performance degradation instead with these Root
> Ports. (There would have been a pre-existing quirk to disallow
> increasing MRRS but it is not identical to rejecting >128B MRRS.)
> 
> Add a quirk that disallows enabling Extended Tags and setting MRRS
> larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> bifurcated to x2. Reject >128B MRRS only when it is going to be written
> by the kernel (this assumes FW configured a good initial value for MRRS
> in case the kernel is not touching MRRS at all).
> 
> It was first attempted to always write MRRS when the quirk is needed
> (always overwrite the initial value). That turned out to be quite
> invasive change, however, given the complexity of the initial setup
> callchain and various stages returning early when they decide no changes
> are necessary, requiring override each. As such, the initial value for
> MRRS is now left into the hands of FW.
> 
> Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---

v1/rfc: https://lore.kernel.org/all/20250304135108.2599-1-ilpo.jarvinen@linux.intel.com/

I suppose it's quixotic to hope for anything better than quirks that
have to be added or updated for every new processor that comes along.

ACPI _HPX might be a possible way for the platform to tell us what to
do here.  ACPI r6.5, sec 6.2.9 says it's for hot-added devices and
"Functions not configured by the platform firmware during initial
system boot" (how are we supposed to determine that?)  In any case,
Linux does evaluate _HPX for every device in pci_configure_device().

I'm not sure _HPX really works; it's very general, and I would expect
to see reports of problems if firmware really tried to use it.

Or, I guess a _DSM function would be a possible way to communicate
this.

> v2:
> - Explain in changelog why FW cannot solve this on its own
> - Moved the quirk under arch/x86/pci/
> - Don't NULL check value from pci_find_host_bridge()
> - Added comment above the quirk about the performance degradation
> - Removed all setup chain 128B quirk overrides expect for MRRS write
>   itself (assumes a sane initial value is set by FW)
> 
>  arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
>  drivers/pci/pci.c    | 15 ++++++++-------
>  include/linux/pci.h  |  1 +
>  3 files changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index efefeb82ab61..aa9617bc4b55 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_MCH_PB1,	pcie_r
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_MCH_PC,	pcie_rootport_aspm_quirk);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_MCH_PC1,	pcie_rootport_aspm_quirk);
>  
> +/*
> + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> + * problems by disabling Extended Tags and limiting MRRS to 128B.
> + *
> + * https://cdrdv2.intel.com/v1/dl/getContent/837176
> + */
> +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
> +{
> +	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> +	u32 linkcap;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
> +	if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
> +		return;
> +
> +	bridge->no_ext_tags = 1;
> +	bridge->only_128b_mrrs = 1;
> +	pci_info(pdev, "Disabling Extended Tags and limiting MRRS to 128B (performance reasons due to 2x PCIe link)\n");
> +}
> +
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db0, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db1, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db2, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db3, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db6, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db7, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db8, quirk_pcie2x_no_tags_no_mrrs);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db9, quirk_pcie2x_no_tags_no_mrrs);
> +
>  /*
>   * Fixup to mark boot BIOS video selected by BIOS before it changes
>   *
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4d7c9f64ea24..2ca9cb30fbd3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5941,7 +5941,7 @@ EXPORT_SYMBOL(pcie_get_readrq);
>  int pcie_set_readrq(struct pci_dev *dev, int rq)
>  {
>  	u16 v;
> -	int ret;
> +	int ret, max_mrrs = 4096;
>  	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>  
>  	if (rq < 128 || rq > 4096 || !is_power_of_2(rq))
> @@ -5961,13 +5961,14 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
>  
>  	v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
>  
> -	if (bridge->no_inc_mrrs) {
> -		int max_mrrs = pcie_get_readrq(dev);
> +	if (bridge->no_inc_mrrs)
> +		max_mrrs = pcie_get_readrq(dev);
> +	if (bridge->only_128b_mrrs)
> +		max_mrrs = 128;
>  
> -		if (rq > max_mrrs) {
> -			pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
> -			return -EINVAL;
> -		}
> +	if (rq > max_mrrs) {
> +		pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
> +		return -EINVAL;
>  	}
>  
>  	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 0e8e3fd77e96..6dc7a05f4d4b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -603,6 +603,7 @@ struct pci_host_bridge {
>  	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
>  	unsigned int	no_ext_tags:1;		/* No Extended Tags */
>  	unsigned int	no_inc_mrrs:1;		/* No Increase MRRS */
> +	unsigned int	only_128b_mrrs:1;	/* Only 128B MRRS */
>  	unsigned int	native_aer:1;		/* OS may use PCIe AER */
>  	unsigned int	native_pcie_hotplug:1;	/* OS may use PCIe hotplug */
>  	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
> 
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> -- 
> 2.39.5
> 

  reply	other threads:[~2025-04-22 20:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 13:02 [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6 Ilpo Järvinen
2025-04-22 20:08 ` Bjorn Helgaas [this message]
2025-04-22 22:24   ` Dan Williams
2025-04-22 20:15 ` Ingo Molnar
2025-05-01  8:57 ` Lukas Wunner
2025-06-01 17:22 ` Manivannan Sadhasivam
2025-06-02  5:10   ` Ilpo Järvinen
2025-06-02  6:29     ` Manivannan Sadhasivam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250422200857.GA381276@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.