* [PATCH V4 00/12] PCIe TPH and cache direct injection support
@ 2024-08-22 20:41 Wei Huang
2024-08-22 20:41 ` [PATCH V4 01/12] PCI: Introduce PCIe TPH support framework Wei Huang
` (14 more replies)
0 siblings, 15 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Hi All,
TPH (TLP Processing Hints) is a PCIe feature that allows endpoint
devices to provide optimization hints for requests that target memory
space. These hints, in a format called steering tag (ST), are provided
in the requester's TLP headers and allow the system hardware, including
the Root Complex, to optimize the utilization of platform resources
for the requests.
Upcoming AMD hardware implement a new Cache Injection feature that
leverages TPH. Cache Injection allows PCIe endpoints to inject I/O
Coherent DMA writes directly into an L2 within the CCX (core complex)
closest to the CPU core that will consume it. This technology is aimed
at applications requiring high performance and low latency, such as
networking and storage applications.
This series introduces generic TPH support in Linux, allowing STs to be
retrieved and used by PCIe endpoint drivers as needed. As a
demonstration, it includes an example usage in the Broadcom BNXT driver.
When running on Broadcom NICs with the appropriate firmware, it shows
substantial memory bandwidth savings and better network bandwidth using
real-world benchmarks. This solution is vendor-neutral and implemented
based on industry standards (PCIe Spec and PCI FW Spec).
V3->V4:
* Rebase on top of the latest pci/next tree (tag: 6.11-rc1)
* Add new API functioins to query/enable/disable TPH support
* Make pcie_tph_set_st() completely independent from pcie_tph_get_cpu_st()
* Rewrite bnxt.c based on new APIs
* Remove documentation for now due to constantly changing API
* Remove pci=notph, but keep pci=nostmode with better flow (Bjorn)
* Lots of code rewrite in tph.c & pci-tph.h with cleaner interface (Bjorn)
* Add TPH save/restore support (Paul Luse and Lukas Wunner)
V2->V3:
* Rebase on top of pci/next tree (tag: pci-v6.11-changes)
* Redefine PCI TPH registers (pci_regs.h) without breaking uapi
* Fix commit subjects/messages for kernel options (Jonathan and Bjorn)
* Break API functions into three individual patches for easy review
* Rewrite lots of code in tph.c/tph.h based (Jonathan and Bjorn)
V1->V2:
* Rebase on top of pci.git/for-linus (6.10-rc1)
* Address mismatched data types reported by Sparse (Sparse check passed)
* Add pcie_tph_intr_vec_supported() for checking IRQ mode support
* Skip bnxt affinity notifier registration if
pcie_tph_intr_vec_supported()=false
* Minor fixes in bnxt driver (i.e. warning messages)
Manoj Panicker (1):
bnxt_en: Add TPH support in BNXT driver
Michael Chan (1):
bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings
Paul Luse (1):
PCI/TPH: Add save/restore support for TPH
Wei Huang (9):
PCI: Introduce PCIe TPH support framework
PCI: Add TPH related register definition
PCI/TPH: Add pcie_tph_modes() to query TPH modes
PCI/TPH: Add pcie_enable_tph() to enable TPH
PCI/TPH: Add pcie_disable_tph() to disable TPH
PCI/TPH: Add pcie_tph_enabled() to check TPH state
PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
PCI/TPH: Add pci=nostmode to force TPH No ST Mode
.../admin-guide/kernel-parameters.txt | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 86 ++-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +
drivers/pci/pci.c | 4 +
drivers/pci/pci.h | 12 +
drivers/pci/pcie/Kconfig | 11 +
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/tph.c | 563 ++++++++++++++++++
drivers/pci/probe.c | 1 +
include/linux/pci-tph.h | 48 ++
include/linux/pci.h | 7 +
include/uapi/linux/pci_regs.h | 38 +-
12 files changed, 768 insertions(+), 10 deletions(-)
create mode 100644 drivers/pci/pcie/tph.c
create mode 100644 include/linux/pci-tph.h
--
2.45.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH V4 01/12] PCI: Introduce PCIe TPH support framework
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 02/12] PCI: Add TPH related register definition Wei Huang
` (13 subsequent siblings)
14 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Implement the framework for PCIe TPH support by introducing tph.c
source file, along with CONFIG_PCIE_TPH, to Linux PCIe subsystem. Add
tph_cap in pci_dev to cache TPH capability offset.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/pci/pci.h | 6 ++++++
drivers/pci/pcie/Kconfig | 11 +++++++++++
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/tph.c | 15 +++++++++++++++
drivers/pci/probe.c | 1 +
include/linux/pci.h | 4 ++++
6 files changed, 38 insertions(+)
create mode 100644 drivers/pci/pcie/tph.c
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 79c8398f3938..289eddfe350b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -571,6 +571,12 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
#endif /* CONFIG_PCI_IOV */
+#ifdef CONFIG_PCIE_TPH
+void pci_tph_init(struct pci_dev *dev);
+#else
+static inline void pci_tph_init(struct pci_dev *dev) { }
+#endif
+
#ifdef CONFIG_PCIE_PTM
void pci_ptm_init(struct pci_dev *dev);
void pci_save_ptm_state(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 17919b99fa66..61e4bd16eaf1 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -155,3 +155,14 @@ config PCIE_EDR
the PCI Firmware Specification r3.2. Enable this if you want to
support hybrid DPC model which uses both firmware and OS to
implement DPC.
+
+config PCIE_TPH
+ bool "TLP Processing Hints"
+ depends on ACPI
+ default n
+ help
+ This option adds support for PCIe TLP Processing Hints (TPH).
+ TPH allows endpoint devices to provide optimization hints, such as
+ desired caching behavior, for requests that target memory space.
+ These hints, called Steering Tags, can empower the system hardware
+ to optimize the utilization of platform resources.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 6461aa93fe76..3542b42ea0b9 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
obj-$(CONFIG_PCIE_PTM) += ptm.o
obj-$(CONFIG_PCIE_EDR) += edr.o
+obj-$(CONFIG_PCIE_TPH) += tph.o
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
new file mode 100644
index 000000000000..a547858c3f68
--- /dev/null
+++ b/drivers/pci/pcie/tph.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * TPH (TLP Processing Hints) support
+ *
+ * Copyright (C) 2024 Advanced Micro Devices, Inc.
+ * Eric Van Tassell <Eric.VanTassell@amd.com>
+ * Wei Huang <wei.huang2@amd.com>
+ */
+
+#include "../pci.h"
+
+void pci_tph_init(struct pci_dev *pdev)
+{
+ pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b14b9876c030..c74adcdee52b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2498,6 +2498,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
pci_dpc_init(dev); /* Downstream Port Containment */
pci_rcec_init(dev); /* Root Complex Event Collector */
pci_doe_init(dev); /* Data Object Exchange */
+ pci_tph_init(dev); /* TLP Processing Hints */
pcie_report_downtraining(dev);
pci_init_reset_methods(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4cf89a4b4cbc..c59e7ecab491 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -530,6 +530,10 @@ struct pci_dev {
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+#ifdef CONFIG_PCIE_TPH
+ u16 tph_cap; /* TPH capability offset */
+#endif
};
static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 02/12] PCI: Add TPH related register definition
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
2024-08-22 20:41 ` [PATCH V4 01/12] PCI: Introduce PCIe TPH support framework Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-09-04 19:52 ` Bjorn Helgaas
2024-08-22 20:41 ` [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes Wei Huang
` (12 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Linux has some basic, but incomplete, definition for the TPH Requester
capability registers. Also the definitions of TPH Requester control
register and TPH Completer capability, as well as the ST fields of
MSI-X entry, are missing. Add all required definitions to support TPH
without changing the existing Linux UAPI.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
include/uapi/linux/pci_regs.h | 38 +++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 94c00996e633..643236f43f4d 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -340,7 +340,9 @@
#define PCI_MSIX_ENTRY_UPPER_ADDR 0x4 /* Message Upper Address */
#define PCI_MSIX_ENTRY_DATA 0x8 /* Message Data */
#define PCI_MSIX_ENTRY_VECTOR_CTRL 0xc /* Vector Control */
-#define PCI_MSIX_ENTRY_CTRL_MASKBIT 0x00000001
+#define PCI_MSIX_ENTRY_CTRL_MASKBIT 0x00000001 /* Mask Bit */
+#define PCI_MSIX_ENTRY_CTRL_ST_LOWER 0x00ff0000 /* ST Lower */
+#define PCI_MSIX_ENTRY_CTRL_ST_UPPER 0xff000000 /* ST Upper */
/* CompactPCI Hotswap Register */
@@ -657,6 +659,7 @@
#define PCI_EXP_DEVCAP2_ATOMIC_COMP64 0x00000100 /* 64b AtomicOp completion */
#define PCI_EXP_DEVCAP2_ATOMIC_COMP128 0x00000200 /* 128b AtomicOp completion */
#define PCI_EXP_DEVCAP2_LTR 0x00000800 /* Latency tolerance reporting */
+#define PCI_EXP_DEVCAP2_TPH_COMP_MASK 0x00003000 /* TPH completer support */
#define PCI_EXP_DEVCAP2_OBFF_MASK 0x000c0000 /* OBFF support mechanism */
#define PCI_EXP_DEVCAP2_OBFF_MSG 0x00040000 /* New message signaling */
#define PCI_EXP_DEVCAP2_OBFF_WAKE 0x00080000 /* Re-use WAKE# for OBFF */
@@ -1020,15 +1023,34 @@
#define PCI_DPA_CAP_SUBSTATE_MASK 0x1F /* # substates - 1 */
#define PCI_DPA_BASE_SIZEOF 16 /* size with 0 substates */
+/* TPH Completer Support */
+#define PCI_EXP_DEVCAP2_TPH_COMP_NONE 0x0 /* None */
+#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_ONLY 0x1 /* TPH only */
+#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_AND_EXT 0x3 /* TPH and Extended TPH */
+
/* TPH Requester */
#define PCI_TPH_CAP 4 /* capability register */
-#define PCI_TPH_CAP_LOC_MASK 0x600 /* location mask */
-#define PCI_TPH_LOC_NONE 0x000 /* no location */
-#define PCI_TPH_LOC_CAP 0x200 /* in capability */
-#define PCI_TPH_LOC_MSIX 0x400 /* in MSI-X */
-#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST table mask */
-#define PCI_TPH_CAP_ST_SHIFT 16 /* ST table shift */
-#define PCI_TPH_BASE_SIZEOF 0xc /* size with no ST table */
+#define PCI_TPH_CAP_NO_ST 0x00000001 /* No ST Mode Supported */
+#define PCI_TPH_CAP_INT_VEC 0x00000002 /* Interrupt Vector Mode Supported */
+#define PCI_TPH_CAP_DEV_SPEC 0x00000004 /* Device Specific Mode Supported */
+#define PCI_TPH_CAP_EXT_TPH 0x00000100 /* Ext TPH Requester Supported */
+#define PCI_TPH_CAP_LOC_MASK 0x00000600 /* ST Table Location */
+#define PCI_TPH_LOC_NONE 0x00000000 /* Not present */
+#define PCI_TPH_LOC_CAP 0x00000200 /* In capability */
+#define PCI_TPH_LOC_MSIX 0x00000400 /* In MSI-X */
+#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST Table Size */
+#define PCI_TPH_CAP_ST_SHIFT 16 /* ST Table Size shift */
+#define PCI_TPH_BASE_SIZEOF 0xc /* Size with no ST table */
+
+#define PCI_TPH_CTRL 8 /* control register */
+#define PCI_TPH_CTRL_MODE_SEL_MASK 0x00000007 /* ST Mode Select */
+#define PCI_TPH_NO_ST_MODE 0x0 /* No ST Mode */
+#define PCI_TPH_INT_VEC_MODE 0x1 /* Interrupt Vector Mode */
+#define PCI_TPH_DEV_SPEC_MODE 0x2 /* Device Specific Mode */
+#define PCI_TPH_CTRL_REQ_EN_MASK 0x00000300 /* TPH Requester Enable */
+#define PCI_TPH_REQ_DISABLE 0x0 /* No TPH requests allowed */
+#define PCI_TPH_REQ_TPH_ONLY 0x1 /* TPH only requests allowed */
+#define PCI_TPH_REQ_EXT_TPH 0x3 /* Extended TPH requests allowed */
/* Downstream Port Containment */
#define PCI_EXP_DPC_CAP 0x04 /* DPC Capability */
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
2024-08-22 20:41 ` [PATCH V4 01/12] PCI: Introduce PCIe TPH support framework Wei Huang
2024-08-22 20:41 ` [PATCH V4 02/12] PCI: Add TPH related register definition Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-09-04 19:40 ` Bjorn Helgaas
2024-08-22 20:41 ` [PATCH V4 04/12] PCI/TPH: Add pcie_enable_tph() to enable TPH Wei Huang
` (11 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Add pcie_tph_modes() to allow drivers to query the TPH modes supported
by an endpoint device, as reported in the TPH Requester Capability
register. The modes are reported as a bitmask and current supported
modes include:
- PCI_TPH_CAP_NO_ST: NO ST Mode Supported
- PCI_TPH_CAP_INT_VEC: Interrupt Vector Mode Supported
- PCI_TPH_CAP_DEV_SPEC: Device Specific Mode Supported
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/pci/pcie/tph.c | 33 +++++++++++++++++++++++++++++++++
include/linux/pci-tph.h | 18 ++++++++++++++++++
2 files changed, 51 insertions(+)
create mode 100644 include/linux/pci-tph.h
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index a547858c3f68..a28dced3097d 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -6,9 +6,42 @@
* Eric Van Tassell <Eric.VanTassell@amd.com>
* Wei Huang <wei.huang2@amd.com>
*/
+#include <linux/pci.h>
+#include <linux/pci-tph.h>
#include "../pci.h"
+static u8 get_st_modes(struct pci_dev *pdev)
+{
+ u32 reg;
+
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+ reg &= PCI_TPH_CAP_NO_ST | PCI_TPH_CAP_INT_VEC | PCI_TPH_CAP_DEV_SPEC;
+
+ return reg;
+}
+
+/**
+ * pcie_tph_modes - Get the ST modes supported by device
+ * @pdev: PCI device
+ *
+ * Returns a bitmask with all TPH modes supported by a device as shown in the
+ * TPH capability register. Current supported modes include:
+ * PCI_TPH_CAP_NO_ST - NO ST Mode Supported
+ * PCI_TPH_CAP_INT_VEC - Interrupt Vector Mode Supported
+ * PCI_TPH_CAP_DEV_SPEC - Device Specific Mode Supported
+ *
+ * Return: 0 when TPH is not supported, otherwise bitmask of supported modes
+ */
+int pcie_tph_modes(struct pci_dev *pdev)
+{
+ if (!pdev->tph_cap)
+ return 0;
+
+ return get_st_modes(pdev);
+}
+EXPORT_SYMBOL(pcie_tph_modes);
+
void pci_tph_init(struct pci_dev *pdev)
{
pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
new file mode 100644
index 000000000000..fa378afe9c7e
--- /dev/null
+++ b/include/linux/pci-tph.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * TPH (TLP Processing Hints)
+ *
+ * Copyright (C) 2024 Advanced Micro Devices, Inc.
+ * Eric Van Tassell <Eric.VanTassell@amd.com>
+ * Wei Huang <wei.huang2@amd.com>
+ */
+#ifndef LINUX_PCI_TPH_H
+#define LINUX_PCI_TPH_H
+
+#ifdef CONFIG_PCIE_TPH
+int pcie_tph_modes(struct pci_dev *pdev);
+#else
+static inline int pcie_tph_modes(struct pci_dev *pdev) { return 0; }
+#endif
+
+#endif /* LINUX_PCI_TPH_H */
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 04/12] PCI/TPH: Add pcie_enable_tph() to enable TPH
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (2 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-09-13 11:35 ` Alejandro Lucero Palau
2024-08-22 20:41 ` [PATCH V4 05/12] PCI/TPH: Add pcie_disable_tph() to disable TPH Wei Huang
` (10 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Allow drivers to enable TPH support using a specific ST mode. It checks
whether the mode is actually supported by the device before enabling.
Additionally determines what types of requests, TPH (8-bit) or Extended
TPH (16-bit), can be issued by the device based on the device's TPH
Requester capability and its Root Port's Completer capability.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/pci/pcie/tph.c | 92 +++++++++++++++++++++++++++++++++++++++++
include/linux/pci-tph.h | 3 ++
include/linux/pci.h | 3 ++
3 files changed, 98 insertions(+)
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index a28dced3097d..14ad8c5e895c 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -7,6 +7,7 @@
* Wei Huang <wei.huang2@amd.com>
*/
#include <linux/pci.h>
+#include <linux/bitfield.h>
#include <linux/pci-tph.h>
#include "../pci.h"
@@ -21,6 +22,97 @@ static u8 get_st_modes(struct pci_dev *pdev)
return reg;
}
+/* Return device's Root Port completer capability */
+static u8 get_rp_completer_type(struct pci_dev *pdev)
+{
+ struct pci_dev *rp;
+ u32 reg;
+ int ret;
+
+ rp = pcie_find_root_port(pdev);
+ if (!rp)
+ return 0;
+
+ ret = pcie_capability_read_dword(rp, PCI_EXP_DEVCAP2, ®);
+ if (ret)
+ return 0;
+
+ return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg);
+}
+
+/**
+ * pcie_enable_tph - Enable TPH support for device using a specific ST mode
+ * @pdev: PCI device
+ * @mode: ST mode to enable, as returned by pcie_tph_modes()
+ *
+ * Checks whether the mode is actually supported by the device before enabling
+ * and returns an error if not. Additionally determines what types of requests,
+ * TPH or extended TPH, can be issued by the device based on its TPH requester
+ * capability and the Root Port's completer capability.
+ *
+ * Return: 0 on success, otherwise negative value (-errno)
+ */
+int pcie_enable_tph(struct pci_dev *pdev, int mode)
+{
+ u32 reg;
+ u8 dev_modes;
+ u8 rp_req_type;
+
+ if (!pdev->tph_cap)
+ return -EINVAL;
+
+ if (pdev->tph_enabled)
+ return -EBUSY;
+
+ /* Check ST mode comptability */
+ dev_modes = get_st_modes(pdev);
+ if (!(mode & dev_modes))
+ return -EINVAL;
+
+ /* Select a supported mode */
+ switch (mode) {
+ case PCI_TPH_CAP_INT_VEC:
+ pdev->tph_mode = PCI_TPH_INT_VEC_MODE;
+ break;
+ case PCI_TPH_CAP_DEV_SPEC:
+ pdev->tph_mode = PCI_TPH_DEV_SPEC_MODE;
+ break;
+ case PCI_TPH_CAP_NO_ST:
+ pdev->tph_mode = PCI_TPH_NO_ST_MODE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* Get req_type supported by device and its Root Port */
+ reg = pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+ if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
+ pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
+ else
+ pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
+
+ rp_req_type = get_rp_completer_type(pdev);
+
+ /* Final req_type is the smallest value of two */
+ pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
+
+ /* Write them into TPH control register */
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
+
+ reg &= ~PCI_TPH_CTRL_MODE_SEL_MASK;
+ reg |= FIELD_PREP(PCI_TPH_CTRL_MODE_SEL_MASK, pdev->tph_mode);
+
+ reg &= ~PCI_TPH_CTRL_REQ_EN_MASK;
+ reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, pdev->tph_req_type);
+
+ pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
+
+ pdev->tph_enabled = 1;
+
+ return 0;
+}
+EXPORT_SYMBOL(pcie_enable_tph);
+
/**
* pcie_tph_modes - Get the ST modes supported by device
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index fa378afe9c7e..cdf561076484 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -10,8 +10,11 @@
#define LINUX_PCI_TPH_H
#ifdef CONFIG_PCIE_TPH
+int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_tph_modes(struct pci_dev *pdev);
#else
+static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
+{ return -EINVAL; }
static inline int pcie_tph_modes(struct pci_dev *pdev) { return 0; }
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c59e7ecab491..6f05deb6a0bf 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -433,6 +433,7 @@ struct pci_dev {
unsigned int ats_enabled:1; /* Address Translation Svc */
unsigned int pasid_enabled:1; /* Process Address Space ID */
unsigned int pri_enabled:1; /* Page Request Interface */
+ unsigned int tph_enabled:1; /* TLP Processing Hints */
unsigned int is_managed:1; /* Managed via devres */
unsigned int is_msi_managed:1; /* MSI release via devres installed */
unsigned int needs_freset:1; /* Requires fundamental reset */
@@ -533,6 +534,8 @@ struct pci_dev {
#ifdef CONFIG_PCIE_TPH
u16 tph_cap; /* TPH capability offset */
+ u8 tph_mode; /* TPH mode */
+ u8 tph_req_type; /* TPH requester type */
#endif
};
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 05/12] PCI/TPH: Add pcie_disable_tph() to disable TPH
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (3 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 04/12] PCI/TPH: Add pcie_enable_tph() to enable TPH Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 06/12] PCI/TPH: Add pcie_tph_enabled() to check TPH state Wei Huang
` (9 subsequent siblings)
14 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Turn off TPH support for a device and clean all related states.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/pci/pcie/tph.c | 22 ++++++++++++++++++++++
include/linux/pci-tph.h | 2 ++
2 files changed, 24 insertions(+)
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 14ad8c5e895c..08ce4fdeb160 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -40,6 +40,28 @@ static u8 get_rp_completer_type(struct pci_dev *pdev)
return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg);
}
+/**
+ * pcie_disable_tph - Turn off TPH support for device
+ * @pdev: PCI device
+ *
+ * Return: none
+ */
+void pcie_disable_tph(struct pci_dev *pdev)
+{
+ if (!pdev->tph_cap)
+ return;
+
+ if (!pdev->tph_enabled)
+ return;
+
+ pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, 0);
+
+ pdev->tph_mode = 0;
+ pdev->tph_req_type = 0;
+ pdev->tph_enabled = 0;
+}
+EXPORT_SYMBOL(pcie_disable_tph);
+
/**
* pcie_enable_tph - Enable TPH support for device using a specific ST mode
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index cdf561076484..422d395ade68 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -10,9 +10,11 @@
#define LINUX_PCI_TPH_H
#ifdef CONFIG_PCIE_TPH
+void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_tph_modes(struct pci_dev *pdev);
#else
+static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
static inline int pcie_tph_modes(struct pci_dev *pdev) { return 0; }
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 06/12] PCI/TPH: Add pcie_tph_enabled() to check TPH state
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (4 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 05/12] PCI/TPH: Add pcie_disable_tph() to disable TPH Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag Wei Huang
` (8 subsequent siblings)
14 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Check if TPH has been enabled on a device.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/pci/pcie/tph.c | 12 ++++++++++++
include/linux/pci-tph.h | 2 ++
2 files changed, 14 insertions(+)
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 08ce4fdeb160..d949930e7e78 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -40,6 +40,18 @@ static u8 get_rp_completer_type(struct pci_dev *pdev)
return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg);
}
+/**
+ * pcie_tph_enabled - Check whether TPH is enabled in device
+ * @pdev: PCI device
+ *
+ * Return: true if TPH is enabled, otherwise false
+ */
+bool pcie_tph_enabled(struct pci_dev *pdev)
+{
+ return pdev->tph_enabled;
+}
+EXPORT_SYMBOL(pcie_tph_enabled);
+
/**
* pcie_disable_tph - Turn off TPH support for device
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 422d395ade68..50e05cdfbc43 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -10,10 +10,12 @@
#define LINUX_PCI_TPH_H
#ifdef CONFIG_PCIE_TPH
+bool pcie_tph_enabled(struct pci_dev *pdev);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_tph_modes(struct pci_dev *pdev);
#else
+static inline bool pcie_tph_enabled(struct pci_dev *pdev) { return false; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (5 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 06/12] PCI/TPH: Add pcie_tph_enabled() to check TPH state Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-08-26 11:46 ` kernel test robot
2024-08-22 20:41 ` [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get " Wei Huang
` (7 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Add a function to update the device's steering tags. Depending on the
ST table location, the tags will be automatically written into the
device's MSI-X table or into the ST table located in the TPH Extended
Capability space.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/pci/pcie/tph.c | 161 ++++++++++++++++++++++++++++++++++++++++
include/linux/pci-tph.h | 5 ++
2 files changed, 166 insertions(+)
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index d949930e7e78..82189361a2ee 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -8,10 +8,24 @@
*/
#include <linux/pci.h>
#include <linux/bitfield.h>
+#include <linux/msi.h>
#include <linux/pci-tph.h>
#include "../pci.h"
+/* Update the TPH Requester Enable field of TPH Control Register */
+static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
+{
+ u32 reg;
+
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
+
+ reg &= ~PCI_TPH_CTRL_REQ_EN_MASK;
+ reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, req_type);
+
+ pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
+}
+
static u8 get_st_modes(struct pci_dev *pdev)
{
u32 reg;
@@ -22,6 +36,37 @@ static u8 get_st_modes(struct pci_dev *pdev)
return reg;
}
+static u32 get_st_table_loc(struct pci_dev *pdev)
+{
+ u32 reg;
+
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+
+ return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+}
+
+/*
+ * Return the size of ST table. If ST table is not in TPH Requester Extended
+ * Capability space, return 0. Otherwise return the ST Table Size + 1.
+ */
+static u16 get_st_table_size(struct pci_dev *pdev)
+{
+ u32 reg;
+ u32 loc;
+
+ /* Check ST table location first */
+ loc = get_st_table_loc(pdev);
+
+ /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
+ loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
+ if (loc != PCI_TPH_LOC_CAP)
+ return 0;
+
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+
+ return FIELD_GET(PCI_TPH_CAP_ST_MASK, reg) + 1;
+}
+
/* Return device's Root Port completer capability */
static u8 get_rp_completer_type(struct pci_dev *pdev)
{
@@ -40,6 +85,122 @@ static u8 get_rp_completer_type(struct pci_dev *pdev)
return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg);
}
+/* Write ST to MSI-X vector control reg - Return 0 if OK, otherwise -errno */
+static int write_tag_to_msix(struct pci_dev *pdev, int msix_idx, u16 tag)
+{
+ struct msi_desc *msi_desc = NULL;
+ void __iomem *vec_ctrl;
+ u32 val, mask;
+ int err = 0;
+
+ msi_lock_descs(&pdev->dev);
+
+ /* Find the msi_desc entry with matching msix_idx */
+ msi_for_each_desc(msi_desc, &pdev->dev, MSI_DESC_ASSOCIATED) {
+ if (msi_desc->msi_index == msix_idx)
+ break;
+ }
+
+ if (!msi_desc) {
+ err = -ENXIO;
+ goto err_out;
+ }
+
+ /* Get the vector control register (offset 0xc) pointed by msix_idx */
+ vec_ctrl = pdev->msix_base + msix_idx * PCI_MSIX_ENTRY_SIZE;
+ vec_ctrl += PCI_MSIX_ENTRY_VECTOR_CTRL;
+
+ val = readl(vec_ctrl);
+ mask = PCI_MSIX_ENTRY_CTRL_ST_LOWER | PCI_MSIX_ENTRY_CTRL_ST_UPPER;
+ val &= ~mask;
+ val |= FIELD_PREP(mask, tag);
+ writel(val, vec_ctrl);
+
+ /* Read back to flush the update */
+ val = readl(vec_ctrl);
+
+err_out:
+ msi_unlock_descs(&pdev->dev);
+ return err;
+}
+
+/* Write tag to ST table - Return 0 if OK, otherwise errno */
+static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
+{
+ int st_table_size;
+ int offset;
+
+ /* Check if index is out of bound */
+ st_table_size = get_st_table_size(pdev);
+ if (index >= st_table_size)
+ return -ENXIO;
+
+ offset = pdev->tph_cap + PCI_TPH_BASE_SIZEOF + index * sizeof(u16);
+
+ return pci_write_config_word(pdev, offset, tag);
+}
+
+/**
+ * pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
+ * @pdev: PCI device
+ * @index: ST table entry index
+ * @tag: Steering Tag to be written
+ *
+ * This function will figure out the proper location of ST table, either in
+ * the MSI-X table or in the TPH Extended Capability space, and write the
+ * Steering Tag into the ST entry pointed by index.
+ *
+ * Returns: 0 if success, otherwise negative value (-errno)
+ */
+int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
+{
+ u32 loc;
+ int err = 0;
+
+ if (!pdev->tph_cap)
+ return -EINVAL;
+
+ if (!pdev->tph_enabled)
+ return -EINVAL;
+
+ /* No need to write tag if device is in "No ST Mode" */
+ if (pdev->tph_mode == PCI_TPH_NO_ST_MODE)
+ return 0;
+
+ /* Disable TPH before updating ST to avoid potential instability as
+ * cautioned in PCIe r6.2, sec 6.17.3, "ST Modes of Operation"
+ */
+ set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
+
+ loc = get_st_table_loc(pdev);
+ /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
+ loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
+
+ switch (loc) {
+ case PCI_TPH_LOC_MSIX:
+ err = write_tag_to_msix(pdev, index, tag);
+ break;
+ case PCI_TPH_LOC_CAP:
+ err = write_tag_to_st_table(pdev, index, tag);
+ break;
+ default:
+ err = -EINVAL;
+ }
+
+ if (err) {
+ pcie_disable_tph(pdev);
+ return err;
+ }
+
+ set_ctrl_reg_req_en(pdev, pdev->tph_mode);
+
+ pci_dbg(pdev, "set steering tag: %s table, index=%d, tag=%#04x\n",
+ (loc == PCI_TPH_LOC_MSIX) ? "MSI-X" : "ST", index, tag);
+
+ return 0;
+}
+EXPORT_SYMBOL(pcie_tph_set_st_entry);
+
/**
* pcie_tph_enabled - Check whether TPH is enabled in device
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 50e05cdfbc43..a0c93b97090a 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -10,11 +10,16 @@
#define LINUX_PCI_TPH_H
#ifdef CONFIG_PCIE_TPH
+int pcie_tph_set_st_entry(struct pci_dev *pdev,
+ unsigned int index, u16 tag);
bool pcie_tph_enabled(struct pci_dev *pdev);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_tph_modes(struct pci_dev *pdev);
#else
+static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
+ unsigned int index, u16 tag)
+{ return -EINVAL; }
static inline bool pcie_tph_enabled(struct pci_dev *pdev) { return false; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (6 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-09-14 10:10 ` Alejandro Lucero Palau
2024-08-22 20:41 ` [PATCH V4 09/12] PCI/TPH: Add save/restore support for TPH Wei Huang
` (6 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Allow a caller to retrieve Steering Tags for a target memory that is
associated with a specific CPU. The caller must provided two parameters,
memory type and CPU UID, when calling this function. The tag is
retrieved by invoking ACPI _DSM of the device's Root Port device.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/pci/pcie/tph.c | 154 ++++++++++++++++++++++++++++++++++++++++
include/linux/pci-tph.h | 18 +++++
2 files changed, 172 insertions(+)
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 82189361a2ee..5bd194fb425e 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -7,12 +7,125 @@
* Wei Huang <wei.huang2@amd.com>
*/
#include <linux/pci.h>
+#include <linux/pci-acpi.h>
#include <linux/bitfield.h>
#include <linux/msi.h>
#include <linux/pci-tph.h>
#include "../pci.h"
+/*
+ * The st_info struct defines the Steering Tag (ST) info returned by the
+ * firmware _DSM method defined in the approved ECN for PCI Firmware Spec,
+ * available at https://members.pcisig.com/wg/PCI-SIG/document/15470.
+ *
+ * @vm_st_valid: 8-bit ST for volatile memory is valid
+ * @vm_xst_valid: 16-bit extended ST for volatile memory is valid
+ * @vm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied
+ * @vm_st: 8-bit ST for volatile mem
+ * @vm_xst: 16-bit extended ST for volatile mem
+ * @pm_st_valid: 8-bit ST for persistent memory is valid
+ * @pm_xst_valid: 16-bit extended ST for persistent memory is valid
+ * @pm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied
+ * @pm_st: 8-bit ST for persistent mem
+ * @pm_xst: 16-bit extended ST for persistent mem
+ */
+union st_info {
+ struct {
+ u64 vm_st_valid : 1;
+ u64 vm_xst_valid : 1;
+ u64 vm_ph_ignore : 1;
+ u64 rsvd1 : 5;
+ u64 vm_st : 8;
+ u64 vm_xst : 16;
+ u64 pm_st_valid : 1;
+ u64 pm_xst_valid : 1;
+ u64 pm_ph_ignore : 1;
+ u64 rsvd2 : 5;
+ u64 pm_st : 8;
+ u64 pm_xst : 16;
+ };
+ u64 value;
+};
+
+static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
+ union st_info *info)
+{
+ switch (req_type) {
+ case PCI_TPH_REQ_TPH_ONLY: /* 8-bit tag */
+ switch (mem_type) {
+ case TPH_MEM_TYPE_VM:
+ if (info->vm_st_valid)
+ return info->vm_st;
+ break;
+ case TPH_MEM_TYPE_PM:
+ if (info->pm_st_valid)
+ return info->pm_st;
+ break;
+ }
+ break;
+ case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
+ switch (mem_type) {
+ case TPH_MEM_TYPE_VM:
+ if (info->vm_xst_valid)
+ return info->vm_xst;
+ break;
+ case TPH_MEM_TYPE_PM:
+ if (info->pm_xst_valid)
+ return info->pm_xst;
+ break;
+ }
+ break;
+ default:
+ return 0;
+ }
+
+ return 0;
+}
+
+#define TPH_ST_DSM_FUNC_INDEX 0xF
+static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
+ union st_info *st_out)
+{
+ union acpi_object arg3[3], in_obj, *out_obj;
+
+ if (!acpi_check_dsm(handle, &pci_acpi_dsm_guid, 7,
+ BIT(TPH_ST_DSM_FUNC_INDEX)))
+ return AE_ERROR;
+
+ /* DWORD: feature ID (0 for processor cache ST query) */
+ arg3[0].integer.type = ACPI_TYPE_INTEGER;
+ arg3[0].integer.value = 0;
+
+ /* DWORD: target UID */
+ arg3[1].integer.type = ACPI_TYPE_INTEGER;
+ arg3[1].integer.value = cpu_uid;
+
+ /* QWORD: properties, all 0's */
+ arg3[2].integer.type = ACPI_TYPE_INTEGER;
+ arg3[2].integer.value = 0;
+
+ in_obj.type = ACPI_TYPE_PACKAGE;
+ in_obj.package.count = ARRAY_SIZE(arg3);
+ in_obj.package.elements = arg3;
+
+ out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, 7,
+ TPH_ST_DSM_FUNC_INDEX, &in_obj);
+ if (!out_obj)
+ return AE_ERROR;
+
+ if (out_obj->type != ACPI_TYPE_BUFFER) {
+ ACPI_FREE(out_obj);
+ return AE_ERROR;
+ }
+
+ st_out->value = *((u64 *)(out_obj->buffer.pointer));
+
+ ACPI_FREE(out_obj);
+
+ return AE_OK;
+}
+
/* Update the TPH Requester Enable field of TPH Control Register */
static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
{
@@ -140,6 +253,47 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
return pci_write_config_word(pdev, offset, tag);
}
+/**
+ * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
+ * with a specific CPU
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @cpu_uid: associated CPU id
+ * @tag: Steering Tag to be returned
+ *
+ * This function returns the Steering Tag for a target memory that is
+ * associated with a specific CPU as indicated by cpu_uid.
+ *
+ * Returns 0 if success, otherwise negative value (-errno)
+ */
+int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+ unsigned int cpu_uid, u16 *tag)
+{
+ struct pci_dev *rp;
+ acpi_handle rp_acpi_handle;
+ union st_info info;
+
+ rp = pcie_find_root_port(pdev);
+ if (!rp || !rp->bus || !rp->bus->bridge)
+ return -ENODEV;
+
+ rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
+
+ if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, &info) != AE_OK) {
+ *tag = 0;
+ return -EINVAL;
+ }
+
+ *tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
+
+ pci_dbg(pdev, "get steering tag: mem_type=%s, cpu_uid=%d, tag=%#04x\n",
+ (mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
+ cpu_uid, *tag);
+
+ return 0;
+}
+EXPORT_SYMBOL(pcie_tph_get_cpu_st);
+
/**
* pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index a0c93b97090a..c9f33688b9a9 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -9,9 +9,23 @@
#ifndef LINUX_PCI_TPH_H
#define LINUX_PCI_TPH_H
+/*
+ * According to the ECN for PCI Firmware Spec, Steering Tag can be different
+ * depending on the memory type: Volatile Memory or Persistent Memory. When a
+ * caller query about a target's Steering Tag, it must provide the target's
+ * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
+ */
+enum tph_mem_type {
+ TPH_MEM_TYPE_VM, /* volatile memory */
+ TPH_MEM_TYPE_PM /* persistent memory */
+};
+
#ifdef CONFIG_PCIE_TPH
int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag);
+int pcie_tph_get_cpu_st(struct pci_dev *dev,
+ enum tph_mem_type mem_type,
+ unsigned int cpu_uid, u16 *tag);
bool pcie_tph_enabled(struct pci_dev *pdev);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
@@ -20,6 +34,10 @@ int pcie_tph_modes(struct pci_dev *pdev);
static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag)
{ return -EINVAL; }
+static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
+ enum tph_mem_type mem_type,
+ unsigned int cpu_uid, u16 *tag)
+{ return -EINVAL; }
static inline bool pcie_tph_enabled(struct pci_dev *pdev) { return false; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 09/12] PCI/TPH: Add save/restore support for TPH
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (7 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get " Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-09-04 20:11 ` Bjorn Helgaas
2024-08-22 20:41 ` [PATCH V4 10/12] PCI/TPH: Add pci=nostmode to force TPH No ST Mode Wei Huang
` (5 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
From: Paul Luse <paul.e.luse@linux.intel.com>
Save and restore the configuration space for TPH capability to preserve
the settings during PCI reset. The settings include the TPH control
register and the ST table if present.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com>
---
drivers/pci/pci.c | 2 ++
drivers/pci/pci.h | 4 +++
drivers/pci/pcie/tph.c | 62 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e3a49f66982d..1e4960994b1a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1813,6 +1813,7 @@ int pci_save_state(struct pci_dev *dev)
pci_save_dpc_state(dev);
pci_save_aer_state(dev);
pci_save_ptm_state(dev);
+ pci_save_tph_state(dev);
return pci_save_vc_state(dev);
}
EXPORT_SYMBOL(pci_save_state);
@@ -1917,6 +1918,7 @@ void pci_restore_state(struct pci_dev *dev)
pci_restore_vc_state(dev);
pci_restore_rebar_state(dev);
pci_restore_dpc_state(dev);
+ pci_restore_tph_state(dev);
pci_restore_ptm_state(dev);
pci_aer_clear_status(dev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 289eddfe350b..d7c7f86e8705 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -572,8 +572,12 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
#endif /* CONFIG_PCI_IOV */
#ifdef CONFIG_PCIE_TPH
+void pci_restore_tph_state(struct pci_dev *dev);
+void pci_save_tph_state(struct pci_dev *dev);
void pci_tph_init(struct pci_dev *dev);
#else
+static inline void pci_restore_tph_state(struct pci_dev *dev) { }
+static inline void pci_save_tph_state(struct pci_dev *dev) { }
static inline void pci_tph_init(struct pci_dev *dev) { }
#endif
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 5bd194fb425e..b228ef5b7948 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -483,6 +483,68 @@ int pcie_tph_modes(struct pci_dev *pdev)
}
EXPORT_SYMBOL(pcie_tph_modes);
+void pci_restore_tph_state(struct pci_dev *pdev)
+{
+ struct pci_cap_saved_state *save_state;
+ int num_entries, i, offset;
+ u16 *st_entry;
+ u32 *cap;
+
+ if (!pdev->tph_cap)
+ return;
+
+ if (!pdev->tph_enabled)
+ return;
+
+ save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_TPH);
+ if (!save_state)
+ return;
+
+ /* Restore control register and all ST entries */
+ cap = &save_state->cap.data[0];
+ pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, *cap++);
+ st_entry = (u16 *)cap;
+ offset = PCI_TPH_BASE_SIZEOF;
+ num_entries = get_st_table_size(pdev);
+ for (i = 0; i < num_entries; i++) {
+ pci_write_config_word(pdev, pdev->tph_cap + offset,
+ *st_entry++);
+ offset += sizeof(u16);
+ }
+}
+
+void pci_save_tph_state(struct pci_dev *pdev)
+{
+ struct pci_cap_saved_state *save_state;
+ int num_entries, i, offset;
+ u16 *st_entry;
+ u32 *cap;
+
+ if (!pdev->tph_cap)
+ return;
+
+ if (!pdev->tph_enabled)
+ return;
+
+ save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_TPH);
+ if (!save_state)
+ return;
+
+ /* Save control register */
+ cap = &save_state->cap.data[0];
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, cap++);
+
+ /* Save all ST entries in extended capability structure */
+ st_entry = (u16 *)cap;
+ offset = PCI_TPH_BASE_SIZEOF;
+ num_entries = get_st_table_size(pdev);
+ for (i = 0; i < num_entries; i++) {
+ pci_read_config_word(pdev, pdev->tph_cap + offset,
+ st_entry++);
+ offset += sizeof(u16);
+ }
+}
+
void pci_tph_init(struct pci_dev *pdev)
{
pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 10/12] PCI/TPH: Add pci=nostmode to force TPH No ST Mode
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (8 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 09/12] PCI/TPH: Add save/restore support for TPH Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver Wei Huang
` (4 subsequent siblings)
14 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
When "No ST mode" is enabled, endpoint devices can generate TPH headers
but with all steering tags treated as zero. A steering tag of zero is
interpreted as "using the default policy" by the root complex. This is
essential to quantify the benefit of Steering Tags for some given
workloads.
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
drivers/pci/pci.c | 2 ++
drivers/pci/pci.h | 2 ++
drivers/pci/pcie/tph.c | 12 ++++++++++++
4 files changed, 19 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1384c7b59c9..ed2ee97cf7fb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4617,6 +4617,9 @@
nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of
one PCI domain per PCI function
+ nostmode [PCIE] If PCIe TPH Processing Hints (TPH) is
+ enabled, this kernel option forces all Steering
+ Tags to be treated as zero (aka "No ST Mode").
pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power
Management.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1e4960994b1a..88aabac354c0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6870,6 +6870,8 @@ static int __init pci_setup(char *str)
pci_no_domains();
} else if (!strncmp(str, "noari", 5)) {
pcie_ari_disabled = true;
+ } else if (!strncmp(str, "nostmode", 8)) {
+ pci_tph_set_nostmode();
} else if (!strncmp(str, "cbiosize=", 9)) {
pci_cardbus_io_size = memparse(str + 9, &str);
} else if (!strncmp(str, "cbmemsize=", 10)) {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d7c7f86e8705..54d74f5ff861 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -574,10 +574,12 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
#ifdef CONFIG_PCIE_TPH
void pci_restore_tph_state(struct pci_dev *dev);
void pci_save_tph_state(struct pci_dev *dev);
+void pci_tph_set_nostmode(void);
void pci_tph_init(struct pci_dev *dev);
#else
static inline void pci_restore_tph_state(struct pci_dev *dev) { }
static inline void pci_save_tph_state(struct pci_dev *dev) { }
+static inline void pci_tph_set_nostmode(void) { }
static inline void pci_tph_init(struct pci_dev *dev) { }
#endif
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index b228ef5b7948..f723352adcf5 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -48,6 +48,8 @@ union st_info {
u64 value;
};
+static bool pci_tph_nostmode;
+
static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
union st_info *info)
{
@@ -433,6 +435,10 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
return -EINVAL;
}
+ /* Honor "nostmode" kernel parameter */
+ if (pci_tph_nostmode)
+ pdev->tph_mode = PCI_TPH_NO_ST_MODE;
+
/* Get req_type supported by device and its Root Port */
reg = pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
@@ -545,6 +551,12 @@ void pci_save_tph_state(struct pci_dev *pdev)
}
}
+void pci_tph_set_nostmode(void)
+{
+ pci_tph_nostmode = true;
+ pr_info("PCIe TPH No ST Mode is enabled\n");
+}
+
void pci_tph_init(struct pci_dev *pdev)
{
pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (9 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 10/12] PCI/TPH: Add pci=nostmode to force TPH No ST Mode Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-08-26 20:22 ` Jakub Kicinski
` (2 more replies)
2024-08-22 20:41 ` [PATCH V4 12/12] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang
` (3 subsequent siblings)
14 siblings, 3 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
From: Manoj Panicker <manoj.panicker2@amd.com>
Implement TPH support in Broadcom BNXT device driver. The driver uses
TPH functions to retrieve and configure the device's Steering Tags when
its interrupt affinity is being changed.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 +++++++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 ++
2 files changed, 82 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ffa74c26ee53..5903cd36b54d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -55,6 +55,7 @@
#include <net/page_pool/helpers.h>
#include <linux/align.h>
#include <net/netdev_queues.h>
+#include <linux/pci-tph.h>
#include "bnxt_hsi.h"
#include "bnxt.h"
@@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
return 0;
}
+static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
+ const cpumask_t *mask)
+{
+ struct bnxt_irq *irq;
+ u16 tag;
+
+ irq = container_of(notify, struct bnxt_irq, affinity_notify);
+ cpumask_copy(irq->cpu_mask, mask);
+
+ if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
+ cpumask_first(irq->cpu_mask), &tag))
+ return;
+
+ if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
+ return;
+
+ if (netif_running(irq->bp->dev)) {
+ rtnl_lock();
+ bnxt_close_nic(irq->bp, false, false);
+ bnxt_open_nic(irq->bp, false, false);
+ rtnl_unlock();
+ }
+}
+
+static void __bnxt_irq_affinity_release(struct kref __always_unused *ref)
+{
+}
+
+static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
+{
+ irq_set_affinity_notifier(irq->vector, NULL);
+}
+
+static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq)
+{
+ struct irq_affinity_notify *notify;
+
+ /* Nothing to do if TPH is not enabled */
+ if (!pcie_tph_enabled(bp->pdev))
+ return;
+
+ irq->bp = bp;
+
+ /* Register IRQ affinility notifier */
+ notify = &irq->affinity_notify;
+ notify->irq = irq->vector;
+ notify->notify = __bnxt_irq_affinity_notify;
+ notify->release = __bnxt_irq_affinity_release;
+
+ irq_set_affinity_notifier(irq->vector, notify);
+}
+
static void bnxt_free_irq(struct bnxt *bp)
{
struct bnxt_irq *irq;
@@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
free_cpumask_var(irq->cpu_mask);
irq->have_cpumask = 0;
}
+
+ bnxt_release_irq_notifier(irq);
+
free_irq(irq->vector, bp->bnapi[i]);
}
irq->requested = 0;
}
+
+ /* Disable TPH support */
+ pcie_disable_tph(bp->pdev);
}
static int bnxt_request_irq(struct bnxt *bp)
@@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
if (!(bp->flags & BNXT_FLAG_USING_MSIX))
flags = IRQF_SHARED;
+ /* Enable TPH support as part of IRQ request */
+ if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
+ rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
+ if (rc)
+ netdev_warn(bp->dev, "failed enabling TPH support\n");
+ }
+
for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
int map_idx = bnxt_cp_num_to_irq_num(bp, i);
struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
@@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
int numa_node = dev_to_node(&bp->pdev->dev);
+ u16 tag;
irq->have_cpumask = 1;
+ irq->msix_nr = map_idx;
cpumask_set_cpu(cpumask_local_spread(i, numa_node),
irq->cpu_mask);
rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
@@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
irq->vector);
break;
}
+
+ bnxt_register_irq_notifier(bp, irq);
+
+ /* Init ST table entry */
+ if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
+ cpumask_first(irq->cpu_mask),
+ &tag))
+ break;
+
+ pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
}
}
return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 6bbdc718c3a7..ae1abcc1bddf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1224,6 +1224,10 @@ struct bnxt_irq {
u8 have_cpumask:1;
char name[IFNAMSIZ + 2];
cpumask_var_t cpu_mask;
+
+ struct bnxt *bp;
+ int msix_nr;
+ struct irq_affinity_notify affinity_notify;
};
#define HWRM_RING_ALLOC_TX 0x1
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH V4 12/12] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (10 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver Wei Huang
@ 2024-08-22 20:41 ` Wei Huang
2024-09-03 22:42 ` [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (2 subsequent siblings)
14 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-08-22 20:41 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
From: Michael Chan <michael.chan@broadcom.com>
Newer firmware can use the NQ ring ID associated with each RX/RX AGG
ring to enable PCIe steering tag. Older firmware will just ignore the
information.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5903cd36b54d..5fb46aaa16e3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6838,10 +6838,12 @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp,
/* Association of rx ring with stats context */
grp_info = &bp->grp_info[ring->grp_idx];
+ req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id);
req->rx_buf_size = cpu_to_le16(bp->rx_buf_use_size);
req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx);
req->enables |= cpu_to_le32(
- RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID);
+ RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID |
+ RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID);
if (NET_IP_ALIGN == 2)
flags = RING_ALLOC_REQ_FLAGS_RX_SOP_PAD;
req->flags = cpu_to_le16(flags);
@@ -6853,11 +6855,13 @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp,
/* Association of agg ring with rx ring */
grp_info = &bp->grp_info[ring->grp_idx];
req->rx_ring_id = cpu_to_le16(grp_info->rx_fw_ring_id);
+ req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id);
req->rx_buf_size = cpu_to_le16(BNXT_RX_PAGE_SIZE);
req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx);
req->enables |= cpu_to_le32(
RING_ALLOC_REQ_ENABLES_RX_RING_ID_VALID |
- RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID);
+ RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID |
+ RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID);
} else {
req->ring_type = RING_ALLOC_REQ_RING_TYPE_RX;
}
--
2.45.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
2024-08-22 20:41 ` [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag Wei Huang
@ 2024-08-26 11:46 ` kernel test robot
0 siblings, 0 replies; 43+ messages in thread
From: kernel test robot @ 2024-08-26 11:46 UTC (permalink / raw)
To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
Cc: llvm, oe-kbuild-all, Jonathan.Cameron, helgaas, corbet, davem,
edumazet, kuba, pabeni, alex.williamson, gospo, michael.chan,
ajit.khaparde, somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, wei.huang2, vadim.fedorenko, horms, bagasdotme,
bhelgaas, lukas, paul.e.luse, jing2.liu
Hi Wei,
kernel test robot noticed the following build warnings:
[auto build test WARNING on pci/next]
[also build test WARNING on pci/for-linus linus/master v6.11-rc5 next-20240826]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Wei-Huang/PCI-Introduce-PCIe-TPH-support-framework/20240826-121149
base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link: https://lore.kernel.org/r/20240822204120.3634-8-wei.huang2%40amd.com
patch subject: [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240826/202408261902.hGVx0hL8-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240826/202408261902.hGVx0hL8-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408261902.hGVx0hL8-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/pci/pcie/tph.c:116:9: warning: result of comparison of constant 18446744073709551615 with expression of type 'typeof (_Generic((mask), char: (unsigned char)0, unsigned char: (unsigned char)0, signed char: (unsigned char)0, unsigned short: (unsigned short)0, short: (unsigned short)0, unsigned int: (unsigned int)0, int: (unsigned int)0, unsigned long: (unsigned long)0, long: (unsigned long)0, unsigned long long: (unsigned long long)0, long long: (unsigned long long)0, default: (mask)))' (aka 'unsigned int') is always false [-Wtautological-constant-out-of-range-compare]
116 | val |= FIELD_PREP(mask, tag);
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/bitfield.h:115:3: note: expanded from macro 'FIELD_PREP'
115 | __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/bitfield.h:72:53: note: expanded from macro '__BF_FIELD_CHECK'
72 | BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
73 | __bf_cast_unsigned(_reg, ~0ull), \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74 | _pfx "type of reg too small for mask"); \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
include/linux/compiler_types.h:510:22: note: expanded from macro 'compiletime_assert'
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:498:23: note: expanded from macro '_compiletime_assert'
498 | __compiletime_assert(condition, msg, prefix, suffix)
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:490:9: note: expanded from macro '__compiletime_assert'
490 | if (!(condition)) \
| ^~~~~~~~~
1 warning generated.
vim +116 drivers/pci/pcie/tph.c
87
88 /* Write ST to MSI-X vector control reg - Return 0 if OK, otherwise -errno */
89 static int write_tag_to_msix(struct pci_dev *pdev, int msix_idx, u16 tag)
90 {
91 struct msi_desc *msi_desc = NULL;
92 void __iomem *vec_ctrl;
93 u32 val, mask;
94 int err = 0;
95
96 msi_lock_descs(&pdev->dev);
97
98 /* Find the msi_desc entry with matching msix_idx */
99 msi_for_each_desc(msi_desc, &pdev->dev, MSI_DESC_ASSOCIATED) {
100 if (msi_desc->msi_index == msix_idx)
101 break;
102 }
103
104 if (!msi_desc) {
105 err = -ENXIO;
106 goto err_out;
107 }
108
109 /* Get the vector control register (offset 0xc) pointed by msix_idx */
110 vec_ctrl = pdev->msix_base + msix_idx * PCI_MSIX_ENTRY_SIZE;
111 vec_ctrl += PCI_MSIX_ENTRY_VECTOR_CTRL;
112
113 val = readl(vec_ctrl);
114 mask = PCI_MSIX_ENTRY_CTRL_ST_LOWER | PCI_MSIX_ENTRY_CTRL_ST_UPPER;
115 val &= ~mask;
> 116 val |= FIELD_PREP(mask, tag);
117 writel(val, vec_ctrl);
118
119 /* Read back to flush the update */
120 val = readl(vec_ctrl);
121
122 err_out:
123 msi_unlock_descs(&pdev->dev);
124 return err;
125 }
126
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-22 20:41 ` [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver Wei Huang
@ 2024-08-26 20:22 ` Jakub Kicinski
2024-08-26 20:56 ` Andy Gospodarek
2024-09-05 15:06 ` Bjorn Helgaas
2024-09-11 15:37 ` Alejandro Lucero Palau
2 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2024-08-26 20:22 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
helgaas, corbet, davem, edumazet, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, 22 Aug 2024 15:41:19 -0500 Wei Huang wrote:
> + rtnl_lock();
> + bnxt_close_nic(irq->bp, false, false);
> + bnxt_open_nic(irq->bp, false, false);
> + rtnl_unlock();
This code means that under memory pressure changing CPU affinity
can take the machine offline. The entire machine, even if container
orchestration is trying to just move a few IRQs in place for a new
container.
We can't let you do this, it will set a bad precedent. I can't think
of any modern driver with reconfiguration safety as bad as bnxt.
Technical debt coming due.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-26 20:22 ` Jakub Kicinski
@ 2024-08-26 20:56 ` Andy Gospodarek
2024-08-26 22:49 ` Jakub Kicinski
0 siblings, 1 reply; 43+ messages in thread
From: Andy Gospodarek @ 2024-08-26 20:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev,
Jonathan.Cameron, helgaas, corbet, davem, edumazet, pabeni,
alex.williamson, michael.chan, ajit.khaparde, somnath.kotur,
andrew.gospodarek, manoj.panicker2, Eric.VanTassell,
vadim.fedorenko, horms, bagasdotme, bhelgaas, lukas, paul.e.luse,
jing2.liu
On Mon, Aug 26, 2024 at 01:22:13PM -0700, Jakub Kicinski wrote:
> On Thu, 22 Aug 2024 15:41:19 -0500 Wei Huang wrote:
> > + rtnl_lock();
> > + bnxt_close_nic(irq->bp, false, false);
> > + bnxt_open_nic(irq->bp, false, false);
> > + rtnl_unlock();
>
> This code means that under memory pressure changing CPU affinity
> can take the machine offline. The entire machine, even if container
> orchestration is trying to just move a few IRQs in place for a new
> container.
>
> We can't let you do this, it will set a bad precedent. I can't think
> of any modern driver with reconfiguration safety as bad as bnxt.
> Technical debt coming due.
Jakub,
We have not said this on the list, but we agree. We plan to replace these
calls with calls to stop and start only that ring via netdev_rx_queue_restart
as soon as these calls all land in the same tree. Since this set is
[presumably] coming through linux-pci we didn't think we could do that yet.
Thoughts?
-andy
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-26 20:56 ` Andy Gospodarek
@ 2024-08-26 22:49 ` Jakub Kicinski
2024-08-27 14:50 ` Andy Gospodarek
0 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2024-08-26 22:49 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev,
Jonathan.Cameron, helgaas, corbet, davem, edumazet, pabeni,
alex.williamson, michael.chan, ajit.khaparde, somnath.kotur,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Mon, 26 Aug 2024 16:56:36 -0400 Andy Gospodarek wrote:
> We plan to replace these calls with calls to stop and start only that
> ring via netdev_rx_queue_restart as soon as these calls all land in
> the same tree. Since this set is [presumably] coming through
> linux-pci we didn't think we could do that yet.
>
> Thoughts?
The merge window is in 3 weeks or so, so this can wait.
I'm worried we'll find out later that the current queue reset
implementation in bnxt turns out to be insufficient. And we'll
be stuck with yet another close/open in this driver.
While we talk about affinity settings in bnxt -- it'd be great
if it maintained the mapping and XPS settings across reconfiguration
(like queue count changes or attaching XDP).
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-26 22:49 ` Jakub Kicinski
@ 2024-08-27 14:50 ` Andy Gospodarek
2024-08-27 19:05 ` Jakub Kicinski
0 siblings, 1 reply; 43+ messages in thread
From: Andy Gospodarek @ 2024-08-27 14:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andy Gospodarek, Wei Huang, linux-pci, linux-kernel, linux-doc,
netdev, Jonathan.Cameron, helgaas, corbet, davem, edumazet,
pabeni, alex.williamson, michael.chan, ajit.khaparde,
somnath.kotur, manoj.panicker2, Eric.VanTassell, vadim.fedorenko,
horms, bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Mon, Aug 26, 2024 at 03:49:12PM -0700, Jakub Kicinski wrote:
> On Mon, 26 Aug 2024 16:56:36 -0400 Andy Gospodarek wrote:
> > We plan to replace these calls with calls to stop and start only that
> > ring via netdev_rx_queue_restart as soon as these calls all land in
> > the same tree. Since this set is [presumably] coming through
> > linux-pci we didn't think we could do that yet.
> >
> > Thoughts?
>
> The merge window is in 3 weeks or so, so this can wait.
Are you asking for the patch for this feature to include the queue
stop/start instead of this? I just checked linux-pci and it does have
bnxt_queue_stop/bnxt_queue_start.
> I'm worried we'll find out later that the current queue reset
> implementation in bnxt turns out to be insufficient. And we'll
> be stuck with yet another close/open in this driver.
The queue reset _has_ to work. We will ensure that it does and fix any
problems found. Note that these have been under test already internally
and fixes are/will be posted to the list as they are made. Holding this
patch because an API that it uses might not work seems odd.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-27 14:50 ` Andy Gospodarek
@ 2024-08-27 19:05 ` Jakub Kicinski
2024-08-27 19:20 ` Michael Chan
0 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2024-08-27 19:05 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev,
Jonathan.Cameron, helgaas, corbet, davem, edumazet, pabeni,
alex.williamson, michael.chan, ajit.khaparde, somnath.kotur,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Tue, 27 Aug 2024 10:50:51 -0400 Andy Gospodarek wrote:
> > The merge window is in 3 weeks or so, so this can wait.
>
> Are you asking for the patch for this feature to include the queue
> stop/start instead of this?
Yes, indeed.
> I just checked linux-pci and it does have bnxt_queue_stop/bnxt_queue_start.
>
> > I'm worried we'll find out later that the current queue reset
> > implementation in bnxt turns out to be insufficient. And we'll
> > be stuck with yet another close/open in this driver.
>
> The queue reset _has_ to work. We will ensure that it does and fix any
> problems found. Note that these have been under test already internally
> and fixes are/will be posted to the list as they are made. Holding this
> patch because an API that it uses might not work seems odd.
Not holding because API may not work, holding because (I thought)
API isn't in place at all. If bnxt_queue_stop/bnxt_queue_start are in
linux-pci please rewrite the patch to use those and then all clear
from my PoV.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-27 19:05 ` Jakub Kicinski
@ 2024-08-27 19:20 ` Michael Chan
0 siblings, 0 replies; 43+ messages in thread
From: Michael Chan @ 2024-08-27 19:20 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andy Gospodarek, Wei Huang, linux-pci, linux-kernel, linux-doc,
netdev, Jonathan.Cameron, helgaas, corbet, davem, edumazet,
pabeni, alex.williamson, ajit.khaparde, somnath.kotur,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
[-- Attachment #1: Type: text/plain, Size: 604 bytes --]
On Tue, Aug 27, 2024 at 12:05 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Not holding because API may not work, holding because (I thought)
> API isn't in place at all. If bnxt_queue_stop/bnxt_queue_start are in
> linux-pci please rewrite the patch to use those and then all clear
> from my PoV.
To be clear, the API is available in the linux-pci tree but the recent
patch from David to check for proper FW support is only in net-next:
97cbf3d0accc ("bnxt_en: only set dev->queue_mgmt_ops if supported by FW")
So we'll need to add this check for TPH also once the 2 trees are merged.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (11 preceding siblings ...)
2024-08-22 20:41 ` [PATCH V4 12/12] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang
@ 2024-09-03 22:42 ` Wei Huang
2024-09-04 18:49 ` Bjorn Helgaas
2024-09-04 20:20 ` Bjorn Helgaas
14 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-09-03 22:42 UTC (permalink / raw)
To: linux-pci, linux-kernel, linux-doc, netdev, Bjorn Helgaas
Cc: Jonathan.Cameron, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
Hi Bjorn,
I have incorporated Jakub's feedback on patch 11 (i.e. bnxt.c) in my new
revision. Do you have additional comments or suggestions on V4 to
include in the next spin?
Thanks,
-Wei
On 8/22/24 3:41 PM, Wei Huang wrote:
> Hi All,
>
> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint
> devices to provide optimization hints for requests that target memory
> space. These hints, in a format called steering tag (ST), are provided
> in the requester's TLP headers and allow the system hardware, including
> the Root Complex, to optimize the utilization of platform resources
> for the requests.
>
> Upcoming AMD hardware implement a new Cache Injection feature that
> leverages TPH. Cache Injection allows PCIe endpoints to inject I/O
> Coherent DMA writes directly into an L2 within the CCX (core complex)
> closest to the CPU core that will consume it. This technology is aimed
> at applications requiring high performance and low latency, such as
> networking and storage applications.
>
> This series introduces generic TPH support in Linux, allowing STs to be
> retrieved and used by PCIe endpoint drivers as needed. As a
> demonstration, it includes an example usage in the Broadcom BNXT driver.
> When running on Broadcom NICs with the appropriate firmware, it shows
> substantial memory bandwidth savings and better network bandwidth using
> real-world benchmarks. This solution is vendor-neutral and implemented
> based on industry standards (PCIe Spec and PCI FW Spec).
>
> V3->V4:
> * Rebase on top of the latest pci/next tree (tag: 6.11-rc1)
> * Add new API functioins to query/enable/disable TPH support
> * Make pcie_tph_set_st() completely independent from pcie_tph_get_cpu_st()
> * Rewrite bnxt.c based on new APIs
> * Remove documentation for now due to constantly changing API
> * Remove pci=notph, but keep pci=nostmode with better flow (Bjorn)
> * Lots of code rewrite in tph.c & pci-tph.h with cleaner interface (Bjorn)
> * Add TPH save/restore support (Paul Luse and Lukas Wunner)
>
> V2->V3:
> * Rebase on top of pci/next tree (tag: pci-v6.11-changes)
> * Redefine PCI TPH registers (pci_regs.h) without breaking uapi
> * Fix commit subjects/messages for kernel options (Jonathan and Bjorn)
> * Break API functions into three individual patches for easy review
> * Rewrite lots of code in tph.c/tph.h based (Jonathan and Bjorn)
>
> V1->V2:
> * Rebase on top of pci.git/for-linus (6.10-rc1)
> * Address mismatched data types reported by Sparse (Sparse check passed)
> * Add pcie_tph_intr_vec_supported() for checking IRQ mode support
> * Skip bnxt affinity notifier registration if
> pcie_tph_intr_vec_supported()=false
> * Minor fixes in bnxt driver (i.e. warning messages)
>
> Manoj Panicker (1):
> bnxt_en: Add TPH support in BNXT driver
>
> Michael Chan (1):
> bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings
>
> Paul Luse (1):
> PCI/TPH: Add save/restore support for TPH
>
> Wei Huang (9):
> PCI: Introduce PCIe TPH support framework
> PCI: Add TPH related register definition
> PCI/TPH: Add pcie_tph_modes() to query TPH modes
> PCI/TPH: Add pcie_enable_tph() to enable TPH
> PCI/TPH: Add pcie_disable_tph() to disable TPH
> PCI/TPH: Add pcie_tph_enabled() to check TPH state
> PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
> PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
> PCI/TPH: Add pci=nostmode to force TPH No ST Mode
>
> .../admin-guide/kernel-parameters.txt | 3 +
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 86 ++-
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +
> drivers/pci/pci.c | 4 +
> drivers/pci/pci.h | 12 +
> drivers/pci/pcie/Kconfig | 11 +
> drivers/pci/pcie/Makefile | 1 +
> drivers/pci/pcie/tph.c | 563 ++++++++++++++++++
> drivers/pci/probe.c | 1 +
> include/linux/pci-tph.h | 48 ++
> include/linux/pci.h | 7 +
> include/uapi/linux/pci_regs.h | 38 +-
> 12 files changed, 768 insertions(+), 10 deletions(-)
> create mode 100644 drivers/pci/pcie/tph.c
> create mode 100644 include/linux/pci-tph.h
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (12 preceding siblings ...)
2024-09-03 22:42 ` [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
@ 2024-09-04 18:49 ` Bjorn Helgaas
2024-09-04 19:48 ` Wei Huang
2024-09-04 20:20 ` Bjorn Helgaas
14 siblings, 1 reply; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-04 18:49 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Aug 22, 2024 at 03:41:08PM -0500, Wei Huang wrote:
> Hi All,
>
> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint
> devices to provide optimization hints for requests that target memory
> space. These hints, in a format called steering tag (ST), are provided
> in the requester's TLP headers and allow the system hardware, including
> the Root Complex, to optimize the utilization of platform resources
> for the requests.
>
> Upcoming AMD hardware implement a new Cache Injection feature that
> leverages TPH. Cache Injection allows PCIe endpoints to inject I/O
> Coherent DMA writes directly into an L2 within the CCX (core complex)
> closest to the CPU core that will consume it. This technology is aimed
> at applications requiring high performance and low latency, such as
> networking and storage applications.
Thanks for this example, it's a great intro. Suggest adding something
similar to a patch commit log, since the cover letter is harder to
find after this appears in git.
> This series introduces generic TPH support in Linux, allowing STs to be
> retrieved and used by PCIe endpoint drivers as needed. As a
> demonstration, it includes an example usage in the Broadcom BNXT driver.
> When running on Broadcom NICs with the appropriate firmware, it shows
> substantial memory bandwidth savings and better network bandwidth using
> real-world benchmarks. This solution is vendor-neutral and implemented
> based on industry standards (PCIe Spec and PCI FW Spec).
>
> V3->V4:
> * Rebase on top of the latest pci/next tree (tag: 6.11-rc1)
No need to rebase to pci/next; pci/main is where it will be applied.
But it currently applies cleanly to either, so no problem.
> * Add new API functioins to query/enable/disable TPH support
> * Make pcie_tph_set_st() completely independent from pcie_tph_get_cpu_st()
> * Rewrite bnxt.c based on new APIs
> * Remove documentation for now due to constantly changing API
I'd like to see this documentation included. And updated if the API
changes, of course.
> * Remove pci=notph, but keep pci=nostmode with better flow (Bjorn)
This seems backward to me. I think "pci=notph" makes sense as a way
to completely disable the TPH feature in case a user trips over a
hardware or driver defect.
But "pci=nostmode" is advertised as a way to quantify the benefit of
Steering Tags, and that seems like it's of interest to developers but
not users.
So my advice would be to keep "pci=notph" and drop "pci=nostmode".
Bjorn
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes
2024-08-22 20:41 ` [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes Wei Huang
@ 2024-09-04 19:40 ` Bjorn Helgaas
2024-09-05 14:46 ` Wei Huang
0 siblings, 1 reply; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-04 19:40 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Aug 22, 2024 at 03:41:11PM -0500, Wei Huang wrote:
> Add pcie_tph_modes() to allow drivers to query the TPH modes supported
> by an endpoint device, as reported in the TPH Requester Capability
> register. The modes are reported as a bitmask and current supported
> modes include:
>
> - PCI_TPH_CAP_NO_ST: NO ST Mode Supported
> - PCI_TPH_CAP_INT_VEC: Interrupt Vector Mode Supported
> - PCI_TPH_CAP_DEV_SPEC: Device Specific Mode Supported
> + * pcie_tph_modes - Get the ST modes supported by device
> + * @pdev: PCI device
> + *
> + * Returns a bitmask with all TPH modes supported by a device as shown in the
> + * TPH capability register. Current supported modes include:
> + * PCI_TPH_CAP_NO_ST - NO ST Mode Supported
> + * PCI_TPH_CAP_INT_VEC - Interrupt Vector Mode Supported
> + * PCI_TPH_CAP_DEV_SPEC - Device Specific Mode Supported
> + *
> + * Return: 0 when TPH is not supported, otherwise bitmask of supported modes
> + */
> +int pcie_tph_modes(struct pci_dev *pdev)
> +{
> + if (!pdev->tph_cap)
> + return 0;
> +
> + return get_st_modes(pdev);
> +}
> +EXPORT_SYMBOL(pcie_tph_modes);
I'm not sure I see the need for pcie_tph_modes(). The new bnxt code
looks like this:
bnxt_request_irq
if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC)
rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
What is the advantage of this over just this?
bnxt_request_irq
rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
It seems like drivers could just ask for what they want since
pcie_enable_tph() has to verify support for it anyway. If that fails,
the driver can fall back to another mode.
Returning a bitmask of supported modes might be useful if the driver
could combine them, but IIUC the modes are all mutually exclusive, so
the driver can't request a combination of them.
Bjorn
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-09-04 18:49 ` Bjorn Helgaas
@ 2024-09-04 19:48 ` Wei Huang
2024-09-04 20:03 ` Bjorn Helgaas
0 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-09-04 19:48 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On 9/4/24 13:49, Bjorn Helgaas wrote:
> On Thu, Aug 22, 2024 at 03:41:08PM -0500, Wei Huang wrote:
>> Hi All,
>>
>> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint
>> devices to provide optimization hints for requests that target memory
>> space. These hints, in a format called steering tag (ST), are provided
>> in the requester's TLP headers and allow the system hardware, including
>> the Root Complex, to optimize the utilization of platform resources
>> for the requests.
>>
>> Upcoming AMD hardware implement a new Cache Injection feature that
>> leverages TPH. Cache Injection allows PCIe endpoints to inject I/O
>> Coherent DMA writes directly into an L2 within the CCX (core complex)
>> closest to the CPU core that will consume it. This technology is aimed
>> at applications requiring high performance and low latency, such as
>> networking and storage applications.
>
> Thanks for this example, it's a great intro. Suggest adding something
> similar to a patch commit log, since the cover letter is harder to
> find after this appears in git.
I'll incorporate some of these descriptions into the TPH patches where
relevant. Additionally, I'll enhance the commit log for bnxt.c (patch
11) with examples of the benefits.
>
>> This series introduces generic TPH support in Linux, allowing STs to be
>> retrieved and used by PCIe endpoint drivers as needed. As a
>> demonstration, it includes an example usage in the Broadcom BNXT driver.
>> When running on Broadcom NICs with the appropriate firmware, it shows
>> substantial memory bandwidth savings and better network bandwidth using
>> real-world benchmarks. This solution is vendor-neutral and implemented
>> based on industry standards (PCIe Spec and PCI FW Spec).
>>
>> V3->V4:
>> * Rebase on top of the latest pci/next tree (tag: 6.11-rc1)
>
> No need to rebase to pci/next; pci/main is where it will be applied.
> But it currently applies cleanly to either, so no problem.
Got it, will rebase to pci/main in next spin anyway.
>
>> * Add new API functioins to query/enable/disable TPH support
>> * Make pcie_tph_set_st() completely independent from pcie_tph_get_cpu_st()
>> * Rewrite bnxt.c based on new APIs
>> * Remove documentation for now due to constantly changing API
>
> I'd like to see this documentation included. And updated if the API
> changes, of course.
Will do.
>
>> * Remove pci=notph, but keep pci=nostmode with better flow (Bjorn)
>
> This seems backward to me. I think "pci=notph" makes sense as a way
> to completely disable the TPH feature in case a user trips over a
> hardware or driver defect.
>
> But "pci=nostmode" is advertised as a way to quantify the benefit of
> Steering Tags, and that seems like it's of interest to developers but
> not users.
>
> So my advice would be to keep "pci=notph" and drop "pci=nostmode".
OK, I will replace the "nostmode" patch with "notph" in V5.
>
> Bjorn
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 02/12] PCI: Add TPH related register definition
2024-08-22 20:41 ` [PATCH V4 02/12] PCI: Add TPH related register definition Wei Huang
@ 2024-09-04 19:52 ` Bjorn Helgaas
2024-09-05 15:08 ` Wei Huang
0 siblings, 1 reply; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-04 19:52 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Aug 22, 2024 at 03:41:10PM -0500, Wei Huang wrote:
> Linux has some basic, but incomplete, definition for the TPH Requester
> capability registers. Also the definitions of TPH Requester control
> register and TPH Completer capability, as well as the ST fields of
> MSI-X entry, are missing. Add all required definitions to support TPH
> without changing the existing Linux UAPI.
> /* TPH Requester */
> #define PCI_TPH_CAP 4 /* capability register */
> -#define PCI_TPH_CAP_LOC_MASK 0x600 /* location mask */
> -#define PCI_TPH_LOC_NONE 0x000 /* no location */
> -#define PCI_TPH_LOC_CAP 0x200 /* in capability */
> -#define PCI_TPH_LOC_MSIX 0x400 /* in MSI-X */
> -#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST table mask */
> -#define PCI_TPH_CAP_ST_SHIFT 16 /* ST table shift */
> -#define PCI_TPH_BASE_SIZEOF 0xc /* size with no ST table */
> +#define PCI_TPH_CAP_NO_ST 0x00000001 /* No ST Mode Supported */
> +#define PCI_TPH_CAP_INT_VEC 0x00000002 /* Interrupt Vector Mode Supported */
> +#define PCI_TPH_CAP_DEV_SPEC 0x00000004 /* Device Specific Mode Supported */
I think these modes should all include "ST" to clearly delineate
Steering Tags from the Processing Hints. E.g.,
PCI_TPH_CAP_ST_NO_ST or maybe PCI_TPH_CAP_ST_NONE
PCI_TPH_CAP_ST_INT_VEC
PCI_TPH_CAP_ST_DEV_SPEC
> +#define PCI_TPH_CAP_EXT_TPH 0x00000100 /* Ext TPH Requester Supported */
> +#define PCI_TPH_CAP_LOC_MASK 0x00000600 /* ST Table Location */
> +#define PCI_TPH_LOC_NONE 0x00000000 /* Not present */
> +#define PCI_TPH_LOC_CAP 0x00000200 /* In capability */
> +#define PCI_TPH_LOC_MSIX 0x00000400 /* In MSI-X */
These are existing symbols just being tidied, so can't really add "ST"
here unless we just add aliases. Since they're just used internally,
not by drivers, I think they're fine as-is.
> +#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST Table Size */
> +#define PCI_TPH_CAP_ST_SHIFT 16 /* ST Table Size shift */
> +#define PCI_TPH_BASE_SIZEOF 0xc /* Size with no ST table */
> +
> +#define PCI_TPH_CTRL 8 /* control register */
> +#define PCI_TPH_CTRL_MODE_SEL_MASK 0x00000007 /* ST Mode Select */
> +#define PCI_TPH_NO_ST_MODE 0x0 /* No ST Mode */
> +#define PCI_TPH_INT_VEC_MODE 0x1 /* Interrupt Vector Mode */
> +#define PCI_TPH_DEV_SPEC_MODE 0x2 /* Device Specific Mode */
These are also internal, but they're new and I think they should also
include "ST" to match the CAP #defines.
Even better, maybe we only add these and use them for both CAP and
CTRL since they're defined with identical values.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-09-04 19:48 ` Wei Huang
@ 2024-09-04 20:03 ` Bjorn Helgaas
0 siblings, 0 replies; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-04 20:03 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Wed, Sep 04, 2024 at 02:48:30PM -0500, Wei Huang wrote:
> On 9/4/24 13:49, Bjorn Helgaas wrote:
> > On Thu, Aug 22, 2024 at 03:41:08PM -0500, Wei Huang wrote:
> >> Hi All,
> >>
> >> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint
> >> devices to provide optimization hints for requests that target memory
> >> space. These hints, in a format called steering tag (ST), are provided
> >> in the requester's TLP headers and allow the system hardware, including
> >> the Root Complex, to optimize the utilization of platform resources
> >> for the requests.
> >>
> >> Upcoming AMD hardware implement a new Cache Injection feature that
> >> leverages TPH. Cache Injection allows PCIe endpoints to inject I/O
> >> Coherent DMA writes directly into an L2 within the CCX (core complex)
> >> closest to the CPU core that will consume it. This technology is aimed
> >> at applications requiring high performance and low latency, such as
> >> networking and storage applications.
> >
> > Thanks for this example, it's a great intro. Suggest adding something
> > similar to a patch commit log, since the cover letter is harder to
> > find after this appears in git.
>
> I'll incorporate some of these descriptions into the TPH patches where
> relevant. Additionally, I'll enhance the commit log for bnxt.c (patch
> 11) with examples of the benefits.
Sounds good.
Another confusing point that would be helpful to mention is that TPH
includes two pieces: Processing Hints and Steering Tags.
As far as I can see, the only architected control of Processing Hints
(bi-directional, requester, target, target w/ priority) is to
enable/disable TPH or Extended TPH.
But we *do* have significant software control over the Steering Tags.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 09/12] PCI/TPH: Add save/restore support for TPH
2024-08-22 20:41 ` [PATCH V4 09/12] PCI/TPH: Add save/restore support for TPH Wei Huang
@ 2024-09-04 20:11 ` Bjorn Helgaas
0 siblings, 0 replies; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-04 20:11 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Aug 22, 2024 at 03:41:17PM -0500, Wei Huang wrote:
> From: Paul Luse <paul.e.luse@linux.intel.com>
>
> Save and restore the configuration space for TPH capability to preserve
> the settings during PCI reset. The settings include the TPH control
> register and the ST table if present.
> +void pci_restore_tph_state(struct pci_dev *pdev)
> +{
> + struct pci_cap_saved_state *save_state;
> + int num_entries, i, offset;
> + u16 *st_entry;
> + u32 *cap;
> +
> + if (!pdev->tph_cap)
> + return;
> +
> + if (!pdev->tph_enabled)
> + return;
> +
> + save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_TPH);
> + if (!save_state)
> + return;
> +
> + /* Restore control register and all ST entries */
> + cap = &save_state->cap.data[0];
> + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, *cap++);
> + st_entry = (u16 *)cap;
> + offset = PCI_TPH_BASE_SIZEOF;
> + num_entries = get_st_table_size(pdev);
> + for (i = 0; i < num_entries; i++) {
> + pci_write_config_word(pdev, pdev->tph_cap + offset,
> + *st_entry++);
> + offset += sizeof(u16);
> + }
> +}
> +
> +void pci_save_tph_state(struct pci_dev *pdev)
> +{
> + struct pci_cap_saved_state *save_state;
> + int num_entries, i, offset;
> + u16 *st_entry;
> + u32 *cap;
> +
> + if (!pdev->tph_cap)
> + return;
> +
> + if (!pdev->tph_enabled)
> + return;
> +
> + save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_TPH);
> + if (!save_state)
> + return;
Don't we need a pci_add_ext_cap_save_buffer() somewhere for this?
E.g., in pci_tph_init()?
> + /* Save control register */
> + cap = &save_state->cap.data[0];
> + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, cap++);
> +
> + /* Save all ST entries in extended capability structure */
> + st_entry = (u16 *)cap;
> + offset = PCI_TPH_BASE_SIZEOF;
> + num_entries = get_st_table_size(pdev);
> + for (i = 0; i < num_entries; i++) {
> + pci_read_config_word(pdev, pdev->tph_cap + offset,
> + st_entry++);
> + offset += sizeof(u16);
> + }
> +}
> +
> void pci_tph_init(struct pci_dev *pdev)
> {
> pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
> --
> 2.45.1
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
` (13 preceding siblings ...)
2024-09-04 18:49 ` Bjorn Helgaas
@ 2024-09-04 20:20 ` Bjorn Helgaas
2024-09-05 15:45 ` Wei Huang
14 siblings, 1 reply; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-04 20:20 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Aug 22, 2024 at 03:41:08PM -0500, Wei Huang wrote:
> Hi All,
>
> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint
> devices to provide optimization hints for requests that target memory
> space. These hints, in a format called steering tag (ST), are provided
> in the requester's TLP headers and allow the system hardware, including
> the Root Complex, to optimize the utilization of platform resources
> for the requests.
> Paul Luse (1):
> PCI/TPH: Add save/restore support for TPH
>
> Wei Huang (9):
> PCI: Introduce PCIe TPH support framework
> PCI: Add TPH related register definition
> PCI/TPH: Add pcie_tph_modes() to query TPH modes
> PCI/TPH: Add pcie_enable_tph() to enable TPH
> PCI/TPH: Add pcie_disable_tph() to disable TPH
> PCI/TPH: Add pcie_tph_enabled() to check TPH state
> PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
> PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
> PCI/TPH: Add pci=nostmode to force TPH No ST Mode
To me, this series would make more sense if we squashed these
together:
PCI: Introduce PCIe TPH support framework
PCI: Add TPH related register definition
PCI/TPH: Add pcie_enable_tph() to enable TPH
PCI/TPH: Add pcie_disable_tph() to disable TPH
PCI/TPH: Add save/restore support for TPH
These would add the "minimum viable functionality", e.g., enable TPH
just for Processing Hints, with no Steering Tag support at all. Would
also include "pci=notph".
PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
And squash these also to add Steering Tag support in a single commit,
including enhancing the save/restore.
PCI/TPH: Add pcie_tph_modes() to query TPH modes
PCI/TPH: Add pcie_tph_enabled() to check TPH state
And maybe we can get away without these altogether. I mentioned
pcie_tph_modes() elsewhere; seems possibly unnecessary since drivers
can just request the mode they want and we'll fail if it's not
supported.
Drivers should also be able to remember whether they enabled TPH
successfully without us having to remind them.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes
2024-09-04 19:40 ` Bjorn Helgaas
@ 2024-09-05 14:46 ` Wei Huang
2024-09-05 15:12 ` Bjorn Helgaas
0 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-09-05 14:46 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On 9/4/24 14:40, Bjorn Helgaas wrote:
> On Thu, Aug 22, 2024 at 03:41:11PM -0500, Wei Huang wrote:
>> Add pcie_tph_modes() to allow drivers to query the TPH modes supported
>> by an endpoint device, as reported in the TPH Requester Capability
>> register. The modes are reported as a bitmask and current supported
>> modes include:
>>
>> - PCI_TPH_CAP_NO_ST: NO ST Mode Supported
>> - PCI_TPH_CAP_INT_VEC: Interrupt Vector Mode Supported
>> - PCI_TPH_CAP_DEV_SPEC: Device Specific Mode Supported
>
>> + * pcie_tph_modes - Get the ST modes supported by device
>> + * @pdev: PCI device
>> + *
>> + * Returns a bitmask with all TPH modes supported by a device as shown in the
>> + * TPH capability register. Current supported modes include:
>> + * PCI_TPH_CAP_NO_ST - NO ST Mode Supported
>> + * PCI_TPH_CAP_INT_VEC - Interrupt Vector Mode Supported
>> + * PCI_TPH_CAP_DEV_SPEC - Device Specific Mode Supported
>> + *
>> + * Return: 0 when TPH is not supported, otherwise bitmask of supported modes
>> + */
>> +int pcie_tph_modes(struct pci_dev *pdev)
>> +{
>> + if (!pdev->tph_cap)
>> + return 0;
>> +
>> + return get_st_modes(pdev);
>> +}
>> +EXPORT_SYMBOL(pcie_tph_modes);
>
> I'm not sure I see the need for pcie_tph_modes(). The new bnxt code
> looks like this:
>
> bnxt_request_irq
> if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC)
> rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>
> What is the advantage of this over just this?
>
> bnxt_request_irq
> rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>
> It seems like drivers could just ask for what they want since
> pcie_enable_tph() has to verify support for it anyway. If that fails,
> the driver can fall back to another mode.
I can get rid of pcie_tph_modes() if unnecessary.
The design logic was that a driver can be used on various devices from
the same company. Some of these devices might not be TPH capable. So
instead of using trial-and-error (i.e. try INT_VEC ==> DEV_SPEC ==> give
up), we provide a way for the driver to query the device TPH
capabilities and pick a mode explicitly. IMO the code will be a bit cleaner.
>
> Returning a bitmask of supported modes might be useful if the driver
> could combine them, but IIUC the modes are all mutually exclusive, so
> the driver can't request a combination of them.
In the real world cases I saw, this is true. In the spec I didn't find
that these bits are mutually exclusive though.
>
> Bjorn
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-22 20:41 ` [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver Wei Huang
2024-08-26 20:22 ` Jakub Kicinski
@ 2024-09-05 15:06 ` Bjorn Helgaas
2024-09-11 15:37 ` Alejandro Lucero Palau
2 siblings, 0 replies; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-05 15:06 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Aug 22, 2024 at 03:41:19PM -0500, Wei Huang wrote:
> From: Manoj Panicker <manoj.panicker2@amd.com>
>
> Implement TPH support in Broadcom BNXT device driver. The driver uses
> TPH functions to retrieve and configure the device's Steering Tags when
> its interrupt affinity is being changed.
> + /* Register IRQ affinility notifier */
s/affinility/affinity/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 02/12] PCI: Add TPH related register definition
2024-09-04 19:52 ` Bjorn Helgaas
@ 2024-09-05 15:08 ` Wei Huang
2024-09-05 16:41 ` Bjorn Helgaas
0 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-09-05 15:08 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On 9/4/24 14:52, Bjorn Helgaas wrote:
>> -#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST table mask */
>> -#define PCI_TPH_CAP_ST_SHIFT 16 /* ST table shift */
>> -#define PCI_TPH_BASE_SIZEOF 0xc /* size with no ST table */
>> +#define PCI_TPH_CAP_NO_ST 0x00000001 /* No ST Mode Supported */
>> +#define PCI_TPH_CAP_INT_VEC 0x00000002 /* Interrupt Vector Mode Supported */
>> +#define PCI_TPH_CAP_DEV_SPEC 0x00000004 /* Device Specific Mode Supported */
>
> I think these modes should all include "ST" to clearly delineate
> Steering Tags from the Processing Hints. E.g.,
>
> PCI_TPH_CAP_ST_NO_ST or maybe PCI_TPH_CAP_ST_NONE
Can I keep "NO_ST" instead of switching over to "ST_NONE"? First, it
matches with PCIe spec. Secondly, IMO "ST_NONE" implies no ST support at
all.
> PCI_TPH_CAP_ST_INT_VEC
> PCI_TPH_CAP_ST_DEV_SPEC
Will change
>
>> +#define PCI_TPH_CAP_EXT_TPH 0x00000100 /* Ext TPH Requester Supported */
>> +#define PCI_TPH_CAP_LOC_MASK 0x00000600 /* ST Table Location */
>> +#define PCI_TPH_LOC_NONE 0x00000000 /* Not present */
>> +#define PCI_TPH_LOC_CAP 0x00000200 /* In capability */
>> +#define PCI_TPH_LOC_MSIX 0x00000400 /* In MSI-X */
>
> These are existing symbols just being tidied, so can't really add "ST"
> here unless we just add aliases. Since they're just used internally,
> not by drivers, I think they're fine as-is.
>
Yes. In V1 review, Alex Williamson and Jonathan Cameron asked not to
change these definitions as they might be used by existing software.
>> +#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST Table Size */
>> +#define PCI_TPH_CAP_ST_SHIFT 16 /* ST Table Size shift */
>> +#define PCI_TPH_BASE_SIZEOF 0xc /* Size with no ST table */
>> +
>> +#define PCI_TPH_CTRL 8 /* control register */
>> +#define PCI_TPH_CTRL_MODE_SEL_MASK 0x00000007 /* ST Mode Select */
>> +#define PCI_TPH_NO_ST_MODE 0x0 /* No ST Mode */
>> +#define PCI_TPH_INT_VEC_MODE 0x1 /* Interrupt Vector Mode */
>> +#define PCI_TPH_DEV_SPEC_MODE 0x2 /* Device Specific Mode */
>
> These are also internal, but they're new and I think they should also
> include "ST" to match the CAP #defines.
>
> Even better, maybe we only add these and use them for both CAP and
> CTRL since they're defined with identical values.
Can you elaborate here? In CTRL register, "ST Mode Select" is defined as
a 2-bit field. The possible values are 0, 1, 2. But in CAP register, the
modes are individual bit masked. So I cannot use CTRL definitions in CAP
register directly unless I do shifting.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes
2024-09-05 14:46 ` Wei Huang
@ 2024-09-05 15:12 ` Bjorn Helgaas
0 siblings, 0 replies; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-05 15:12 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Sep 05, 2024 at 09:46:44AM -0500, Wei Huang wrote:
> On 9/4/24 14:40, Bjorn Helgaas wrote:
> > On Thu, Aug 22, 2024 at 03:41:11PM -0500, Wei Huang wrote:
> >> Add pcie_tph_modes() to allow drivers to query the TPH modes supported
> >> by an endpoint device, as reported in the TPH Requester Capability
> >> register. The modes are reported as a bitmask and current supported
> >> modes include:
> >>
> >> - PCI_TPH_CAP_NO_ST: NO ST Mode Supported
> >> - PCI_TPH_CAP_INT_VEC: Interrupt Vector Mode Supported
> >> - PCI_TPH_CAP_DEV_SPEC: Device Specific Mode Supported
> >
> >> + * pcie_tph_modes - Get the ST modes supported by device
> >> + * @pdev: PCI device
> >> + *
> >> + * Returns a bitmask with all TPH modes supported by a device as shown in the
> >> + * TPH capability register. Current supported modes include:
> >> + * PCI_TPH_CAP_NO_ST - NO ST Mode Supported
> >> + * PCI_TPH_CAP_INT_VEC - Interrupt Vector Mode Supported
> >> + * PCI_TPH_CAP_DEV_SPEC - Device Specific Mode Supported
> >> + *
> >> + * Return: 0 when TPH is not supported, otherwise bitmask of supported modes
> >> + */
> >> +int pcie_tph_modes(struct pci_dev *pdev)
> >> +{
> >> + if (!pdev->tph_cap)
> >> + return 0;
> >> +
> >> + return get_st_modes(pdev);
> >> +}
> >> +EXPORT_SYMBOL(pcie_tph_modes);
> >
> > I'm not sure I see the need for pcie_tph_modes(). The new bnxt code
> > looks like this:
> >
> > bnxt_request_irq
> > if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC)
> > rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
> >
> > What is the advantage of this over just this?
> >
> > bnxt_request_irq
> > rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
> >
> > It seems like drivers could just ask for what they want since
> > pcie_enable_tph() has to verify support for it anyway. If that fails,
> > the driver can fall back to another mode.
>
> I can get rid of pcie_tph_modes() if unnecessary.
>
> The design logic was that a driver can be used on various devices from
> the same company. Some of these devices might not be TPH capable. So
> instead of using trial-and-error (i.e. try INT_VEC ==> DEV_SPEC ==> give
> up), we provide a way for the driver to query the device TPH
> capabilities and pick a mode explicitly. IMO the code will be a bit cleaner.
>
> > Returning a bitmask of supported modes might be useful if the driver
> > could combine them, but IIUC the modes are all mutually exclusive, so
> > the driver can't request a combination of them.
>
> In the real world cases I saw, this is true. In the spec I didn't find
> that these bits are mutually exclusive though.
A device may advertise *support* for multiple modes in TPH Requester
Capability, of course.
What I meant by "driver can't request a combination" is that we can
only *select* one of them via the ST Mode select in TPH Requester
Control.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-09-04 20:20 ` Bjorn Helgaas
@ 2024-09-05 15:45 ` Wei Huang
2024-09-05 16:44 ` Bjorn Helgaas
0 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-09-05 15:45 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On 9/4/24 15:20, Bjorn Helgaas wrote:
> To me, this series would make more sense if we squashed these
> together:
>
> PCI: Introduce PCIe TPH support framework
> PCI: Add TPH related register definition
> PCI/TPH: Add pcie_enable_tph() to enable TPH
> PCI/TPH: Add pcie_disable_tph() to disable TPH
> PCI/TPH: Add save/restore support for TPH
>
> These would add the "minimum viable functionality", e.g., enable TPH
> just for Processing Hints, with no Steering Tag support at all. Would
> also include "pci=notph".
Will do
>
> PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
> PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
>
> And squash these also to add Steering Tag support in a single commit,
> including enhancing the save/restore.
Can you elaborate on save/restore enhancement? Assuming that the first
combined patch will have save/restore support as suggested. Are you
talking about the ST entries save/restore as the enhancements (see
below), because the second combined patch deals with ST?
st_entry = (u16 *)cap;
offset = PCI_TPH_BASE_SIZEOF;
num_entries = get_st_table_size(pdev);
for (i = 0; i < num_entries; i++) {
pci_write_config_word(pdev, pdev->tph_cap + offset,
*st_entry++);
offset += sizeof(u16);
}
>
> PCI/TPH: Add pcie_tph_modes() to query TPH modes
> PCI/TPH: Add pcie_tph_enabled() to check TPH state
>
> And maybe we can get away without these altogether. I mentioned
> pcie_tph_modes() elsewhere; seems possibly unnecessary since drivers
> can just request the mode they want and we'll fail if it's not
> supported.
>
> Drivers should also be able to remember whether they enabled TPH
> successfully without us having to remind them.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 02/12] PCI: Add TPH related register definition
2024-09-05 15:08 ` Wei Huang
@ 2024-09-05 16:41 ` Bjorn Helgaas
2024-09-16 21:08 ` Wei Huang
0 siblings, 1 reply; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-05 16:41 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Sep 05, 2024 at 10:08:33AM -0500, Wei Huang wrote:
> On 9/4/24 14:52, Bjorn Helgaas wrote:
> >> -#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST table mask */
> >> -#define PCI_TPH_CAP_ST_SHIFT 16 /* ST table shift */
> >> -#define PCI_TPH_BASE_SIZEOF 0xc /* size with no ST table */
> >> +#define PCI_TPH_CAP_NO_ST 0x00000001 /* No ST Mode Supported */
> >> +#define PCI_TPH_CAP_INT_VEC 0x00000002 /* Interrupt Vector Mode Supported */
> >> +#define PCI_TPH_CAP_DEV_SPEC 0x00000004 /* Device Specific Mode Supported */
> >
> > I think these modes should all include "ST" to clearly delineate
> > Steering Tags from the Processing Hints. E.g.,
> >
> > PCI_TPH_CAP_ST_NO_ST or maybe PCI_TPH_CAP_ST_NONE
>
> Can I keep "NO_ST" instead of switching over to "ST_NONE"? First, it
> matches with PCIe spec. Secondly, IMO "ST_NONE" implies no ST support at
> all.
Sure. Does PCI_TPH_CAP_ST_NO_ST work for you? That follows the same
PCI_TPH_CAP_ST_* pattern as below.
> > PCI_TPH_CAP_ST_INT_VEC
> > PCI_TPH_CAP_ST_DEV_SPEC
>
> Will change
> >> +#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST Table Size */
> >> +#define PCI_TPH_CAP_ST_SHIFT 16 /* ST Table Size shift */
> >> +#define PCI_TPH_BASE_SIZEOF 0xc /* Size with no ST table */
> >> +
> >> +#define PCI_TPH_CTRL 8 /* control register */
> >> +#define PCI_TPH_CTRL_MODE_SEL_MASK 0x00000007 /* ST Mode Select */
> >> +#define PCI_TPH_NO_ST_MODE 0x0 /* No ST Mode */
> >> +#define PCI_TPH_INT_VEC_MODE 0x1 /* Interrupt Vector Mode */
> >> +#define PCI_TPH_DEV_SPEC_MODE 0x2 /* Device Specific Mode */
> >
> > These are also internal, but they're new and I think they should also
> > include "ST" to match the CAP #defines.
> >
> > Even better, maybe we only add these and use them for both CAP and
> > CTRL since they're defined with identical values.
>
> Can you elaborate here? In CTRL register, "ST Mode Select" is defined as
> a 2-bit field. The possible values are 0, 1, 2. But in CAP register, the
> modes are individual bit masked. So I cannot use CTRL definitions in CAP
> register directly unless I do shifting.
Oops, sorry, I thought they were the same values, but they're not, so
ignore this comment.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 00/12] PCIe TPH and cache direct injection support
2024-09-05 15:45 ` Wei Huang
@ 2024-09-05 16:44 ` Bjorn Helgaas
0 siblings, 0 replies; 43+ messages in thread
From: Bjorn Helgaas @ 2024-09-05 16:44 UTC (permalink / raw)
To: Wei Huang
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On Thu, Sep 05, 2024 at 10:45:57AM -0500, Wei Huang wrote:
> On 9/4/24 15:20, Bjorn Helgaas wrote:
> > To me, this series would make more sense if we squashed these
> > together:
> >
> > PCI: Introduce PCIe TPH support framework
> > PCI: Add TPH related register definition
> > PCI/TPH: Add pcie_enable_tph() to enable TPH
> > PCI/TPH: Add pcie_disable_tph() to disable TPH
> > PCI/TPH: Add save/restore support for TPH
> >
> > These would add the "minimum viable functionality", e.g., enable TPH
> > just for Processing Hints, with no Steering Tag support at all. Would
> > also include "pci=notph".
>
> > PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag
> > PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
> >
> > And squash these also to add Steering Tag support in a single commit,
> > including enhancing the save/restore.
>
> Can you elaborate on save/restore enhancement? Assuming that the first
> combined patch will have save/restore support as suggested. Are you
> talking about the ST entries save/restore as the enhancements (see
> below), because the second combined patch deals with ST?
>
> st_entry = (u16 *)cap;
> offset = PCI_TPH_BASE_SIZEOF;
> num_entries = get_st_table_size(pdev);
> for (i = 0; i < num_entries; i++) {
> pci_write_config_word(pdev, pdev->tph_cap + offset,
> *st_entry++);
> offset += sizeof(u16);
> }
I meant that since the first patch knows nothing about ST, it would
save/restore TPH control but not the ST entries.
The second patch would add ST support and also add save/restore of the
ST entries.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-08-22 20:41 ` [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver Wei Huang
2024-08-26 20:22 ` Jakub Kicinski
2024-09-05 15:06 ` Bjorn Helgaas
@ 2024-09-11 15:37 ` Alejandro Lucero Palau
2024-09-16 18:55 ` Wei Huang
2 siblings, 1 reply; 43+ messages in thread
From: Alejandro Lucero Palau @ 2024-09-11 15:37 UTC (permalink / raw)
To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 8/22/24 21:41, Wei Huang wrote:
> From: Manoj Panicker <manoj.panicker2@amd.com>
>
> Implement TPH support in Broadcom BNXT device driver. The driver uses
> TPH functions to retrieve and configure the device's Steering Tags when
> its interrupt affinity is being changed.
>
> Co-developed-by: Wei Huang <wei.huang2@amd.com>
> Signed-off-by: Wei Huang <wei.huang2@amd.com>
> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 +++++++++++++++++++++++
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 ++
> 2 files changed, 82 insertions(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index ffa74c26ee53..5903cd36b54d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -55,6 +55,7 @@
> #include <net/page_pool/helpers.h>
> #include <linux/align.h>
> #include <net/netdev_queues.h>
> +#include <linux/pci-tph.h>
>
> #include "bnxt_hsi.h"
> #include "bnxt.h"
> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
> return 0;
> }
>
> +static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> + const cpumask_t *mask)
> +{
> + struct bnxt_irq *irq;
> + u16 tag;
> +
> + irq = container_of(notify, struct bnxt_irq, affinity_notify);
> + cpumask_copy(irq->cpu_mask, mask);
> +
> + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
> + cpumask_first(irq->cpu_mask), &tag))
I understand just one cpu from the mask has to be used, but I wonder if
some check should be done for ensuring the mask is not mad.
This is control path and the related queue is going to be restarted, so
maybe a sanity check for ensuring all the cpus in the mask are from the
same CCX complex?
That would be an iteration checking the tag is the same one for all of
them. If not, at least a warning stating the tag/CCX/cpu used.
> + return;
> +
> + if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
> + return;
> +
> + if (netif_running(irq->bp->dev)) {
> + rtnl_lock();
> + bnxt_close_nic(irq->bp, false, false);
> + bnxt_open_nic(irq->bp, false, false);
> + rtnl_unlock();
> + }
> +}
> +
> +static void __bnxt_irq_affinity_release(struct kref __always_unused *ref)
> +{
> +}
> +
> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
> +{
> + irq_set_affinity_notifier(irq->vector, NULL);
> +}
> +
> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq)
> +{
> + struct irq_affinity_notify *notify;
> +
> + /* Nothing to do if TPH is not enabled */
> + if (!pcie_tph_enabled(bp->pdev))
> + return;
> +
> + irq->bp = bp;
> +
> + /* Register IRQ affinility notifier */
> + notify = &irq->affinity_notify;
> + notify->irq = irq->vector;
> + notify->notify = __bnxt_irq_affinity_notify;
> + notify->release = __bnxt_irq_affinity_release;
> +
> + irq_set_affinity_notifier(irq->vector, notify);
> +}
> +
> static void bnxt_free_irq(struct bnxt *bp)
> {
> struct bnxt_irq *irq;
> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
> free_cpumask_var(irq->cpu_mask);
> irq->have_cpumask = 0;
> }
> +
> + bnxt_release_irq_notifier(irq);
> +
> free_irq(irq->vector, bp->bnapi[i]);
> }
>
> irq->requested = 0;
> }
> +
> + /* Disable TPH support */
> + pcie_disable_tph(bp->pdev);
> }
>
> static int bnxt_request_irq(struct bnxt *bp)
> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
> if (!(bp->flags & BNXT_FLAG_USING_MSIX))
> flags = IRQF_SHARED;
>
> + /* Enable TPH support as part of IRQ request */
> + if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
> + rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
> + if (rc)
> + netdev_warn(bp->dev, "failed enabling TPH support\n");
> + }
> +
> for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
> int map_idx = bnxt_cp_num_to_irq_num(bp, i);
> struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>
> if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
> int numa_node = dev_to_node(&bp->pdev->dev);
> + u16 tag;
>
> irq->have_cpumask = 1;
> + irq->msix_nr = map_idx;
> cpumask_set_cpu(cpumask_local_spread(i, numa_node),
> irq->cpu_mask);
> rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
> irq->vector);
> break;
> }
> +
> + bnxt_register_irq_notifier(bp, irq);
> +
> + /* Init ST table entry */
> + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
> + cpumask_first(irq->cpu_mask),
> + &tag))
> + break;
> +
> + pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
> }
> }
> return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 6bbdc718c3a7..ae1abcc1bddf 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
> u8 have_cpumask:1;
> char name[IFNAMSIZ + 2];
> cpumask_var_t cpu_mask;
> +
> + struct bnxt *bp;
> + int msix_nr;
> + struct irq_affinity_notify affinity_notify;
> };
>
> #define HWRM_RING_ALLOC_TX 0x1
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 04/12] PCI/TPH: Add pcie_enable_tph() to enable TPH
2024-08-22 20:41 ` [PATCH V4 04/12] PCI/TPH: Add pcie_enable_tph() to enable TPH Wei Huang
@ 2024-09-13 11:35 ` Alejandro Lucero Palau
0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Lucero Palau @ 2024-09-13 11:35 UTC (permalink / raw)
To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 8/22/24 21:41, Wei Huang wrote:
> Allow drivers to enable TPH support using a specific ST mode. It checks
> whether the mode is actually supported by the device before enabling.
> Additionally determines what types of requests, TPH (8-bit) or Extended
> TPH (16-bit), can be issued by the device based on the device's TPH
> Requester capability and its Root Port's Completer capability.
>
> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
> Signed-off-by: Wei Huang <wei.huang2@amd.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> ---
> drivers/pci/pcie/tph.c | 92 +++++++++++++++++++++++++++++++++++++++++
> include/linux/pci-tph.h | 3 ++
> include/linux/pci.h | 3 ++
> 3 files changed, 98 insertions(+)
>
> diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
> index a28dced3097d..14ad8c5e895c 100644
> --- a/drivers/pci/pcie/tph.c
> +++ b/drivers/pci/pcie/tph.c
> @@ -7,6 +7,7 @@
> * Wei Huang <wei.huang2@amd.com>
> */
> #include <linux/pci.h>
> +#include <linux/bitfield.h>
> #include <linux/pci-tph.h>
>
> #include "../pci.h"
> @@ -21,6 +22,97 @@ static u8 get_st_modes(struct pci_dev *pdev)
> return reg;
> }
>
> +/* Return device's Root Port completer capability */
> +static u8 get_rp_completer_type(struct pci_dev *pdev)
> +{
> + struct pci_dev *rp;
> + u32 reg;
> + int ret;
> +
> + rp = pcie_find_root_port(pdev);
> + if (!rp)
> + return 0;
> +
> + ret = pcie_capability_read_dword(rp, PCI_EXP_DEVCAP2, ®);
> + if (ret)
> + return 0;
> +
> + return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg);
> +}
> +
> +/**
> + * pcie_enable_tph - Enable TPH support for device using a specific ST mode
> + * @pdev: PCI device
> + * @mode: ST mode to enable, as returned by pcie_tph_modes()
> + *
> + * Checks whether the mode is actually supported by the device before enabling
> + * and returns an error if not. Additionally determines what types of requests,
> + * TPH or extended TPH, can be issued by the device based on its TPH requester
> + * capability and the Root Port's completer capability.
> + *
> + * Return: 0 on success, otherwise negative value (-errno)
> + */
> +int pcie_enable_tph(struct pci_dev *pdev, int mode)
> +{
> + u32 reg;
> + u8 dev_modes;
> + u8 rp_req_type;
> +
> + if (!pdev->tph_cap)
> + return -EINVAL;
> +
> + if (pdev->tph_enabled)
> + return -EBUSY;
> +
> + /* Check ST mode comptability */
> + dev_modes = get_st_modes(pdev);
> + if (!(mode & dev_modes))
> + return -EINVAL;
> +
> + /* Select a supported mode */
> + switch (mode) {
> + case PCI_TPH_CAP_INT_VEC:
> + pdev->tph_mode = PCI_TPH_INT_VEC_MODE;
> + break;
> + case PCI_TPH_CAP_DEV_SPEC:
> + pdev->tph_mode = PCI_TPH_DEV_SPEC_MODE;
> + break;
> + case PCI_TPH_CAP_NO_ST:
> + pdev->tph_mode = PCI_TPH_NO_ST_MODE;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + /* Get req_type supported by device and its Root Port */
> + reg = pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
> + if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
> + pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
> + else
> + pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
> +
> + rp_req_type = get_rp_completer_type(pdev);
> +
> + /* Final req_type is the smallest value of two */
> + pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
> +
> + /* Write them into TPH control register */
> + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
No error checked. Same with below pci_write_config_dword.
> +
> + reg &= ~PCI_TPH_CTRL_MODE_SEL_MASK;
> + reg |= FIELD_PREP(PCI_TPH_CTRL_MODE_SEL_MASK, pdev->tph_mode);
> +
> + reg &= ~PCI_TPH_CTRL_REQ_EN_MASK;
> + reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, pdev->tph_req_type);
> +
> + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
> +
> + pdev->tph_enabled = 1;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(pcie_enable_tph);
> +
> /**
> * pcie_tph_modes - Get the ST modes supported by device
> * @pdev: PCI device
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index fa378afe9c7e..cdf561076484 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
> @@ -10,8 +10,11 @@
> #define LINUX_PCI_TPH_H
>
> #ifdef CONFIG_PCIE_TPH
> +int pcie_enable_tph(struct pci_dev *pdev, int mode);
> int pcie_tph_modes(struct pci_dev *pdev);
> #else
> +static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
> +{ return -EINVAL; }
> static inline int pcie_tph_modes(struct pci_dev *pdev) { return 0; }
> #endif
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c59e7ecab491..6f05deb6a0bf 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -433,6 +433,7 @@ struct pci_dev {
> unsigned int ats_enabled:1; /* Address Translation Svc */
> unsigned int pasid_enabled:1; /* Process Address Space ID */
> unsigned int pri_enabled:1; /* Page Request Interface */
> + unsigned int tph_enabled:1; /* TLP Processing Hints */
> unsigned int is_managed:1; /* Managed via devres */
> unsigned int is_msi_managed:1; /* MSI release via devres installed */
> unsigned int needs_freset:1; /* Requires fundamental reset */
> @@ -533,6 +534,8 @@ struct pci_dev {
>
> #ifdef CONFIG_PCIE_TPH
> u16 tph_cap; /* TPH capability offset */
> + u8 tph_mode; /* TPH mode */
> + u8 tph_req_type; /* TPH requester type */
> #endif
> };
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
2024-08-22 20:41 ` [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get " Wei Huang
@ 2024-09-14 10:10 ` Alejandro Lucero Palau
2024-09-16 18:58 ` Wei Huang
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Lucero Palau @ 2024-09-14 10:10 UTC (permalink / raw)
To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 8/22/24 21:41, Wei Huang wrote:
> Allow a caller to retrieve Steering Tags for a target memory that is
> associated with a specific CPU. The caller must provided two parameters,
> memory type and CPU UID, when calling this function. The tag is
> retrieved by invoking ACPI _DSM of the device's Root Port device.
>
> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
> Signed-off-by: Wei Huang <wei.huang2@amd.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> ---
> drivers/pci/pcie/tph.c | 154 ++++++++++++++++++++++++++++++++++++++++
> include/linux/pci-tph.h | 18 +++++
> 2 files changed, 172 insertions(+)
>
> diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
> index 82189361a2ee..5bd194fb425e 100644
> --- a/drivers/pci/pcie/tph.c
> +++ b/drivers/pci/pcie/tph.c
> @@ -7,12 +7,125 @@
> * Wei Huang <wei.huang2@amd.com>
> */
> #include <linux/pci.h>
> +#include <linux/pci-acpi.h>
> #include <linux/bitfield.h>
> #include <linux/msi.h>
> #include <linux/pci-tph.h>
>
> #include "../pci.h"
>
> +/*
> + * The st_info struct defines the Steering Tag (ST) info returned by the
> + * firmware _DSM method defined in the approved ECN for PCI Firmware Spec,
> + * available at https://members.pcisig.com/wg/PCI-SIG/document/15470.
> + *
> + * @vm_st_valid: 8-bit ST for volatile memory is valid
> + * @vm_xst_valid: 16-bit extended ST for volatile memory is valid
> + * @vm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied
> + * @vm_st: 8-bit ST for volatile mem
> + * @vm_xst: 16-bit extended ST for volatile mem
> + * @pm_st_valid: 8-bit ST for persistent memory is valid
> + * @pm_xst_valid: 16-bit extended ST for persistent memory is valid
> + * @pm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied
> + * @pm_st: 8-bit ST for persistent mem
> + * @pm_xst: 16-bit extended ST for persistent mem
> + */
> +union st_info {
> + struct {
> + u64 vm_st_valid : 1;
> + u64 vm_xst_valid : 1;
> + u64 vm_ph_ignore : 1;
> + u64 rsvd1 : 5;
> + u64 vm_st : 8;
> + u64 vm_xst : 16;
> + u64 pm_st_valid : 1;
> + u64 pm_xst_valid : 1;
> + u64 pm_ph_ignore : 1;
> + u64 rsvd2 : 5;
> + u64 pm_st : 8;
> + u64 pm_xst : 16;
> + };
> + u64 value;
> +};
> +
> +static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
> + union st_info *info)
> +{
> + switch (req_type) {
> + case PCI_TPH_REQ_TPH_ONLY: /* 8-bit tag */
> + switch (mem_type) {
> + case TPH_MEM_TYPE_VM:
> + if (info->vm_st_valid)
> + return info->vm_st;
> + break;
> + case TPH_MEM_TYPE_PM:
> + if (info->pm_st_valid)
> + return info->pm_st;
> + break;
> + }
> + break;
> + case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
> + switch (mem_type) {
> + case TPH_MEM_TYPE_VM:
> + if (info->vm_xst_valid)
> + return info->vm_xst;
> + break;
> + case TPH_MEM_TYPE_PM:
> + if (info->pm_xst_valid)
> + return info->pm_xst;
> + break;
> + }
> + break;
> + default:
> + return 0;
> + }
> +
> + return 0;
> +}
> +
> +#define TPH_ST_DSM_FUNC_INDEX 0xF
Where is this coming from? Any spec to refer to?
> +static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
> + union st_info *st_out)
> +{
> + union acpi_object arg3[3], in_obj, *out_obj;
> +
> + if (!acpi_check_dsm(handle, &pci_acpi_dsm_guid, 7,
Again, what is this revision 7 based on? Specs?
I'm trying to use this patchset and this call fails. I've tried to use
lower revision numbers with no success.
FWIW, I got no DSM function at all. This could be a problem with my BIOS
or system, but in any case, this should be clearer specified in the code.
> + BIT(TPH_ST_DSM_FUNC_INDEX)))
> + return AE_ERROR;
> +
> + /* DWORD: feature ID (0 for processor cache ST query) */
> + arg3[0].integer.type = ACPI_TYPE_INTEGER;
> + arg3[0].integer.value = 0;
> +
> + /* DWORD: target UID */
> + arg3[1].integer.type = ACPI_TYPE_INTEGER;
> + arg3[1].integer.value = cpu_uid;
> +
> + /* QWORD: properties, all 0's */
> + arg3[2].integer.type = ACPI_TYPE_INTEGER;
> + arg3[2].integer.value = 0;
> +
> + in_obj.type = ACPI_TYPE_PACKAGE;
> + in_obj.package.count = ARRAY_SIZE(arg3);
> + in_obj.package.elements = arg3;
> +
> + out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, 7,
> + TPH_ST_DSM_FUNC_INDEX, &in_obj);
> + if (!out_obj)
> + return AE_ERROR;
> +
> + if (out_obj->type != ACPI_TYPE_BUFFER) {
> + ACPI_FREE(out_obj);
> + return AE_ERROR;
> + }
> +
> + st_out->value = *((u64 *)(out_obj->buffer.pointer));
> +
> + ACPI_FREE(out_obj);
> +
> + return AE_OK;
> +}
> +
> /* Update the TPH Requester Enable field of TPH Control Register */
> static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
> {
> @@ -140,6 +253,47 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
> return pci_write_config_word(pdev, offset, tag);
> }
>
> +/**
> + * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
> + * with a specific CPU
> + * @pdev: PCI device
> + * @mem_type: target memory type (volatile or persistent RAM)
> + * @cpu_uid: associated CPU id
> + * @tag: Steering Tag to be returned
> + *
> + * This function returns the Steering Tag for a target memory that is
> + * associated with a specific CPU as indicated by cpu_uid.
> + *
> + * Returns 0 if success, otherwise negative value (-errno)
> + */
> +int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
> + unsigned int cpu_uid, u16 *tag)
> +{
> + struct pci_dev *rp;
> + acpi_handle rp_acpi_handle;
> + union st_info info;
> +
> + rp = pcie_find_root_port(pdev);
> + if (!rp || !rp->bus || !rp->bus->bridge)
> + return -ENODEV;
> +
> + rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
> +
> + if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, &info) != AE_OK) {
> + *tag = 0;
> + return -EINVAL;
> + }
> +
> + *tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
> +
> + pci_dbg(pdev, "get steering tag: mem_type=%s, cpu_uid=%d, tag=%#04x\n",
> + (mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
> + cpu_uid, *tag);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(pcie_tph_get_cpu_st);
> +
> /**
> * pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
> * @pdev: PCI device
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index a0c93b97090a..c9f33688b9a9 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
> @@ -9,9 +9,23 @@
> #ifndef LINUX_PCI_TPH_H
> #define LINUX_PCI_TPH_H
>
> +/*
> + * According to the ECN for PCI Firmware Spec, Steering Tag can be different
> + * depending on the memory type: Volatile Memory or Persistent Memory. When a
> + * caller query about a target's Steering Tag, it must provide the target's
> + * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
> + */
> +enum tph_mem_type {
> + TPH_MEM_TYPE_VM, /* volatile memory */
> + TPH_MEM_TYPE_PM /* persistent memory */
> +};
> +
> #ifdef CONFIG_PCIE_TPH
> int pcie_tph_set_st_entry(struct pci_dev *pdev,
> unsigned int index, u16 tag);
> +int pcie_tph_get_cpu_st(struct pci_dev *dev,
> + enum tph_mem_type mem_type,
> + unsigned int cpu_uid, u16 *tag);
> bool pcie_tph_enabled(struct pci_dev *pdev);
> void pcie_disable_tph(struct pci_dev *pdev);
> int pcie_enable_tph(struct pci_dev *pdev, int mode);
> @@ -20,6 +34,10 @@ int pcie_tph_modes(struct pci_dev *pdev);
> static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
> unsigned int index, u16 tag)
> { return -EINVAL; }
> +static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
> + enum tph_mem_type mem_type,
> + unsigned int cpu_uid, u16 *tag)
> +{ return -EINVAL; }
> static inline bool pcie_tph_enabled(struct pci_dev *pdev) { return false; }
> static inline void pcie_disable_tph(struct pci_dev *pdev) { }
> static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-09-11 15:37 ` Alejandro Lucero Palau
@ 2024-09-16 18:55 ` Wei Huang
2024-09-18 17:31 ` Alejandro Lucero Palau
0 siblings, 1 reply; 43+ messages in thread
From: Wei Huang @ 2024-09-16 18:55 UTC (permalink / raw)
To: Alejandro Lucero Palau, linux-pci, linux-kernel, linux-doc,
netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>
> On 8/22/24 21:41, Wei Huang wrote:
>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>
>> Implement TPH support in Broadcom BNXT device driver. The driver uses
>> TPH functions to retrieve and configure the device's Steering Tags when
>> its interrupt affinity is being changed.
>>
>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>> ---
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 +++++++++++++++++++++++
>> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 ++
>> 2 files changed, 82 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> index ffa74c26ee53..5903cd36b54d 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> @@ -55,6 +55,7 @@
>> #include <net/page_pool/helpers.h>
>> #include <linux/align.h>
>> #include <net/netdev_queues.h>
>> +#include <linux/pci-tph.h>
>>
>> #include "bnxt_hsi.h"
>> #include "bnxt.h"
>> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
>> return 0;
>> }
>>
>> +static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
>> + const cpumask_t *mask)
>> +{
>> + struct bnxt_irq *irq;
>> + u16 tag;
>> +
>> + irq = container_of(notify, struct bnxt_irq, affinity_notify);
>> + cpumask_copy(irq->cpu_mask, mask);
>> +
>> + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>> + cpumask_first(irq->cpu_mask), &tag))
>
>
> I understand just one cpu from the mask has to be used, but I wonder if
> some check should be done for ensuring the mask is not mad.
>
> This is control path and the related queue is going to be restarted, so
> maybe a sanity check for ensuring all the cpus in the mask are from the
> same CCX complex?
I don't think this is always true and we shouldn't warn when this
happens. There is only one ST can be supported, so the driver need to
make a good judgement on which ST to be used. But no matter what, ST is
just a hint - it shouldn't cause any correctness issues in HW, even when
it is not the optimal target CPU. So warning is unnecessary.
>
> That would be an iteration checking the tag is the same one for all of
> them. If not, at least a warning stating the tag/CCX/cpu used.
>
>
>> + return;
>> +
>> + if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
>> + return;
>> +
>> + if (netif_running(irq->bp->dev)) {
>> + rtnl_lock();
>> + bnxt_close_nic(irq->bp, false, false);
>> + bnxt_open_nic(irq->bp, false, false);
>> + rtnl_unlock();
>> + }
>> +}
>> +
>> +static void __bnxt_irq_affinity_release(struct kref __always_unused *ref)
>> +{
>> +}
>> +
>> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
>> +{
>> + irq_set_affinity_notifier(irq->vector, NULL);
>> +}
>> +
>> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq)
>> +{
>> + struct irq_affinity_notify *notify;
>> +
>> + /* Nothing to do if TPH is not enabled */
>> + if (!pcie_tph_enabled(bp->pdev))
>> + return;
>> +
>> + irq->bp = bp;
>> +
>> + /* Register IRQ affinility notifier */
>> + notify = &irq->affinity_notify;
>> + notify->irq = irq->vector;
>> + notify->notify = __bnxt_irq_affinity_notify;
>> + notify->release = __bnxt_irq_affinity_release;
>> +
>> + irq_set_affinity_notifier(irq->vector, notify);
>> +}
>> +
>> static void bnxt_free_irq(struct bnxt *bp)
>> {
>> struct bnxt_irq *irq;
>> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
>> free_cpumask_var(irq->cpu_mask);
>> irq->have_cpumask = 0;
>> }
>> +
>> + bnxt_release_irq_notifier(irq);
>> +
>> free_irq(irq->vector, bp->bnapi[i]);
>> }
>>
>> irq->requested = 0;
>> }
>> +
>> + /* Disable TPH support */
>> + pcie_disable_tph(bp->pdev);
>> }
>>
>> static int bnxt_request_irq(struct bnxt *bp)
>> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
>> if (!(bp->flags & BNXT_FLAG_USING_MSIX))
>> flags = IRQF_SHARED;
>>
>> + /* Enable TPH support as part of IRQ request */
>> + if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
>> + rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>> + if (rc)
>> + netdev_warn(bp->dev, "failed enabling TPH support\n");
>> + }
>> +
>> for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
>> int map_idx = bnxt_cp_num_to_irq_num(bp, i);
>> struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
>> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>>
>> if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
>> int numa_node = dev_to_node(&bp->pdev->dev);
>> + u16 tag;
>>
>> irq->have_cpumask = 1;
>> + irq->msix_nr = map_idx;
>> cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>> irq->cpu_mask);
>> rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
>> irq->vector);
>> break;
>> }
>> +
>> + bnxt_register_irq_notifier(bp, irq);
>> +
>> + /* Init ST table entry */
>> + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>> + cpumask_first(irq->cpu_mask),
>> + &tag))
>> + break;
>> +
>> + pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
>> }
>> }
>> return rc;
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>> index 6bbdc718c3a7..ae1abcc1bddf 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
>> u8 have_cpumask:1;
>> char name[IFNAMSIZ + 2];
>> cpumask_var_t cpu_mask;
>> +
>> + struct bnxt *bp;
>> + int msix_nr;
>> + struct irq_affinity_notify affinity_notify;
>> };
>>
>> #define HWRM_RING_ALLOC_TX 0x1
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get ST tag
2024-09-14 10:10 ` Alejandro Lucero Palau
@ 2024-09-16 18:58 ` Wei Huang
0 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-09-16 18:58 UTC (permalink / raw)
To: Alejandro Lucero Palau, linux-pci, linux-kernel, linux-doc,
netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 9/14/24 5:10 AM, Alejandro Lucero Palau wrote:
>
> On 8/22/24 21:41, Wei Huang wrote:
>> Allow a caller to retrieve Steering Tags for a target memory that is
>> associated with a specific CPU. The caller must provided two parameters,
>> memory type and CPU UID, when calling this function. The tag is
>> retrieved by invoking ACPI _DSM of the device's Root Port device.
>>
>> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
>> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>> ---
>> drivers/pci/pcie/tph.c | 154 ++++++++++++++++++++++++++++++++++++++++
>> include/linux/pci-tph.h | 18 +++++
>> 2 files changed, 172 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
>> index 82189361a2ee..5bd194fb425e 100644
>> --- a/drivers/pci/pcie/tph.c
>> +++ b/drivers/pci/pcie/tph.c
>> @@ -7,12 +7,125 @@
>> * Wei Huang <wei.huang2@amd.com>
>> */
>> #include <linux/pci.h>
>> +#include <linux/pci-acpi.h>
>> #include <linux/bitfield.h>
>> #include <linux/msi.h>
>> #include <linux/pci-tph.h>
>>
>> #include "../pci.h"
>>
>> +/*
>> + * The st_info struct defines the Steering Tag (ST) info returned by the
>> + * firmware _DSM method defined in the approved ECN for PCI Firmware Spec,
>> + * available at https://members.pcisig.com/wg/PCI-SIG/document/15470.
>> + *
>> + * @vm_st_valid: 8-bit ST for volatile memory is valid
>> + * @vm_xst_valid: 16-bit extended ST for volatile memory is valid
>> + * @vm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied
>> + * @vm_st: 8-bit ST for volatile mem
>> + * @vm_xst: 16-bit extended ST for volatile mem
>> + * @pm_st_valid: 8-bit ST for persistent memory is valid
>> + * @pm_xst_valid: 16-bit extended ST for persistent memory is valid
>> + * @pm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied
>> + * @pm_st: 8-bit ST for persistent mem
>> + * @pm_xst: 16-bit extended ST for persistent mem
>> + */
>> +union st_info {
>> + struct {
>> + u64 vm_st_valid : 1;
>> + u64 vm_xst_valid : 1;
>> + u64 vm_ph_ignore : 1;
>> + u64 rsvd1 : 5;
>> + u64 vm_st : 8;
>> + u64 vm_xst : 16;
>> + u64 pm_st_valid : 1;
>> + u64 pm_xst_valid : 1;
>> + u64 pm_ph_ignore : 1;
>> + u64 rsvd2 : 5;
>> + u64 pm_st : 8;
>> + u64 pm_xst : 16;
>> + };
>> + u64 value;
>> +};
>> +
>> +static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
>> + union st_info *info)
>> +{
>> + switch (req_type) {
>> + case PCI_TPH_REQ_TPH_ONLY: /* 8-bit tag */
>> + switch (mem_type) {
>> + case TPH_MEM_TYPE_VM:
>> + if (info->vm_st_valid)
>> + return info->vm_st;
>> + break;
>> + case TPH_MEM_TYPE_PM:
>> + if (info->pm_st_valid)
>> + return info->pm_st;
>> + break;
>> + }
>> + break;
>> + case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
>> + switch (mem_type) {
>> + case TPH_MEM_TYPE_VM:
>> + if (info->vm_xst_valid)
>> + return info->vm_xst;
>> + break;
>> + case TPH_MEM_TYPE_PM:
>> + if (info->pm_xst_valid)
>> + return info->pm_xst;
>> + break;
>> + }
>> + break;
>> + default:
>> + return 0;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +#define TPH_ST_DSM_FUNC_INDEX 0xF
>
>
> Where is this coming from? Any spec to refer to?
See comment above: https://members.pcisig.com/wg/PCI-SIG/document/15470
>
>
>> +static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
>> + union st_info *st_out)
>> +{
>> + union acpi_object arg3[3], in_obj, *out_obj;
>> +
>> + if (!acpi_check_dsm(handle, &pci_acpi_dsm_guid, 7,
>
>
> Again, what is this revision 7 based on? Specs?
https://members.pcisig.com/wg/PCI-SIG/document/15470
>
> I'm trying to use this patchset and this call fails. I've tried to use
> lower revision numbers with no success.
>
> FWIW, I got no DSM function at all. This could be a problem with my BIOS
> or system, but in any case, this should be clearer specified in the code.
Please make sure both BIOS is the latest version and BIOS option is
turned on. You should be able to find the GUID defined in PCIe ECN after
dumping ACPI table.
>
>
>> + BIT(TPH_ST_DSM_FUNC_INDEX)))
>> + return AE_ERROR;
>> +
>> + /* DWORD: feature ID (0 for processor cache ST query) */
>> + arg3[0].integer.type = ACPI_TYPE_INTEGER;
>> + arg3[0].integer.value = 0;
>> +
>> + /* DWORD: target UID */
>> + arg3[1].integer.type = ACPI_TYPE_INTEGER;
>> + arg3[1].integer.value = cpu_uid;
>> +
>> + /* QWORD: properties, all 0's */
>> + arg3[2].integer.type = ACPI_TYPE_INTEGER;
>> + arg3[2].integer.value = 0;
>> +
>> + in_obj.type = ACPI_TYPE_PACKAGE;
>> + in_obj.package.count = ARRAY_SIZE(arg3);
>> + in_obj.package.elements = arg3;
>> +
>> + out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, 7,
>> + TPH_ST_DSM_FUNC_INDEX, &in_obj);
>> + if (!out_obj)
>> + return AE_ERROR;
>> +
>> + if (out_obj->type != ACPI_TYPE_BUFFER) {
>> + ACPI_FREE(out_obj);
>> + return AE_ERROR;
>> + }
>> +
>> + st_out->value = *((u64 *)(out_obj->buffer.pointer));
>> +
>> + ACPI_FREE(out_obj);
>> +
>> + return AE_OK;
>> +}
>> +
>> /* Update the TPH Requester Enable field of TPH Control Register */
>> static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
>> {
>> @@ -140,6 +253,47 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
>> return pci_write_config_word(pdev, offset, tag);
>> }
>>
>> +/**
>> + * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
>> + * with a specific CPU
>> + * @pdev: PCI device
>> + * @mem_type: target memory type (volatile or persistent RAM)
>> + * @cpu_uid: associated CPU id
>> + * @tag: Steering Tag to be returned
>> + *
>> + * This function returns the Steering Tag for a target memory that is
>> + * associated with a specific CPU as indicated by cpu_uid.
>> + *
>> + * Returns 0 if success, otherwise negative value (-errno)
>> + */
>> +int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
>> + unsigned int cpu_uid, u16 *tag)
>> +{
>> + struct pci_dev *rp;
>> + acpi_handle rp_acpi_handle;
>> + union st_info info;
>> +
>> + rp = pcie_find_root_port(pdev);
>> + if (!rp || !rp->bus || !rp->bus->bridge)
>> + return -ENODEV;
>> +
>> + rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
>> +
>> + if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, &info) != AE_OK) {
>> + *tag = 0;
>> + return -EINVAL;
>> + }
>> +
>> + *tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
>> +
>> + pci_dbg(pdev, "get steering tag: mem_type=%s, cpu_uid=%d, tag=%#04x\n",
>> + (mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
>> + cpu_uid, *tag);
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(pcie_tph_get_cpu_st);
>> +
>> /**
>> * pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
>> * @pdev: PCI device
>> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
>> index a0c93b97090a..c9f33688b9a9 100644
>> --- a/include/linux/pci-tph.h
>> +++ b/include/linux/pci-tph.h
>> @@ -9,9 +9,23 @@
>> #ifndef LINUX_PCI_TPH_H
>> #define LINUX_PCI_TPH_H
>>
>> +/*
>> + * According to the ECN for PCI Firmware Spec, Steering Tag can be different
>> + * depending on the memory type: Volatile Memory or Persistent Memory. When a
>> + * caller query about a target's Steering Tag, it must provide the target's
>> + * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
>> + */
>> +enum tph_mem_type {
>> + TPH_MEM_TYPE_VM, /* volatile memory */
>> + TPH_MEM_TYPE_PM /* persistent memory */
>> +};
>> +
>> #ifdef CONFIG_PCIE_TPH
>> int pcie_tph_set_st_entry(struct pci_dev *pdev,
>> unsigned int index, u16 tag);
>> +int pcie_tph_get_cpu_st(struct pci_dev *dev,
>> + enum tph_mem_type mem_type,
>> + unsigned int cpu_uid, u16 *tag);
>> bool pcie_tph_enabled(struct pci_dev *pdev);
>> void pcie_disable_tph(struct pci_dev *pdev);
>> int pcie_enable_tph(struct pci_dev *pdev, int mode);
>> @@ -20,6 +34,10 @@ int pcie_tph_modes(struct pci_dev *pdev);
>> static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
>> unsigned int index, u16 tag)
>> { return -EINVAL; }
>> +static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
>> + enum tph_mem_type mem_type,
>> + unsigned int cpu_uid, u16 *tag)
>> +{ return -EINVAL; }
>> static inline bool pcie_tph_enabled(struct pci_dev *pdev) { return false; }
>> static inline void pcie_disable_tph(struct pci_dev *pdev) { }
>> static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 02/12] PCI: Add TPH related register definition
2024-09-05 16:41 ` Bjorn Helgaas
@ 2024-09-16 21:08 ` Wei Huang
0 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-09-16 21:08 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-kernel, linux-doc, netdev, Jonathan.Cameron,
corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
michael.chan, ajit.khaparde, somnath.kotur, andrew.gospodarek,
manoj.panicker2, Eric.VanTassell, vadim.fedorenko, horms,
bagasdotme, bhelgaas, lukas, paul.e.luse, jing2.liu
On 9/5/24 11:41 AM, Bjorn Helgaas wrote:
> On Thu, Sep 05, 2024 at 10:08:33AM -0500, Wei Huang wrote:
>> On 9/4/24 14:52, Bjorn Helgaas wrote:
>>> I think these modes should all include "ST" to clearly delineate
>>> Steering Tags from the Processing Hints. E.g.,
>>>
>>> PCI_TPH_CAP_ST_NO_ST or maybe PCI_TPH_CAP_ST_NONE
>>
>> Can I keep "NO_ST" instead of switching over to "ST_NONE"? First, it
>> matches with PCIe spec. Secondly, IMO "ST_NONE" implies no ST support at
>> all.
>
> Sure. Does PCI_TPH_CAP_ST_NO_ST work for you? That follows the same
> PCI_TPH_CAP_ST_* pattern as below.
So I tweaked a bit in V5 to make them look cleaner: instead of ST_NO_ST,
V5 has ST_NS. Similar pattern applies to other modes:
PCI_TPH_CAP_NO_ST => PCI_TPH_CAP_ST_NS
PCI_TPH_CAP_INT_VEC => PCI_TPH_CAP_ST_IV
PCI_TPH_CAP_DEV_SPEC => PCI_TPH_CAP_ST_DS
Ctrl register has similar changes.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-09-16 18:55 ` Wei Huang
@ 2024-09-18 17:31 ` Alejandro Lucero Palau
2024-09-19 16:14 ` Wei Huang
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Lucero Palau @ 2024-09-18 17:31 UTC (permalink / raw)
To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 9/16/24 19:55, Wei Huang wrote:
>
>
> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>>
>> On 8/22/24 21:41, Wei Huang wrote:
>>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>>
>>> Implement TPH support in Broadcom BNXT device driver. The driver uses
>>> TPH functions to retrieve and configure the device's Steering Tags when
>>> its interrupt affinity is being changed.
>>>
>>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>>> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>>> ---
>>> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78
>>> +++++++++++++++++++++++
>>> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 ++
>>> 2 files changed, 82 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> index ffa74c26ee53..5903cd36b54d 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> @@ -55,6 +55,7 @@
>>> #include <net/page_pool/helpers.h>
>>> #include <linux/align.h>
>>> #include <net/netdev_queues.h>
>>> +#include <linux/pci-tph.h>
>>> #include "bnxt_hsi.h"
>>> #include "bnxt.h"
>>> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp,
>>> bool irq_re_init)
>>> return 0;
>>> }
>>> +static void __bnxt_irq_affinity_notify(struct
>>> irq_affinity_notify *notify,
>>> + const cpumask_t *mask)
>>> +{
>>> + struct bnxt_irq *irq;
>>> + u16 tag;
>>> +
>>> + irq = container_of(notify, struct bnxt_irq, affinity_notify);
>>> + cpumask_copy(irq->cpu_mask, mask);
>>> +
>>> + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>>> + cpumask_first(irq->cpu_mask), &tag))
>>
>>
>> I understand just one cpu from the mask has to be used, but I wonder if
>> some check should be done for ensuring the mask is not mad.
>>
>> This is control path and the related queue is going to be restarted, so
>> maybe a sanity check for ensuring all the cpus in the mask are from the
>> same CCX complex?
>
> I don't think this is always true and we shouldn't warn when this
> happens. There is only one ST can be supported, so the driver need to
> make a good judgement on which ST to be used. But no matter what, ST
> is just a hint - it shouldn't cause any correctness issues in HW, even
> when it is not the optimal target CPU. So warning is unnecessary.
>
1) You can use a "mad" mask for avoiding a specific interrupt to disturb
a specific execution is those cores not part of the mask. But I argue
the ST hint should not be set then.
2) Someone, maybe an automatic script, could try to get the best
performance possible, and a "mad" mask could preclude such outcome
inadvertently.
I agree a warning could not be a good idea because 1, but I would say
adding some way of traceability here could be interesting. A tracepoint
or a new ST field for last hint set for that interrupt/queue.
>>
>> That would be an iteration checking the tag is the same one for all of
>> them. If not, at least a warning stating the tag/CCX/cpu used.
>>
>>
>>> + return;
>>> +
>>> + if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
>>> + return;
>>> +
>>> + if (netif_running(irq->bp->dev)) {
>>> + rtnl_lock();
>>> + bnxt_close_nic(irq->bp, false, false);
>>> + bnxt_open_nic(irq->bp, false, false);
>>> + rtnl_unlock();
>>> + }
>>> +}
>>> +
>>> +static void __bnxt_irq_affinity_release(struct kref __always_unused
>>> *ref)
>>> +{
>>> +}
>>> +
>>> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
>>> +{
>>> + irq_set_affinity_notifier(irq->vector, NULL);
>>> +}
>>> +
>>> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct
>>> bnxt_irq *irq)
>>> +{
>>> + struct irq_affinity_notify *notify;
>>> +
>>> + /* Nothing to do if TPH is not enabled */
>>> + if (!pcie_tph_enabled(bp->pdev))
>>> + return;
>>> +
>>> + irq->bp = bp;
>>> +
>>> + /* Register IRQ affinility notifier */
>>> + notify = &irq->affinity_notify;
>>> + notify->irq = irq->vector;
>>> + notify->notify = __bnxt_irq_affinity_notify;
>>> + notify->release = __bnxt_irq_affinity_release;
>>> +
>>> + irq_set_affinity_notifier(irq->vector, notify);
>>> +}
>>> +
>>> static void bnxt_free_irq(struct bnxt *bp)
>>> {
>>> struct bnxt_irq *irq;
>>> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
>>> free_cpumask_var(irq->cpu_mask);
>>> irq->have_cpumask = 0;
>>> }
>>> +
>>> + bnxt_release_irq_notifier(irq);
>>> +
>>> free_irq(irq->vector, bp->bnapi[i]);
>>> }
>>> irq->requested = 0;
>>> }
>>> +
>>> + /* Disable TPH support */
>>> + pcie_disable_tph(bp->pdev);
>>> }
>>> static int bnxt_request_irq(struct bnxt *bp)
>>> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
>>> if (!(bp->flags & BNXT_FLAG_USING_MSIX))
>>> flags = IRQF_SHARED;
>>> + /* Enable TPH support as part of IRQ request */
>>> + if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
>>> + rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>>> + if (rc)
>>> + netdev_warn(bp->dev, "failed enabling TPH support\n");
>>> + }
>>> +
>>> for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
>>> int map_idx = bnxt_cp_num_to_irq_num(bp, i);
>>> struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
>>> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>>> if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
>>> int numa_node = dev_to_node(&bp->pdev->dev);
>>> + u16 tag;
>>> irq->have_cpumask = 1;
>>> + irq->msix_nr = map_idx;
>>> cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>> irq->cpu_mask);
>>> rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>>> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
>>> irq->vector);
>>> break;
>>> }
>>> +
>>> + bnxt_register_irq_notifier(bp, irq);
>>> +
>>> + /* Init ST table entry */
>>> + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>>> + cpumask_first(irq->cpu_mask),
>>> + &tag))
>>> + break;
>>> +
>>> + pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
>>> }
>>> }
>>> return rc;
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> index 6bbdc718c3a7..ae1abcc1bddf 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
>>> u8 have_cpumask:1;
>>> char name[IFNAMSIZ + 2];
>>> cpumask_var_t cpu_mask;
>>> +
>>> + struct bnxt *bp;
>>> + int msix_nr;
>>> + struct irq_affinity_notify affinity_notify;
>>> };
>>> #define HWRM_RING_ALLOC_TX 0x1
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver
2024-09-18 17:31 ` Alejandro Lucero Palau
@ 2024-09-19 16:14 ` Wei Huang
0 siblings, 0 replies; 43+ messages in thread
From: Wei Huang @ 2024-09-19 16:14 UTC (permalink / raw)
To: Alejandro Lucero Palau, linux-pci, linux-kernel, linux-doc,
netdev
Cc: Jonathan.Cameron, helgaas, corbet, davem, edumazet, kuba, pabeni,
alex.williamson, gospo, michael.chan, ajit.khaparde,
somnath.kotur, andrew.gospodarek, manoj.panicker2,
Eric.VanTassell, vadim.fedorenko, horms, bagasdotme, bhelgaas,
lukas, paul.e.luse, jing2.liu
On 9/18/24 12:31, Alejandro Lucero Palau wrote:
>
> On 9/16/24 19:55, Wei Huang wrote:
>>
>>
>> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>>>
...
>>>
>>> I understand just one cpu from the mask has to be used, but I wonder if
>>> some check should be done for ensuring the mask is not mad.
>>>
>>> This is control path and the related queue is going to be restarted, so
>>> maybe a sanity check for ensuring all the cpus in the mask are from the
>>> same CCX complex?
>>
>> I don't think this is always true and we shouldn't warn when this
>> happens. There is only one ST can be supported, so the driver need to
>> make a good judgement on which ST to be used. But no matter what, ST
>> is just a hint - it shouldn't cause any correctness issues in HW, even
>> when it is not the optimal target CPU. So warning is unnecessary.
>>
>
> 1) You can use a "mad" mask for avoiding a specific interrupt to disturb
> a specific execution is those cores not part of the mask. But I argue
> the ST hint should not be set then.
>
>
> 2) Someone, maybe an automatic script, could try to get the best
> performance possible, and a "mad" mask could preclude such outcome
> inadvertently.
>
For this case, you can use the following command:
echo cpu_id > /proc/irq/nnn/smp_affinity_list
where nnn is the MSI IRQ number associated witht the device. This forces
IRQ to be associated with only one specific CPU.
>
> I agree a warning could not be a good idea because 1, but I would say
> adding some way of traceability here could be interesting. A tracepoint
> or a new ST field for last hint set for that interrupt/queue.
We do have two pci_dbg() in tph.c. You can see the logs with proper
kernel print level. The logs show GET/SET ST values in what PCIe device,
which ST table, and at which index.
>
>
>>>
>>> That would be an iteration checking the tag is the same one for all of
>>> them. If not, at least a warning stating the tag/CCX/cpu used.
>>>
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2024-09-19 16:14 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-22 20:41 [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
2024-08-22 20:41 ` [PATCH V4 01/12] PCI: Introduce PCIe TPH support framework Wei Huang
2024-08-22 20:41 ` [PATCH V4 02/12] PCI: Add TPH related register definition Wei Huang
2024-09-04 19:52 ` Bjorn Helgaas
2024-09-05 15:08 ` Wei Huang
2024-09-05 16:41 ` Bjorn Helgaas
2024-09-16 21:08 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 03/12] PCI/TPH: Add pcie_tph_modes() to query TPH modes Wei Huang
2024-09-04 19:40 ` Bjorn Helgaas
2024-09-05 14:46 ` Wei Huang
2024-09-05 15:12 ` Bjorn Helgaas
2024-08-22 20:41 ` [PATCH V4 04/12] PCI/TPH: Add pcie_enable_tph() to enable TPH Wei Huang
2024-09-13 11:35 ` Alejandro Lucero Palau
2024-08-22 20:41 ` [PATCH V4 05/12] PCI/TPH: Add pcie_disable_tph() to disable TPH Wei Huang
2024-08-22 20:41 ` [PATCH V4 06/12] PCI/TPH: Add pcie_tph_enabled() to check TPH state Wei Huang
2024-08-22 20:41 ` [PATCH V4 07/12] PCI/TPH: Add pcie_tph_set_st_entry() to set ST tag Wei Huang
2024-08-26 11:46 ` kernel test robot
2024-08-22 20:41 ` [PATCH V4 08/12] PCI/TPH: Add pcie_tph_get_cpu_st() to get " Wei Huang
2024-09-14 10:10 ` Alejandro Lucero Palau
2024-09-16 18:58 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 09/12] PCI/TPH: Add save/restore support for TPH Wei Huang
2024-09-04 20:11 ` Bjorn Helgaas
2024-08-22 20:41 ` [PATCH V4 10/12] PCI/TPH: Add pci=nostmode to force TPH No ST Mode Wei Huang
2024-08-22 20:41 ` [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver Wei Huang
2024-08-26 20:22 ` Jakub Kicinski
2024-08-26 20:56 ` Andy Gospodarek
2024-08-26 22:49 ` Jakub Kicinski
2024-08-27 14:50 ` Andy Gospodarek
2024-08-27 19:05 ` Jakub Kicinski
2024-08-27 19:20 ` Michael Chan
2024-09-05 15:06 ` Bjorn Helgaas
2024-09-11 15:37 ` Alejandro Lucero Palau
2024-09-16 18:55 ` Wei Huang
2024-09-18 17:31 ` Alejandro Lucero Palau
2024-09-19 16:14 ` Wei Huang
2024-08-22 20:41 ` [PATCH V4 12/12] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang
2024-09-03 22:42 ` [PATCH V4 00/12] PCIe TPH and cache direct injection support Wei Huang
2024-09-04 18:49 ` Bjorn Helgaas
2024-09-04 19:48 ` Wei Huang
2024-09-04 20:03 ` Bjorn Helgaas
2024-09-04 20:20 ` Bjorn Helgaas
2024-09-05 15:45 ` Wei Huang
2024-09-05 16:44 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).