Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH v6 0/6] vfio/pci: Add PCIe TPH support
@ 2026-05-06  9:46 Chengwen Feng
  2026-05-06  9:46 ` [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

This patchset enables userspace control over PCIe TPH steering tags,
motivated by the following considerations:

1. Why userspace needs the capability to control steering tags:
   When PCIe devices are fully owned by userspace workloads such as DPDK
   and SPDK, only userspace has full knowledge of core binding policies
   and traffic distribution strategies. Without this series, userspace
   cannot enable TPH or configure steering tags, leaving built-in PCIe
   performance optimizations unused in high-throughput polling I/O
   scenarios.

2. Why this interface must be implemented in VFIO:
   VFIO is the standard, secure community solution for granting full
   PCIe device ownership to userspace. Existing kernel TPH interfaces
   are designed purely for in-kernel drivers. For user-owned devices,
   VFIO provides the only isolated and correct path to expose per-device
   TPH management.

TPH supports both IV and DS modes. Since device-specific (DS) TPH mode
introduces cross-VM isolation risks such as untrusted guests programming
arbitrary steering tags to impact other domains, so a new module parameter
`enable_unsafe_tph_ds_mode` is added. It defaults to off, and blocks all
unsafe DS-mode TPH operations when disabled.

To restrict abuse of SET_ST and prevent arbitrary steering tag programming
from userspace, the interface only accepts explicit CPU ID, memory type
and index inputs. The kernel resolves the corresponding steering tag
internally before programming, limiting userspace to controlled,
index-based configuration.

Based on earlier RFC work by Wathsala Vithanage

v6:
- Address Alex's comment on [1/6] commit.
- Fix Sashiko review comments:
  Add tph_cap validation for pcie_tph_get_st_modes/st_table_loc.
  Add argsz validation for each op cmd.
  Move disable tph from ioctl-reset to register.
  Verify reserved field for get/set ST op.
  Fix ABI mismatch due to pointer arithmetic of get/st ST op.
v5:
- Fix pcie_tph_get_st_table_loc() field extraction bug
- Add disable TPH in vfio_pci_ioctl_reset() to clean software state
v4:
- Address Jason's comment of restrict device-specific mode under module
  param control.
- Rename module param to enable_unsafe_tph_ds_mode
v3:
- Add module param enable_unsafe_tph_ds to guard unsafe usage
  of TPH device-specific mode with no ST table
v2:
- Export pcie_tph_get_st_modes()
- Add detailed comment for UAPI structures and operations
- Add batch entry limit VFIO_TPH_MAX_ENTRIES
- Improve robustness and error handling

Chengwen Feng (6):
  PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
  PCI/TPH: Export pcie_tph_get_st_modes() for external use
  vfio/pci: Add PCIe TPH interface with capability query
  vfio/pci: Add PCIe TPH enable/disable support
  vfio/pci: Add PCIe TPH GET_ST interface
  vfio/pci: Add PCIe TPH SET_ST interface

 drivers/pci/tph.c                |  26 +++-
 drivers/vfio/pci/vfio_pci.c      |  13 +-
 drivers/vfio/pci/vfio_pci_core.c | 257 ++++++++++++++++++++++++++++++-
 include/linux/pci-tph.h          |   7 +
 include/linux/vfio_pci_core.h    |   3 +-
 include/uapi/linux/vfio.h        | 131 ++++++++++++++++
 6 files changed, 426 insertions(+), 11 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
  2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-05-06  9:46 ` Chengwen Feng
  2026-05-06 15:54   ` Bjorn Helgaas
  2026-05-06  9:46 ` [PATCH v6 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.

This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.

While this change appears to be a no-op within tph.c, the external caller
mlx5_st_create() relies on the documented function behavior, making this
fix necessary.

Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
---
 drivers/pci/tph.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 91145e8d9d95..f17b74b5fb1e 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -170,7 +170,7 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
 
 	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
 
-	return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+	return reg & PCI_TPH_CAP_LOC_MASK;
 }
 EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
 
@@ -183,11 +183,7 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
 	u32 reg;
 	u32 loc;
 
-	/* Check ST table location first */
 	loc = pcie_tph_get_st_table_loc(pdev);
-
-	/* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
-	loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
 	if (loc != PCI_TPH_LOC_CAP)
 		return 0;
 
@@ -316,8 +312,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
 	set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
 
 	loc = pcie_tph_get_st_table_loc(pdev);
-	/* Convert loc to match with PCI_TPH_LOC_* */
-	loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
 
 	switch (loc) {
 	case PCI_TPH_LOC_MSIX:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use
  2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
  2026-05-06  9:46 ` [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-05-06  9:46 ` Chengwen Feng
  2026-05-06 12:39   ` sashiko-bot
  2026-05-06  9:46 ` [PATCH v6 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

Export the helper to retrieve supported PCIe TPH steering tag modes so
that drivers like VFIO can query and expose device capabilities to
userspace.

Add stub functions for pcie_tph_get_st_table_size() and
pcie_tph_get_st_table_loc() when !CONFIG_PCI_TPH.

Add tph_cap validation for pcie_tph_get_st_modes() and
pcie_tph_get_st_table_loc() to prevent invalid PCI configuration
space access when TPH is not supported.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/tph.c       | 20 ++++++++++++++++++--
 include/linux/pci-tph.h |  7 +++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index f17b74b5fb1e..ba31b010f67a 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
 	pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
 }
 
-static u8 get_st_modes(struct pci_dev *pdev)
+/**
+ * pcie_tph_get_st_modes - Get supported Steering Tag modes
+ * @pdev: PCI device to query
+ *
+ * Return:
+ *  Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
+ *                                 PCI_TPH_CAP_ST_DS)
+ */
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
 {
 	u32 reg;
 
+	if (!pdev->tph_cap)
+		return 0;
+
 	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
 	reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
 
 	return reg;
 }
+EXPORT_SYMBOL(pcie_tph_get_st_modes);
 
 /**
  * pcie_tph_get_st_table_loc - Return the device's ST table location
@@ -168,6 +180,9 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
 {
 	u32 reg;
 
+	if (!pdev->tph_cap)
+		return PCI_TPH_LOC_NONE;
+
 	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
 
 	return reg & PCI_TPH_CAP_LOC_MASK;
@@ -183,6 +198,7 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
 	u32 reg;
 	u32 loc;
 
+	/* Check ST table location first */
 	loc = pcie_tph_get_st_table_loc(pdev);
 	if (loc != PCI_TPH_LOC_CAP)
 		return 0;
@@ -394,7 +410,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
 
 	/* Sanitize and check ST mode compatibility */
 	mode &= PCI_TPH_CTRL_MODE_SEL_MASK;
-	dev_modes = get_st_modes(pdev);
+	dev_modes = pcie_tph_get_st_modes(pdev);
 	if (!((1 << mode) & dev_modes))
 		return -EINVAL;
 
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..586c75b19e01 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -30,6 +30,7 @@ void pcie_disable_tph(struct pci_dev *pdev);
 int pcie_enable_tph(struct pci_dev *pdev, int mode);
 u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
 u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
 #else
 static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
 					unsigned int index, u16 tag)
@@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
 static inline void pcie_disable_tph(struct pci_dev *pdev) { }
 static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
 { return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return 0x7FF; /* Values that do not appear in normal case */ }
+static inline u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
+{ return 0; }
 #endif
 
 #endif /* LINUX_PCI_TPH_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/6] vfio/pci: Add PCIe TPH interface with capability query
  2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
  2026-05-06  9:46 ` [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
  2026-05-06  9:46 ` [PATCH v6 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-06  9:46 ` Chengwen Feng
  2026-05-06 18:48   ` sashiko-bot
  2026-05-06  9:46 ` [PATCH v6 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

Add VFIO_DEVICE_PCI_TPH IOCTL to allow userspace to query device TPH
capabilities, supported modes, and steering tag table information.

Add module parameter 'enable_unsafe_tph_ds_mode' to restrict unsafe
device-specific TPH mode to trusted userspace only.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci.c      |  13 ++-
 drivers/vfio/pci/vfio_pci_core.c |  55 ++++++++++++-
 include/linux/vfio_pci_core.h    |   3 +-
 include/uapi/linux/vfio.h        | 131 +++++++++++++++++++++++++++++++
 4 files changed, 199 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..40bf5aa9fd0b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
 module_param(disable_denylist, bool, 0444);
 MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
 
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph_ds_mode;
+module_param(enable_unsafe_tph_ds_mode, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph_ds_mode, "Enable UNSAFE TPH device-specific (DS) mode. This mode provides weak isolation, cannot be safely used for virtual machines. If you do not know what this is for, step away. (default: false)");
+#endif
+
 static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
 {
 	switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
 {
 	int ret;
 	bool is_disable_vga = true;
+	bool is_enable_unsafe_tph_ds_mode = false;
 
 #ifdef CONFIG_VFIO_PCI_VGA
 	is_disable_vga = disable_vga;
 #endif
+#ifdef CONFIG_PCIE_TPH
+	is_enable_unsafe_tph_ds_mode = enable_unsafe_tph_ds_mode;
+#endif
 
-	vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+	vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+				 is_enable_unsafe_tph_ds_mode);
 
 	/* Register and scan for devices */
 	ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3f8d093aacf8..d98d04bad4a3 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
 #include <linux/sched/mm.h>
 #include <linux/iommufd.h>
 #include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -41,6 +42,7 @@
 static bool nointxmask;
 static bool disable_vga;
 static bool disable_idle_d3;
+static bool enable_unsafe_tph_ds_mode;
 
 static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
 {
@@ -1461,6 +1463,53 @@ static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				  ioeventfd.fd);
 }
 
+static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
+				struct vfio_device_pci_tph_op *op,
+				void __user *uarg)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	u8 mode = pcie_tph_get_st_modes(pdev);
+	struct vfio_pci_tph_cap cap = {0};
+
+	if (op->argsz < offsetof(struct vfio_device_pci_tph_op, cap) +
+			sizeof(struct vfio_pci_tph_cap))
+		return -EINVAL;
+
+	if (mode == 0 || mode == PCI_TPH_CAP_ST_NS)
+		return -EOPNOTSUPP;
+
+	if (mode & PCI_TPH_CAP_ST_IV)
+		cap.supported_modes |= VFIO_PCI_TPH_MODE_IV;
+	if (mode & PCI_TPH_CAP_ST_DS)
+		cap.supported_modes |= VFIO_PCI_TPH_MODE_DS;
+
+	if (pcie_tph_get_st_table_loc(pdev) != PCI_TPH_LOC_NONE)
+		cap.st_table_sz = pcie_tph_get_st_table_size(pdev);
+
+	if (copy_to_user(uarg, &cap, sizeof(cap)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
+			      void __user *uarg)
+{
+	struct vfio_device_pci_tph_op op = {0};
+	size_t minsz = sizeof(op.argsz) + sizeof(op.op);
+
+	if (copy_from_user(&op, uarg, minsz))
+		return -EFAULT;
+
+	switch (op.op) {
+	case VFIO_PCI_TPH_GET_CAP:
+		return vfio_pci_tph_get_cap(vdev, &op, uarg + minsz);
+	default:
+		/* Other ops are not implemented yet */
+		return -EINVAL;
+	}
+}
+
 long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 			 unsigned long arg)
 {
@@ -1483,6 +1532,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 		return vfio_pci_ioctl_reset(vdev, uarg);
 	case VFIO_DEVICE_SET_IRQS:
 		return vfio_pci_ioctl_set_irqs(vdev, uarg);
+	case VFIO_DEVICE_PCI_TPH:
+		return vfio_pci_ioctl_tph(vdev, uarg);
 	default:
 		return -ENOTTY;
 	}
@@ -2570,11 +2621,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
 }
 
 void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
-			      bool is_disable_idle_d3)
+			      bool is_disable_idle_d3,
+			      bool is_enable_unsafe_tph_ds_mode)
 {
 	nointxmask = is_nointxmask;
 	disable_vga = is_disable_vga;
 	disable_idle_d3 = is_disable_idle_d3;
+	enable_unsafe_tph_ds_mode = is_enable_unsafe_tph_ds_mode;
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 2ebba746c18f..5af2a2e04ca7 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -157,7 +157,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
 				      const struct vfio_pci_regops *ops,
 				      size_t size, u32 flags, void *data);
 void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
-			      bool is_disable_idle_d3);
+			      bool is_disable_idle_d3,
+			      bool is_enable_unsafe_tph_ds_mode);
 void vfio_pci_core_close_device(struct vfio_device *core_vdev);
 int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
 void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..f899521e52c6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1321,6 +1321,137 @@ struct vfio_precopy_info {
 
 #define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21)
 
+/**
+ * struct vfio_pci_tph_cap - PCIe TPH capability information
+ * @supported_modes: Supported TPH operating modes
+ * @st_table_sz: Number of entries in ST table; 0 means no ST table
+ * @reserved: Must be zero
+ *
+ * Used with VFIO_PCI_TPH_GET_CAP operation to return device
+ * TLP Processing Hints (TPH) capabilities to userspace.
+ */
+struct vfio_pci_tph_cap {
+	__u8  supported_modes;
+#define VFIO_PCI_TPH_MODE_IV	(1u << 0) /* Interrupt vector */
+#define VFIO_PCI_TPH_MODE_DS	(1u << 1) /* Device specific */
+	__u8  reserved0;
+	__u16 st_table_sz;
+	__u32 reserved;
+};
+
+/**
+ * struct vfio_pci_tph_ctrl - TPH enable control structure
+ * @mode: Selected TPH operating mode (VFIO_PCI_TPH_MODE_*)
+ * @reserved: Must be zero
+ *
+ * Used with VFIO_PCI_TPH_ENABLE operation to specify the
+ * operating mode when enabling TPH on the device.
+ */
+struct vfio_pci_tph_ctrl {
+	__u8 mode;
+	__u8 reserved[7];
+};
+
+/**
+ * struct vfio_pci_tph_entry - Single TPH steering tag entry
+ * @cpu: CPU identifier for steering tag calculation
+ * @mem_type: Memory type (VFIO_PCI_TPH_MEM_TYPE_*)
+ * @reserved0: Must be zero
+ * @index: ST table index for programming
+ * @st: Unused for SET_ST
+ * @reserved1: Must be zero
+ *
+ * For VFIO_PCI_TPH_GET_ST:
+ *   Userspace sets @cpu and @mem_type; kernel returns @st.
+ *
+ * For VFIO_PCI_TPH_SET_ST:
+ *   Userspace sets @index, @cpu, and @mem_type.
+ *   Kernel internally computes the steering tag and programs
+ *   it into the specified @index.
+ *
+ *   If @cpu == U32_MAX, kernel clears the steering tag at
+ *   the specified @index.
+ */
+struct vfio_pci_tph_entry {
+	__u32 cpu;
+	__u8  mem_type;
+#define VFIO_PCI_TPH_MEM_TYPE_VM	0
+#define VFIO_PCI_TPH_MEM_TYPE_PM	1
+	__u8  reserved0;
+	__u16 index;
+	__u16 st;
+	__u16 reserved1;
+};
+
+/**
+ * struct vfio_pci_tph_st - Batch steering tag request
+ * @count: Number of entries in the array
+ * @reserved: Must be zero
+ * @ents: Flexible array of steering tag entries
+ *
+ * Container structure for batch get/set operations.
+ * Used with both VFIO_PCI_TPH_GET_ST and VFIO_PCI_TPH_SET_ST.
+ */
+struct vfio_pci_tph_st {
+	__u32 count;
+	__u32 reserved;
+	struct vfio_pci_tph_entry ents[];
+#define VFIO_PCI_TPH_MAX_ENTRIES    2048
+};
+
+/**
+ * struct vfio_device_pci_tph_op - Argument for VFIO_DEVICE_PCI_TPH
+ * @argsz: User allocated size of this structure
+ * @op: TPH operation (VFIO_PCI_TPH_*)
+ * @cap: Capability data for GET_CAP
+ * @ctrl: Control data for ENABLE
+ * @st: Batch entry data for GET_ST/SET_ST
+ *
+ * @argsz must be set by the user to the size of the structure
+ * being executed. Kernel validates input and returns data
+ * only within the specified size.
+ *
+ * Operations:
+ * - VFIO_PCI_TPH_GET_CAP: Query device TPH capabilities.
+ * - VFIO_PCI_TPH_ENABLE:  Enable TPH using mode from &ctrl.
+ * - VFIO_PCI_TPH_DISABLE: Disable TPH on the device.
+ * - VFIO_PCI_TPH_GET_ST:  Retrieve CPU's steering tags.
+ *                         Valid only for Device-Specific mode and
+ *                         no ST table is present.
+ * - VFIO_PCI_TPH_SET_ST:  Program steering tags into device table.
+ *                         If any entry fails, previously programmed entries
+ *                         are rolled back to 0 before returning error.
+ */
+struct vfio_device_pci_tph_op {
+	__u32 argsz;
+	__u32 op;
+#define VFIO_PCI_TPH_GET_CAP	0
+#define VFIO_PCI_TPH_ENABLE	1
+#define VFIO_PCI_TPH_DISABLE	2
+#define VFIO_PCI_TPH_GET_ST	3
+#define VFIO_PCI_TPH_SET_ST	4
+	union {
+		struct vfio_pci_tph_cap cap;
+		struct vfio_pci_tph_ctrl ctrl;
+		struct vfio_pci_tph_st st;
+	};
+};
+
+/**
+ * VFIO_DEVICE_PCI_TPH - _IO(VFIO_TYPE, VFIO_BASE + 22)
+ *
+ * IOCTL for managing PCIe TLP Processing Hints (TPH) on
+ * a VFIO-assigned PCI device. Provides operations to query
+ * device capabilities, enable/disable TPH, retrieve CPU's
+ * steering tags, and program steering tag tables.
+ *
+ * Return: 0 on success, negative errno on failure.
+ *         -EOPNOTSUPP: Operation not supported
+ *         -ENODEV: Device or required functionality not present
+ *         -EINVAL: Invalid argument or TPH not supported
+ */
+#define VFIO_DEVICE_PCI_TPH	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
 /*
  * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
  * state with the platform-based power management.  Device use of lower power
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/6] vfio/pci: Add PCIe TPH enable/disable support
  2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (2 preceding siblings ...)
  2026-05-06  9:46 ` [PATCH v6 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
@ 2026-05-06  9:46 ` Chengwen Feng
  2026-05-06 19:21   ` sashiko-bot
  2026-05-06  9:46 ` [PATCH v6 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
  2026-05-06  9:46 ` [PATCH v6 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
  5 siblings, 1 reply; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

Add support to enable and disable TPH function with mode selection.

Restrict unsafe device-specific TPH mode to be allowed only when module
parameter enable_unsafe_tph_ds_mode=1 is set.

Disable TPH when taking over ownership of the device.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 43 ++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d98d04bad4a3..69f666d20c4a 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1492,6 +1492,43 @@ static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
+static int vfio_pci_tph_enable(struct vfio_pci_core_device *vdev,
+			      struct vfio_device_pci_tph_op *op,
+			      void __user *uarg)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_tph_ctrl ctrl;
+	int mode;
+
+	if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, ctrl) +
+			sizeof(struct vfio_pci_tph_ctrl))
+		return -EINVAL;
+
+	if (copy_from_user(&ctrl, uarg, sizeof(ctrl)))
+		return -EFAULT;
+
+	if (ctrl.mode != VFIO_PCI_TPH_MODE_IV &&
+	    ctrl.mode != VFIO_PCI_TPH_MODE_DS)
+		return -EINVAL;
+
+	if (ctrl.mode == VFIO_PCI_TPH_MODE_DS && !enable_unsafe_tph_ds_mode)
+		return -EOPNOTSUPP;
+
+	/* Reserved must be zero */
+	if (memchr_inv(ctrl.reserved, 0, sizeof(ctrl.reserved)))
+		return -EINVAL;
+
+	mode = (ctrl.mode == VFIO_PCI_TPH_MODE_IV) ? PCI_TPH_ST_IV_MODE :
+						     PCI_TPH_ST_DS_MODE;
+	return pcie_enable_tph(pdev, mode);
+}
+
+static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
+{
+	pcie_disable_tph(vdev->pdev);
+	return 0;
+}
+
 static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
 			      void __user *uarg)
 {
@@ -1504,6 +1541,10 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
 	switch (op.op) {
 	case VFIO_PCI_TPH_GET_CAP:
 		return vfio_pci_tph_get_cap(vdev, &op, uarg + minsz);
+	case VFIO_PCI_TPH_ENABLE:
+		return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
+	case VFIO_PCI_TPH_DISABLE:
+		return vfio_pci_tph_disable(vdev);
 	default:
 		/* Other ops are not implemented yet */
 		return -EINVAL;
@@ -2259,6 +2300,8 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
 	ret = vfio_register_group_dev(&vdev->vdev);
 	if (ret)
 		goto out_power;
+	/* Disable TPH when taking over ownership of the device */
+	pcie_disable_tph(pdev);
 	return 0;
 
 out_power:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 5/6] vfio/pci: Add PCIe TPH GET_ST interface
  2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (3 preceding siblings ...)
  2026-05-06  9:46 ` [PATCH v6 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
@ 2026-05-06  9:46 ` Chengwen Feng
  2026-05-06 19:57   ` sashiko-bot
  2026-05-06  9:46 ` [PATCH v6 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
  5 siblings, 1 reply; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

Add support to batch get CPU steering tags for device-specific TPH mode
that does not implement an ST table. This interface requires enabling the
'enable_unsafe_tph_ds_mode' module parameter.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 73 ++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 69f666d20c4a..45e641ab2a88 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1529,6 +1529,77 @@ static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
 	return 0;
 }
 
+static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
+			       struct vfio_device_pci_tph_op *op,
+			       void __user *uarg)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_tph_entry *ents;
+	struct vfio_pci_tph_st st;
+	enum tph_mem_type mtype;
+	size_t size, ents_off;
+	int i, err;
+
+	if (!enable_unsafe_tph_ds_mode ||
+		pcie_tph_get_st_table_loc(pdev) != PCI_TPH_LOC_NONE)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&st, uarg, sizeof(st)))
+		return -EFAULT;
+
+	/* Check reserved fields are zero */
+	if (memchr_inv(&st.reserved, 0, sizeof(st.reserved)))
+		return -EINVAL;
+
+	if (!st.count || st.count > VFIO_PCI_TPH_MAX_ENTRIES)
+		return -EINVAL;
+
+	size = st.count * sizeof(*ents);
+	if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) +
+			sizeof(struct vfio_pci_tph_st) + size)
+		return -EINVAL;
+
+	ents = kvmalloc(size, GFP_KERNEL);
+	if (!ents)
+		return -ENOMEM;
+
+	ents_off = offsetof(struct vfio_pci_tph_st, ents);
+	if (copy_from_user(ents, uarg + ents_off, size)) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	for (i = 0; i < st.count; i++) {
+		/* Check reserved fields are zero */
+		if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
+		    memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1))) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_VM) {
+			mtype = TPH_MEM_TYPE_VM;
+		} else if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_PM) {
+			mtype = TPH_MEM_TYPE_PM;
+		} else {
+			err = -EINVAL;
+			goto out;
+		}
+
+		err = pcie_tph_get_cpu_st(pdev, mtype, ents[i].cpu,
+					  &ents[i].st);
+		if (err)
+			goto out;
+	}
+
+	if (copy_to_user(uarg + ents_off, ents, size))
+		err = -EFAULT;
+
+out:
+	kvfree(ents);
+	return err;
+}
+
 static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
 			      void __user *uarg)
 {
@@ -1545,6 +1616,8 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
 		return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
 	case VFIO_PCI_TPH_DISABLE:
 		return vfio_pci_tph_disable(vdev);
+	case VFIO_PCI_TPH_GET_ST:
+		return vfio_pci_tph_get_st(vdev, &op, uarg + minsz);
 	default:
 		/* Other ops are not implemented yet */
 		return -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 6/6] vfio/pci: Add PCIe TPH SET_ST interface
  2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (4 preceding siblings ...)
  2026-05-06  9:46 ` [PATCH v6 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
@ 2026-05-06  9:46 ` Chengwen Feng
  2026-05-06 20:22   ` sashiko-bot
  5 siblings, 1 reply; 13+ messages in thread
From: Chengwen Feng @ 2026-05-06  9:46 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
	liuyonglong, kvm, linux-pci

Add VFIO_PCI_TPH_SET_ST operation to support batch programming of steering
tag entries. If any entry fails, roll back successfully programmed entries
to 0 to prevent inconsistent device state.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 86 ++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 45e641ab2a88..867d8694b56b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1600,6 +1600,90 @@ static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
 	return err;
 }
 
+static int vfio_pci_tph_set_st(struct vfio_pci_core_device *vdev,
+			       struct vfio_device_pci_tph_op *op,
+			       void __user *uarg)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_tph_entry *ents;
+	struct vfio_pci_tph_st st;
+	enum tph_mem_type mtype;
+	size_t size, ents_off;
+	int i = 0, j, err;
+	u32 tab_loc;
+	u16 st_val;
+
+	tab_loc = pcie_tph_get_st_table_loc(pdev);
+	if (tab_loc != PCI_TPH_LOC_CAP && tab_loc != PCI_TPH_LOC_MSIX)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&st, uarg, sizeof(st)))
+		return -EFAULT;
+
+	if (!st.count || st.count > VFIO_PCI_TPH_MAX_ENTRIES)
+		return -EINVAL;
+
+	/* Check reserved fields are zero */
+	if (memchr_inv(&st.reserved, 0, sizeof(st.reserved)))
+		return -EINVAL;
+
+	size = st.count * sizeof(*ents);
+	if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) +
+			sizeof(struct vfio_pci_tph_st) + size)
+		return -EINVAL;
+
+	ents = kvmalloc(size, GFP_KERNEL);
+	if (!ents)
+		return -ENOMEM;
+
+	ents_off = offsetof(struct vfio_pci_tph_st, ents);
+	if (copy_from_user(ents, uarg + ents_off, size)) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	for (; i < st.count; i++) {
+		/* Check reserved fields are zero */
+		if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
+		    memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1))) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		if (ents[i].cpu == U32_MAX) {
+			err = pcie_tph_set_st_entry(pdev, ents[i].index, 0);
+			if (err)
+				goto out;
+			continue;
+		}
+
+		if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_VM) {
+			mtype = TPH_MEM_TYPE_VM;
+		} else if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_PM) {
+			mtype = TPH_MEM_TYPE_PM;
+		} else {
+			err = -EINVAL;
+			goto out;
+		}
+
+		err = pcie_tph_get_cpu_st(pdev, mtype, ents[i].cpu, &st_val);
+		if (err)
+			goto out;
+		err = pcie_tph_set_st_entry(pdev, ents[i].index, st_val);
+		if (err)
+			goto out;
+	}
+
+out:
+	if (err) {
+		/* Roll back previously programmed entries to 0 */
+		for (j = 0; j < i; j++)
+			pcie_tph_set_st_entry(pdev, ents[j].index, 0);
+	}
+	kvfree(ents);
+	return err;
+}
+
 static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
 			      void __user *uarg)
 {
@@ -1618,6 +1702,8 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
 		return vfio_pci_tph_disable(vdev);
 	case VFIO_PCI_TPH_GET_ST:
 		return vfio_pci_tph_get_st(vdev, &op, uarg + minsz);
+	case VFIO_PCI_TPH_SET_ST:
+		return vfio_pci_tph_set_st(vdev, &op, uarg + minsz);
 	default:
 		/* Other ops are not implemented yet */
 		return -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use
  2026-05-06  9:46 ` [PATCH v6 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-06 12:39   ` sashiko-bot
  0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-06 12:39 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Missing error handling for pci_read_config_dword() in the newly exported pcie_tph_get_st_modes() leads to falsely reporting all TPH modes as supported when a device is disconnected.
- [Low] The !CONFIG_PCI_TPH stub for pcie_tph_get_st_table_loc() returns an inconsistent out-of-bounds value (0x7FF) instead of PCI_TPH_LOC_NONE.
--

commit d23fc69219a490dd1458be7c3d6f50ae929f70cd
Author: Chengwen Feng <fengchengwen@huawei.com>

PCI/TPH: Export pcie_tph_get_st_modes() for external use

This commit exports the pcie_tph_get_st_modes() helper so drivers like VFIO
can query supported PCIe TPH steering tag modes. It also adds stub functions
for when !CONFIG_PCI_TPH and tph_cap validation.

> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index f17b74b5fb1e3..ba31b010f67ad 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
> -static u8 get_st_modes(struct pci_dev *pdev)
> +/**
> + * pcie_tph_get_st_modes - Get supported Steering Tag modes
> + * @pdev: PCI device to query
> + *
> + * Return:
> + *  Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
> + *                                 PCI_TPH_CAP_ST_DS)
> + */
> +u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
>  {
>  	u32 reg;
>  
> +	if (!pdev->tph_cap)
> +		return 0;
> +
>  	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);

If pci_read_config_dword() fails (e.g., when a device is disconnected), it
populates the output variable with an error response (~0 or 0xFFFFFFFF).

Since the return value is ignored here, will this masking operation:

>  	reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;

evaluate to 0x07 and incorrectly report all three Steering Tag modes as
supported during a device disconnection?

>  
>  	return reg;
>  }
> +EXPORT_SYMBOL(pcie_tph_get_st_modes);

[ ... ]

> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index be68cd17f2f83..586c75b19e019 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
[ ... ]
> @@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
>  static inline void pcie_disable_tph(struct pci_dev *pdev) { }
>  static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
>  { return -EINVAL; }
> +static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
> +{ return 0; }
> +static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
> +{ return 0x7FF; /* Values that do not appear in normal case */ }

When TPH is enabled but a device lacks TPH capabilities, the live function
safely returns PCI_TPH_LOC_NONE.

Does returning 0x7FF in this stub introduce an inconsistency in the API that
forces callers to handle arbitrary magic numbers instead of relying on
PCI_TPH_LOC_NONE?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260506094623.29327-1-fengchengwen@huawei.com?part=2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
  2026-05-06  9:46 ` [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-05-06 15:54   ` Bjorn Helgaas
  0 siblings, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2026-05-06 15:54 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: alex, jgg, wathsala.vithanage, wei.huang2, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

On Wed, May 06, 2026 at 05:46:18PM +0800, Chengwen Feng wrote:
> pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
> field value to bit 0. But the function is designed to return raw
> PCI_TPH_LOC_* values as defined in the function comment.
> 
> This causes incorrect ST table location detection. Fix it by using bitwise
> AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
> the function specification.
> 
> While this change appears to be a no-op within tph.c, the external caller
> mlx5_st_create() relies on the documented function behavior, making this
> fix necessary.

Previously, pcie_tph_get_st_table_loc() returned 0x0 (for
PCI_TPH_LOC_NONE), 0x2 (for PCI_TPH_LOC_CAP), or 0x4 (for
PCI_TPH_LOC_MSIX).

mlx5_st_create() is currently the only external caller.  It only
checks for PCI_TPH_LOC_NONE, which is 0 regardless of the FIELD_GET(),
so I don't think this actually fixes mlx5_st_create().

Probably still worth a stable backport because other drivers may call
pcie_tph_get_st_table_loc() in the future, and if they depend on
PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX, and are backported to stable
kernels, they will need this.

I might reword that last paragraph to avoid the implication that this
actually fixes something in mlx5:

  This doesn't make a difference to mlx5_st_create(), the lone
  external caller, because it only checks for PCI_TPH_LOC_NONE (0),
  but will be needed for callers that check for PCI_TPH_LOC_CAP or
  PCI_TPH_LOC_MSIX.

> Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>

Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>

Feel free to merge along with the rest, I assume via the VFIO tree.

> ---
>  drivers/pci/tph.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 91145e8d9d95..f17b74b5fb1e 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -170,7 +170,7 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
>  
>  	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
>  
> -	return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
> +	return reg & PCI_TPH_CAP_LOC_MASK;
>  }
>  EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
>  
> @@ -183,11 +183,7 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
>  	u32 reg;
>  	u32 loc;
>  
> -	/* Check ST table location first */
>  	loc = pcie_tph_get_st_table_loc(pdev);
> -
> -	/* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
> -	loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
>  	if (loc != PCI_TPH_LOC_CAP)
>  		return 0;
>  
> @@ -316,8 +312,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
>  	set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
>  
>  	loc = pcie_tph_get_st_table_loc(pdev);
> -	/* Convert loc to match with PCI_TPH_LOC_* */
> -	loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
>  
>  	switch (loc) {
>  	case PCI_TPH_LOC_MSIX:
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/6] vfio/pci: Add PCIe TPH interface with capability query
  2026-05-06  9:46 ` [PATCH v6 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
@ 2026-05-06 18:48   ` sashiko-bot
  0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-06 18:48 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The VFIO_DEVICE_PCI_TPH IOCTL reads PCI configuration space without ensuring the device is runtime resumed, potentially reporting bogus capabilities and causing hardware errors.
--

commit c1b7fcc4c0c41ea45b0bcdd16ef1d416fcc244c3
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Add PCIe TPH interface with capability query

This commit adds the VFIO_DEVICE_PCI_TPH IOCTL to allow userspace to query
device TPH capabilities. It also introduces a module parameter to restrict
unsafe device-specific TPH mode to trusted userspace.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3f8d093aacf8a..d98d04bad4a39 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1461,6 +1463,53 @@ static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
>  				  ioeventfd.fd);
>  }
>  
> +static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
> +				struct vfio_device_pci_tph_op *op,
> +				void __user *uarg)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	u8 mode = pcie_tph_get_st_modes(pdev);

Does this ioctl need to ensure the device is runtime resumed before reading
its configuration space?

If the device is in a low-power state (like D3cold), won't the configuration
reads in pcie_tph_get_st_modes() return 0xFFFFFFFF?

This could cause the kernel to falsely report that all TPH modes are
supported and advertise an incorrect ST table size.

Should this handler (or vfio_pci_ioctl_tph) call
pm_runtime_resume_and_get() before invoking hardware-touching operations,
and pm_runtime_put() afterward?

> +	struct vfio_pci_tph_cap cap = {0};
> +
> +	if (op->argsz < offsetof(struct vfio_device_pci_tph_op, cap) +
> +			sizeof(struct vfio_pci_tph_cap))
> +		return -EINVAL;
> +
> +	if (mode == 0 || mode == PCI_TPH_CAP_ST_NS)
> +		return -EOPNOTSUPP;
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260506094623.29327-1-fengchengwen@huawei.com?part=3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 4/6] vfio/pci: Add PCIe TPH enable/disable support
  2026-05-06  9:46 ` [PATCH v6 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
@ 2026-05-06 19:21   ` sashiko-bot
  0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-06 19:21 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Incorrect `argsz` validation in `vfio_pci_tph_enable()` incorrectly adds the structure size twice, breaking the UAPI.
- [High] Missing locking in `vfio_pci_ioctl_tph()` leads to config space and `pci_dev` bitfield data corruption.
- [High] Disabling TPH after registering the device exposes it to userspace races.
- [High] TPH state is not explicitly disabled when the VFIO device is closed, leading to state inconsistencies and subsequent failures.
--

commit c8b020c06b71ba6da131d464466e50c2b03f622b
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Add PCIe TPH enable/disable support

This patch adds support to enable and disable the TPH function with mode
selection. It restricts the unsafe device-specific TPH mode via a module
parameter and disables TPH when taking over device ownership.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d98d04bad4a39..69f666d20c4af 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c

[ ... ]

> @@ -1492,6 +1492,43 @@ static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
>  	return 0;
>  }
>  
> +static int vfio_pci_tph_enable(struct vfio_pci_core_device *vdev,
> +			      struct vfio_device_pci_tph_op *op,
> +			      void __user *uarg)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	struct vfio_pci_tph_ctrl ctrl;
> +	int mode;
> +
> +	if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, ctrl) +
> +			sizeof(struct vfio_pci_tph_ctrl))
> +		return -EINVAL;

Since offsetofend(TYPE, MEMBER) already includes the size of the ctrl member,
does adding sizeof(struct vfio_pci_tph_ctrl) a second time require more
bytes than the actual structure size?

Could this cause the check to reject valid userspace calls that provide the
exact structure size?

> +
> +	if (copy_from_user(&ctrl, uarg, sizeof(ctrl)))
> +		return -EFAULT;
> +
> +	if (ctrl.mode != VFIO_PCI_TPH_MODE_IV &&
> +	    ctrl.mode != VFIO_PCI_TPH_MODE_DS)
> +		return -EINVAL;
> +
> +	if (ctrl.mode == VFIO_PCI_TPH_MODE_DS && !enable_unsafe_tph_ds_mode)
> +		return -EOPNOTSUPP;
> +
> +	/* Reserved must be zero */
> +	if (memchr_inv(ctrl.reserved, 0, sizeof(ctrl.reserved)))
> +		return -EINVAL;
> +
> +	mode = (ctrl.mode == VFIO_PCI_TPH_MODE_IV) ? PCI_TPH_ST_IV_MODE :
> +						     PCI_TPH_ST_DS_MODE;
> +	return pcie_enable_tph(pdev, mode);
> +}

If userspace enables TPH but crashes or closes the file descriptor without
manually disabling it, is the TPH state left enabled since the release path
does not call pcie_disable_tph()?

If the device is opened again, could subsequent attempts to enable TPH fail
with -EBUSY because pcie_enable_tph() sees it already enabled?

> +
> +static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
> +{
> +	pcie_disable_tph(vdev->pdev);
> +	return 0;
> +}
> +
>  static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
>  			      void __user *uarg)
>  {
> @@ -1504,6 +1541,10 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
>  	switch (op.op) {
>  	case VFIO_PCI_TPH_GET_CAP:
>  		return vfio_pci_tph_get_cap(vdev, &op, uarg + minsz);
> +	case VFIO_PCI_TPH_ENABLE:
> +		return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
> +	case VFIO_PCI_TPH_DISABLE:
> +		return vfio_pci_tph_disable(vdev);

Because vfio_pci_ioctl_tph() handles these ioctls locklessly without
holding vdev->igate, does concurrently calling pcie_enable_tph() cause a
data race on the pdev->tph_enabled bitfield?

Could this concurrently overwrite adjacent critical bitfields in
struct pci_dev like msi_enabled or is_managed?

>  	default:
>  		/* Other ops are not implemented yet */
>  		return -EINVAL;

[ ... ]

> @@ -2259,6 +2300,8 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
>  	ret = vfio_register_group_dev(&vdev->vdev);
>  	if (ret)
>  		goto out_power;
> +	/* Disable TPH when taking over ownership of the device */
> +	pcie_disable_tph(pdev);

Is it safe to call pcie_disable_tph() after vfio_register_group_dev()?

Since the device becomes visible to userspace once registered, could a
userspace process instantly open the device and enable TPH, racing with this
initialization thread and overwriting the user's configuration?

Should hardware state initialization happen before publishing the device to
userspace?

>  	return 0;
>  
>  out_power:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260506094623.29327-1-fengchengwen@huawei.com?part=4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 5/6] vfio/pci: Add PCIe TPH GET_ST interface
  2026-05-06  9:46 ` [PATCH v6 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
@ 2026-05-06 19:57   ` sashiko-bot
  0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-06 19:57 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The `argsz` validation incorrectly double-counts the size of the union member, causing valid requests to be rejected.
- [High] Concurrent TPH ioctls lead to data races on unprotected PCIe TPH state variables and device configuration space.
- [Medium] The unused `index` field in `struct vfio_pci_tph_entry` is not validated to be zero during `VFIO_PCI_TPH_GET_ST`.
--

commit 69a04a0a6e943f789b3703368fa523b4efd43755
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Add PCIe TPH GET_ST interface

This commit adds support to batch get CPU steering tags for device-specific
TPH mode that does not implement an ST table. This interface requires
enabling the enable_unsafe_tph_ds_mode module parameter.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 69f666d20c4af..45e641ab2a887 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1529,6 +1529,77 @@ static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
>  	return 0;
>  }
>  
> +static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
> +			       struct vfio_device_pci_tph_op *op,
> +			       void __user *uarg)
> +{
[ ... ]
> +	if (!st.count || st.count > VFIO_PCI_TPH_MAX_ENTRIES)
> +		return -EINVAL;
> +
> +	size = st.count * sizeof(*ents);
> +	if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) +
> +			sizeof(struct vfio_pci_tph_st) + size)
> +		return -EINVAL;

Is it possible the size calculation double-counts
sizeof(struct vfio_pci_tph_st)?

The macro offsetofend(TYPE, MEMBER) evaluates to offsetof(TYPE, MEMBER) +
sizeof_field(TYPE, MEMBER). For the st member, offsetofend() correctly
calculates the end boundary. Adding sizeof(struct vfio_pci_tph_st) a
second time makes the expected minimum argsz larger than it should be.

If userspace passes the properly calculated size, will the kernel
incorrectly evaluate this as false and reject valid ioctl calls with -EINVAL?

(Note: The identical calculation error appears to be present in
vfio_pci_tph_enable() introduced in a prior patch.)

> +
> +	ents = kvmalloc(size, GFP_KERNEL);
> +	if (!ents)
> +		return -ENOMEM;
> +
> +	ents_off = offsetof(struct vfio_pci_tph_st, ents);
> +	if (copy_from_user(ents, uarg + ents_off, size)) {
> +		err = -EFAULT;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < st.count; i++) {
> +		/* Check reserved fields are zero */
> +		if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
> +		    memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1))) {
> +			err = -EINVAL;
> +			goto out;
> +		}

Does this code need to validate that the unused index field in
struct vfio_pci_tph_entry is zero?

While reserved0 and reserved1 are checked, the index field appears to
be ignored during GET_ST operations. Because userspace is not required to
zero-fill unused arguments, this field may contain uninitialized garbage.

If a future kernel extension assigns meaning to index for GET_ST, will
this lack of validation break backwards compatibility by making it
impossible to distinguish intentional new values from uninitialized data in
older applications?

[ ... ]
>  static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
>  			      void __user *uarg)
>  {
> @@ -1545,6 +1616,8 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
>  		return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
>  	case VFIO_PCI_TPH_DISABLE:
>  		return vfio_pci_tph_disable(vdev);
> +	case VFIO_PCI_TPH_GET_ST:
> +		return vfio_pci_tph_get_st(vdev, &op, uarg + minsz);
>  	default:
>  		/* Other ops are not implemented yet */
>  		return -EINVAL;

Are concurrent TPH ioctls adequately serialized here?

The vfio_pci_ioctl_tph() handler processes requests without acquiring any
serialization locks (such as vdev->igate). If userspace issues multiple
TPH ioctls concurrently, vfio_pci_tph_enable(), vfio_pci_tph_disable(),
and vfio_pci_tph_get_st() can execute in parallel.

The underlying PCI core functions pcie_enable_tph() and pcie_disable_tph()
perform lockless read-modify-write operations on the PCI_TPH_CTRL config
register and mutate pdev->tph_mode, pdev->tph_req_type, and
pdev->tph_enabled. Simultaneously, vfio_pci_tph_get_st() invokes
pcie_tph_get_cpu_st(), which relies on these state variables.

Can this lead to data races on unprotected PCIe TPH state variables,
resulting in corrupted device configuration space or inconsistent state
reads?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260506094623.29327-1-fengchengwen@huawei.com?part=5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 6/6] vfio/pci: Add PCIe TPH SET_ST interface
  2026-05-06  9:46 ` [PATCH v6 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
@ 2026-05-06 20:22   ` sashiko-bot
  0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-06 20:22 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Incorrect `argsz` size validation unconditionally rejects valid ioctl requests.
- [Medium] The unused `st` field in `struct vfio_pci_tph_entry` is not validated for being zero during SET_ST, breaking forward ABI compatibility.
--

commit cad3760ab856857766102c878427138245e1feb6
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Add PCIe TPH SET_ST interface

This commit adds the VFIO_PCI_TPH_SET_ST operation to support batch
programming of steering tag entries. It also implements a rollback
mechanism to clear successfully programmed entries if any entry fails
to prevent inconsistent device state.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 45e641ab2a887..867d8694b56bd 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1600,6 +1600,90 @@ static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
>  	return err;
>  }
>  
> +static int vfio_pci_tph_set_st(struct vfio_pci_core_device *vdev,
> +			       struct vfio_device_pci_tph_op *op,
> +			       void __user *uarg)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	struct vfio_pci_tph_entry *ents;
> +	struct vfio_pci_tph_st st;
[ ... ]
> +	size = st.count * sizeof(*ents);
> +	if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) +
> +			sizeof(struct vfio_pci_tph_st) + size)
> +		return -EINVAL;

Does adding sizeof(struct vfio_pci_tph_st) artificially inflate the required
buffer size?

The offsetofend() macro already includes the size of the st member. If a
properly written userspace application computes its argsz exactly as
sizeof(struct vfio_device_pci_tph_op) + size, this check would fail and
reject the request with -EINVAL.

> +
> +	ents = kvmalloc(size, GFP_KERNEL);
> +	if (!ents)
> +		return -ENOMEM;
> +
> +	ents_off = offsetof(struct vfio_pci_tph_st, ents);
> +	if (copy_from_user(ents, uarg + ents_off, size)) {
> +		err = -EFAULT;
> +		goto out;
> +	}
> +
> +	for (; i < st.count; i++) {
> +		/* Check reserved fields are zero */
> +		if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
> +		    memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1))) {
> +			err = -EINVAL;
> +			goto out;
> +		}

During the SET_ST operation, the st field in struct vfio_pci_tph_entry
appears to be unused, as the steering tag is instead dynamically fetched
via pcie_tph_get_cpu_st() later in the loop.

Should the st field also be validated to be zero?

If userspace leaves this unused field uninitialized and passes garbage data,
it might break forward ABI compatibility. A future kernel update that attempts
to read explicit steering tags directly from this field wouldn't be able to
distinguish between old programs sending uninitialized garbage and new
programs sending valid data.

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260506094623.29327-1-fengchengwen@huawei.com?part=6

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-06 20:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06  9:46 [PATCH v6 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-06  9:46 ` [PATCH v6 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-06 15:54   ` Bjorn Helgaas
2026-05-06  9:46 ` [PATCH v6 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-06 12:39   ` sashiko-bot
2026-05-06  9:46 ` [PATCH v6 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
2026-05-06 18:48   ` sashiko-bot
2026-05-06  9:46 ` [PATCH v6 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
2026-05-06 19:21   ` sashiko-bot
2026-05-06  9:46 ` [PATCH v6 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
2026-05-06 19:57   ` sashiko-bot
2026-05-06  9:46 ` [PATCH v6 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
2026-05-06 20:22   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox