All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v16 00/12] vfio/pci: Add PCIe TPH support
@ 2026-06-04 13:27 Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

This patchset enables full userspace configurable PCIe TPH support for
VFIO, which brings performance benefits for userspace workloads such as
DPDK and SPDK.

Currently VFIO only exposes read-only TPH capability registers to 
userspace, while all write operations are silently discarded. This 
prevents userspace from enabling and configuring TPH, limiting performance
optimization opportunities.

Per PCIe spec 7.5.3.15: TPH Completer support is applicable to Root Ports
and Endpoints, allowing Steering Tags to target host CPUs or peer devices
for P2P transactions.

TPH usage model can be divided into three fundamental parts:
1. Retrieve Steering Tag:
   - Tags targeting host CPUs are obtained via platform methods (ACPI _DSM)
     wrapped in pcie_tph_get_cpu_st(). Userspace requires a generic 
     interface to query these CPU-associated ST values.
   - Tags targeting peer devices are managed by userspace drivers.
2. Program Steering Tag table:
   - For devices with standard ST table structures (in capability space or
     MSI-X table), userspace needs a unified interface to configure ST 
     entries.
   - Devices without standard ST tables are handled by userspace itself.
3. Toggle device TPH Requester enable/disable state.

To support the above scenarios, this series extends VFIO with complete TPH
virtualization features:
- New device feature TPH_CPU_ST: Batch query interface to resolve 
  CPU-specific ST tags.
- New device feature TPH_ST_CONFIG: Batch configure interface for device ST
  table entries, with shadow cache and atomic rollback support.
- Full TPH capability register virtualization: allow userspace to toggle
  TPH Requester state via TPH_CTRL register writes.

To guarantee isolation and security, this patchset adopts a two-level
safety gate design with careful ABI considerations:
1. Global unsafe gate:
   TPH caching behavior may cross isolation domains and impact shared
   platform resources. A new module parameter `enable_unsafe_tph` is
   introduced (default off) to globally gate all TPH functionalities.
2. Per-device opt-in gate:
   To preserve strict ABI compatibility and avoid unexpected hardware
   state changes for existing users, a new VFIO device feature TPH_ENABLE
   is added. TPH capabilities are only available after userspace explicitly
   enables it per-device.

Because Kernel PCI TPH implementation requires TPH Requester to be enabled
before programming ST entries. To support userspace configuring ST table
in arbitrary order, a shadow ST table is introduced to buffer ST writes
before TPH is enabled. All cached entries are flushed to hardware when
TPH Requester turns on. This also provides atomic batch rollback capability
for reliable configuration.

The patchset is split into two logical parts: the first seven patches fix
and refactor core PCI/TPH kernel code to export required helper interfaces,
and the remaining five patches implement corresponding VFIO TPH
virtualization layer step by step.

Patch breakdown:
 1/12 PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
 2/12 PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
 3/12 PCI/TPH: Cache TPH requester capability at probe time
 4/12 PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
 5/12 PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
 6/12 PCI/TPH: expose the enabled TPH requester type
 7/12 PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attrs
 8/12 vfio/pci: Hide TPH capability when TPH is unsupported
 9/12 vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
10/12 vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
11/12 vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
12/12 vfio/pci: Virtualize PCIe TPH capability registers

Based on earlier RFC work by Wathsala Vithanage

---
v16: 
- Supports opt-in at the device level which address Alex's comment.
- Split sub-commit: add hide TPH capability when TPH is unsupported.
- Optimize the tph fields layout of the pci_dev structure.
- Optimize virtualize PCIe TPH capability commit: support rollback
  when set fail.
- Reorder PCI/TPH commits: make fix commit ahead.
- Reorganized the cover letter to serve as the starting point for
  discussion.
v15: Address Alex's comments:
- Drop TPH capability when tph_cap=0
- Use _explicit postfix other than add policy parameter for enable
  TPH and get tph st.
- Make sure set st entry under D0
- Reimpl virtualize TPH capability register
- Other fix
v14:
- Return PCI_TPH_LOC_NONE when !CONFIG_PCIE_TPH accord Alex's comment
- Fix Sashiko comments:
  - Clear ST shadow state across user session
  - Fix out-of-bounds byte masking in vfio_pci_tph_config_read

Chengwen Feng (11):
  PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
  PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
  PCI/TPH: Cache TPH requester capability at probe time
  PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
  PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
  PCI/TPH: Add pcie_tph_supported() helper to check TPH capability
    attributes
  vfio/pci: Hide TPH capability when TPH is unsupported
  vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
  vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
  vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
  vfio/pci: Virtualize PCIe TPH capability registers

Zhiping Zhang (1):
  PCI/TPH: expose the enabled TPH requester type

 drivers/pci/tph.c                  | 220 ++++++++++++++++++++--------
 drivers/vfio/pci/vfio_pci.c        |  13 +-
 drivers/vfio/pci/vfio_pci_config.c | 111 ++++++++++++++
 drivers/vfio/pci/vfio_pci_core.c   | 228 ++++++++++++++++++++++++++++-
 include/linux/pci-tph.h            |  22 +++
 include/linux/pci.h                |   6 +-
 include/linux/vfio_pci_core.h      |   7 +-
 include/uapi/linux/vfio.h          |  60 ++++++++
 8 files changed, 600 insertions(+), 67 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v16 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.

This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.

This doesn't make a difference to mlx5_st_create(), the lone external
caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.

Also add tph_cap validation for pcie_tph_get_st_table_loc() to prevent
invalid PCI configuration space access when TPH is not supported. Add stub
functions for pcie_tph_get_st_table_size() and pcie_tph_get_st_table_loc()
when !CONFIG_PCIE_TPH.

Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/tph.c       | 12 +++++-------
 include/linux/pci-tph.h |  5 +++++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 91145e8d9d95..bef3a55539c4 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -166,11 +166,14 @@ static u8 get_st_modes(struct pci_dev *pdev)
  */
 u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
 {
-	u32 reg;
+	u32 reg = 0;
+
+	if (!pdev->tph_cap)
+		return PCI_TPH_LOC_NONE;
 
 	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
 
-	return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+	return reg & PCI_TPH_CAP_LOC_MASK;
 }
 EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
 
@@ -185,9 +188,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
 
 	/* Check ST table location first */
 	loc = pcie_tph_get_st_table_loc(pdev);
-
-	/* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
-	loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
 	if (loc != PCI_TPH_LOC_CAP)
 		return 0;
 
@@ -316,8 +316,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
 	set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
 
 	loc = pcie_tph_get_st_table_loc(pdev);
-	/* Convert loc to match with PCI_TPH_LOC_* */
-	loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
 
 	switch (loc) {
 	case PCI_TPH_LOC_MSIX:
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..6f02b020d7d7 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -8,6 +8,7 @@
  */
 #ifndef LINUX_PCI_TPH_H
 #define LINUX_PCI_TPH_H
+#include <linux/pci.h>
 
 /*
  * According to the ECN for PCI Firmware Spec, Steering Tag can be different
@@ -41,6 +42,10 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
 static inline void pcie_disable_tph(struct pci_dev *pdev) { }
 static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
 { return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return PCI_TPH_LOC_NONE; }
 #endif
 
 #endif /* LINUX_PCI_TPH_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Split tph_enabled from shared pci_dev bitfield into spare bit of tph_cap's
u16: tph_cap is immutable post-enumeration (15 bits for offset), remaining
1 bit stores tph_enabled. Removes cross-bitfield concurrent write hazards
highlighted by Sashiko after VFIO TPH exposure. No functional changes.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 include/linux/pci.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..109182658f76 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -471,7 +471,6 @@ struct pci_dev {
 	unsigned int	ats_enabled:1;		/* Address Translation Svc */
 	unsigned int	pasid_enabled:1;	/* Process Address Space ID */
 	unsigned int	pri_enabled:1;		/* Page Request Interface */
-	unsigned int	tph_enabled:1;		/* TLP Processing Hints */
 	unsigned int	fm_enabled:1;		/* Flit Mode (segment captured) */
 	unsigned int	is_managed:1;		/* Managed via devres */
 	unsigned int	is_msi_managed:1;	/* MSI release via devres installed */
@@ -589,7 +588,8 @@ struct pci_dev {
 	u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
 
 #ifdef CONFIG_PCIE_TPH
-	u16		tph_cap;	/* TPH capability offset */
+	u16		tph_cap:15;	/* TPH capability offset */
+	u16		tph_enabled:1;	/* Whether TPH is enabled */
 	u8		tph_mode;	/* TPH mode */
 	u8		tph_req_type;	/* TPH requester type */
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 03/12] PCI/TPH: Cache TPH requester capability at probe time
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Calculate the negotiated TPH requester type from device and root port
capabilities once in pci_tph_init().

Add tph_ext_support flag to cache whether the device is allowed to
issue Extended TPH requests after topology negotiation. If the final
requester type is disabled, clear TPH capability to prevent usage.

Simplify pcie_enable_tph() by using the cached requester capability
instead of recalculating every time.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/pci/tph.c   | 43 +++++++++++++++++++++++++------------------
 include/linux/pci.h |  4 +++-
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index bef3a55539c4..951f0a33ff66 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -384,7 +384,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
 {
 	u32 reg;
 	u8 dev_modes;
-	u8 rp_req_type;
 
 	/* Honor "notph" kernel parameter */
 	if (pci_tph_disabled)
@@ -404,23 +403,8 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
 
 	pdev->tph_mode = mode;
 
-	/* Get req_type supported by device and its Root Port */
-	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
-	if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
-		pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
-	else
-		pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
-
-	/* Check if the device is behind a Root Port */
-	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
-		rp_req_type = get_rp_completer_type(pdev);
-
-		/* Final req_type is the smallest value of two */
-		pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
-	}
-
-	if (pdev->tph_req_type == PCI_TPH_REQ_DISABLE)
-		return -EINVAL;
+	pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
+						     PCI_TPH_REQ_TPH_ONLY;
 
 	/* Write them into TPH control register */
 	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, &reg);
@@ -510,13 +494,36 @@ void pci_no_tph(void)
 
 void pci_tph_init(struct pci_dev *pdev)
 {
+	u8 tph_req_type, rp_req_type;
 	int num_entries;
 	u32 save_size;
+	u32 reg = 0;
 
 	pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
 	if (!pdev->tph_cap)
 		return;
 
+	/* Get req_type supported by device and its Root Port */
+	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, &reg);
+	if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
+		tph_req_type = PCI_TPH_REQ_EXT_TPH;
+	else
+		tph_req_type = PCI_TPH_REQ_TPH_ONLY;
+
+	/* Check if the device is behind a Root Port */
+	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
+		rp_req_type = get_rp_completer_type(pdev);
+		/* Final req_type is the smallest value of two */
+		tph_req_type = min(tph_req_type, rp_req_type);
+	}
+
+	if (tph_req_type == PCI_TPH_REQ_DISABLE) {
+		pdev->tph_cap = 0;
+		return;
+	}
+
+	pdev->tph_ext_support = !!(tph_req_type == PCI_TPH_REQ_EXT_TPH);
+
 	num_entries = pcie_tph_get_st_table_size(pdev);
 	save_size = sizeof(u32) + num_entries * sizeof(u16);
 	pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 109182658f76..285c0f00882e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -588,7 +588,9 @@ struct pci_dev {
 	u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
 
 #ifdef CONFIG_PCIE_TPH
-	u16		tph_cap:15;	/* TPH capability offset */
+	u16		tph_cap:14;	/* TPH capability offset */
+	u16		tph_ext_support:1; /* Indicate whether Extended TPH
+					    * requester is supported */
 	u16		tph_enabled:1;	/* Whether TPH is enabled */
 	u8		tph_mode;	/* TPH mode */
 	u8		tph_req_type;	/* TPH requester type */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (2 preceding siblings ...)
  2026-06-04 13:27 ` [PATCH v16 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Refactor pcie_enable_tph implementation: extract core logic into static
internal enable_tph() helper accepting explicit requester type.

- Preserve original pcie_enable_tph() unchanged as auto wrapper; it
  auto-selects EXT/standard TPH requester per device capability, existing
  bnxt/mlx5 callers require zero modification.
- Add exported pcie_enable_tph_explicit() with bool 'extended' parameter
  for explicit STD/EXT selection, used by upcoming VFIO TPH support.

Input validation for EXT_TPH availability is retained inside helper to
reject invalid explicit EXT request if hardware does not support extended
requester.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/pci/tph.c       | 70 ++++++++++++++++++++++++++++-------------
 include/linux/pci-tph.h |  4 +++
 2 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 951f0a33ff66..51009ac9b379 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -364,23 +364,7 @@ void pcie_disable_tph(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL(pcie_disable_tph);
 
-/**
- * pcie_enable_tph - Enable TPH support for device using a specific ST mode
- * @pdev: PCI device
- * @mode: ST mode to enable. Current supported modes include:
- *
- *   - PCI_TPH_ST_NS_MODE: NO ST Mode
- *   - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode
- *   - PCI_TPH_ST_DS_MODE: Device Specific Mode
- *
- * Check whether the mode is actually supported by the device before enabling
- * and return an error if not. Additionally determine what types of requests,
- * TPH or extended TPH, can be issued by the device based on its TPH requester
- * capability and the Root Port's completer capability.
- *
- * Return: 0 on success, otherwise negative value (-errno)
- */
-int pcie_enable_tph(struct pci_dev *pdev, int mode)
+static int enable_tph(struct pci_dev *pdev, int mode, u8 req_type)
 {
 	u32 reg;
 	u8 dev_modes;
@@ -401,10 +385,11 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
 	if (!((1 << mode) & dev_modes))
 		return -EINVAL;
 
-	pdev->tph_mode = mode;
+	if (req_type == PCI_TPH_REQ_EXT_TPH && !pdev->tph_ext_support)
+		return -EINVAL;
 
-	pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
-						     PCI_TPH_REQ_TPH_ONLY;
+	pdev->tph_mode = mode;
+	pdev->tph_req_type = req_type;
 
 	/* Write them into TPH control register */
 	pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, &reg);
@@ -413,7 +398,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
 	reg |= FIELD_PREP(PCI_TPH_CTRL_MODE_SEL_MASK, pdev->tph_mode);
 
 	reg &= ~PCI_TPH_CTRL_REQ_EN_MASK;
-	reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, pdev->tph_req_type);
+	reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, req_type);
 
 	pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
 
@@ -421,8 +406,51 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
 
 	return 0;
 }
+
+/**
+ * pcie_enable_tph - Enable TPH support for device using a specific ST mode
+ * @pdev: PCI device
+ * @mode: ST mode to enable. Current supported modes include:
+ *
+ *   - PCI_TPH_ST_NS_MODE: NO ST Mode
+ *   - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode
+ *   - PCI_TPH_ST_DS_MODE: Device Specific Mode
+ *
+ * Check whether the mode is actually supported by the device before enabling
+ * and return an error if not. Additionally determine what types of requests,
+ * TPH or extended TPH, can be issued by the device based on its TPH requester
+ * capability and the Root Port's completer capability.
+ *
+ * Return: 0 on success, otherwise negative value (-errno)
+ */
+int pcie_enable_tph(struct pci_dev *pdev, int mode)
+{
+	u8 req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
+					      PCI_TPH_REQ_TPH_ONLY;
+	return enable_tph(pdev, mode, req_type);
+}
 EXPORT_SYMBOL(pcie_enable_tph);
 
+/**
+ * pcie_enable_tph_explicit - Enable TPH with explicit requester selection
+ * @pdev: PCI device to operate
+ * @mode: ST table operating mode (NS/IV/DS)
+ * @extended: true = EXT_TPH, false = standard TPH only
+ *
+ * Unlike auto-detecting pcie_enable_tph(), caller selects requester type
+ * manually instead of hardware auto-selection. Rejects EXT_TPH request
+ * if device lacks extended requester capability.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended)
+{
+	u8 req_type = extended ? PCI_TPH_REQ_EXT_TPH : PCI_TPH_REQ_TPH_ONLY;
+
+	return enable_tph(pdev, mode, req_type);
+}
+EXPORT_SYMBOL(pcie_enable_tph_explicit);
+
 void pci_restore_tph_state(struct pci_dev *pdev)
 {
 	struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 6f02b020d7d7..ca0faa98afac 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -29,6 +29,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev,
 			unsigned int cpu, u16 *tag);
 void pcie_disable_tph(struct pci_dev *pdev);
 int pcie_enable_tph(struct pci_dev *pdev, int mode);
+int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
 u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
 u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
 #else
@@ -42,6 +43,9 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
 static inline void pcie_disable_tph(struct pci_dev *pdev) { }
 static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
 { return -EINVAL; }
+static inline int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode,
+					   bool extended)
+{ return -EINVAL; }
 static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
 { return 0; }
 static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (3 preceding siblings ...)
  2026-06-04 13:27 ` [PATCH v16 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 06/12] PCI/TPH: expose the enabled TPH requester type Chengwen Feng
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Refactor pcie_tph_get_cpu_st(): extract core logic into static internal
get_cpu_st() helper accepting explicit requester type parameter.

- Preserve original pcie_tph_get_cpu_st() unchanged as auto wrapper; it
  uses existing pdev->tph_req_type automatically, existing callers require
  no change.
- Add exported pcie_tph_get_cpu_st_explicit() with bool 'extended'
  parameter for manual STD/EXT requester selection, consumed by upcoming
  VFIO TPH code.
- Add capability check: reject explicit EXT request when device does not
  support extended TPH requester.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/pci/tph.c       | 68 ++++++++++++++++++++++++++++++-----------
 include/linux/pci-tph.h |  7 +++++
 2 files changed, 57 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 51009ac9b379..aca08671fdfe 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -231,21 +231,8 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
 	return pci_write_config_word(pdev, offset, tag);
 }
 
-/**
- * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
- * with a specific CPU
- * @pdev: PCI device
- * @mem_type: target memory type (volatile or persistent RAM)
- * @cpu: associated CPU id
- * @tag: Steering Tag to be returned
- *
- * Return the Steering Tag for a target memory that is associated with a
- * specific CPU as indicated by cpu.
- *
- * Return: 0 if success, otherwise negative value (-errno)
- */
-int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
-			unsigned int cpu, u16 *tag)
+static int get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+		      u8 req_type, unsigned int cpu, u16 *tag)
 {
 #ifdef CONFIG_ACPI
 	struct pci_dev *rp;
@@ -269,19 +256,64 @@ int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
 		return -EINVAL;
 	}
 
-	*tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
+	*tag = tph_extract_tag(mem_type, req_type, &info);
 
-	pci_dbg(pdev, "get steering tag: mem_type=%s, cpu=%d, tag=%#04x\n",
+	pci_dbg(pdev, "get steering tag: mem_type=%s, req_type=%u, cpu=%d, tag=%#04x\n",
 		(mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
-		cpu, *tag);
+		req_type, cpu, *tag);
 
 	return 0;
 #else
 	return -ENODEV;
 #endif
 }
+
+/**
+ * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
+ * with a specific CPU
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @cpu: associated CPU id
+ * @tag: Steering Tag to be returned
+ *
+ * Return the Steering Tag for a target memory that is associated with a
+ * specific CPU as indicated by cpu.
+ *
+ * Return: 0 if success, otherwise negative value (-errno)
+ */
+int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+			unsigned int cpu, u16 *tag)
+{
+	return get_cpu_st(pdev, mem_type, pdev->tph_req_type, cpu, tag);
+}
 EXPORT_SYMBOL(pcie_tph_get_cpu_st);
 
+/**
+ * pcie_tph_get_cpu_st_explicit - Get ST with explicit requester type
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @extended: true=EXT_TPH, false=standard TPH only
+ * @cpu: associated CPU id
+ * @tag: output steering tag pointer
+ *
+ * Unlike auto pcie_tph_get_cpu_st(), caller manually picks requester type.
+ * Rejects EXT request if device lacks extended requester capability.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+				 enum tph_mem_type mem_type,
+				 bool extended, unsigned int cpu, u16 *tag)
+{
+	u8 req_type = extended ? PCI_TPH_REQ_EXT_TPH : PCI_TPH_REQ_TPH_ONLY;
+
+	if (extended && !pdev->tph_ext_support)
+		return -EINVAL;
+
+	return get_cpu_st(pdev, mem_type, req_type, cpu, tag);
+}
+EXPORT_SYMBOL(pcie_tph_get_cpu_st_explicit);
+
 /**
  * pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
  * @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index ca0faa98afac..1a508b3d511f 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -27,6 +27,9 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev,
 int pcie_tph_get_cpu_st(struct pci_dev *dev,
 			enum tph_mem_type mem_type,
 			unsigned int cpu, u16 *tag);
+int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+				 enum tph_mem_type mem_type,
+				 bool extended, unsigned int cpu, u16 *tag);
 void pcie_disable_tph(struct pci_dev *pdev);
 int pcie_enable_tph(struct pci_dev *pdev, int mode);
 int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
@@ -40,6 +43,10 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
 				      enum tph_mem_type mem_type,
 				      unsigned int cpu, u16 *tag)
 { return -EINVAL; }
+static inline int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+				enum tph_mem_type mem_type,
+				bool extended, unsigned int cpu, u16 *tag)
+{ return -EINVAL; }
 static inline void pcie_disable_tph(struct pci_dev *pdev) { }
 static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
 { return -EINVAL; }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 06/12] PCI/TPH: expose the enabled TPH requester type
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (4 preceding siblings ...)
  2026-06-04 13:27 ` [PATCH v16 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:27 ` [PATCH v16 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

From: Zhiping Zhang <zhipingz@meta.com>

Add pcie_tph_enabled_req_type() so drivers can query the enabled TPH
requester mode without reaching into pci_dev internals.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/pci/tph.c       | 12 ++++++++++++
 include/linux/pci-tph.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index aca08671fdfe..6c4623cacc85 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -483,6 +483,18 @@ int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended)
 }
 EXPORT_SYMBOL(pcie_enable_tph_explicit);
 
+/**
+ * pcie_tph_enabled_req_type - Return the device's enabled TPH requester type
+ * @pdev: PCI device to query
+ *
+ * Return: PCI_TPH_REQ_DISABLE, PCI_TPH_REQ_TPH_ONLY or PCI_TPH_REQ_EXT_TPH.
+ */
+u8 pcie_tph_enabled_req_type(struct pci_dev *pdev)
+{
+	return pdev->tph_req_type;
+}
+EXPORT_SYMBOL(pcie_tph_enabled_req_type);
+
 void pci_restore_tph_state(struct pci_dev *pdev)
 {
 	struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 1a508b3d511f..e4f7045fc152 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -33,6 +33,7 @@ int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
 void pcie_disable_tph(struct pci_dev *pdev);
 int pcie_enable_tph(struct pci_dev *pdev, int mode);
 int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
+u8 pcie_tph_enabled_req_type(struct pci_dev *pdev);
 u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
 u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
 #else
@@ -47,6 +48,8 @@ static inline int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
 				enum tph_mem_type mem_type,
 				bool extended, unsigned int cpu, u16 *tag)
 { return -EINVAL; }
+static inline u8 pcie_tph_enabled_req_type(struct pci_dev *pdev)
+{ return PCI_TPH_REQ_DISABLE; }
 static inline void pcie_disable_tph(struct pci_dev *pdev) { }
 static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
 { return -EINVAL; }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (5 preceding siblings ...)
  2026-06-04 13:27 ` [PATCH v16 06/12] PCI/TPH: expose the enabled TPH requester type Chengwen Feng
@ 2026-06-04 13:27 ` Chengwen Feng
  2026-06-04 13:28 ` [PATCH v16 08/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:27 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Add new helper pcie_tph_supported() with want_ext parameter:
- want_ext = false: Check if device has valid TPH capability;
- want_ext = true: Check hardware Extended TPH support.

This helper is prepared for follow-up VFIO TPH virtualization patches to
uniformly query basic TPH existence and Extended TPH capability.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/pci/tph.c       | 19 +++++++++++++++++++
 include/linux/pci-tph.h |  3 +++
 2 files changed, 22 insertions(+)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 6c4623cacc85..95280aab4fb5 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -600,3 +600,22 @@ void pci_tph_init(struct pci_dev *pdev)
 	save_size = sizeof(u32) + num_entries * sizeof(u16);
 	pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
 }
+
+/**
+ * pcie_tph_supported - Check TPH capability attribute
+ * @pdev: PCI device to query
+ * @want_ext: false - check TPH cap exists; true - check EXT_TPH support
+ *
+ * Return: true on matched condition, false otherwise
+ */
+bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
+{
+	if (!pdev->tph_cap)
+		return false;
+
+	if (!want_ext)
+		return true;
+
+	return pdev->tph_ext_support;
+}
+EXPORT_SYMBOL(pcie_tph_supported);
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index e4f7045fc152..5917a0694c1d 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -36,6 +36,7 @@ int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
 u8 pcie_tph_enabled_req_type(struct pci_dev *pdev);
 u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
 u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext);
 #else
 static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
 					unsigned int index, u16 tag)
@@ -60,6 +61,8 @@ static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
 { return 0; }
 static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
 { return PCI_TPH_LOC_NONE; }
+static inline bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
+{ return false; }
 #endif
 
 #endif /* LINUX_PCI_TPH_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 08/12] vfio/pci: Hide TPH capability when TPH is unsupported
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (6 preceding siblings ...)
  2026-06-04 13:27 ` [PATCH v16 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
@ 2026-06-04 13:28 ` Chengwen Feng
  2026-06-04 13:48   ` sashiko-bot
  2026-06-04 13:28 ` [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:28 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Check the device negotiated TPH support status before parsing the TPH
extended capability. Return zero length to hide the capability from
userspace if TPH is disabled during topology negotiation.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_config.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a10ed733f0e3..5c6ab172df6c 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -22,6 +22,7 @@
 
 #include <linux/fs.h>
 #include <linux/pci.h>
+#include <linux/pci-tph.h>
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/slab.h>
@@ -1450,6 +1451,8 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
 		byte &= PCI_DPA_CAP_SUBSTATE_MASK;
 		return PCI_DPA_BASE_SIZEOF + byte + 1;
 	case PCI_EXT_CAP_ID_TPH:
+		if (!pcie_tph_supported(pdev, false))
+			return 0;
 		ret = pci_read_config_dword(pdev, epos + PCI_TPH_CAP, &dword);
 		if (ret)
 			return pcibios_err_to_errno(ret);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (7 preceding siblings ...)
  2026-06-04 13:28 ` [PATCH v16 08/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
@ 2026-06-04 13:28 ` Chengwen Feng
  2026-06-04 13:54   ` sashiko-bot
  2026-06-04 13:28 ` [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag Chengwen Feng
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:28 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Introduce module param enable_unsafe_tph to gate all TPH related features,
and add VFIO_DEVICE_FEATURE_TPH_ENABLE uapi together with per-device
tph_permit flag.

This is a preparatory implementation: only feature framework is added
for now, actual TPH_CTRL register permission control and steering tag
features (TPH_CPU_ST / TPH_ST_CONFIG) will be attached in subsequent
TPH capability virtualization commits.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci.c      | 13 ++++++++++++-
 drivers/vfio/pci/vfio_pci_core.c | 25 ++++++++++++++++++++++++-
 include/linux/vfio_pci_core.h    |  4 +++-
 include/uapi/linux/vfio.h        |  7 +++++++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..6d73668459cf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
 module_param(disable_denylist, bool, 0444);
 MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
 
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph;
+module_param(enable_unsafe_tph, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
+#endif
+
 static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
 {
 	switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
 {
 	int ret;
 	bool is_disable_vga = true;
+	bool is_enable_unsafe_tph = false;
 
 #ifdef CONFIG_VFIO_PCI_VGA
 	is_disable_vga = disable_vga;
 #endif
+#ifdef CONFIG_PCIE_TPH
+	is_enable_unsafe_tph = enable_unsafe_tph;
+#endif
 
-	vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+	vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+				 is_enable_unsafe_tph);
 
 	/* Register and scan for devices */
 	ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 050e7542952e..d5e534dd5829 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -41,6 +41,7 @@
 static bool nointxmask;
 static bool disable_vga;
 static bool disable_idle_d3;
+static bool enable_unsafe_tph;
 
 static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
 {
@@ -1551,6 +1552,24 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
+static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
+					    u32 flags, size_t argsz)
+{
+	int ret;
+
+	if (!enable_unsafe_tph)
+		return -EOPNOTSUPP;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
+	if (ret <= 0)
+		return ret;
+
+	if (!vdev->tph_permit)
+		vdev->tph_permit = 1;
+
+	return 0;
+}
+
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz)
 {
@@ -1569,6 +1588,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 		return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
 	case VFIO_DEVICE_FEATURE_DMA_BUF:
 		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+	case VFIO_DEVICE_FEATURE_TPH_ENABLE:
+		return vfio_pci_core_feature_tph_enable(vdev, flags, argsz);
 	default:
 		return -ENOTTY;
 	}
@@ -2605,11 +2626,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
 }
 
 void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
-			      bool is_disable_idle_d3)
+			      bool is_disable_idle_d3,
+			      bool is_enable_unsafe_tph)
 {
 	nointxmask = is_nointxmask;
 	disable_vga = is_disable_vga;
 	disable_idle_d3 = is_disable_idle_d3;
+	enable_unsafe_tph = is_enable_unsafe_tph;
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 89165b769e5c..662a04943511 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ struct vfio_pci_core_device {
 	bool			needs_pm_restore:1;
 	bool			pm_intx_masked:1;
 	bool			pm_runtime_engaged:1;
+	bool			tph_permit:1;
 	struct pci_saved_state	*pci_saved_state;
 	struct pci_saved_state	*pm_save;
 	int			ioeventfds_nr;
@@ -157,7 +158,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
 				      const struct vfio_pci_regops *ops,
 				      size_t size, u32 flags, void *data);
 void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
-			      bool is_disable_idle_d3);
+			      bool is_disable_idle_d3,
+			      bool is_enable_unsafe_tph);
 void vfio_pci_core_close_device(struct vfio_device *core_vdev);
 int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
 void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..e5a4d1d7091b 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,13 @@ struct vfio_device_feature_dma_buf {
  */
 #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2  12
 
+/*
+ * Device-level opt-in for TPH (Transaction Processing Hints) support.
+ * When set, allows access to TPH_CPU_ST and TPH_ST_CONFIG features.
+ * Requires global enable_unsafe_tph module parameter to be enabled.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ENABLE	13
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (8 preceding siblings ...)
  2026-06-04 13:28 ` [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
@ 2026-06-04 13:28 ` Chengwen Feng
  2026-06-04 13:55   ` sashiko-bot
  2026-06-04 13:28 ` [PATCH v16 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:28 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Add read-only VFIO device feature TPH_CPU_ST for userspace to resolve
PCIe TPH Steering Tag by given CPU ID, memory type (VM/PM), and
TPH requester type (standard/extended).

Define compact flag bits to select memory type and requester type.
Implement batch query logic: input buffer carries CPU ID array, output
returns corresponding resolved ST tags.

Add strict sanity checks for flags, count and reserved fields.
The feature is double gated:
1. Global enable_unsafe_tph module parameter must be enabled;
2. Userspace needs to firstly SET VFIO_DEVICE_FEATURE_TPH_ENABLE
   to set per-device tph_permit flag before using TPH_CPU_ST.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 70 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h        | 31 ++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d5e534dd5829..91c0737727f2 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
 #include <linux/sched/mm.h>
 #include <linux/iommufd.h>
 #include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -1570,6 +1571,72 @@ static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
+static int vfio_pci_core_feature_tph_cpu_st(struct vfio_pci_core_device *vdev,
+			u32 flags,
+			struct vfio_device_feature_tph_cpu_st __user *arg,
+			size_t argsz)
+{
+	struct vfio_device_feature_tph_cpu_st cpu_st;
+	struct pci_dev *pdev = vdev->pdev;
+	enum tph_mem_type mtype;
+	void __user *uptr;
+	bool extended;
+	int i, ret;
+	u32 *cpus;
+	u16 *sts;
+	u16 st;
+
+	if (!vdev->tph_permit)
+		return -EOPNOTSUPP;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
+				 sizeof(cpu_st));
+	if (ret <= 0)
+		return ret;
+
+	if (copy_from_user(&cpu_st, arg, sizeof(cpu_st)))
+		return -EFAULT;
+
+	if (cpu_st.flags & ~(VFIO_TPH_CPU_ST_MEM_TYPE_MASK |
+			     VFIO_TPH_CPU_ST_REQ_TYPE_MASK) ||
+		cpu_st.count == 0 || cpu_st.count > nr_cpu_ids ||
+		cpu_st.reserved != 0)
+		return -EINVAL;
+
+	uptr = u64_to_user_ptr(cpu_st.data_uptr);
+	cpus = memdup_array_user(uptr, cpu_st.count, sizeof(u32));
+	if (IS_ERR(cpus))
+		return PTR_ERR(cpus);
+
+	sts = kcalloc(cpu_st.count, sizeof(u16), GFP_KERNEL);
+	if (!sts) {
+		ret = -ENOMEM;
+		goto out_free_cpus;
+	}
+
+	mtype = (cpu_st.flags & VFIO_TPH_CPU_ST_MEM_TYPE_MASK) ==
+		VFIO_TPH_CPU_ST_MEM_TYPE_VM ? TPH_MEM_TYPE_VM : TPH_MEM_TYPE_PM;
+	extended = !!(cpu_st.flags & VFIO_TPH_CPU_ST_REQ_TYPE_MASK);
+
+	for (i = 0; i < cpu_st.count; i++) {
+		ret = pcie_tph_get_cpu_st_explicit(pdev, mtype, extended,
+						   cpus[i], &st);
+		if (ret)
+			goto out_free_sts;
+		sts[i] = st;
+	}
+
+	ret = copy_to_user(uptr, sts, cpu_st.count * sizeof(u16));
+	if (ret)
+		ret = -EFAULT;
+
+out_free_sts:
+	kfree(sts);
+out_free_cpus:
+	kfree(cpus);
+	return ret;
+}
+
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz)
 {
@@ -1590,6 +1657,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
 	case VFIO_DEVICE_FEATURE_TPH_ENABLE:
 		return vfio_pci_core_feature_tph_enable(vdev, flags, argsz);
+	case VFIO_DEVICE_FEATURE_TPH_CPU_ST:
+		return vfio_pci_core_feature_tph_cpu_st(vdev, flags,
+							arg, argsz);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e5a4d1d7091b..19b5fb7eeae6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1541,6 +1541,37 @@ struct vfio_device_feature_dma_buf {
  */
 #define VFIO_DEVICE_FEATURE_TPH_ENABLE	13
 
+/**
+ * VFIO_DEVICE_FEATURE_TPH_CPU_ST - Query TPH ST by CPU, mem and requester type
+ *
+ * Read-only interface to resolve Steering Tag using:
+ *  - CPU ID
+ *  - Memory type (volatile / persistent)
+ *  - TPH requested type (standard 8-bit / extended 16-bit)
+ *
+ * @flags: bit0 - memory type, bit1 - TPH requester type
+ * @count: Number of CPU IDs to query consecutively
+ * @data_uptr: Userspace buffer: [in] u32 cpu_id array, [out] u16 st_tag array
+ *
+ * This feature requires two preconditions:
+ * 1. Global enable_unsafe_tph module parameter is enabled;
+ * 2. VFIO_DEVICE_FEATURE_TPH_ENABLE has been SET on the device beforehand.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_CPU_ST	14
+
+struct vfio_device_feature_tph_cpu_st {
+	__u32 flags;
+#define VFIO_TPH_CPU_ST_MEM_TYPE_MASK	1u
+#define VFIO_TPH_CPU_ST_MEM_TYPE_VM	(0u << 0)
+#define VFIO_TPH_CPU_ST_MEM_TYPE_PM	(1u << 0)
+#define VFIO_TPH_CPU_ST_REQ_TYPE_MASK	(1u << 1)
+#define VFIO_TPH_CPU_ST_REQ_STANDARD	(0u << 1)
+#define VFIO_TPH_CPU_ST_REQ_EXTENDED	(1u << 1)
+	__u16 count;
+	__u16 reserved; /* Reserved for future use, must be zero */
+	__aligned_u64 data_uptr;
+};
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (9 preceding siblings ...)
  2026-06-04 13:28 ` [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag Chengwen Feng
@ 2026-06-04 13:28 ` Chengwen Feng
  2026-06-04 13:51   ` sashiko-bot
  2026-06-04 13:28 ` [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
  2026-06-15 12:19 ` [PATCH v16 00/12] vfio/pci: Add PCIe TPH support fengchengwen
  12 siblings, 1 reply; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:28 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Add a new VFIO device feature VFIO_DEVICE_FEATURE_TPH_ST_CONFIG to allow
userspace to configure PCIe TPH Steering Tag table entries. This interface
supports only configuration writes, read operations are not permitted.

Implement shadow ST table to cache entries, paired with per-device mutex
for concurrent access protection. Batch write failure triggers entry
rollback to guarantee hardware and shadow table consistency.

The feature is double gated:
1. Global enable_unsafe_tph module parameter must be enabled;
2. Userspace needs to firstly SET VFIO_DEVICE_FEATURE_TPH_ENABLE
   to set per-device tph_permit flag before using TPH_CPU_CONFIG.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 129 +++++++++++++++++++++++++++++++
 include/linux/vfio_pci_core.h    |   3 +
 include/uapi/linux/vfio.h        |  22 ++++++
 3 files changed, 154 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 91c0737727f2..0851fdf0daac 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -530,6 +530,50 @@ static const struct dev_pm_ops vfio_pci_core_pm_ops = {
 			   NULL)
 };
 
+static int vfio_pci_tph_st_shadow_size(struct vfio_pci_core_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	u32 loc = pcie_tph_get_st_table_loc(pdev);
+	int ret;
+
+	if (loc == PCI_TPH_LOC_CAP) {
+		return pcie_tph_get_st_table_size(pdev);
+	} else if (loc == PCI_TPH_LOC_MSIX) {
+		ret = pci_msix_vec_count(pdev);
+		if (ret < 0)
+			return 0;
+		return ret;
+	} else {
+		return 0;
+	}
+}
+
+static int vfio_pci_tph_init(struct vfio_pci_core_device *vdev)
+{
+	vdev->tph_st_entries = 0;
+	vdev->tph_st_shadow = NULL;
+
+	if (!enable_unsafe_tph)
+		return 0;
+
+	vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
+	if (vdev->tph_st_entries) {
+		vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
+					      GFP_KERNEL);
+		if (!vdev->tph_st_shadow)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void vfio_pci_tph_deinit(struct vfio_pci_core_device *vdev)
+{
+	kfree(vdev->tph_st_shadow);
+	vdev->tph_st_shadow = NULL;
+	vdev->tph_st_entries = 0;
+}
+
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
@@ -556,6 +600,11 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 		goto out_disable_device;
 
 	vdev->reset_works = !ret;
+
+	ret = vfio_pci_tph_init(vdev);
+	if (ret)
+		goto out_disable_device;
+
 	pci_save_state(pdev);
 	vdev->pci_saved_state = pci_store_saved_state(pdev);
 	if (!vdev->pci_saved_state)
@@ -613,6 +662,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 out_free_state:
 	kfree(vdev->pci_saved_state);
 	vdev->pci_saved_state = NULL;
+	vfio_pci_tph_deinit(vdev);
 out_disable_device:
 	pci_disable_device(pdev);
 out_power:
@@ -681,6 +731,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 	kfree(vdev->region);
 	vdev->region = NULL; /* don't krealloc a freed pointer */
 
+	vfio_pci_tph_deinit(vdev);
 	vfio_config_free(vdev);
 
 	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
@@ -1637,6 +1688,79 @@ static int vfio_pci_core_feature_tph_cpu_st(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_core_feature_tph_st_config(
+			struct vfio_pci_core_device *vdev,
+			u32 flags,
+			struct vfio_device_feature_tph_st_config __user *arg,
+			size_t argsz)
+{
+	struct vfio_device_feature_tph_st_config config;
+	struct pci_dev *pdev = vdev->pdev;
+	void __user *uptr;
+	int i, idx, ret;
+	size_t sz;
+	u16 *sts;
+
+	if (!vdev->tph_permit || !vdev->tph_st_shadow)
+		return -EOPNOTSUPP;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+				 sizeof(config));
+	if (ret <= 0)
+		return ret;
+
+	if (copy_from_user(&config, arg, sizeof(config)))
+		return -EFAULT;
+
+	if (config.count == 0 || config.reserved != 0 ||
+		config.index >= vdev->tph_st_entries ||
+		config.count > vdev->tph_st_entries - config.index)
+		return -EINVAL;
+
+	uptr = u64_to_user_ptr(config.data_uptr);
+	sts = memdup_array_user(uptr, config.count, sizeof(u16));
+	sz = config.count * sizeof(u16);
+	if (IS_ERR(sts))
+		return PTR_ERR(sts);
+
+	down_write(&vdev->memory_lock);
+	ret = vfio_pci_set_power_state(vdev, PCI_D0);
+	if (ret)
+		goto out_unlock_memory;
+
+	mutex_lock(&vdev->tph_lock);
+
+	if (pcie_tph_enabled_req_type(pdev) == PCI_TPH_REQ_DISABLE)
+		goto update_shadow;
+
+	for (i = 0; i < config.count; i++) {
+		idx = config.index + i;
+		ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
+		if (ret)
+			goto rollback;
+	}
+
+update_shadow:
+	memcpy(&vdev->tph_st_shadow[config.index], sts, sz);
+	ret = 0;
+	goto out_unlock_tph;
+
+rollback:
+	while (i-- > 0) {
+		idx = config.index + i;
+		pcie_tph_set_st_entry(pdev, idx, vdev->tph_st_shadow[idx]);
+	}
+
+out_unlock_tph:
+	mutex_unlock(&vdev->tph_lock);
+
+out_unlock_memory:
+	up_write(&vdev->memory_lock);
+
+	kfree(sts);
+	return ret;
+}
+
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz)
 {
@@ -1660,6 +1784,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 	case VFIO_DEVICE_FEATURE_TPH_CPU_ST:
 		return vfio_pci_core_feature_tph_cpu_st(vdev, flags,
 							arg, argsz);
+	case VFIO_DEVICE_FEATURE_TPH_ST_CONFIG:
+		return vfio_pci_core_feature_tph_st_config(vdev, flags,
+							   arg, argsz);
 	default:
 		return -ENOTTY;
 	}
@@ -2223,6 +2350,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	mutex_init(&vdev->igate);
 	spin_lock_init(&vdev->irqlock);
 	mutex_init(&vdev->ioeventfds_lock);
+	mutex_init(&vdev->tph_lock);
 	INIT_LIST_HEAD(&vdev->dummy_resources_list);
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
@@ -2244,6 +2372,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
 
 	mutex_destroy(&vdev->igate);
 	mutex_destroy(&vdev->ioeventfds_lock);
+	mutex_destroy(&vdev->tph_lock);
 	kfree(vdev->region);
 	kfree(vdev->pm_save);
 }
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 662a04943511..652e278974b7 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -143,6 +143,9 @@ struct vfio_pci_core_device {
 	struct notifier_block	nb;
 	struct rw_semaphore	memory_lock;
 	struct list_head	dmabufs;
+	struct mutex		tph_lock;
+	u16			*tph_st_shadow;
+	u16			tph_st_entries;
 };
 
 enum vfio_pci_io_width {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 19b5fb7eeae6..3df52b4dcf46 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1572,6 +1572,28 @@ struct vfio_device_feature_tph_cpu_st {
 	__aligned_u64 data_uptr;
 };
 
+/**
+ * VFIO_DEVICE_FEATURE_TPH_ST_CONFIG - Configure PCIe TPH Steering Tag entries
+ *
+ * Provides userspace interface to configure PCIe TPH ST table entries.
+ *
+ * @index: Start entry offset within ST table
+ * @count: Number of consecutive entries to configure
+ * @data_uptr: Userspace data buffer for 16-bit raw ST values
+ *
+ * This feature requires two preconditions:
+ * 1. Global enable_unsafe_tph module parameter is enabled;
+ * 2. VFIO_DEVICE_FEATURE_TPH_ENABLE has been SET on the device beforehand.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ST_CONFIG	15
+
+struct vfio_device_feature_tph_st_config {
+	__u16 index;
+	__u16 count;
+	__u32 reserved; /* Reserved for future use, must be zero */
+	__aligned_u64 data_uptr;
+};
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (10 preceding siblings ...)
  2026-06-04 13:28 ` [PATCH v16 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
@ 2026-06-04 13:28 ` Chengwen Feng
  2026-06-04 14:05   ` sashiko-bot
  2026-06-15 12:19 ` [PATCH v16 00/12] vfio/pci: Add PCIe TPH support fengchengwen
  12 siblings, 1 reply; 23+ messages in thread
From: Chengwen Feng @ 2026-06-04 13:28 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Virtualize TPH extended capability config space registers:
- Original TPH capability was fully read-only; now split permission:
  TPH_CAP header remains read-only, TPH_CTRL register supports write to
  toggle TPH requester enable mode.
- Block direct ST-table programming via config space write access: all ST
  entry configuration is restricted to VFIO_DEVICE_FEATURE_TPH_ST_CONFIG
  feature exclusively after userspace SET TPH_ENABLE opt-in.
- Backup original virtual config value and revert vconfig if hardware TPH
  enable operation fails or invalid requester mode is configured.
- After TPH requester gets enabled via CTRL write, sync cached shadow ST
  table down to physical hardware with tph_lock protection and PCI D0 power
  check.

Add vconfig masking to hide EXT_TPH capability bit if underlying hardware
does not support extended TPH via new vfio_tph_mask_ext_tph_bit helper.
Reset hardware TPH state on device open/close to eliminate cross-session
TPH configuration leakage between different VM lifecycles.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 drivers/vfio/pci/vfio_pci_config.c | 108 +++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_core.c   |   4 ++
 2 files changed, 112 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 5c6ab172df6c..23c32aa5842f 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1086,6 +1086,109 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
 	return 0;
 }
 
+/* Permissions for TPH extended capability */
+static int __init init_pci_ext_cap_tph_perm(struct perm_bits *perm)
+{
+	int i;
+
+	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_TPH]))
+		return -ENOMEM;
+
+	p_setd(perm, PCI_TPH_CAP, ALL_VIRT, NO_WRITE);
+
+	p_setd(perm, PCI_TPH_CTRL, ALL_VIRT, ALL_WRITE);
+
+	/* Per PCI specification: There is an upper limit of 64 entries
+	 * when the ST table is located in the TPH Requester Extended
+	 * Capability structure.
+	 * And the pci_ext_cap_length[PCI_EXT_CAP_ID_TPH] is 0xFF, so the
+	 * following operation is fine.
+	 */
+	for (i = 0; i < 64; i++)
+		p_setw(perm, PCI_TPH_BASE_SIZEOF + i * sizeof(u16),
+		       (u16)ALL_VIRT, (u16)ALL_WRITE);
+
+	return 0;
+}
+
+static void vfio_tph_mask_ext_tph_bit(struct vfio_pci_core_device *vdev,
+				      int pos)
+{
+	__le32 *vptr = (__le32 *)&vdev->vconfig[pos + PCI_TPH_CAP];
+	struct pci_dev *pdev = vdev->pdev;
+	u32 val;
+
+	if (!pcie_tph_supported(pdev, true)) {
+		val = le32_to_cpu(*vptr);
+		val &= ~PCI_TPH_CAP_EXT_TPH;
+		*vptr = cpu_to_le32(val);
+	}
+}
+
+static int vfio_find_cap_start(struct vfio_pci_core_device *vdev, int pos);
+static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
+				 int count, struct perm_bits *perm,
+				 int offset, __le32 val)
+{
+	int req_en_byte = PCI_TPH_CTRL + 1;
+	struct pci_dev *pdev = vdev->pdev;
+	__le32 org_val = 0;
+	bool extended;
+	u8 mode, req;
+	int i, ret;
+	u16 start;
+	u32 data;
+
+	if (!vdev->tph_permit)
+		return count;
+
+	/* Back up the original values in order rollback when fail */
+	if (offset <= req_en_byte && offset + count > req_en_byte)
+		vfio_default_config_read(vdev, pos, count, perm, offset,
+					 &org_val);
+
+	ret = vfio_default_config_write(vdev, pos, count, perm, offset, val);
+	if (ret != count)
+		return ret;
+
+	/* Skip if write range does not cover Requester Enable byte */
+	if (offset > req_en_byte || offset + count <= req_en_byte)
+		return count;
+
+	ret = vfio_pci_set_power_state(vdev, PCI_D0);
+	if (ret)
+		return count;
+
+	mutex_lock(&vdev->tph_lock);
+
+	start = vfio_find_cap_start(vdev, pos);
+	data = le32_to_cpu(*(__le32 *)&vdev->vconfig[start + PCI_TPH_CTRL]);
+	mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
+	req = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
+
+	if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
+		extended = !!(req == PCI_TPH_REQ_EXT_TPH);
+		ret = pcie_enable_tph_explicit(pdev, mode, extended);
+		if (!ret && vdev->tph_st_shadow) {
+			for (i = 0; i < vdev->tph_st_entries; i++)
+				pcie_tph_set_st_entry(pdev, i,
+						      vdev->tph_st_shadow[i]);
+		}
+		if (ret)
+			vfio_default_config_write(vdev, pos, count, perm,
+						  offset, org_val);
+	} else if (req == PCI_TPH_REQ_DISABLE) {
+		pcie_disable_tph(vdev->pdev);
+	} else {
+		vfio_default_config_write(vdev, pos, count, perm, offset,
+					  org_val);
+	}
+
+	mutex_unlock(&vdev->tph_lock);
+
+	return count;
+}
+
 /*
  * Initialize the shared permission tables
  */
@@ -1101,6 +1204,7 @@ void vfio_pci_uninit_perm_bits(void)
 
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_TPH]);
 }
 
 int __init vfio_pci_init_perm_bits(void)
@@ -1121,6 +1225,8 @@ int __init vfio_pci_init_perm_bits(void)
 	/* Extended capabilities */
 	ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
 	ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+	ret |= init_pci_ext_cap_tph_perm(&ecap_perms[PCI_EXT_CAP_ID_TPH]);
+	ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_tph_config_write;
 	ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
 	ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
 
@@ -1704,6 +1810,8 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
 		ret = vfio_fill_vconfig_bytes(vdev, epos, len);
 		if (ret)
 			return ret;
+		if (ecap == PCI_EXT_CAP_ID_TPH && !hidden)
+			vfio_tph_mask_ext_tph_bit(vdev, epos);
 
 		/*
 		 * If we're just using this capability to anchor the list,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 0851fdf0daac..615d73766b89 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -601,6 +601,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 
 	vdev->reset_works = !ret;
 
+	/* Reset TPH status on new user session */
+	pcie_disable_tph(vdev->pdev);
 	ret = vfio_pci_tph_init(vdev);
 	if (ret)
 		goto out_disable_device;
@@ -731,6 +733,8 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 	kfree(vdev->region);
 	vdev->region = NULL; /* don't krealloc a freed pointer */
 
+	/* Reset TPH status on session exit */
+	pcie_disable_tph(vdev->pdev);
 	vfio_pci_tph_deinit(vdev);
 	vfio_config_free(vdev);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 08/12] vfio/pci: Hide TPH capability when TPH is unsupported
  2026-06-04 13:28 ` [PATCH v16 08/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
@ 2026-06-04 13:48   ` sashiko-bot
  0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-06-04 13:48 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci, kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Missing bounds check on parsed PCIe capability length causes heap buffer overflow.
--

commit 2c400d6c4f255c3b600592b504b9399d3cd59033
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Hide TPH capability when TPH is unsupported

Check the device negotiated TPH support status before parsing the TPH
extended capability. Return zero length to hide the capability from
userspace if TPH is disabled during topology negotiation.

> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3a..5c6ab172df6c0 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
[ ... ]
> @@ -1450,6 +1451,8 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
>  		byte &= PCI_DPA_CAP_SUBSTATE_MASK;
>  		return PCI_DPA_BASE_SIZEOF + byte + 1;
>  	case PCI_EXT_CAP_ID_TPH:
> +		if (!pcie_tph_supported(pdev, false))
> +			return 0;
>  		ret = pci_read_config_dword(pdev, epos + PCI_TPH_CAP, &dword);
>  		if (ret)
>  			return pcibios_err_to_errno(ret);

[Severity: Critical]
This is a pre-existing issue, but since we are adding support for another
variable-length capability, can the capability length returned here cause a
heap buffer overflow?

Looking at vfio_ecap_init() in drivers/vfio/pci/vfio_pci_config.c, the
returned length is used in memset() and vfio_fill_vconfig_bytes() without
bounds checking against the size of the config space buffer:

drivers/vfio/pci/vfio_pci_config.c:vfio_ecap_init() {
    ...
    len = vfio_ext_cap_len(vdev, ecap, epos);
    ...
    memset(map + epos, ecap, len);
    ret = vfio_fill_vconfig_bytes(vdev, epos, len);
    ...
}

If a malicious or compromised PCIe endpoint provides forged values for
capabilities like TPH or VNDR, resulting in a length that makes
epos + len > pdev->cfg_size, would this result in an out-of-bounds write
past the boundaries of the map and vconfig heap buffers?

A similar issue appears to exist for standard capabilities in vfio_cap_init()
where length is also not bounds-checked:

drivers/vfio/pci/vfio_pci_config.c:vfio_cap_init() {
    ...
    memset(map + pos, cap, len);
    ret = vfio_fill_vconfig_bytes(vdev, pos, len);
    ...
}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132804.28678-1-fengchengwen@huawei.com?part=8

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
  2026-06-04 13:28 ` [PATCH v16 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
@ 2026-06-04 13:51   ` sashiko-bot
  0 siblings, 0 replies; 23+ messages in thread
From: sashiko-bot @ 2026-06-04 13:51 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: linux-pci, kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Lockless modification of the `tph_permit` bitfield races with updates to adjacent bitfields, risking data corruption of critical device state flags.
- [Medium] The `tph_permit` flag is not reset when the VFIO device is closed, leaking state between user sessions.
--

commit 3866f62cd649303d4e63fcdfae7f9e49073c8b9c
Author: Chengwen Feng <fengchengwen@huawei.com>
Subject: vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration

This patch adds the VFIO_DEVICE_FEATURE_TPH_ST_CONFIG feature to allow
userspace to configure PCIe TPH Steering Tag table entries. It uses a shadow
table to cache entries, paired with a per-device mutex for concurrent access
protection.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 7c6e3e1213d3e..e20c92a3dfcb1 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -496,6 +496,50 @@ static const struct dev_pm_ops vfio_pci_core_pm_ops = {
[ ... ]
> +static void vfio_pci_tph_deinit(struct vfio_pci_core_device *vdev)
> +{
> +	kfree(vdev->tph_st_shadow);
> +	vdev->tph_st_shadow = NULL;
> +	vdev->tph_st_entries = 0;
> +}

[Severity: Medium]
When the VFIO device is closed, should tph_permit be reset to 0 here or in
vfio_pci_core_disable()?

Because struct vfio_pci_core_device persists across opens, a subsequent
user opening the same device will inherit tph_permit = 1.

Could this allow the new user to bypass the explicit feature opt-in sequence
required by the API and cause cross-session state leakage?

[ ... ]
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 5f32830766a0f..9cac61c63ead2 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -143,6 +143,9 @@ struct vfio_pci_core_device {
>  	struct notifier_block	nb;
>  	struct rw_semaphore	memory_lock;
>  	struct list_head	dmabufs;
> +	struct mutex		tph_lock;
> +	u16			*tph_st_shadow;
> +	u16			tph_st_entries;
>  };

[Severity: High]
Is it safe to modify tph_permit locklessly?

In struct vfio_pci_core_device, tph_permit is defined as a 1-bit bitfield
adjacent to other state flags like pm_runtime_engaged.

Because bitfields share the same underlying memory word, modifying
tph_permit translates to a non-atomic read-modify-write operation.

If vfio_pci_core_feature_tph_enable() sets vdev->tph_permit = 1 without a
lock, could it race with concurrent ioctls modifying adjacent bitfields
(such as pm_runtime_engaged under vdev->igate) and cause silent state
corruption?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132804.28678-1-fengchengwen@huawei.com?part=11

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
  2026-06-04 13:28 ` [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
@ 2026-06-04 13:54   ` sashiko-bot
  2026-06-15  1:26     ` fengchengwen
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-04 13:54 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: kvm, linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Lockless modification of the `tph_permit` bitfield causes a C11 data race with adjacent bitfields, potentially corrupting device state.
- [Medium] The `tph_permit` flag is not cleared when the device is closed, leaking the enabled state to subsequent users of the device.
--

commit d0a7c28453cdbf1699133633ce00cd13507a2648
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter

This commit introduces the framework for PCIe TPH (Transaction Processing
Hints) support in VFIO. It adds a global enable_unsafe_tph module parameter
and a new VFIO_DEVICE_FEATURE_TPH_ENABLE uapi to allow userspace to set a
per-device tph_permit flag.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3f8d093aacf8a..ae05343701f76 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c

[ ... ]

> @@ -1516,6 +1517,24 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
>  	return 0;
>  }
>  
> +static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
> +					    u32 flags, size_t argsz)
> +{
> +	int ret;
> +
> +	if (!enable_unsafe_tph)
> +		return -EOPNOTSUPP;
> +
> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
> +	if (ret <= 0)
> +		return ret;
> +
> +	if (!vdev->tph_permit)
> +		vdev->tph_permit = 1;

[Severity: High]
Does setting this bitfield without a lock cause a read-modify-write data
race with adjacent bitfields?

Looking at struct vfio_pci_core_device, tph_permit shares the same memory
word as pm_intx_masked and pm_runtime_engaged. Since C compilers implement
bitfield assignments via read-modify-write operations, could a concurrent
lockless update here overwrite a state change made elsewhere?

For example, vfio_pci_core_runtime_suspend() updates vdev->pm_intx_masked
without holding any locks. If the ioctl and runtime PM suspend execute
concurrently, it seems one could overwrite the other's state change, which
could cause the kernel to fail to unmask INTx interrupts during resume.

[Severity: Medium]
Is there a mechanism to clear this flag when the device is closed?

Because struct vfio_pci_core_device is tied to the PCI device lifetime and
is not reallocated between sessions, it appears a subsequent user opening
the device will inherit tph_permit = 1.

While hardware state might be cleaned up on close in functions like
vfio_pci_core_disable() or vfio_pci_tph_deinit(), could failing to clear
vdev->tph_permit bypass the userspace opt-in requirement for future sessions,
allowing a VMM that never called VFIO_DEVICE_FEATURE_TPH_ENABLE to configure
ST tables?

> +
> +	return 0;
> +}
> +
>  int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>  				void __user *arg, size_t argsz)
>  {

[ ... ]

> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 2ebba746c18f7..5f32830766a0f 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h

[ ... ]

> @@ -127,6 +127,7 @@ struct vfio_pci_core_device {
>  	bool			needs_pm_restore:1;
>  	bool			pm_intx_masked:1;
>  	bool			pm_runtime_engaged:1;
> +	bool			tph_permit:1;
>  	struct pci_saved_state	*pci_saved_state;
>  	struct pci_saved_state	*pm_save;
>  	int			ioeventfds_nr;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132804.28678-1-fengchengwen@huawei.com?part=9

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
  2026-06-04 13:28 ` [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag Chengwen Feng
@ 2026-06-04 13:55   ` sashiko-bot
  2026-06-15  1:31     ` fengchengwen
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-04 13:55 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: kvm, linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Medium] VFIO feature probing is broken for `VFIO_DEVICE_FEATURE_TPH_CPU_ST` due to premature capability check.

Pre-existing issues:
- [High] Data race on bitfields in `struct vfio_pci_core_device` due to unsynchronized updates to `tph_permit`.
--

commit 8e85ddd8531d274e066b704c2d380e850a4df8e5
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag

This commit adds a new read-only VFIO device feature
VFIO_DEVICE_FEATURE_TPH_CPU_ST to allow userspace to query PCIe TPH
Steering Tags by CPU ID. It implements a batch query logic and is
double-gated by a module parameter and a per-device tph_permit flag.

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index ae05343701f76..7c6e3e1213d3e 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1535,6 +1536,72 @@ static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
>  	return 0;
>  }
>  
> +static int vfio_pci_core_feature_tph_cpu_st(struct vfio_pci_core_device *vdev,
> +			u32 flags,
> +			struct vfio_device_feature_tph_cpu_st __user *arg,
> +			size_t argsz)
> +{
> +	struct vfio_device_feature_tph_cpu_st cpu_st;
> +	struct pci_dev *pdev = vdev->pdev;
> +	enum tph_mem_type mtype;
> +	void __user *uptr;
> +	bool extended;
> +	int i, ret;
> +	u32 *cpus;
> +	u16 *sts;
> +	u16 st;
> +
> +	if (!vdev->tph_permit)

[Severity: High]
This is a pre-existing issue, but does the tph_permit flag suffer from a data
race due to being packed in a bitfield without locking?

The tph_permit flag is packed as a bitfield (bool tph_permit:1;) alongside
other volatile device state flags like bardirty and needs_reset in
struct vfio_pci_core_device.

The vfio_pci_core_feature_tph_enable() ioctl modifies vdev->tph_permit
without holding any device-level locks:

    if (!vdev->tph_permit)
        vdev->tph_permit = 1;

Since bitfield modifications are compiled into non-atomic read-modify-write
operations on the underlying word, executing the TPH enable ioctl
concurrently with other state changes (such as vdev->bardirty = true; in
vfio_pci_core_write(), which is executed under memory_lock read-side) could
cause a data race. Can this silently overwrite and drop state changes,
potentially causing the device to malfunction due to missed resets
or unflushed BARs?

> +		return -EOPNOTSUPP;
> +
> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> +				 sizeof(cpu_st));

[Severity: Medium]
Will this early tph_permit check break VFIO feature probing for
VFIO_DEVICE_FEATURE_TPH_CPU_ST?

Because the dynamic permission check intercepts the call first, probing
TPH_CPU_ST will incorrectly fail with -EOPNOTSUPP unless the user has
already dynamically enabled TPH via VFIO_DEVICE_FEATURE_TPH_ENABLE.

vfio_check_feature() handles VFIO_DEVICE_FEATURE_PROBE operations, which
userspace uses to discover if the kernel understands a feature ioctl.
Moving the tph_permit check after vfio_check_feature() would resolve this,
as PROBE returns 0 immediately.

> +	if (ret <= 0)
> +		return ret;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132804.28678-1-fengchengwen@huawei.com?part=10

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers
  2026-06-04 13:28 ` [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
@ 2026-06-04 14:05   ` sashiko-bot
  2026-06-15  3:57     ` fengchengwen
  0 siblings, 1 reply; 23+ messages in thread
From: sashiko-bot @ 2026-06-04 14:05 UTC (permalink / raw)
  To: Chengwen Feng; +Cc: kvm, linux-pci

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [Critical] Concurrent config writes cause Double-Free/Use-After-Free in power management saved state.
- [High] Missing virtual configuration rollback on power state transition failure.
- [High] Missing virtualization of TPH Capability Header leaks the physical Next Capability Pointer to the guest.
- [High] Race condition (TOCTOU) allows concurrent config writes to corrupt virtual state and silently lose write requests.
- [Medium] Silent failure when programming TPH shadow ST table entries into hardware.

Pre-existing issues:
- [High] Out-of-bounds write in extended capability parsing loop due to missing boundary checks.
--

commit 32f3411184cca80f87b3f32db73925b8e64d5792
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Virtualize PCIe TPH capability registers

This commit virtualizes TPH extended capability registers, splitting
permissions so the header is read-only and the control register is writable.
It restricts ST-table programming to a specific feature and syncs the
cached shadow table to hardware upon enablement.

> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 5c6ab172df6c0..23c32aa5842fa 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -1086,6 +1086,109 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
>  	return 0;
>  }
>  
> +/* Permissions for TPH extended capability */
> +static int __init init_pci_ext_cap_tph_perm(struct perm_bits *perm)
> +{
> +	int i;
> +
> +	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_TPH]))
> +		return -ENOMEM;
> +
> +	p_setd(perm, PCI_TPH_CAP, ALL_VIRT, NO_WRITE);

[Severity: High]
Does this miss virtualizing the TPH Capability Header at offset 0?

Other capabilities use p_setd(perm, 0, ALL_VIRT, NO_WRITE) to ensure the
guest reads the virtualized header instead of the physical one. Without this,
could guest reads to the TPH capability header fall through to physical
hardware, leaking the physical Next Capability Pointer and breaking VFIO's
ability to hide capabilities?

[ ... ]

> +static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> +				 int count, struct perm_bits *perm,
> +				 int offset, __le32 val)
> +{
> +	int req_en_byte = PCI_TPH_CTRL + 1;
> +	struct pci_dev *pdev = vdev->pdev;
> +	__le32 org_val = 0;
> +	bool extended;
> +	u8 mode, req;
> +	int i, ret;
> +	u16 start;
> +	u32 data;
> +
> +	if (!vdev->tph_permit)
> +		return count;
> +
> +	/* Back up the original values in order rollback when fail */
> +	if (offset <= req_en_byte && offset + count > req_en_byte)
> +		vfio_default_config_read(vdev, pos, count, perm, offset,
> +					 &org_val);
> +
> +	ret = vfio_default_config_write(vdev, pos, count, perm, offset, val);
> +	if (ret != count)
> +		return ret;

[Severity: High]
Can this create a race condition where concurrent writes corrupt the virtual
state?

Modifying vconfig via vfio_default_config_write() happens before acquiring
vdev->tph_lock. If two threads concurrently write to the control register,
both modify vconfig unprotected. If the first thread acquires the lock but
fails to program hardware, its rollback would blindly overwrite the second
thread's new state in vconfig.

When the second thread subsequently acquires the lock, wouldn't it read the
reverted vconfig value (the first thread's old state) and write it to
hardware, effectively discarding the second thread's write while returning
success?

> +
> +	/* Skip if write range does not cover Requester Enable byte */
> +	if (offset > req_en_byte || offset + count <= req_en_byte)
> +		return count;
> +
> +	ret = vfio_pci_set_power_state(vdev, PCI_D0);

[Severity: Critical]
Does calling vfio_pci_set_power_state() without a serializing lock expose a
double-free vulnerability?

If a guest VM concurrently writes to the TPH control register while the
physical device is in D3hot, both threads can evaluate needs_restore = true.
After the first thread transitions the device to D0, both threads will
concurrently execute pci_load_and_free_saved_state(pdev, &vdev->pm_save).

Since pci_load_and_free_saved_state() has no internal locking and directly
frees the shared pointer, could this concurrent execution cause a double-free
in the host kernel?

> +	if (ret)
> +		return count;

[Severity: High]
Is a rollback of the virtual configuration missing in this error path?

If the power state transition fails, the function returns count (indicating
success to userspace) but fails to roll back vconfig using the org_val backed
up earlier. Doesn't this leave vconfig falsely indicating that the TPH
Requester is configured while the physical hardware remains unprogrammed?

> +
> +	mutex_lock(&vdev->tph_lock);
> +
> +	start = vfio_find_cap_start(vdev, pos);
> +	data = le32_to_cpu(*(__le32 *)&vdev->vconfig[start + PCI_TPH_CTRL]);
> +	mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
> +	req = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
> +
> +	if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
> +		extended = !!(req == PCI_TPH_REQ_EXT_TPH);
> +		ret = pcie_enable_tph_explicit(pdev, mode, extended);
> +		if (!ret && vdev->tph_st_shadow) {
> +			for (i = 0; i < vdev->tph_st_entries; i++)
> +				pcie_tph_set_st_entry(pdev, i,
> +						      vdev->tph_st_shadow[i]);

[Severity: Medium]
Should the return value of pcie_tph_set_st_entry() be checked here?

If programming any ST entry fails, such as from hardware rejection or
inaccessible registers, the failure appears to be silently ignored. Could
this leave the hardware partially programmed while userspace is told the
operation succeeded?

> +		}

[ ... ]

> @@ -1704,6 +1810,8 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
>  		ret = vfio_fill_vconfig_bytes(vdev, epos, len);

[Severity: High]
This is a pre-existing issue, but could this capability parsing loop allow
out-of-bounds heap corruption?

If a malicious hardware device or hotplugged peripheral provides a malformed
capability list with a large length near the end of the PCIe config space,
epos + len can exceed pdev->cfg_size. This would cause a buffer overflow in
memset() and vfio_fill_vconfig_bytes(), corrupting the heap-allocated
vdev->pci_config_map and vdev->vconfig buffers.

>  		if (ret)
>  			return ret;
> +		if (ecap == PCI_EXT_CAP_ID_TPH && !hidden)
> +			vfio_tph_mask_ext_tph_bit(vdev, epos);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132804.28678-1-fengchengwen@huawei.com?part=12

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
  2026-06-04 13:54   ` sashiko-bot
@ 2026-06-15  1:26     ` fengchengwen
  0 siblings, 0 replies; 23+ messages in thread
From: fengchengwen @ 2026-06-15  1:26 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvm, linux-pci

On 6/4/2026 9:54 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Lockless modification of the `tph_permit` bitfield causes a C11 data race with adjacent bitfields, potentially corrupting device state.
> - [Medium] The `tph_permit` flag is not cleared when the device is closed, leaking the enabled state to subsequent users of the device.

Thanks for the feedback.

I'll address these two points in v17:
- Change tph_permit to a standalone bool to fix the bitfield data race, no lock needed.
- Reset tph_permit on device close to prevent state leakage across sessions.

Thanks

> --
> 
> commit d0a7c28453cdbf1699133633ce00cd13507a2648
> Author: Chengwen Feng <fengchengwen@huawei.com>
> 
> vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
> 
> This commit introduces the framework for PCIe TPH (Transaction Processing
> Hints) support in VFIO. It adds a global enable_unsafe_tph module parameter
> and a new VFIO_DEVICE_FEATURE_TPH_ENABLE uapi to allow userspace to set a
> per-device tph_permit flag.
> 
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 3f8d093aacf8a..ae05343701f76 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
> 
> [ ... ]
> 
>> @@ -1516,6 +1517,24 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
>>  	return 0;
>>  }
>>  
>> +static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
>> +					    u32 flags, size_t argsz)
>> +{
>> +	int ret;
>> +
>> +	if (!enable_unsafe_tph)
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
>> +	if (ret <= 0)
>> +		return ret;
>> +
>> +	if (!vdev->tph_permit)
>> +		vdev->tph_permit = 1;
> 
> [Severity: High]
> Does setting this bitfield without a lock cause a read-modify-write data
> race with adjacent bitfields?
> 
> Looking at struct vfio_pci_core_device, tph_permit shares the same memory
> word as pm_intx_masked and pm_runtime_engaged. Since C compilers implement
> bitfield assignments via read-modify-write operations, could a concurrent
> lockless update here overwrite a state change made elsewhere?
> 
> For example, vfio_pci_core_runtime_suspend() updates vdev->pm_intx_masked
> without holding any locks. If the ioctl and runtime PM suspend execute
> concurrently, it seems one could overwrite the other's state change, which
> could cause the kernel to fail to unmask INTx interrupts during resume.
> 
> [Severity: Medium]
> Is there a mechanism to clear this flag when the device is closed?
> 
> Because struct vfio_pci_core_device is tied to the PCI device lifetime and
> is not reallocated between sessions, it appears a subsequent user opening
> the device will inherit tph_permit = 1.
> 
> While hardware state might be cleaned up on close in functions like
> vfio_pci_core_disable() or vfio_pci_tph_deinit(), could failing to clear
> vdev->tph_permit bypass the userspace opt-in requirement for future sessions,
> allowing a VMM that never called VFIO_DEVICE_FEATURE_TPH_ENABLE to configure
> ST tables?
> 
>> +
>> +	return 0;
>> +}
>> +
>>  int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>>  				void __user *arg, size_t argsz)
>>  {
> 
> [ ... ]
> 
>> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
>> index 2ebba746c18f7..5f32830766a0f 100644
>> --- a/include/linux/vfio_pci_core.h
>> +++ b/include/linux/vfio_pci_core.h
> 
> [ ... ]
> 
>> @@ -127,6 +127,7 @@ struct vfio_pci_core_device {
>>  	bool			needs_pm_restore:1;
>>  	bool			pm_intx_masked:1;
>>  	bool			pm_runtime_engaged:1;
>> +	bool			tph_permit:1;
>>  	struct pci_saved_state	*pci_saved_state;
>>  	struct pci_saved_state	*pm_save;
>>  	int			ioeventfds_nr;
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
  2026-06-04 13:55   ` sashiko-bot
@ 2026-06-15  1:31     ` fengchengwen
  0 siblings, 0 replies; 23+ messages in thread
From: fengchengwen @ 2026-06-15  1:31 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvm, linux-pci

On 6/4/2026 9:55 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> 
> New issues:
> - [Medium] VFIO feature probing is broken for `VFIO_DEVICE_FEATURE_TPH_CPU_ST` due to premature capability check.
> 
> Pre-existing issues:
> - [High] Data race on bitfields in `struct vfio_pci_core_device` due to unsynchronized updates to `tph_permit`.
> --

Thanks for the review.

I will fix both issues in v17:
1. Change tph_permit from bitfield to an independent bool variable.
   Single-byte access is atomic, which resolves the lockless data race completely.
2. Adjust code order: move the tph_permit permission check after vfio_check_feature,
   so that VFIO_DEVICE_FEATURE_PROBE can work normally for TPH_CPU_ST.

Thanks

> 
> commit 8e85ddd8531d274e066b704c2d380e850a4df8e5
> Author: Chengwen Feng <fengchengwen@huawei.com>
> 
> vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
> 
> This commit adds a new read-only VFIO device feature
> VFIO_DEVICE_FEATURE_TPH_CPU_ST to allow userspace to query PCIe TPH
> Steering Tags by CPU ID. It implements a batch query logic and is
> double-gated by a module parameter and a per-device tph_permit flag.
> 
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index ae05343701f76..7c6e3e1213d3e 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
> [ ... ]
>> @@ -1535,6 +1536,72 @@ static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
>>  	return 0;
>>  }
>>  
>> +static int vfio_pci_core_feature_tph_cpu_st(struct vfio_pci_core_device *vdev,
>> +			u32 flags,
>> +			struct vfio_device_feature_tph_cpu_st __user *arg,
>> +			size_t argsz)
>> +{
>> +	struct vfio_device_feature_tph_cpu_st cpu_st;
>> +	struct pci_dev *pdev = vdev->pdev;
>> +	enum tph_mem_type mtype;
>> +	void __user *uptr;
>> +	bool extended;
>> +	int i, ret;
>> +	u32 *cpus;
>> +	u16 *sts;
>> +	u16 st;
>> +
>> +	if (!vdev->tph_permit)
> 
> [Severity: High]
> This is a pre-existing issue, but does the tph_permit flag suffer from a data
> race due to being packed in a bitfield without locking?
> 
> The tph_permit flag is packed as a bitfield (bool tph_permit:1;) alongside
> other volatile device state flags like bardirty and needs_reset in
> struct vfio_pci_core_device.
> 
> The vfio_pci_core_feature_tph_enable() ioctl modifies vdev->tph_permit
> without holding any device-level locks:
> 
>     if (!vdev->tph_permit)
>         vdev->tph_permit = 1;
> 
> Since bitfield modifications are compiled into non-atomic read-modify-write
> operations on the underlying word, executing the TPH enable ioctl
> concurrently with other state changes (such as vdev->bardirty = true; in
> vfio_pci_core_write(), which is executed under memory_lock read-side) could
> cause a data race. Can this silently overwrite and drop state changes,
> potentially causing the device to malfunction due to missed resets
> or unflushed BARs?
> 
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
>> +				 sizeof(cpu_st));
> 
> [Severity: Medium]
> Will this early tph_permit check break VFIO feature probing for
> VFIO_DEVICE_FEATURE_TPH_CPU_ST?
> 
> Because the dynamic permission check intercepts the call first, probing
> TPH_CPU_ST will incorrectly fail with -EOPNOTSUPP unless the user has
> already dynamically enabled TPH via VFIO_DEVICE_FEATURE_TPH_ENABLE.
> 
> vfio_check_feature() handles VFIO_DEVICE_FEATURE_PROBE operations, which
> userspace uses to discover if the kernel understands a feature ioctl.
> Moving the tph_permit check after vfio_check_feature() would resolve this,
> as PROBE returns 0 immediately.
> 
>> +	if (ret <= 0)
>> +		return ret;
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers
  2026-06-04 14:05   ` sashiko-bot
@ 2026-06-15  3:57     ` fengchengwen
  0 siblings, 0 replies; 23+ messages in thread
From: fengchengwen @ 2026-06-15  3:57 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvm, linux-pci

On 6/4/2026 10:05 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> 
> New issues:
> - [Critical] Concurrent config writes cause Double-Free/Use-After-Free in power management saved state.

will use memory_lock to protect the power management

> - [High] Missing virtual configuration rollback on power state transition failure.

will do rollback in v17

> - [High] Missing virtualization of TPH Capability Header leaks the physical Next Capability Pointer to the guest.

will done in v17

> - [High] Race condition (TOCTOU) allows concurrent config writes to corrupt virtual state and silently lose write requests.

Will use tph_lock to protect the concurrent config write.

> - [Medium] Silent failure when programming TPH shadow ST table entries into hardware.

I don't think we need this, the device's state is ok for write I think

Thanks

> 
> Pre-existing issues:
> - [High] Out-of-bounds write in extended capability parsing loop due to missing boundary checks.
> --
> 
> commit 32f3411184cca80f87b3f32db73925b8e64d5792
> Author: Chengwen Feng <fengchengwen@huawei.com>
> 
> vfio/pci: Virtualize PCIe TPH capability registers
> 
> This commit virtualizes TPH extended capability registers, splitting
> permissions so the header is read-only and the control register is writable.
> It restricts ST-table programming to a specific feature and syncs the
> cached shadow table to hardware upon enablement.
> 
>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
>> index 5c6ab172df6c0..23c32aa5842fa 100644
>> --- a/drivers/vfio/pci/vfio_pci_config.c
>> +++ b/drivers/vfio/pci/vfio_pci_config.c
>> @@ -1086,6 +1086,109 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
>>  	return 0;
>>  }
>>  
>> +/* Permissions for TPH extended capability */
>> +static int __init init_pci_ext_cap_tph_perm(struct perm_bits *perm)
>> +{
>> +	int i;
>> +
>> +	if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_TPH]))
>> +		return -ENOMEM;
>> +
>> +	p_setd(perm, PCI_TPH_CAP, ALL_VIRT, NO_WRITE);
> 
> [Severity: High]
> Does this miss virtualizing the TPH Capability Header at offset 0?
> 
> Other capabilities use p_setd(perm, 0, ALL_VIRT, NO_WRITE) to ensure the
> guest reads the virtualized header instead of the physical one. Without this,
> could guest reads to the TPH capability header fall through to physical
> hardware, leaking the physical Next Capability Pointer and breaking VFIO's
> ability to hide capabilities?
> 
> [ ... ]
> 
>> +static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
>> +				 int count, struct perm_bits *perm,
>> +				 int offset, __le32 val)
>> +{
>> +	int req_en_byte = PCI_TPH_CTRL + 1;
>> +	struct pci_dev *pdev = vdev->pdev;
>> +	__le32 org_val = 0;
>> +	bool extended;
>> +	u8 mode, req;
>> +	int i, ret;
>> +	u16 start;
>> +	u32 data;
>> +
>> +	if (!vdev->tph_permit)
>> +		return count;
>> +
>> +	/* Back up the original values in order rollback when fail */
>> +	if (offset <= req_en_byte && offset + count > req_en_byte)
>> +		vfio_default_config_read(vdev, pos, count, perm, offset,
>> +					 &org_val);
>> +
>> +	ret = vfio_default_config_write(vdev, pos, count, perm, offset, val);
>> +	if (ret != count)
>> +		return ret;
> 
> [Severity: High]
> Can this create a race condition where concurrent writes corrupt the virtual
> state?
> 
> Modifying vconfig via vfio_default_config_write() happens before acquiring
> vdev->tph_lock. If two threads concurrently write to the control register,
> both modify vconfig unprotected. If the first thread acquires the lock but
> fails to program hardware, its rollback would blindly overwrite the second
> thread's new state in vconfig.
> 
> When the second thread subsequently acquires the lock, wouldn't it read the
> reverted vconfig value (the first thread's old state) and write it to
> hardware, effectively discarding the second thread's write while returning
> success?
> 
>> +
>> +	/* Skip if write range does not cover Requester Enable byte */
>> +	if (offset > req_en_byte || offset + count <= req_en_byte)
>> +		return count;
>> +
>> +	ret = vfio_pci_set_power_state(vdev, PCI_D0);
> 
> [Severity: Critical]
> Does calling vfio_pci_set_power_state() without a serializing lock expose a
> double-free vulnerability?
> 
> If a guest VM concurrently writes to the TPH control register while the
> physical device is in D3hot, both threads can evaluate needs_restore = true.
> After the first thread transitions the device to D0, both threads will
> concurrently execute pci_load_and_free_saved_state(pdev, &vdev->pm_save).
> 
> Since pci_load_and_free_saved_state() has no internal locking and directly
> frees the shared pointer, could this concurrent execution cause a double-free
> in the host kernel?
> 
>> +	if (ret)
>> +		return count;
> 
> [Severity: High]
> Is a rollback of the virtual configuration missing in this error path?
> 
> If the power state transition fails, the function returns count (indicating
> success to userspace) but fails to roll back vconfig using the org_val backed
> up earlier. Doesn't this leave vconfig falsely indicating that the TPH
> Requester is configured while the physical hardware remains unprogrammed?
> 
>> +
>> +	mutex_lock(&vdev->tph_lock);
>> +
>> +	start = vfio_find_cap_start(vdev, pos);
>> +	data = le32_to_cpu(*(__le32 *)&vdev->vconfig[start + PCI_TPH_CTRL]);
>> +	mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
>> +	req = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
>> +
>> +	if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
>> +		extended = !!(req == PCI_TPH_REQ_EXT_TPH);
>> +		ret = pcie_enable_tph_explicit(pdev, mode, extended);
>> +		if (!ret && vdev->tph_st_shadow) {
>> +			for (i = 0; i < vdev->tph_st_entries; i++)
>> +				pcie_tph_set_st_entry(pdev, i,
>> +						      vdev->tph_st_shadow[i]);
> 
> [Severity: Medium]
> Should the return value of pcie_tph_set_st_entry() be checked here?
> 
> If programming any ST entry fails, such as from hardware rejection or
> inaccessible registers, the failure appears to be silently ignored. Could
> this leave the hardware partially programmed while userspace is told the
> operation succeeded?
> 
>> +		}
> 
> [ ... ]
> 
>> @@ -1704,6 +1810,8 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
>>  		ret = vfio_fill_vconfig_bytes(vdev, epos, len);
> 
> [Severity: High]
> This is a pre-existing issue, but could this capability parsing loop allow
> out-of-bounds heap corruption?
> 
> If a malicious hardware device or hotplugged peripheral provides a malformed
> capability list with a large length near the end of the PCIe config space,
> epos + len can exceed pdev->cfg_size. This would cause a buffer overflow in
> memset() and vfio_fill_vconfig_bytes(), corrupting the heap-allocated
> vdev->pci_config_map and vdev->vconfig buffers.
> 
>>  		if (ret)
>>  			return ret;
>> +		if (ecap == PCI_EXT_CAP_ID_TPH && !hidden)
>> +			vfio_tph_mask_ext_tph_bit(vdev, epos);
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 00/12] vfio/pci: Add PCIe TPH support
  2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
                   ` (11 preceding siblings ...)
  2026-06-04 13:28 ` [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
@ 2026-06-15 12:19 ` fengchengwen
  2026-06-15 14:19   ` Alex Williamson
  12 siblings, 1 reply; 23+ messages in thread
From: fengchengwen @ 2026-06-15 12:19 UTC (permalink / raw)
  To: alex, jgg
  Cc: wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci

Gentle ping for v16 TPH series.

Would appreciate reviews from Alex and Jason. Let me know if major changes are needed.

Thanks,
Chengwen Feng


On 6/4/2026 9:27 PM, Chengwen Feng wrote:
> This patchset enables full userspace configurable PCIe TPH support for
> VFIO, which brings performance benefits for userspace workloads such as
> DPDK and SPDK.
> 
> Currently VFIO only exposes read-only TPH capability registers to 
> userspace, while all write operations are silently discarded. This 
> prevents userspace from enabling and configuring TPH, limiting performance
> optimization opportunities.
> 
> Per PCIe spec 7.5.3.15: TPH Completer support is applicable to Root Ports
> and Endpoints, allowing Steering Tags to target host CPUs or peer devices
> for P2P transactions.
> 
> TPH usage model can be divided into three fundamental parts:
> 1. Retrieve Steering Tag:
>    - Tags targeting host CPUs are obtained via platform methods (ACPI _DSM)
>      wrapped in pcie_tph_get_cpu_st(). Userspace requires a generic 
>      interface to query these CPU-associated ST values.
>    - Tags targeting peer devices are managed by userspace drivers.
> 2. Program Steering Tag table:
>    - For devices with standard ST table structures (in capability space or
>      MSI-X table), userspace needs a unified interface to configure ST 
>      entries.
>    - Devices without standard ST tables are handled by userspace itself.
> 3. Toggle device TPH Requester enable/disable state.
> 
> To support the above scenarios, this series extends VFIO with complete TPH
> virtualization features:
> - New device feature TPH_CPU_ST: Batch query interface to resolve 
>   CPU-specific ST tags.
> - New device feature TPH_ST_CONFIG: Batch configure interface for device ST
>   table entries, with shadow cache and atomic rollback support.
> - Full TPH capability register virtualization: allow userspace to toggle
>   TPH Requester state via TPH_CTRL register writes.
> 
> To guarantee isolation and security, this patchset adopts a two-level
> safety gate design with careful ABI considerations:
> 1. Global unsafe gate:
>    TPH caching behavior may cross isolation domains and impact shared
>    platform resources. A new module parameter `enable_unsafe_tph` is
>    introduced (default off) to globally gate all TPH functionalities.
> 2. Per-device opt-in gate:
>    To preserve strict ABI compatibility and avoid unexpected hardware
>    state changes for existing users, a new VFIO device feature TPH_ENABLE
>    is added. TPH capabilities are only available after userspace explicitly
>    enables it per-device.
> 
> Because Kernel PCI TPH implementation requires TPH Requester to be enabled
> before programming ST entries. To support userspace configuring ST table
> in arbitrary order, a shadow ST table is introduced to buffer ST writes
> before TPH is enabled. All cached entries are flushed to hardware when
> TPH Requester turns on. This also provides atomic batch rollback capability
> for reliable configuration.
> 
> The patchset is split into two logical parts: the first seven patches fix
> and refactor core PCI/TPH kernel code to export required helper interfaces,
> and the remaining five patches implement corresponding VFIO TPH
> virtualization layer step by step.
> 
> Patch breakdown:
>  1/12 PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
>  2/12 PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
>  3/12 PCI/TPH: Cache TPH requester capability at probe time
>  4/12 PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
>  5/12 PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
>  6/12 PCI/TPH: expose the enabled TPH requester type
>  7/12 PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attrs
>  8/12 vfio/pci: Hide TPH capability when TPH is unsupported
>  9/12 vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
> 10/12 vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
> 11/12 vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
> 12/12 vfio/pci: Virtualize PCIe TPH capability registers
> 
> Based on earlier RFC work by Wathsala Vithanage
> 
> ---
> v16: 
> - Supports opt-in at the device level which address Alex's comment.
> - Split sub-commit: add hide TPH capability when TPH is unsupported.
> - Optimize the tph fields layout of the pci_dev structure.
> - Optimize virtualize PCIe TPH capability commit: support rollback
>   when set fail.
> - Reorder PCI/TPH commits: make fix commit ahead.
> - Reorganized the cover letter to serve as the starting point for
>   discussion.
> v15: Address Alex's comments:
> - Drop TPH capability when tph_cap=0
> - Use _explicit postfix other than add policy parameter for enable
>   TPH and get tph st.
> - Make sure set st entry under D0
> - Reimpl virtualize TPH capability register
> - Other fix
> v14:
> - Return PCI_TPH_LOC_NONE when !CONFIG_PCIE_TPH accord Alex's comment
> - Fix Sashiko comments:
>   - Clear ST shadow state across user session
>   - Fix out-of-bounds byte masking in vfio_pci_tph_config_read
> 
> Chengwen Feng (11):
>   PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
>   PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
>   PCI/TPH: Cache TPH requester capability at probe time
>   PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
>   PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
>   PCI/TPH: Add pcie_tph_supported() helper to check TPH capability
>     attributes
>   vfio/pci: Hide TPH capability when TPH is unsupported
>   vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
>   vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
>   vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
>   vfio/pci: Virtualize PCIe TPH capability registers
> 
> Zhiping Zhang (1):
>   PCI/TPH: expose the enabled TPH requester type
> 
>  drivers/pci/tph.c                  | 220 ++++++++++++++++++++--------
>  drivers/vfio/pci/vfio_pci.c        |  13 +-
>  drivers/vfio/pci/vfio_pci_config.c | 111 ++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c   | 228 ++++++++++++++++++++++++++++-
>  include/linux/pci-tph.h            |  22 +++
>  include/linux/pci.h                |   6 +-
>  include/linux/vfio_pci_core.h      |   7 +-
>  include/uapi/linux/vfio.h          |  60 ++++++++
>  8 files changed, 600 insertions(+), 67 deletions(-)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v16 00/12] vfio/pci: Add PCIe TPH support
  2026-06-15 12:19 ` [PATCH v16 00/12] vfio/pci: Add PCIe TPH support fengchengwen
@ 2026-06-15 14:19   ` Alex Williamson
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2026-06-15 14:19 UTC (permalink / raw)
  To: fengchengwen
  Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, zhipingz, wangzhou1,
	wangyushan12, liuyonglong, kvm, linux-pci, alex

On Mon, 15 Jun 2026 20:19:48 +0800
fengchengwen <fengchengwen@huawei.com> wrote:

> Gentle ping for v16 TPH series.
> 
> Would appreciate reviews from Alex and Jason. Let me know if major
> changes are needed.

The same changes that were still being discussed while v15 and v16 were
posted, I don't think vfio-pci is the right place to host a CPU to ST
translation interface.  Thanks,

Alex


> On 6/4/2026 9:27 PM, Chengwen Feng wrote:
> > This patchset enables full userspace configurable PCIe TPH support
> > for VFIO, which brings performance benefits for userspace workloads
> > such as DPDK and SPDK.
> > 
> > Currently VFIO only exposes read-only TPH capability registers to 
> > userspace, while all write operations are silently discarded. This 
> > prevents userspace from enabling and configuring TPH, limiting
> > performance optimization opportunities.
> > 
> > Per PCIe spec 7.5.3.15: TPH Completer support is applicable to Root
> > Ports and Endpoints, allowing Steering Tags to target host CPUs or
> > peer devices for P2P transactions.
> > 
> > TPH usage model can be divided into three fundamental parts:
> > 1. Retrieve Steering Tag:
> >    - Tags targeting host CPUs are obtained via platform methods
> > (ACPI _DSM) wrapped in pcie_tph_get_cpu_st(). Userspace requires a
> > generic interface to query these CPU-associated ST values.
> >    - Tags targeting peer devices are managed by userspace drivers.
> > 2. Program Steering Tag table:
> >    - For devices with standard ST table structures (in capability
> > space or MSI-X table), userspace needs a unified interface to
> > configure ST entries.
> >    - Devices without standard ST tables are handled by userspace
> > itself. 3. Toggle device TPH Requester enable/disable state.
> > 
> > To support the above scenarios, this series extends VFIO with
> > complete TPH virtualization features:
> > - New device feature TPH_CPU_ST: Batch query interface to resolve 
> >   CPU-specific ST tags.
> > - New device feature TPH_ST_CONFIG: Batch configure interface for
> > device ST table entries, with shadow cache and atomic rollback
> > support.
> > - Full TPH capability register virtualization: allow userspace to
> > toggle TPH Requester state via TPH_CTRL register writes.
> > 
> > To guarantee isolation and security, this patchset adopts a
> > two-level safety gate design with careful ABI considerations:
> > 1. Global unsafe gate:
> >    TPH caching behavior may cross isolation domains and impact
> > shared platform resources. A new module parameter
> > `enable_unsafe_tph` is introduced (default off) to globally gate
> > all TPH functionalities. 2. Per-device opt-in gate:
> >    To preserve strict ABI compatibility and avoid unexpected
> > hardware state changes for existing users, a new VFIO device
> > feature TPH_ENABLE is added. TPH capabilities are only available
> > after userspace explicitly enables it per-device.
> > 
> > Because Kernel PCI TPH implementation requires TPH Requester to be
> > enabled before programming ST entries. To support userspace
> > configuring ST table in arbitrary order, a shadow ST table is
> > introduced to buffer ST writes before TPH is enabled. All cached
> > entries are flushed to hardware when TPH Requester turns on. This
> > also provides atomic batch rollback capability for reliable
> > configuration.
> > 
> > The patchset is split into two logical parts: the first seven
> > patches fix and refactor core PCI/TPH kernel code to export
> > required helper interfaces, and the remaining five patches
> > implement corresponding VFIO TPH virtualization layer step by step.
> > 
> > Patch breakdown:
> >  1/12 PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
> >  2/12 PCI/TPH: Fix tph_enabled concurrent update race by bitfield
> > packing 3/12 PCI/TPH: Cache TPH requester capability at probe time
> >  4/12 PCI/TPH: Refactor pcie_enable_tph & add explicit requester
> > variant 5/12 PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit
> > variant 6/12 PCI/TPH: expose the enabled TPH requester type
> >  7/12 PCI/TPH: Add pcie_tph_supported() helper to check TPH
> > capability attrs 8/12 vfio/pci: Hide TPH capability when TPH is
> > unsupported 9/12 vfio/pci: Add TPH_ENABLE feature skeleton and
> > unsafe module parameter 10/12 vfio/pci: Add TPH_CPU_ST to query
> > CPU's TPH steering tag 11/12 vfio/pci: Add TPH_ST_CONFIG for PCIe
> > TPH ST configuration 12/12 vfio/pci: Virtualize PCIe TPH capability
> > registers
> > 
> > Based on earlier RFC work by Wathsala Vithanage
> > 
> > ---
> > v16: 
> > - Supports opt-in at the device level which address Alex's comment.
> > - Split sub-commit: add hide TPH capability when TPH is unsupported.
> > - Optimize the tph fields layout of the pci_dev structure.
> > - Optimize virtualize PCIe TPH capability commit: support rollback
> >   when set fail.
> > - Reorder PCI/TPH commits: make fix commit ahead.
> > - Reorganized the cover letter to serve as the starting point for
> >   discussion.
> > v15: Address Alex's comments:
> > - Drop TPH capability when tph_cap=0
> > - Use _explicit postfix other than add policy parameter for enable
> >   TPH and get tph st.
> > - Make sure set st entry under D0
> > - Reimpl virtualize TPH capability register
> > - Other fix
> > v14:
> > - Return PCI_TPH_LOC_NONE when !CONFIG_PCIE_TPH accord Alex's
> > comment
> > - Fix Sashiko comments:
> >   - Clear ST shadow state across user session
> >   - Fix out-of-bounds byte masking in vfio_pci_tph_config_read
> > 
> > Chengwen Feng (11):
> >   PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
> >   PCI/TPH: Fix tph_enabled concurrent update race by bitfield
> > packing PCI/TPH: Cache TPH requester capability at probe time
> >   PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
> >   PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
> >   PCI/TPH: Add pcie_tph_supported() helper to check TPH capability
> >     attributes
> >   vfio/pci: Hide TPH capability when TPH is unsupported
> >   vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module
> > parameter vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag
> >   vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
> >   vfio/pci: Virtualize PCIe TPH capability registers
> > 
> > Zhiping Zhang (1):
> >   PCI/TPH: expose the enabled TPH requester type
> > 
> >  drivers/pci/tph.c                  | 220
> > ++++++++++++++++++++-------- drivers/vfio/pci/vfio_pci.c        |
> > 13 +- drivers/vfio/pci/vfio_pci_config.c | 111 ++++++++++++++
> >  drivers/vfio/pci/vfio_pci_core.c   | 228
> > ++++++++++++++++++++++++++++- include/linux/pci-tph.h            |
> > 22 +++ include/linux/pci.h                |   6 +-
> >  include/linux/vfio_pci_core.h      |   7 +-
> >  include/uapi/linux/vfio.h          |  60 ++++++++
> >  8 files changed, 600 insertions(+), 67 deletions(-)
> >   
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-06-15 14:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 13:27 [PATCH v16 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 06/12] PCI/TPH: expose the enabled TPH requester type Chengwen Feng
2026-06-04 13:27 ` [PATCH v16 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
2026-06-04 13:28 ` [PATCH v16 08/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
2026-06-04 13:48   ` sashiko-bot
2026-06-04 13:28 ` [PATCH v16 09/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
2026-06-04 13:54   ` sashiko-bot
2026-06-15  1:26     ` fengchengwen
2026-06-04 13:28 ` [PATCH v16 10/12] vfio/pci: Add TPH_CPU_ST to query CPU's TPH steering tag Chengwen Feng
2026-06-04 13:55   ` sashiko-bot
2026-06-15  1:31     ` fengchengwen
2026-06-04 13:28 ` [PATCH v16 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
2026-06-04 13:51   ` sashiko-bot
2026-06-04 13:28 ` [PATCH v16 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
2026-06-04 14:05   ` sashiko-bot
2026-06-15  3:57     ` fengchengwen
2026-06-15 12:19 ` [PATCH v16 00/12] vfio/pci: Add PCIe TPH support fengchengwen
2026-06-15 14:19   ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.