* [PATCH v17 00/12] vfio/pci: Add PCIe TPH support
@ 2026-06-16 10:46 Chengwen Feng
2026-06-16 10:46 ` [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
` (11 more replies)
0 siblings, 12 replies; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
This patchset enables full userspace configurable PCIe TPH support for
VFIO, which brings performance benefits for userspace workloads such as
DPDK and SPDK.
Currently VFIO only exposes read-only TPH capability registers to
userspace, while all write operations are silently discarded. This
prevents userspace from enabling and configuring TPH, limiting performance
optimization opportunities.
Per PCIe spec 7.5.3.15: TPH Completer support is applicable to Root Ports
and Endpoints, allowing Steering Tags to target host CPUs or peer devices
for P2P transactions.
TPH usage model can be divided into three fundamental parts:
1. Retrieve Steering Tag:
- Tags targeting host CPUs are obtained via platform methods (ACPI _DSM)
wrapped in pcie_tph_get_cpu_st(). Userspace requires a generic
interface to query these CPU-associated ST values.
- Tags targeting peer devices are managed by userspace drivers.
2. Program Steering Tag table:
- For devices with standard ST table structures (in capability space or
MSI-X table), userspace needs a unified interface to configure ST
entries.
- Devices without standard ST tables are handled by userspace itself.
3. Toggle device TPH Requester enable/disable state.
To support the above scenarios, this series extends PCI and VFIO with
complete TPH virtualization features:
- [*PCI*] Support sysfs binary file to export CPU to steering-tag mapping,
so that userspace could retrieve CPU's ST by read.
- [*VFIO*] New device feature TPH_ST_CONFIG: Batch configure interface for
device ST table entries, with shadow cache and atomic rollback support.
- [*VFIO*] Full TPH capability register virtualization: allow userspace to
toggle TPH Requester state via TPH_CTRL register writes.
To guarantee isolation and security, this patchset adopts a two-level
safety gate design with careful ABI considerations:
1. Global unsafe gate:
TPH caching behavior may cross isolation domains and impact shared
platform resources. A new module parameter `enable_unsafe_tph` is
introduced (default off) to globally gate all VFIO TPH functionalities.
2. Per-device opt-in gate:
To preserve strict ABI compatibility and avoid unexpected hardware
state changes for existing users, a new VFIO device feature TPH_ENABLE
is added. TPH capabilities are only available after userspace explicitly
enables it per-device.
Because Kernel PCI TPH implementation requires TPH Requester to be enabled
before programming ST entries. To support userspace configuring ST table
in arbitrary order, a shadow ST table is introduced to buffer ST writes
before TPH is enabled. All cached entries are flushed to hardware when
TPH Requester turns on. This also provides atomic batch rollback capability
for reliable configuration.
The patchset is split into two logical parts: the first eight patches fix
and refactor core PCI/TPH kernel code to export required helper interfaces
and CPU to ST mapping, the remaining four patches implement corresponding
VFIO TPH virtualization layer step by step.
Based on earlier RFC work by Wathsala Vithanage
---
v17:
- Move retrieve CPU to ST mapping logic from VFIO to PCI subsystem
- Remove tph_lock which seemed not use
- Fix Sashiko review comment of v16:
- tph_permit is bit field which has concurrent problem
- Fix tph_permit not reset when re-open device
- TPH capability virtualization write has concurrent, don't rollback
original value problems.
- Missing virtualization of TPH Capability Header leaks the physical
Next Capability Pointer to the guest
v16:
- Supports opt-in at the device level which address Alex's comment.
- Split sub-commit: add hide TPH capability when TPH is unsupported.
- Optimize the tph fields layout of the pci_dev structure.
- Optimize virtualize PCIe TPH capability commit: support rollback
when set fail.
- Reorder PCI/TPH commits: make fix commit ahead.
- Reorganized the cover letter to serve as the starting point for
discussion.
v15: Address Alex's comments:
- Drop TPH capability when tph_cap=0
- Use _explicit postfix other than add policy parameter for enable
TPH and get tph st.
- Make sure set st entry under D0
- Reimpl virtualize TPH capability register
- Other fix
Chengwen Feng (11):
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
PCI/TPH: Cache TPH requester capability at probe time
PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
PCI/TPH: Add pcie_tph_supported() helper to check TPH capability
attributes
PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
vfio/pci: Hide TPH capability when TPH is unsupported
vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
vfio/pci: Virtualize PCIe TPH capability registers
Zhiping Zhang (1):
PCI/TPH: Expose the enabled TPH requester type
drivers/pci/pci-sysfs.c | 3 +
drivers/pci/pci.h | 4 +
drivers/pci/tph.c | 325 ++++++++++++++++++++++-------
drivers/vfio/pci/vfio_pci.c | 13 +-
drivers/vfio/pci/vfio_pci_config.c | 121 +++++++++++
drivers/vfio/pci/vfio_pci_core.c | 152 +++++++++++++-
include/linux/pci-tph.h | 22 ++
include/linux/pci.h | 6 +-
include/linux/vfio_pci_core.h | 6 +-
include/uapi/linux/pci.h | 15 ++
include/uapi/linux/vfio.h | 29 +++
11 files changed, 618 insertions(+), 78 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 11:00 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
` (10 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.
This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.
This doesn't make a difference to mlx5_st_create(), the lone external
caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.
Also add tph_cap validation for pcie_tph_get_st_table_loc() to prevent
invalid PCI configuration space access when TPH is not supported. Add stub
functions for pcie_tph_get_st_table_size() and pcie_tph_get_st_table_loc()
when !CONFIG_PCIE_TPH.
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 12 +++++-------
include/linux/pci-tph.h | 5 +++++
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 91145e8d9d95..bef3a55539c4 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -166,11 +166,14 @@ static u8 get_st_modes(struct pci_dev *pdev)
*/
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{
- u32 reg;
+ u32 reg = 0;
+
+ if (!pdev->tph_cap)
+ return PCI_TPH_LOC_NONE;
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+ return reg & PCI_TPH_CAP_LOC_MASK;
}
EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
@@ -185,9 +188,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
/* Check ST table location first */
loc = pcie_tph_get_st_table_loc(pdev);
-
- /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
if (loc != PCI_TPH_LOC_CAP)
return 0;
@@ -316,8 +316,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
loc = pcie_tph_get_st_table_loc(pdev);
- /* Convert loc to match with PCI_TPH_LOC_* */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
switch (loc) {
case PCI_TPH_LOC_MSIX:
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..6f02b020d7d7 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -8,6 +8,7 @@
*/
#ifndef LINUX_PCI_TPH_H
#define LINUX_PCI_TPH_H
+#include <linux/pci.h>
/*
* According to the ECN for PCI Firmware Spec, Steering Tag can be different
@@ -41,6 +42,10 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return PCI_TPH_LOC_NONE; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-16 10:46 ` [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:55 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
` (9 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Split tph_enabled from shared pci_dev bitfield into spare bit of tph_cap's
u16: tph_cap is immutable post-enumeration (15 bits for offset), remaining
1 bit stores tph_enabled. Removes cross-bitfield concurrent write hazards
highlighted by Sashiko after VFIO TPH exposure. No functional changes.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
include/linux/pci.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..109182658f76 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -471,7 +471,6 @@ struct pci_dev {
unsigned int ats_enabled:1; /* Address Translation Svc */
unsigned int pasid_enabled:1; /* Process Address Space ID */
unsigned int pri_enabled:1; /* Page Request Interface */
- unsigned int tph_enabled:1; /* TLP Processing Hints */
unsigned int fm_enabled:1; /* Flit Mode (segment captured) */
unsigned int is_managed:1; /* Managed via devres */
unsigned int is_msi_managed:1; /* MSI release via devres installed */
@@ -589,7 +588,8 @@ struct pci_dev {
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
#ifdef CONFIG_PCIE_TPH
- u16 tph_cap; /* TPH capability offset */
+ u16 tph_cap:15; /* TPH capability offset */
+ u16 tph_enabled:1; /* Whether TPH is enabled */
u8 tph_mode; /* TPH mode */
u8 tph_req_type; /* TPH requester type */
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 03/12] PCI/TPH: Cache TPH requester capability at probe time
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-16 10:46 ` [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-06-16 10:46 ` [PATCH v17 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:55 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
` (8 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Calculate the negotiated TPH requester type from device and root port
capabilities once in pci_tph_init().
Add tph_ext_support flag to cache whether the device is allowed to
issue Extended TPH requests after topology negotiation. If the final
requester type is disabled, clear TPH capability to prevent usage.
Simplify pcie_enable_tph() by using the cached requester capability
instead of recalculating every time.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 43 +++++++++++++++++++++++++------------------
include/linux/pci.h | 4 +++-
2 files changed, 28 insertions(+), 19 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index bef3a55539c4..951f0a33ff66 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -384,7 +384,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
{
u32 reg;
u8 dev_modes;
- u8 rp_req_type;
/* Honor "notph" kernel parameter */
if (pci_tph_disabled)
@@ -404,23 +403,8 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
pdev->tph_mode = mode;
- /* Get req_type supported by device and its Root Port */
- pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
- pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
- else
- pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
-
- /* Check if the device is behind a Root Port */
- if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
- rp_req_type = get_rp_completer_type(pdev);
-
- /* Final req_type is the smallest value of two */
- pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
- }
-
- if (pdev->tph_req_type == PCI_TPH_REQ_DISABLE)
- return -EINVAL;
+ pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
+ PCI_TPH_REQ_TPH_ONLY;
/* Write them into TPH control register */
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
@@ -510,13 +494,36 @@ void pci_no_tph(void)
void pci_tph_init(struct pci_dev *pdev)
{
+ u8 tph_req_type, rp_req_type;
int num_entries;
u32 save_size;
+ u32 reg = 0;
pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
if (!pdev->tph_cap)
return;
+ /* Get req_type supported by device and its Root Port */
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+ if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
+ tph_req_type = PCI_TPH_REQ_EXT_TPH;
+ else
+ tph_req_type = PCI_TPH_REQ_TPH_ONLY;
+
+ /* Check if the device is behind a Root Port */
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
+ rp_req_type = get_rp_completer_type(pdev);
+ /* Final req_type is the smallest value of two */
+ tph_req_type = min(tph_req_type, rp_req_type);
+ }
+
+ if (tph_req_type == PCI_TPH_REQ_DISABLE) {
+ pdev->tph_cap = 0;
+ return;
+ }
+
+ pdev->tph_ext_support = !!(tph_req_type == PCI_TPH_REQ_EXT_TPH);
+
num_entries = pcie_tph_get_st_table_size(pdev);
save_size = sizeof(u32) + num_entries * sizeof(u16);
pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 109182658f76..285c0f00882e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -588,7 +588,9 @@ struct pci_dev {
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
#ifdef CONFIG_PCIE_TPH
- u16 tph_cap:15; /* TPH capability offset */
+ u16 tph_cap:14; /* TPH capability offset */
+ u16 tph_ext_support:1; /* Indicate whether Extended TPH
+ * requester is supported */
u16 tph_enabled:1; /* Whether TPH is enabled */
u8 tph_mode; /* TPH mode */
u8 tph_req_type; /* TPH requester type */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (2 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:53 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
` (7 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Refactor pcie_enable_tph implementation: extract core logic into static
internal enable_tph() helper accepting explicit requester type.
- Preserve original pcie_enable_tph() unchanged as auto wrapper; it
auto-selects EXT/standard TPH requester per device capability, existing
bnxt/mlx5 callers require zero modification.
- Add exported pcie_enable_tph_explicit() with bool 'extended' parameter
for explicit STD/EXT selection, used by upcoming VFIO TPH support.
Input validation for EXT_TPH availability is retained inside helper to
reject invalid explicit EXT request if hardware does not support extended
requester.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 70 ++++++++++++++++++++++++++++-------------
include/linux/pci-tph.h | 4 +++
2 files changed, 53 insertions(+), 21 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 951f0a33ff66..51009ac9b379 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -364,23 +364,7 @@ void pcie_disable_tph(struct pci_dev *pdev)
}
EXPORT_SYMBOL(pcie_disable_tph);
-/**
- * pcie_enable_tph - Enable TPH support for device using a specific ST mode
- * @pdev: PCI device
- * @mode: ST mode to enable. Current supported modes include:
- *
- * - PCI_TPH_ST_NS_MODE: NO ST Mode
- * - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode
- * - PCI_TPH_ST_DS_MODE: Device Specific Mode
- *
- * Check whether the mode is actually supported by the device before enabling
- * and return an error if not. Additionally determine what types of requests,
- * TPH or extended TPH, can be issued by the device based on its TPH requester
- * capability and the Root Port's completer capability.
- *
- * Return: 0 on success, otherwise negative value (-errno)
- */
-int pcie_enable_tph(struct pci_dev *pdev, int mode)
+static int enable_tph(struct pci_dev *pdev, int mode, u8 req_type)
{
u32 reg;
u8 dev_modes;
@@ -401,10 +385,11 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
if (!((1 << mode) & dev_modes))
return -EINVAL;
- pdev->tph_mode = mode;
+ if (req_type == PCI_TPH_REQ_EXT_TPH && !pdev->tph_ext_support)
+ return -EINVAL;
- pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
- PCI_TPH_REQ_TPH_ONLY;
+ pdev->tph_mode = mode;
+ pdev->tph_req_type = req_type;
/* Write them into TPH control register */
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
@@ -413,7 +398,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
reg |= FIELD_PREP(PCI_TPH_CTRL_MODE_SEL_MASK, pdev->tph_mode);
reg &= ~PCI_TPH_CTRL_REQ_EN_MASK;
- reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, pdev->tph_req_type);
+ reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, req_type);
pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
@@ -421,8 +406,51 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
return 0;
}
+
+/**
+ * pcie_enable_tph - Enable TPH support for device using a specific ST mode
+ * @pdev: PCI device
+ * @mode: ST mode to enable. Current supported modes include:
+ *
+ * - PCI_TPH_ST_NS_MODE: NO ST Mode
+ * - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode
+ * - PCI_TPH_ST_DS_MODE: Device Specific Mode
+ *
+ * Check whether the mode is actually supported by the device before enabling
+ * and return an error if not. Additionally determine what types of requests,
+ * TPH or extended TPH, can be issued by the device based on its TPH requester
+ * capability and the Root Port's completer capability.
+ *
+ * Return: 0 on success, otherwise negative value (-errno)
+ */
+int pcie_enable_tph(struct pci_dev *pdev, int mode)
+{
+ u8 req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
+ PCI_TPH_REQ_TPH_ONLY;
+ return enable_tph(pdev, mode, req_type);
+}
EXPORT_SYMBOL(pcie_enable_tph);
+/**
+ * pcie_enable_tph_explicit - Enable TPH with explicit requester selection
+ * @pdev: PCI device to operate
+ * @mode: ST table operating mode (NS/IV/DS)
+ * @extended: true = EXT_TPH, false = standard TPH only
+ *
+ * Unlike auto-detecting pcie_enable_tph(), caller selects requester type
+ * manually instead of hardware auto-selection. Rejects EXT_TPH request
+ * if device lacks extended requester capability.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended)
+{
+ u8 req_type = extended ? PCI_TPH_REQ_EXT_TPH : PCI_TPH_REQ_TPH_ONLY;
+
+ return enable_tph(pdev, mode, req_type);
+}
+EXPORT_SYMBOL(pcie_enable_tph_explicit);
+
void pci_restore_tph_state(struct pci_dev *pdev)
{
struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 6f02b020d7d7..ca0faa98afac 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -29,6 +29,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev,
unsigned int cpu, u16 *tag);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
+int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
#else
@@ -42,6 +43,9 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode,
+ bool extended)
+{ return -EINVAL; }
static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
{ return 0; }
static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (3 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:53 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
` (6 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Refactor pcie_tph_get_cpu_st(): extract core logic into static internal
get_cpu_st() helper accepting explicit requester type parameter.
- Preserve original pcie_tph_get_cpu_st() unchanged as auto wrapper; it
uses existing pdev->tph_req_type automatically, existing callers require
no change.
- Add exported pcie_tph_get_cpu_st_explicit() with bool 'extended'
parameter for manual STD/EXT requester selection, consumed by upcoming
VFIO TPH code.
- Add capability check: reject explicit EXT request when device does not
support extended TPH requester.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 68 ++++++++++++++++++++++++++++++-----------
include/linux/pci-tph.h | 7 +++++
2 files changed, 57 insertions(+), 18 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 51009ac9b379..aca08671fdfe 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -231,21 +231,8 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
return pci_write_config_word(pdev, offset, tag);
}
-/**
- * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
- * with a specific CPU
- * @pdev: PCI device
- * @mem_type: target memory type (volatile or persistent RAM)
- * @cpu: associated CPU id
- * @tag: Steering Tag to be returned
- *
- * Return the Steering Tag for a target memory that is associated with a
- * specific CPU as indicated by cpu.
- *
- * Return: 0 if success, otherwise negative value (-errno)
- */
-int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
- unsigned int cpu, u16 *tag)
+static int get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+ u8 req_type, unsigned int cpu, u16 *tag)
{
#ifdef CONFIG_ACPI
struct pci_dev *rp;
@@ -269,19 +256,64 @@ int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
return -EINVAL;
}
- *tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
+ *tag = tph_extract_tag(mem_type, req_type, &info);
- pci_dbg(pdev, "get steering tag: mem_type=%s, cpu=%d, tag=%#04x\n",
+ pci_dbg(pdev, "get steering tag: mem_type=%s, req_type=%u, cpu=%d, tag=%#04x\n",
(mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
- cpu, *tag);
+ req_type, cpu, *tag);
return 0;
#else
return -ENODEV;
#endif
}
+
+/**
+ * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
+ * with a specific CPU
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @cpu: associated CPU id
+ * @tag: Steering Tag to be returned
+ *
+ * Return the Steering Tag for a target memory that is associated with a
+ * specific CPU as indicated by cpu.
+ *
+ * Return: 0 if success, otherwise negative value (-errno)
+ */
+int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+ unsigned int cpu, u16 *tag)
+{
+ return get_cpu_st(pdev, mem_type, pdev->tph_req_type, cpu, tag);
+}
EXPORT_SYMBOL(pcie_tph_get_cpu_st);
+/**
+ * pcie_tph_get_cpu_st_explicit - Get ST with explicit requester type
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @extended: true=EXT_TPH, false=standard TPH only
+ * @cpu: associated CPU id
+ * @tag: output steering tag pointer
+ *
+ * Unlike auto pcie_tph_get_cpu_st(), caller manually picks requester type.
+ * Rejects EXT request if device lacks extended requester capability.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+ enum tph_mem_type mem_type,
+ bool extended, unsigned int cpu, u16 *tag)
+{
+ u8 req_type = extended ? PCI_TPH_REQ_EXT_TPH : PCI_TPH_REQ_TPH_ONLY;
+
+ if (extended && !pdev->tph_ext_support)
+ return -EINVAL;
+
+ return get_cpu_st(pdev, mem_type, req_type, cpu, tag);
+}
+EXPORT_SYMBOL(pcie_tph_get_cpu_st_explicit);
+
/**
* pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index ca0faa98afac..1a508b3d511f 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -27,6 +27,9 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev,
int pcie_tph_get_cpu_st(struct pci_dev *dev,
enum tph_mem_type mem_type,
unsigned int cpu, u16 *tag);
+int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+ enum tph_mem_type mem_type,
+ bool extended, unsigned int cpu, u16 *tag);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
@@ -40,6 +43,10 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
enum tph_mem_type mem_type,
unsigned int cpu, u16 *tag)
{ return -EINVAL; }
+static inline int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+ enum tph_mem_type mem_type,
+ bool extended, unsigned int cpu, u16 *tag)
+{ return -EINVAL; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 06/12] PCI/TPH: Expose the enabled TPH requester type
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (4 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:51 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
` (5 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
From: Zhiping Zhang <zhipingz@meta.com>
Add pcie_tph_enabled_req_type() so drivers can query the enabled TPH
requester mode without reaching into pci_dev internals.
Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 12 ++++++++++++
include/linux/pci-tph.h | 3 +++
2 files changed, 15 insertions(+)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index aca08671fdfe..6c4623cacc85 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -483,6 +483,18 @@ int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended)
}
EXPORT_SYMBOL(pcie_enable_tph_explicit);
+/**
+ * pcie_tph_enabled_req_type - Return the device's enabled TPH requester type
+ * @pdev: PCI device to query
+ *
+ * Return: PCI_TPH_REQ_DISABLE, PCI_TPH_REQ_TPH_ONLY or PCI_TPH_REQ_EXT_TPH.
+ */
+u8 pcie_tph_enabled_req_type(struct pci_dev *pdev)
+{
+ return pdev->tph_req_type;
+}
+EXPORT_SYMBOL(pcie_tph_enabled_req_type);
+
void pci_restore_tph_state(struct pci_dev *pdev)
{
struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 1a508b3d511f..e4f7045fc152 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -33,6 +33,7 @@ int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
+u8 pcie_tph_enabled_req_type(struct pci_dev *pdev);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
#else
@@ -47,6 +48,8 @@ static inline int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
enum tph_mem_type mem_type,
bool extended, unsigned int cpu, u16 *tag)
{ return -EINVAL; }
+static inline u8 pcie_tph_enabled_req_type(struct pci_dev *pdev)
+{ return PCI_TPH_REQ_DISABLE; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (5 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:52 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
` (4 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add new helper pcie_tph_supported() with want_ext parameter:
- want_ext = false: Check if device has valid TPH capability;
- want_ext = true: Check hardware Extended TPH support.
This helper is prepared for follow-up VFIO TPH virtualization patches to
uniformly query basic TPH existence and Extended TPH capability.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 19 +++++++++++++++++++
include/linux/pci-tph.h | 3 +++
2 files changed, 22 insertions(+)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 6c4623cacc85..95280aab4fb5 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -600,3 +600,22 @@ void pci_tph_init(struct pci_dev *pdev)
save_size = sizeof(u32) + num_entries * sizeof(u16);
pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
}
+
+/**
+ * pcie_tph_supported - Check TPH capability attribute
+ * @pdev: PCI device to query
+ * @want_ext: false - check TPH cap exists; true - check EXT_TPH support
+ *
+ * Return: true on matched condition, false otherwise
+ */
+bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
+{
+ if (!pdev->tph_cap)
+ return false;
+
+ if (!want_ext)
+ return true;
+
+ return pdev->tph_ext_support;
+}
+EXPORT_SYMBOL(pcie_tph_supported);
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index e4f7045fc152..5917a0694c1d 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -36,6 +36,7 @@ int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
u8 pcie_tph_enabled_req_type(struct pci_dev *pdev);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext);
#else
static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag)
@@ -60,6 +61,8 @@ static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
{ return 0; }
static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{ return PCI_TPH_LOC_NONE; }
+static inline bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
+{ return false; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (6 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 11:00 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
` (3 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add per-device sysfs binary attribute tph_cpu_st to expose ACPI DSM CPU
to steering-tag data to userspace, resolving the concern that VFIO should
not host CPU-to-ST translation interfaces.
Follow PCI standard binattr framework: dynamic visible group, fixed-size
8-byte packed uapi entry, aligned offset read, root-only 0400 permission.
Refactor duplicate ACPI DSM logic into shared tph_get_cpu_st_info helper.
ABI: /sys/bus/pci/devices/<BDF>/tph_cpu_st
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/pci-sysfs.c | 3 ++
drivers/pci/pci.h | 4 ++
drivers/pci/tph.c | 113 +++++++++++++++++++++++++++++++++------
include/uapi/linux/pci.h | 15 ++++++
4 files changed, 120 insertions(+), 15 deletions(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index d37860841260..ad9e4e8d320b 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1832,6 +1832,9 @@ const struct attribute_group *pci_dev_attr_groups[] = {
#ifdef CONFIG_PCI_TSM
&pci_tsm_auth_attr_group,
&pci_tsm_attr_group,
+#endif
+#ifdef CONFIG_PCIE_TPH
+ &pcie_tph_cpu_st_attr_group,
#endif
NULL,
};
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4a14f88e543a..09306078a658 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1366,6 +1366,10 @@ static inline pci_power_t acpi_pci_choose_state(struct pci_dev *pdev)
extern const struct attribute_group aspm_ctrl_attr_group;
#endif
+#ifdef CONFIG_PCIE_TPH
+extern const struct attribute_group pcie_tph_cpu_st_attr_group;
+#endif
+
#ifdef CONFIG_X86_INTEL_MID
bool pci_use_mid_pm(void);
int mid_pci_set_power_state(struct pci_dev *pdev, pci_power_t state);
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 95280aab4fb5..aca5093e8152 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -130,6 +130,29 @@ static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
return AE_OK;
}
+
+static int tph_get_cpu_st_info(struct pci_dev *pdev, unsigned int cpu,
+ union st_info *info)
+{
+ acpi_handle rp_acpi_handle;
+ struct pci_dev *rp;
+ u32 cpu_uid;
+ int ret;
+
+ ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+ if (ret != 0)
+ return ret;
+
+ rp = pcie_find_root_port(pdev);
+ if (!rp || !rp->bus || !rp->bus->bridge)
+ return -ENODEV;
+
+ rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
+ if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, info) != AE_OK)
+ return -EINVAL;
+
+ return 0;
+}
#endif
/* Update the TPH Requester Enable field of TPH Control Register */
@@ -231,31 +254,36 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
return pci_write_config_word(pdev, offset, tag);
}
+static void get_cpu_all_st(struct pci_dev *pdev, unsigned int cpu,
+ struct pci_tph_cpu_st *st)
+{
+ memset(st, 0, sizeof(*st));
+#ifdef CONFIG_ACPI
+ union st_info info;
+ int ret;
+
+ ret = tph_get_cpu_st_info(pdev, cpu, &info);
+ if (ret == 0) {
+ st->vm_st = info.vm_st_valid ? info.vm_st : 0;
+ st->pm_st = info.pm_st_valid ? info.pm_st : 0;
+ st->vm_xst = info.vm_xst_valid ? info.vm_xst : 0;
+ st->pm_xst = info.pm_xst_valid ? info.pm_xst : 0;
+ st->reserved = 0;
+ }
+#endif
+}
+
static int get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
u8 req_type, unsigned int cpu, u16 *tag)
{
#ifdef CONFIG_ACPI
- struct pci_dev *rp;
- acpi_handle rp_acpi_handle;
union st_info info;
- u32 cpu_uid;
int ret;
- ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+ ret = tph_get_cpu_st_info(pdev, cpu, &info);
if (ret != 0)
return ret;
- rp = pcie_find_root_port(pdev);
- if (!rp || !rp->bus || !rp->bus->bridge)
- return -ENODEV;
-
- rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
-
- if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, &info) != AE_OK) {
- *tag = 0;
- return -EINVAL;
- }
-
*tag = tph_extract_tag(mem_type, req_type, &info);
pci_dbg(pdev, "get steering tag: mem_type=%s, req_type=%u, cpu=%d, tag=%#04x\n",
@@ -619,3 +647,58 @@ bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
return pdev->tph_ext_support;
}
EXPORT_SYMBOL(pcie_tph_supported);
+
+static ssize_t tph_cpu_st_read(struct file *filp, struct kobject *kobj,
+ const struct bin_attribute *bin_attr, char *buf,
+ loff_t off, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
+ size_t entry_sz = PCI_TPH_CPU_ST_ENTRY_SZ;
+ struct pci_tph_cpu_st st;
+ unsigned int target_cpu;
+ size_t copy_len;
+
+ if (off >= nr_cpu_ids * entry_sz || off % entry_sz != 0)
+ return 0;
+
+ target_cpu = off / entry_sz;
+ if (!cpu_possible(target_cpu))
+ return -ENODEV;
+
+ get_cpu_all_st(pdev, target_cpu, &st);
+
+ copy_len = min_t(size_t, entry_sz, count);
+ memcpy(buf, &st, copy_len);
+
+ return copy_len;
+}
+static BIN_ATTR(tph_cpu_st, 0400, tph_cpu_st_read, NULL, 0);
+
+static const struct bin_attribute *const tph_cpu_st_bin_attrs[] = {
+ &bin_attr_tph_cpu_st,
+ NULL,
+};
+
+static size_t tph_cpu_st_bin_size(struct kobject *kobj,
+ const struct bin_attribute *a, int n)
+{
+ return nr_cpu_ids * PCI_TPH_CPU_ST_ENTRY_SZ;
+}
+
+static umode_t tph_cpu_st_attr_is_visible(struct kobject *kobj,
+ const struct bin_attribute *a, int n)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (pcie_tph_supported(pdev, false))
+ return a->attr.mode;
+
+ return 0;
+}
+
+const struct attribute_group pcie_tph_cpu_st_attr_group = {
+ .bin_attrs = tph_cpu_st_bin_attrs,
+ .bin_size = tph_cpu_st_bin_size,
+ .is_bin_visible = tph_cpu_st_attr_is_visible,
+};
diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index 4f150028965d..d8aa9f8d47f6 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -46,4 +46,19 @@ enum pci_hotplug_event {
PCI_HOTPLUG_CARD_NOT_PRESENT,
};
+/*
+ * PCIe TPH sysfs binary entry for CPU-to-ST mapping
+ * Sysfs file: /sys/bus/pci/devices/<BDF>/tph_cpu_st
+ * Each entry is 8 bytes aligned, seek offset = cpu_id * PCI_TPH_CPU_ST_ENTRY_SZ
+ */
+struct pci_tph_cpu_st {
+ __u8 vm_st; /* Volatile Memory Steering Tag (1 byte) */
+ __u8 pm_st; /* Persistent Memory Steering Tag (1 byte) */
+ __u16 vm_xst; /* Volatile Memory Extended Steering Tag (2 bytes) */
+ __u16 pm_xst; /* Persistent Memory Extended Steering Tag (2 bytes) */
+ __u16 reserved; /* Padding to 8 bytes for aligned offset lookup */
+} __packed;
+
+#define PCI_TPH_CPU_ST_ENTRY_SZ sizeof(struct pci_tph_cpu_st)
+
#endif /* _UAPILINUX_PCI_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 09/12] vfio/pci: Hide TPH capability when TPH is unsupported
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (7 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:56 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
` (2 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Check the device negotiated TPH support status before parsing the TPH
extended capability. Return zero length to hide the capability from
userspace if TPH is disabled during topology negotiation.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_config.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a10ed733f0e3..5c6ab172df6c 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -22,6 +22,7 @@
#include <linux/fs.h>
#include <linux/pci.h>
+#include <linux/pci-tph.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/slab.h>
@@ -1450,6 +1451,8 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
byte &= PCI_DPA_CAP_SUBSTATE_MASK;
return PCI_DPA_BASE_SIZEOF + byte + 1;
case PCI_EXT_CAP_ID_TPH:
+ if (!pcie_tph_supported(pdev, false))
+ return 0;
ret = pci_read_config_dword(pdev, epos + PCI_TPH_CAP, &dword);
if (ret)
return pcibios_err_to_errno(ret);
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (8 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 10:55 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
2026-06-16 10:46 ` [PATCH v17 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Introduce module param enable_unsafe_tph to gate all TPH related features,
and add VFIO_DEVICE_FEATURE_TPH_ENABLE uapi together with per-device
tph_permit flag.
This is a preparatory implementation: only feature framework is added
for now, actual TPH_CTRL register permission control and steering tag
features (TPH_CPU_ST / TPH_ST_CONFIG) will be attached in subsequent
TPH capability virtualization commits.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci.c | 13 ++++++++++++-
drivers/vfio/pci/vfio_pci_config.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 25 ++++++++++++++++++++++++-
include/linux/vfio_pci_core.h | 4 +++-
include/uapi/linux/vfio.h | 7 +++++++
5 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..6d73668459cf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
module_param(disable_denylist, bool, 0444);
MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph;
+module_param(enable_unsafe_tph, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
+#endif
+
static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
{
switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
{
int ret;
bool is_disable_vga = true;
+ bool is_enable_unsafe_tph = false;
#ifdef CONFIG_VFIO_PCI_VGA
is_disable_vga = disable_vga;
#endif
+#ifdef CONFIG_PCIE_TPH
+ is_enable_unsafe_tph = enable_unsafe_tph;
+#endif
- vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+ vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+ is_enable_unsafe_tph);
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 5c6ab172df6c..251d3ec7fdd4 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1783,6 +1783,7 @@ int vfio_config_init(struct vfio_pci_core_device *vdev)
goto out;
vdev->bardirty = true;
+ vdev->tph_permit = false;
/*
* XXX can we just pci_load_saved_state/pci_restore_state?
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 050e7542952e..d5e534dd5829 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -41,6 +41,7 @@
static bool nointxmask;
static bool disable_vga;
static bool disable_idle_d3;
+static bool enable_unsafe_tph;
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
@@ -1551,6 +1552,24 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
+ u32 flags, size_t argsz)
+{
+ int ret;
+
+ if (!enable_unsafe_tph)
+ return -EOPNOTSUPP;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
+ if (ret <= 0)
+ return ret;
+
+ if (!vdev->tph_permit)
+ vdev->tph_permit = 1;
+
+ return 0;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1569,6 +1588,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_TPH_ENABLE:
+ return vfio_pci_core_feature_tph_enable(vdev, flags, argsz);
default:
return -ENOTTY;
}
@@ -2605,11 +2626,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
}
void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3)
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph)
{
nointxmask = is_nointxmask;
disable_vga = is_disable_vga;
disable_idle_d3 = is_disable_idle_d3;
+ enable_unsafe_tph = is_enable_unsafe_tph;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 89165b769e5c..a946b35e6b85 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ struct vfio_pci_core_device {
bool needs_pm_restore:1;
bool pm_intx_masked:1;
bool pm_runtime_engaged:1;
+ bool tph_permit;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
int ioeventfds_nr;
@@ -157,7 +158,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3);
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..e5a4d1d7091b 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,13 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/*
+ * Device-level opt-in for TPH (Transaction Processing Hints) support.
+ * When set, allows access to TPH_CPU_ST and TPH_ST_CONFIG features.
+ * Requires global enable_unsafe_tph module parameter to be enabled.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ENABLE 13
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (9 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 11:05 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add a new VFIO device feature VFIO_DEVICE_FEATURE_TPH_ST_CONFIG to allow
userspace to configure PCIe TPH Steering Tag table entries. This interface
supports only configuration writes, read operations are not permitted.
Implement shadow ST table to cache entries, paired with per-device mutex
for concurrent access protection. Batch write failure triggers entry
rollback to guarantee hardware and shadow table consistency.
The feature is double gated:
1. Global enable_unsafe_tph module parameter must be enabled;
2. Userspace needs to firstly SET VFIO_DEVICE_FEATURE_TPH_ENABLE
to set per-device tph_permit flag before using TPH_CPU_CONFIG.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_core.c | 123 +++++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 2 +
include/uapi/linux/vfio.h | 22 ++++++
3 files changed, 147 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d5e534dd5829..0f602faeaef3 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <linux/iommufd.h>
#include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
#if IS_ENABLED(CONFIG_EEH)
#include <asm/eeh.h>
#endif
@@ -529,6 +530,50 @@ static const struct dev_pm_ops vfio_pci_core_pm_ops = {
NULL)
};
+static int vfio_pci_tph_st_shadow_size(struct vfio_pci_core_device *vdev)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ u32 loc = pcie_tph_get_st_table_loc(pdev);
+ int ret;
+
+ if (loc == PCI_TPH_LOC_CAP) {
+ return pcie_tph_get_st_table_size(pdev);
+ } else if (loc == PCI_TPH_LOC_MSIX) {
+ ret = pci_msix_vec_count(pdev);
+ if (ret < 0)
+ return 0;
+ return ret;
+ } else {
+ return 0;
+ }
+}
+
+static int vfio_pci_tph_init(struct vfio_pci_core_device *vdev)
+{
+ vdev->tph_st_entries = 0;
+ vdev->tph_st_shadow = NULL;
+
+ if (!enable_unsafe_tph)
+ return 0;
+
+ vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
+ if (vdev->tph_st_entries) {
+ vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
+ GFP_KERNEL);
+ if (!vdev->tph_st_shadow)
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void vfio_pci_tph_deinit(struct vfio_pci_core_device *vdev)
+{
+ kfree(vdev->tph_st_shadow);
+ vdev->tph_st_shadow = NULL;
+ vdev->tph_st_entries = 0;
+}
+
int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
@@ -555,6 +600,11 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
goto out_disable_device;
vdev->reset_works = !ret;
+
+ ret = vfio_pci_tph_init(vdev);
+ if (ret)
+ goto out_disable_device;
+
pci_save_state(pdev);
vdev->pci_saved_state = pci_store_saved_state(pdev);
if (!vdev->pci_saved_state)
@@ -612,6 +662,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
out_free_state:
kfree(vdev->pci_saved_state);
vdev->pci_saved_state = NULL;
+ vfio_pci_tph_deinit(vdev);
out_disable_device:
pci_disable_device(pdev);
out_power:
@@ -680,6 +731,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
kfree(vdev->region);
vdev->region = NULL; /* don't krealloc a freed pointer */
+ vfio_pci_tph_deinit(vdev);
vfio_config_free(vdev);
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
@@ -1570,6 +1622,74 @@ static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_tph_st_config(
+ struct vfio_pci_core_device *vdev,
+ u32 flags,
+ struct vfio_device_feature_tph_st_config __user *arg,
+ size_t argsz)
+{
+ struct vfio_device_feature_tph_st_config config;
+ struct pci_dev *pdev = vdev->pdev;
+ void __user *uptr;
+ int i, idx, ret;
+ size_t sz;
+ u16 *sts;
+
+ if (!vdev->tph_permit || !vdev->tph_st_shadow)
+ return -EOPNOTSUPP;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(config));
+ if (ret <= 0)
+ return ret;
+
+ if (copy_from_user(&config, arg, sizeof(config)))
+ return -EFAULT;
+
+ if (config.count == 0 || config.reserved != 0 ||
+ config.index >= vdev->tph_st_entries ||
+ config.count > vdev->tph_st_entries - config.index)
+ return -EINVAL;
+
+ uptr = u64_to_user_ptr(config.data_uptr);
+ sts = memdup_array_user(uptr, config.count, sizeof(u16));
+ sz = config.count * sizeof(u16);
+ if (IS_ERR(sts))
+ return PTR_ERR(sts);
+
+ down_write(&vdev->memory_lock);
+ ret = vfio_pci_set_power_state(vdev, PCI_D0);
+ if (ret)
+ goto out_unlock_memory;
+
+ if (pcie_tph_enabled_req_type(pdev) == PCI_TPH_REQ_DISABLE)
+ goto update_shadow;
+
+ for (i = 0; i < config.count; i++) {
+ idx = config.index + i;
+ ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
+ if (ret)
+ goto rollback;
+ }
+
+update_shadow:
+ memcpy(&vdev->tph_st_shadow[config.index], sts, sz);
+ ret = 0;
+ goto out_unlock_memory;
+
+rollback:
+ while (i-- > 0) {
+ idx = config.index + i;
+ pcie_tph_set_st_entry(pdev, idx, vdev->tph_st_shadow[idx]);
+ }
+
+out_unlock_memory:
+ up_write(&vdev->memory_lock);
+
+ kfree(sts);
+ return ret;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1590,6 +1710,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_TPH_ENABLE:
return vfio_pci_core_feature_tph_enable(vdev, flags, argsz);
+ case VFIO_DEVICE_FEATURE_TPH_ST_CONFIG:
+ return vfio_pci_core_feature_tph_st_config(vdev, flags,
+ arg, argsz);
default:
return -ENOTTY;
}
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a946b35e6b85..4f20d5a1d557 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -143,6 +143,8 @@ struct vfio_pci_core_device {
struct notifier_block nb;
struct rw_semaphore memory_lock;
struct list_head dmabufs;
+ u16 *tph_st_shadow;
+ u16 tph_st_entries;
};
enum vfio_pci_io_width {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e5a4d1d7091b..61079594a91f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1541,6 +1541,28 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_TPH_ENABLE 13
+/**
+ * VFIO_DEVICE_FEATURE_TPH_ST_CONFIG - Configure PCIe TPH Steering Tag entries
+ *
+ * Provides userspace interface to configure PCIe TPH ST table entries.
+ *
+ * @index: Start entry offset within ST table
+ * @count: Number of consecutive entries to configure
+ * @data_uptr: Userspace data buffer for 16-bit raw ST values
+ *
+ * This feature requires two preconditions:
+ * 1. Global enable_unsafe_tph module parameter is enabled;
+ * 2. VFIO_DEVICE_FEATURE_TPH_ENABLE has been SET on the device beforehand.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ST_CONFIG 14
+
+struct vfio_device_feature_tph_st_config {
+ __u16 index;
+ __u16 count;
+ __u32 reserved; /* Reserved for future use, must be zero */
+ __aligned_u64 data_uptr;
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v17 12/12] vfio/pci: Virtualize PCIe TPH capability registers
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (10 preceding siblings ...)
2026-06-16 10:46 ` [PATCH v17 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
@ 2026-06-16 10:46 ` Chengwen Feng
2026-06-16 11:03 ` sashiko-bot
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-16 10:46 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Virtualize TPH extended capability config space registers:
- Original TPH capability was fully read-only; now split permission:
TPH_CAP header remains read-only, TPH_CTRL register supports write to
toggle TPH requester enable mode.
- Block direct ST-table programming via config space write access: all ST
entry configuration is restricted to VFIO_DEVICE_FEATURE_TPH_ST_CONFIG
feature exclusively after userspace SET TPH_ENABLE opt-in.
- Backup original virtual config value and revert vconfig if hardware TPH
enable operation fails or invalid requester mode is configured.
- After TPH requester gets enabled via CTRL write, sync cached shadow ST
table down to physical hardware with memory_lock protection and PCI D0
power check.
Add vconfig masking to hide EXT_TPH capability bit if underlying hardware
does not support extended TPH via new vfio_tph_mask_ext_tph_bit helper.
Reset hardware TPH state on device open/close to eliminate cross-session
TPH configuration leakage between different VM lifecycles.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_config.c | 117 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 4 +
2 files changed, 121 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 251d3ec7fdd4..1fcb53803b64 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1086,6 +1086,118 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
return 0;
}
+/* Permissions for TPH extended capability */
+static int __init init_pci_ext_cap_tph_perm(struct perm_bits *perm)
+{
+ int i;
+
+ if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_TPH]))
+ return -ENOMEM;
+
+ p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+ p_setd(perm, PCI_TPH_CAP, ALL_VIRT, NO_WRITE);
+
+ p_setd(perm, PCI_TPH_CTRL, ALL_VIRT, ALL_WRITE);
+
+ /* Per PCI specification: There is an upper limit of 64 entries
+ * when the ST table is located in the TPH Requester Extended
+ * Capability structure.
+ * And the pci_ext_cap_length[PCI_EXT_CAP_ID_TPH] is 0xFF, so the
+ * following operation is fine.
+ */
+ for (i = 0; i < 64; i++)
+ p_setw(perm, PCI_TPH_BASE_SIZEOF + i * sizeof(u16),
+ (u16)ALL_VIRT, (u16)ALL_WRITE);
+
+ return 0;
+}
+
+static void vfio_tph_mask_ext_tph_bit(struct vfio_pci_core_device *vdev,
+ int pos)
+{
+ __le32 *vptr = (__le32 *)&vdev->vconfig[pos + PCI_TPH_CAP];
+ struct pci_dev *pdev = vdev->pdev;
+ u32 val;
+
+ if (!pcie_tph_supported(pdev, true)) {
+ val = le32_to_cpu(*vptr);
+ val &= ~PCI_TPH_CAP_EXT_TPH;
+ *vptr = cpu_to_le32(val);
+ }
+}
+
+static int vfio_find_cap_start(struct vfio_pci_core_device *vdev, int pos);
+static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
+ int count, struct perm_bits *perm,
+ int offset, __le32 val)
+{
+ int req_en_byte = PCI_TPH_CTRL + 1;
+ struct pci_dev *pdev = vdev->pdev;
+ __le32 org_val = 0;
+ bool extended;
+ u8 mode, req;
+ int i, ret;
+ u16 start;
+ u32 data;
+
+ if (!vdev->tph_permit)
+ return count;
+
+ down_write(&vdev->memory_lock);
+
+ /* Back up the original values in order rollback when fail */
+ if (offset <= req_en_byte && offset + count > req_en_byte)
+ vfio_default_config_read(vdev, pos, count, perm, offset,
+ &org_val);
+
+ ret = vfio_default_config_write(vdev, pos, count, perm, offset, val);
+ if (ret != count) {
+ up_write(&vdev->memory_lock);
+ return ret;
+ }
+
+ /* Skip if write range does not cover Requester Enable byte */
+ if (offset > req_en_byte || offset + count <= req_en_byte) {
+ up_write(&vdev->memory_lock);
+ return count;
+ }
+
+ ret = vfio_pci_set_power_state(vdev, PCI_D0);
+ if (ret) {
+ vfio_default_config_write(vdev, pos, count, perm, offset,
+ org_val);
+ up_write(&vdev->memory_lock);
+ return count;
+ }
+
+ start = vfio_find_cap_start(vdev, pos);
+ data = le32_to_cpu(*(__le32 *)&vdev->vconfig[start + PCI_TPH_CTRL]);
+ mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
+ req = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
+
+ if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
+ extended = !!(req == PCI_TPH_REQ_EXT_TPH);
+ ret = pcie_enable_tph_explicit(pdev, mode, extended);
+ if (!ret && vdev->tph_st_shadow) {
+ for (i = 0; i < vdev->tph_st_entries; i++)
+ pcie_tph_set_st_entry(pdev, i,
+ vdev->tph_st_shadow[i]);
+ }
+ if (ret)
+ vfio_default_config_write(vdev, pos, count, perm,
+ offset, org_val);
+ } else if (req == PCI_TPH_REQ_DISABLE) {
+ pcie_disable_tph(vdev->pdev);
+ } else {
+ vfio_default_config_write(vdev, pos, count, perm, offset,
+ org_val);
+ }
+
+ up_write(&vdev->memory_lock);
+
+ return count;
+}
+
/*
* Initialize the shared permission tables
*/
@@ -1101,6 +1213,7 @@ void vfio_pci_uninit_perm_bits(void)
free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+ free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_TPH]);
}
int __init vfio_pci_init_perm_bits(void)
@@ -1121,6 +1234,8 @@ int __init vfio_pci_init_perm_bits(void)
/* Extended capabilities */
ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+ ret |= init_pci_ext_cap_tph_perm(&ecap_perms[PCI_EXT_CAP_ID_TPH]);
+ ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_tph_config_write;
ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
@@ -1704,6 +1819,8 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
ret = vfio_fill_vconfig_bytes(vdev, epos, len);
if (ret)
return ret;
+ if (ecap == PCI_EXT_CAP_ID_TPH && !hidden)
+ vfio_tph_mask_ext_tph_bit(vdev, epos);
/*
* If we're just using this capability to anchor the list,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 0f602faeaef3..0bf49e36e64b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -601,6 +601,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
vdev->reset_works = !ret;
+ /* Reset TPH status on new user session */
+ pcie_disable_tph(vdev->pdev);
ret = vfio_pci_tph_init(vdev);
if (ret)
goto out_disable_device;
@@ -731,6 +733,8 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
kfree(vdev->region);
vdev->region = NULL; /* don't krealloc a freed pointer */
+ /* Reset TPH status on session exit */
+ pcie_disable_tph(vdev->pdev);
vfio_pci_tph_deinit(vdev);
vfio_config_free(vdev);
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v17 06/12] PCI/TPH: Expose the enabled TPH requester type
2026-06-16 10:46 ` [PATCH v17 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
@ 2026-06-16 10:51 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:51 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> From: Zhiping Zhang <zhipingz@meta.com>
>
> Add pcie_tph_enabled_req_type() so drivers can query the enabled TPH
> requester mode without reaching into pci_dev internals.
>
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=6
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes
2026-06-16 10:46 ` [PATCH v17 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
@ 2026-06-16 10:52 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:52 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
> Add new helper pcie_tph_supported() with want_ext parameter:
> - want_ext = false: Check if device has valid TPH capability;
> - want_ext = true: Check hardware Extended TPH support.
>
> This helper is prepared for follow-up VFIO TPH virtualization patches to
> uniformly query basic TPH existence and Extended TPH capability.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=7
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
2026-06-16 10:46 ` [PATCH v17 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
@ 2026-06-16 10:53 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:53 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
> Refactor pcie_enable_tph implementation: extract core logic into static
> internal enable_tph() helper accepting explicit requester type.
>
> - Preserve original pcie_enable_tph() unchanged as auto wrapper; it
> auto-selects EXT/standard TPH requester per device capability, existing
> bnxt/mlx5 callers require zero modification.
> - Add exported pcie_enable_tph_explicit() with bool 'extended' parameter
> for explicit STD/EXT selection, used by upcoming VFIO TPH support.
>
> Input validation for EXT_TPH availability is retained inside helper to
> reject invalid explicit EXT request if hardware does not support extended
> requester.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=4
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
2026-06-16 10:46 ` [PATCH v17 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
@ 2026-06-16 10:53 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:53 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> Refactor pcie_tph_get_cpu_st(): extract core logic into static internal
> get_cpu_st() helper accepting explicit requester type parameter.
>
> - Preserve original pcie_tph_get_cpu_st() unchanged as auto wrapper; it
> uses existing pdev->tph_req_type automatically, existing callers require
> no change.
> - Add exported pcie_tph_get_cpu_st_explicit() with bool 'extended'
> parameter for manual STD/EXT requester selection, consumed by upcoming
> VFIO TPH code.
> - Add capability check: reject explicit EXT request when device does not
> support extended TPH requester.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=5
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 03/12] PCI/TPH: Cache TPH requester capability at probe time
2026-06-16 10:46 ` [PATCH v17 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
@ 2026-06-16 10:55 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:55 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> Calculate the negotiated TPH requester type from device and root port
> capabilities once in pci_tph_init().
>
> Add tph_ext_support flag to cache whether the device is allowed to
> issue Extended TPH requests after topology negotiation. If the final
> requester type is disabled, clear TPH capability to prevent usage.
>
> Simplify pcie_enable_tph() by using the cached requester capability
> instead of recalculating every time.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=3
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
2026-06-16 10:46 ` [PATCH v17 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
@ 2026-06-16 10:55 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:55 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
> Split tph_enabled from shared pci_dev bitfield into spare bit of tph_cap's
> u16: tph_cap is immutable post-enumeration (15 bits for offset), remaining
> 1 bit stores tph_enabled. Removes cross-bitfield concurrent write hazards
> highlighted by Sashiko after VFIO TPH exposure. No functional changes.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=2
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
2026-06-16 10:46 ` [PATCH v17 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
@ 2026-06-16 10:55 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:55 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> Introduce module param enable_unsafe_tph to gate all TPH related features,
> and add VFIO_DEVICE_FEATURE_TPH_ENABLE uapi together with per-device
> tph_permit flag.
>
> This is a preparatory implementation: only feature framework is added
> for now, actual TPH_CTRL register permission control and steering tag
> features (TPH_CPU_ST / TPH_ST_CONFIG) will be attached in subsequent
> TPH capability virtualization commits.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=10
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 09/12] vfio/pci: Hide TPH capability when TPH is unsupported
2026-06-16 10:46 ` [PATCH v17 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
@ 2026-06-16 10:56 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 10:56 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] Missing bounds check on dynamically-sized PCIe capabilities allows a malicious device to cause a kernel heap buffer overflow.
--
commit 2b14d080b02da7dcb99b160bf40eeb6f9e38b928
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Hide TPH capability when TPH is unsupported
This commit checks the device negotiated TPH support status before parsing
the TPH extended capability. It returns zero length to hide the capability
from userspace if TPH is disabled during topology negotiation.
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3a..5c6ab172df6c0 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
[ ... ]
> @@ -1450,6 +1451,8 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
> byte &= PCI_DPA_CAP_SUBSTATE_MASK;
> return PCI_DPA_BASE_SIZEOF + byte + 1;
> case PCI_EXT_CAP_ID_TPH:
> + if (!pcie_tph_supported(pdev, false))
> + return 0;
> ret = pci_read_config_dword(pdev, epos + PCI_TPH_CAP, &dword);
[Severity: Critical]
This is a pre-existing issue, but does this function safely bound the returned
capability length?
Looking at the rest of the PCI_EXT_CAP_ID_TPH case in vfio_ext_cap_len()
in drivers/vfio/pci/vfio_pci_config.c:
if ((dword & PCI_TPH_CAP_LOC_MASK) == PCI_TPH_LOC_CAP) {
int sts;
sts = dword & PCI_TPH_CAP_ST_MASK;
sts >>= PCI_TPH_CAP_ST_SHIFT;
return PCI_TPH_BASE_SIZEOF + (sts * 2) + 2;
}
Since the ST table size is read directly from the attacker-controlled device
config space, the returned length could be up to 4108 bytes.
Later, in vfio_ecap_init() in the same file, this unbounded length is used
to write into heap buffers:
memset(map + epos, ecap, len);
ret = vfio_fill_vconfig_bytes(vdev, epos, len);
Since the pci_config_map and vconfig arrays are allocated to pdev->cfg_size
(typically 4096 bytes), could a malicious PCIe endpoint or virtual device
provide an oversized TPH capability size to cause a kernel heap buffer
overflow, as there is no check ensuring epos + len fits within pdev->cfg_size?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=9
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-06-16 10:46 ` [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-06-16 11:00 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 11:00 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Surprise removal (PCI disconnect) or a malicious PCI endpoint can trigger a heap buffer overflow in `pci_save_tph_state()` and an OOB read in `pci_restore_tph_state()` due to a TOCTOU evaluation of the ST table size.
--
commit fe1abb4bbf50a2222b456026a8b0caa06ea99438
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 91145e8d9d951..bef3a55539c46 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -185,9 +188,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
>
> /* Check ST table location first */
> loc = pcie_tph_get_st_table_loc(pdev);
> -
> - /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
> - loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
> if (loc != PCI_TPH_LOC_CAP)
> return 0;
[Severity: High]
This is a pre-existing issue, but does this function have a TOCTOU race
that could cause a buffer overflow during a device disconnect?
pcie_tph_get_st_table_loc() indirectly reads the PCI_TPH_CAP register.
Shortly after, pcie_tph_get_st_table_size() reads it again:
drivers/pci/tph.c:pcie_tph_get_st_table_size() {
...
loc = pcie_tph_get_st_table_loc(pdev);
if (loc != PCI_TPH_LOC_CAP)
return 0;
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
return FIELD_GET(PCI_TPH_CAP_ST_MASK, reg) + 1;
}
If a device disconnects exactly between these two reads, or if a malicious
endpoint provides PCI_TPH_LOC_CAP for the first read but ~0 for the second
read, the FIELD_GET() calculation evaluates to 2048.
Since pci_save_tph_state() iterates based on this dynamically queried size,
could it write up to 4096 bytes of disconnect error data into the
save_state->cap.data buffer, overflowing it?
drivers/pci/tph.c:pci_save_tph_state() {
...
num_entries = pcie_tph_get_st_table_size(pdev);
for (i = 0; i < num_entries; i++) {
pci_read_config_word(pdev, pdev->tph_cap + offset,
st_entry++);
...
}
Similarly, could pci_restore_tph_state() be vulnerable to an out-of-bounds
heap read if the table size evaluates to 2048 on disconnect?
drivers/pci/tph.c:pci_restore_tph_state() {
...
num_entries = pcie_tph_get_st_table_size(pdev);
for (i = 0; i < num_entries; i++) {
pci_write_config_word(pdev, pdev->tph_cap + offset,
*st_entry++);
...
}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=1
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
2026-06-16 10:46 ` [PATCH v17 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
@ 2026-06-16 11:00 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 11:00 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [Medium] sysfs binary read callback incorrectly returns 0 (EOF) for unaligned offsets, breaking partial reads.
- [Medium] Returning -ENODEV for impossible CPUs aborts sequential file reads on sparse CPU topologies.
Pre-existing issues:
- [High] Missing length validation when parsing ACPI _DSM output buffer leads to an out-of-bounds read.
- [High] ACPI _DSM buffer parsing uses an unaligned 64-bit cast and C bitfields, breaking on strict-alignment and big-endian architectures.
--
commit 3d85ff02b28aea880afdba0d09e73076f825fdb4
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
Add per-device sysfs binary attribute tph_cpu_st to expose ACPI DSM CPU
to steering-tag data to userspace, resolving the concern that VFIO should
not host CPU-to-ST translation interfaces.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 95280aab4fb51..aca5093e81525 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -130,6 +130,29 @@ static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
>
> return AE_OK;
> }
> +
> +static int tph_get_cpu_st_info(struct pci_dev *pdev, unsigned int cpu,
> + union st_info *info)
> +{
> + acpi_handle rp_acpi_handle;
> + struct pci_dev *rp;
> + u32 cpu_uid;
> + int ret;
> +
> + ret = acpi_get_cpu_uid(cpu, &cpu_uid);
> + if (ret != 0)
> + return ret;
> +
> + rp = pcie_find_root_port(pdev);
> + if (!rp || !rp->bus || !rp->bus->bridge)
> + return -ENODEV;
> +
> + rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
> + if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, info) != AE_OK)
[Severity: High]
This isn't a bug introduced by this patch, but tph_invoke_dsm() appears to
have an out-of-bounds read. When parsing the ACPI _DSM output, it
unconditionally dereferences the buffer pointer without validating the
length:
tph_invoke_dsm() {
...
if (out_obj->type != ACPI_TYPE_BUFFER) {
ACPI_FREE(out_obj);
return AE_ERROR;
}
st_out->value = *((u64 *)(out_obj->buffer.pointer));
...
}
If a buggy or malicious ACPI firmware returns a buffer smaller than 8 bytes,
could this lead to an out-of-bounds memory read?
This is also a pre-existing issue, but does casting the byte buffer to a u64*
and dereferencing it break on strict-alignment architectures?
Additionally, union st_info uses architecture-dependent C bitfields to
extract data from this 64-bit value. Could this extract incorrect bits on
big-endian architectures, resulting in corruption of the hardware steering
tags?
[ ... ]
> @@ -619,3 +647,58 @@ bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
> return pdev->tph_ext_support;
> }
> EXPORT_SYMBOL(pcie_tph_supported);
> +
> +static ssize_t tph_cpu_st_read(struct file *filp, struct kobject *kobj,
> + const struct bin_attribute *bin_attr, char *buf,
> + loff_t off, size_t count)
> +{
> + struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> + size_t entry_sz = PCI_TPH_CPU_ST_ENTRY_SZ;
> + struct pci_tph_cpu_st st;
> + unsigned int target_cpu;
> + size_t copy_len;
> +
> + if (off >= nr_cpu_ids * entry_sz || off % entry_sz != 0)
> + return 0;
[Severity: Medium]
Does returning 0 here prematurely signal an End-Of-File (EOF) for unaligned
reads? If a userspace application attempts a short read or reads at an
unaligned offset, should this reject the request with -EINVAL or properly
serve the unaligned data instead of truncating the stream?
> +
> + target_cpu = off / entry_sz;
> + if (!cpu_possible(target_cpu))
> + return -ENODEV;
[Severity: Medium]
Could returning -ENODEV here abort sequential file reads on systems with
sparse CPU topologies? If sequential readers (e.g., using cat or hexdump)
hit this error at the first missing CPU, they might fail to retrieve valid
steering tags for any CPUs located after the topology gap. Would it be
better to skip the missing CPU by returning a zeroed entry instead?
> +
> + get_cpu_all_st(pdev, target_cpu, &st);
> +
> + copy_len = min_t(size_t, entry_sz, count);
> + memcpy(buf, &st, copy_len);
> +
> + return copy_len;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=8
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 12/12] vfio/pci: Virtualize PCIe TPH capability registers
2026-06-16 10:46 ` [PATCH v17 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
@ 2026-06-16 11:03 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 11:03 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Silent state desynchronization between hardware TPH and virtual config due to missing error handling for ST entry updates.
--
commit 22993c676c19c404abf1528141d0dd0e78a51f99
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Virtualize PCIe TPH capability registers
This commit virtualizes TPH extended capability config space registers.
It blocks direct ST-table programming via config space, adds vconfig
masking to hide EXT_TPH, and syncs the shadow ST table to physical hardware
when the requester is enabled.
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
[ ... ]
> +static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> + int count, struct perm_bits *perm,
> + int offset, __le32 val)
> +{
[ ... ]
> + if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
> + extended = !!(req == PCI_TPH_REQ_EXT_TPH);
> + ret = pcie_enable_tph_explicit(pdev, mode, extended);
> + if (!ret && vdev->tph_st_shadow) {
> + for (i = 0; i < vdev->tph_st_entries; i++)
> + pcie_tph_set_st_entry(pdev, i,
> + vdev->tph_st_shadow[i]);
> + }
[Severity: High]
If pcie_tph_set_st_entry() fails (for instance, due to missing MSI-X
allocation), does it leave the hardware and virtual configuration out of
sync?
The loop ignores the return value from pcie_tph_set_st_entry(), meaning
ret will remain 0 even if the ST entry update fails. Since a failure here
can forcefully disable TPH in the physical hardware via pcie_disable_tph(),
the subsequent if (ret) config rollback will be skipped. This would leave
the guest OS believing TPH is enabled while the physical hardware is actually
disabled.
Additionally, in the VFIO_DEVICE_FEATURE_TPH_ST_CONFIG ioctl handler
(vfio_pci_core_feature_tph_st_config), the rollback loop tries to restore
entries using pcie_tph_set_st_entry(). If the initial failure unconditionally
disabled TPH on the physical device, will these rollback calls fail
immediately, leaving the recovery broken?
> + if (ret)
> + vfio_default_config_write(vdev, pos, count, perm,
> + offset, org_val);
> + } else if (req == PCI_TPH_REQ_DISABLE) {
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=12
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v17 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
2026-06-16 10:46 ` [PATCH v17 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
@ 2026-06-16 11:05 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-16 11:05 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Improper PM Runtime handling and missing dmabuf restoration in TPH ST configuration ioctl.
- [Medium] Missing __GFP_ACCOUNT in tph_st_shadow allocation.
- [Medium] The `tph_permit` flag is leaked across VFIO user sessions.
- [Medium] Feature PROBE for `TPH_ST_CONFIG` is broken due to an early dynamic permission check.
- [Medium] The TPH ST shadow table is not cleared when the VFIO device is reset, leading to stale data restoration.
--
commit b8ec8f816e02c6def8f7cc759648796f130131e0
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
This commit adds a new VFIO device feature VFIO_DEVICE_FEATURE_TPH_ST_CONFIG
to allow userspace to configure PCIe TPH Steering Tag table entries. It
implements a shadow ST table to cache entries, paired with a per-device mutex
for concurrent access protection.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index ae05343701f76..cf8f6de025c9a 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -495,6 +496,50 @@ static const struct dev_pm_ops vfio_pci_core_pm_ops = {
[ ... ]
> +static int vfio_pci_tph_init(struct vfio_pci_core_device *vdev)
> +{
> + vdev->tph_st_entries = 0;
> + vdev->tph_st_shadow = NULL;
> +
> + if (!enable_unsafe_tph)
> + return 0;
> +
> + vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
> + if (vdev->tph_st_entries) {
> + vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
> + GFP_KERNEL);
[Severity: Medium]
Since this per-device allocation is triggered by a userspace container
session, should it use GFP_KERNEL_ACCOUNT instead of GFP_KERNEL? Omitting
the account flag might allow a container to allocate unaccounted kernel
memory and bypass its memory cgroup limits.
> + if (!vdev->tph_st_shadow)
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static void vfio_pci_tph_deinit(struct vfio_pci_core_device *vdev)
> +{
> + kfree(vdev->tph_st_shadow);
> + vdev->tph_st_shadow = NULL;
> + vdev->tph_st_entries = 0;
[Severity: Medium]
Should vdev->tph_permit be reset to 0 here during session exit?
Because vdev persists across open/close cycles until unbound from the driver,
a subsequent user session might inherit the tph_permit == 1 state,
bypassing the API requirement to explicitly enable TPH.
> +}
[ ... ]
> @@ -1535,6 +1587,74 @@ static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
[ ... ]
> +static int vfio_pci_core_feature_tph_st_config(
> + struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_tph_st_config __user *arg,
> + size_t argsz)
> +{
> + struct vfio_device_feature_tph_st_config config;
> + struct pci_dev *pdev = vdev->pdev;
> + void __user *uptr;
> + int i, idx, ret;
> + size_t sz;
> + u16 *sts;
> +
> + if (!vdev->tph_permit || !vdev->tph_st_shadow)
> + return -EOPNOTSUPP;
[Severity: Medium]
Will this early check break feature discovery?
VFIO allows discovering supported features using the
VFIO_DEVICE_FEATURE_PROBE flag. By checking !vdev->tph_permit before
calling vfio_check_feature(), any early capability PROBE request returns
-EOPNOTSUPP. This might make userspace incorrectly assume the kernel does
not support configuring the feature.
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
> + sizeof(config));
> + if (ret <= 0)
> + return ret;
[ ... ]
> + uptr = u64_to_user_ptr(config.data_uptr);
> + sts = memdup_array_user(uptr, config.count, sizeof(u16));
> + sz = config.count * sizeof(u16);
> + if (IS_ERR(sts))
> + return PTR_ERR(sts);
> +
> + down_write(&vdev->memory_lock);
> + ret = vfio_pci_set_power_state(vdev, PCI_D0);
[Severity: High]
Does this bypass the driver core's power management and miss dmabuf
restoration?
If the parent bridge is in a suspended state (e.g., D3cold), calling
vfio_pci_set_power_state() without first calling pm_runtime_resume_and_get()
can lead to bus errors (Unsupported Request) or system crashes.
Additionally, if the device was previously in a suspended state, does this
path also need to call vfio_pci_dma_buf_move(vdev, false) to restore dmabufs
and prevent permanently breaking dmabuf tracking?
> + if (ret)
> + goto out_unlock_memory;
> +
> + if (pcie_tph_enabled_req_type(pdev) == PCI_TPH_REQ_DISABLE)
> + goto update_shadow;
[ ... ]
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 0b411d2136ed5..d5c6f73747ad9 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -143,6 +143,8 @@ struct vfio_pci_core_device {
> struct notifier_block nb;
> struct rw_semaphore memory_lock;
> struct list_head dmabufs;
> + u16 *tph_st_shadow;
[Severity: Medium]
When a VFIO device is reset (e.g., via the VFIO_DEVICE_RESET ioctl or FLR),
its hardware ST table typically reverts to its default (zeroed) state.
Should the reset path (vfio_pci_ioctl_reset()) clear vdev->tph_st_shadow?
If not, a subsequent partial ST_CONFIG write or TPH enablement via config
space might incorrectly program the hardware with stale, pre-reset ST values
cached in the shadow table.
> + u16 tph_st_entries;
> };
>
> enum vfio_pci_io_width {
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616104621.41915-1-fengchengwen@huawei.com?part=11
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-06-16 11:05 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 10:46 [PATCH v17 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-16 10:46 ` [PATCH v17 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-06-16 11:00 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
2026-06-16 10:55 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
2026-06-16 10:55 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
2026-06-16 10:53 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
2026-06-16 10:53 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
2026-06-16 10:51 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
2026-06-16 10:52 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
2026-06-16 11:00 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
2026-06-16 10:56 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
2026-06-16 10:55 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
2026-06-16 11:05 ` sashiko-bot
2026-06-16 10:46 ` [PATCH v17 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
2026-06-16 11:03 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox