* [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support
@ 2026-06-23 8:50 Chengwen Feng
2026-06-23 8:50 ` [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
` (11 more replies)
0 siblings, 12 replies; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
This patchset enables full userspace configurable PCIe TPH support for
VFIO, which brings performance benefits for userspace workloads such as
DPDK and SPDK.
Currently VFIO only exposes read-only TPH capability registers to
userspace, while all write operations are silently discarded. This
prevents userspace from enabling and configuring TPH, limiting performance
optimization opportunities.
Per PCIe spec 7.5.3.15: TPH Completer support is applicable to Root Ports
and Endpoints, allowing Steering Tags to target host CPUs or peer devices
for P2P transactions.
TPH usage model can be divided into three fundamental parts:
1. Retrieve Steering Tag:
- Tags targeting host CPUs are obtained via platform methods (ACPI _DSM)
wrapped in pcie_tph_get_cpu_st(). Userspace requires a generic
interface to query these CPU-associated ST values.
- Tags targeting peer devices are managed by userspace drivers.
2. Program Steering Tag table:
- For devices with standard ST table structures (in capability space or
MSI-X table), userspace needs a unified interface to configure ST
entries.
- Devices without standard ST tables are handled by userspace itself.
3. Toggle device TPH Requester enable/disable state.
To support the above scenarios, this series extends PCI and VFIO with
complete TPH virtualization features:
- [*PCI*] Support sysfs binary file [under root-port which support TPH
completor and _DSM] to export CPU to steering-tag mapping, so that
userspace could retrieve CPU's ST by read.
- [*VFIO*] New device feature TPH_ST_CONFIG: Batch configure interface for
device ST table entries, with shadow cache and atomic rollback support.
- [*VFIO*] Full TPH capability register virtualization: allow userspace to
toggle TPH Requester state via TPH_CTRL register writes.
To guarantee isolation and security, this patchset adopts a two-level
safety gate design with careful ABI considerations:
1. Global unsafe gate:
TPH caching behavior may cross isolation domains and impact shared
platform resources. A new module parameter `enable_unsafe_tph` is
introduced (default off) to globally gate all VFIO TPH functionalities.
2. Per-device opt-in gate:
To preserve strict ABI compatibility and avoid unexpected hardware
state changes for existing users, a new VFIO device feature TPH_ENABLE
is added. TPH capabilities are only available after userspace explicitly
enables it per-device.
Because Kernel PCI TPH implementation requires TPH Requester to be enabled
before programming ST entries. To support userspace configuring ST table
in arbitrary order, a shadow ST table is introduced to buffer ST writes
before TPH is enabled. All cached entries are flushed to hardware when
TPH Requester turns on. This also provides atomic batch rollback capability
for reliable configuration.
The patchset is split into two logical parts: the first eight patches fix
and refactor core PCI/TPH kernel code to export required helper interfaces
and CPU to ST mapping, the remaining four patches implement corresponding
VFIO TPH virtualization layer step by step.
Based on earlier RFC work by Wathsala Vithanage
---
v18-resend:
- Fix apply failed only
v18:
- Address all comments of [08/12] commit from Alex
- Add document in sysfs-bus-pci
- Place the new field at the root port
- Add new filed only when root port support TPH completer and _DSM
method
- Support random offset read, return zero if cpu is offline
- Zero-initialize the buffer rather than memset and reserved = 0
- Fix git am fail for [10/12] commit
- Fix folloing Sashiko review comments of [11/12] commit:
- Add __GFP_ACCOUNT in thp_st_shadow allocation
- Move reset tph_permit in vfio_pci_tph_init/deinit
- Refine feature PROBE for `TPH_ST_CONFIG`, make sure probe OK when
enable_unsafe_tph is set
- Add commit-log about why TPH ST shadow table is not cleared when the
VFIO device is reset
-
v17:
- Move retrieve CPU to ST mapping logic from VFIO to PCI subsystem
- Remove tph_lock which seemed not use
- Fix Sashiko review comment of v16:
- tph_permit is bit field which has concurrent problem
- Fix tph_permit not reset when re-open device
- TPH capability virtualization write has concurrent, don't rollback
original value problems.
- Missing virtualization of TPH Capability Header leaks the physical
Next Capability Pointer to the guest
Chengwen Feng (11):
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
PCI/TPH: Cache TPH requester capability at probe time
PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
PCI/TPH: Add pcie_tph_supported() helper to check TPH capability
attributes
PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
vfio/pci: Hide TPH capability when TPH is unsupported
vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
vfio/pci: Virtualize PCIe TPH capability registers
Zhiping Zhang (1):
PCI/TPH: Expose the enabled TPH requester type
Documentation/ABI/testing/sysfs-bus-pci | 24 ++
drivers/pci/pci-sysfs.c | 3 +
drivers/pci/pci.h | 4 +
drivers/pci/tph.c | 361 +++++++++++++++++++-----
drivers/vfio/pci/vfio_pci.c | 13 +-
drivers/vfio/pci/vfio_pci_config.c | 120 ++++++++
drivers/vfio/pci/vfio_pci_core.c | 157 ++++++++++-
include/linux/pci-tph.h | 22 ++
include/linux/pci.h | 6 +-
include/linux/vfio_pci_core.h | 6 +-
include/uapi/linux/pci.h | 16 ++
include/uapi/linux/vfio.h | 29 ++
12 files changed, 684 insertions(+), 77 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 8:59 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
` (10 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.
This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.
This doesn't make a difference to mlx5_st_create(), the lone external
caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.
Also add tph_cap validation for pcie_tph_get_st_table_loc() to prevent
invalid PCI configuration space access when TPH is not supported. Add stub
functions for pcie_tph_get_st_table_size() and pcie_tph_get_st_table_loc()
when !CONFIG_PCIE_TPH.
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 12 +++++-------
include/linux/pci-tph.h | 5 +++++
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 655ffd60e62f..489487158b99 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -165,11 +165,14 @@ static u8 get_st_modes(struct pci_dev *pdev)
*/
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{
- u32 reg;
+ u32 reg = 0;
+
+ if (!pdev->tph_cap)
+ return PCI_TPH_LOC_NONE;
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+ return reg & PCI_TPH_CAP_LOC_MASK;
}
EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
@@ -184,9 +187,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
/* Check ST table location first */
loc = pcie_tph_get_st_table_loc(pdev);
-
- /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
if (loc != PCI_TPH_LOC_CAP)
return 0;
@@ -315,8 +315,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
loc = pcie_tph_get_st_table_loc(pdev);
- /* Convert loc to match with PCI_TPH_LOC_* */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
switch (loc) {
case PCI_TPH_LOC_MSIX:
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..6f02b020d7d7 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -8,6 +8,7 @@
*/
#ifndef LINUX_PCI_TPH_H
#define LINUX_PCI_TPH_H
+#include <linux/pci.h>
/*
* According to the ECN for PCI Firmware Spec, Steering Tag can be different
@@ -41,6 +42,10 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return PCI_TPH_LOC_NONE; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-23 8:50 ` [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:00 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
` (9 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Split tph_enabled from shared pci_dev bitfield into spare bit of tph_cap's
u16: tph_cap is immutable post-enumeration (15 bits for offset), remaining
1 bit stores tph_enabled. Removes cross-bitfield concurrent write hazards
highlighted by Sashiko after VFIO TPH exposure. No functional changes.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
include/linux/pci.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ebb5b9d76360..fac439ce53b0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -472,7 +472,6 @@ struct pci_dev {
unsigned int ats_enabled:1; /* Address Translation Svc */
unsigned int pasid_enabled:1; /* Process Address Space ID */
unsigned int pri_enabled:1; /* Page Request Interface */
- unsigned int tph_enabled:1; /* TLP Processing Hints */
unsigned int fm_enabled:1; /* Flit Mode (segment captured) */
unsigned int is_managed:1; /* Managed via devres */
unsigned int is_msi_managed:1; /* MSI release via devres installed */
@@ -588,7 +587,8 @@ struct pci_dev {
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
#ifdef CONFIG_PCIE_TPH
- u16 tph_cap; /* TPH capability offset */
+ u16 tph_cap:15; /* TPH capability offset */
+ u16 tph_enabled:1; /* Whether TPH is enabled */
u8 tph_mode; /* TPH mode */
u8 tph_req_type; /* TPH requester type */
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 03/12] PCI/TPH: Cache TPH requester capability at probe time
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-23 8:50 ` [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-06-23 8:50 ` [PATCH RESEND v18 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:00 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
` (8 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Calculate the negotiated TPH requester type from device and root port
capabilities once in pci_tph_init().
Add tph_ext_support flag to cache whether the device is allowed to
issue Extended TPH requests after topology negotiation. If the final
requester type is disabled, clear TPH capability to prevent usage.
Simplify pcie_enable_tph() by using the cached requester capability
instead of recalculating every time.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 43 +++++++++++++++++++++++++------------------
include/linux/pci.h | 4 +++-
2 files changed, 28 insertions(+), 19 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 489487158b99..4097787ea98a 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -383,7 +383,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
{
u32 reg;
u8 dev_modes;
- u8 rp_req_type;
/* Honor "notph" kernel parameter */
if (pci_tph_disabled)
@@ -403,23 +402,8 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
pdev->tph_mode = mode;
- /* Get req_type supported by device and its Root Port */
- pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
- pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
- else
- pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
-
- /* Check if the device is behind a Root Port */
- if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
- rp_req_type = get_rp_completer_type(pdev);
-
- /* Final req_type is the smallest value of two */
- pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
- }
-
- if (pdev->tph_req_type == PCI_TPH_REQ_DISABLE)
- return -EINVAL;
+ pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
+ PCI_TPH_REQ_TPH_ONLY;
/* Write them into TPH control register */
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
@@ -506,13 +490,36 @@ void pci_no_tph(void)
void pci_tph_init(struct pci_dev *pdev)
{
+ u8 tph_req_type, rp_req_type;
int num_entries;
u32 save_size;
+ u32 reg = 0;
pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
if (!pdev->tph_cap)
return;
+ /* Get req_type supported by device and its Root Port */
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+ if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
+ tph_req_type = PCI_TPH_REQ_EXT_TPH;
+ else
+ tph_req_type = PCI_TPH_REQ_TPH_ONLY;
+
+ /* Check if the device is behind a Root Port */
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
+ rp_req_type = get_rp_completer_type(pdev);
+ /* Final req_type is the smallest value of two */
+ tph_req_type = min(tph_req_type, rp_req_type);
+ }
+
+ if (tph_req_type == PCI_TPH_REQ_DISABLE) {
+ pdev->tph_cap = 0;
+ return;
+ }
+
+ pdev->tph_ext_support = !!(tph_req_type == PCI_TPH_REQ_EXT_TPH);
+
num_entries = pcie_tph_get_st_table_size(pdev);
save_size = sizeof(u32) + num_entries * sizeof(u16);
pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index fac439ce53b0..7ec0726050fe 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -587,7 +587,9 @@ struct pci_dev {
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
#ifdef CONFIG_PCIE_TPH
- u16 tph_cap:15; /* TPH capability offset */
+ u16 tph_cap:14; /* TPH capability offset */
+ u16 tph_ext_support:1; /* Indicate whether Extended TPH
+ * requester is supported */
u16 tph_enabled:1; /* Whether TPH is enabled */
u8 tph_mode; /* TPH mode */
u8 tph_req_type; /* TPH requester type */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (2 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:04 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
` (7 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Refactor pcie_enable_tph implementation: extract core logic into static
internal enable_tph() helper accepting explicit requester type.
- Preserve original pcie_enable_tph() unchanged as auto wrapper; it
auto-selects EXT/standard TPH requester per device capability, existing
bnxt/mlx5 callers require zero modification.
- Add exported pcie_enable_tph_explicit() with bool 'extended' parameter
for explicit STD/EXT selection, used by upcoming VFIO TPH support.
Input validation for EXT_TPH availability is retained inside helper to
reject invalid explicit EXT request if hardware does not support extended
requester.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 68 +++++++++++++++++++++++++++++------------
include/linux/pci-tph.h | 4 +++
2 files changed, 52 insertions(+), 20 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 4097787ea98a..c22eb7f4b0bd 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -363,23 +363,7 @@ void pcie_disable_tph(struct pci_dev *pdev)
}
EXPORT_SYMBOL(pcie_disable_tph);
-/**
- * pcie_enable_tph - Enable TPH support for device using a specific ST mode
- * @pdev: PCI device
- * @mode: ST mode to enable. Current supported modes include:
- *
- * - PCI_TPH_ST_NS_MODE: NO ST Mode
- * - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode
- * - PCI_TPH_ST_DS_MODE: Device Specific Mode
- *
- * Check whether the mode is actually supported by the device before enabling
- * and return an error if not. Additionally determine what types of requests,
- * TPH or extended TPH, can be issued by the device based on its TPH requester
- * capability and the Root Port's completer capability.
- *
- * Return: 0 on success, otherwise negative value (-errno)
- */
-int pcie_enable_tph(struct pci_dev *pdev, int mode)
+static int enable_tph(struct pci_dev *pdev, int mode, u8 req_type)
{
u32 reg;
u8 dev_modes;
@@ -400,10 +384,11 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
if (!((1 << mode) & dev_modes))
return -EINVAL;
- pdev->tph_mode = mode;
+ if (req_type == PCI_TPH_REQ_EXT_TPH && !pdev->tph_ext_support)
+ return -EINVAL;
- pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
- PCI_TPH_REQ_TPH_ONLY;
+ pdev->tph_mode = mode;
+ pdev->tph_req_type = req_type;
/* Write them into TPH control register */
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
@@ -417,8 +402,51 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
return 0;
}
+
+/**
+ * pcie_enable_tph - Enable TPH support for device using a specific ST mode
+ * @pdev: PCI device
+ * @mode: ST mode to enable. Current supported modes include:
+ *
+ * - PCI_TPH_ST_NS_MODE: NO ST Mode
+ * - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode
+ * - PCI_TPH_ST_DS_MODE: Device Specific Mode
+ *
+ * Check whether the mode is actually supported by the device before enabling
+ * and return an error if not. Additionally determine what types of requests,
+ * TPH or extended TPH, can be issued by the device based on its TPH requester
+ * capability and the Root Port's completer capability.
+ *
+ * Return: 0 on success, otherwise negative value (-errno)
+ */
+int pcie_enable_tph(struct pci_dev *pdev, int mode)
+{
+ u8 req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
+ PCI_TPH_REQ_TPH_ONLY;
+ return enable_tph(pdev, mode, req_type);
+}
EXPORT_SYMBOL(pcie_enable_tph);
+/**
+ * pcie_enable_tph_explicit - Enable TPH with explicit requester selection
+ * @pdev: PCI device to operate
+ * @mode: ST table operating mode (NS/IV/DS)
+ * @extended: true = EXT_TPH, false = standard TPH only
+ *
+ * Unlike auto-detecting pcie_enable_tph(), caller selects requester type
+ * manually instead of hardware auto-selection. Rejects EXT_TPH request
+ * if device lacks extended requester capability.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended)
+{
+ u8 req_type = extended ? PCI_TPH_REQ_EXT_TPH : PCI_TPH_REQ_TPH_ONLY;
+
+ return enable_tph(pdev, mode, req_type);
+}
+EXPORT_SYMBOL(pcie_enable_tph_explicit);
+
void pci_restore_tph_state(struct pci_dev *pdev)
{
struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 6f02b020d7d7..ca0faa98afac 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -29,6 +29,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev,
unsigned int cpu, u16 *tag);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
+int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
#else
@@ -42,6 +43,9 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode,
+ bool extended)
+{ return -EINVAL; }
static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
{ return 0; }
static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (3 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:02 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
` (6 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Refactor pcie_tph_get_cpu_st(): extract core logic into static internal
get_cpu_st() helper accepting explicit requester type parameter.
- Preserve original pcie_tph_get_cpu_st() unchanged as auto wrapper; it
uses existing pdev->tph_req_type automatically, existing callers require
no change.
- Add exported pcie_tph_get_cpu_st_explicit() with bool 'extended'
parameter for manual STD/EXT requester selection, consumed by upcoming
VFIO TPH code.
- Add capability check: reject explicit EXT request when device does not
support extended TPH requester.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 68 ++++++++++++++++++++++++++++++-----------
include/linux/pci-tph.h | 7 +++++
2 files changed, 57 insertions(+), 18 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index c22eb7f4b0bd..70273c44e8fd 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -230,21 +230,8 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
return pci_write_config_word(pdev, offset, tag);
}
-/**
- * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
- * with a specific CPU
- * @pdev: PCI device
- * @mem_type: target memory type (volatile or persistent RAM)
- * @cpu: associated CPU id
- * @tag: Steering Tag to be returned
- *
- * Return the Steering Tag for a target memory that is associated with a
- * specific CPU as indicated by cpu.
- *
- * Return: 0 if success, otherwise negative value (-errno)
- */
-int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
- unsigned int cpu, u16 *tag)
+static int get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+ u8 req_type, unsigned int cpu, u16 *tag)
{
#ifdef CONFIG_ACPI
struct pci_dev *rp;
@@ -268,19 +255,64 @@ int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
return -EINVAL;
}
- *tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
+ *tag = tph_extract_tag(mem_type, req_type, &info);
- pci_dbg(pdev, "get steering tag: mem_type=%s, cpu=%d, tag=%#04x\n",
+ pci_dbg(pdev, "get steering tag: mem_type=%s, req_type=%u, cpu=%d, tag=%#04x\n",
(mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
- cpu, *tag);
+ req_type, cpu, *tag);
return 0;
#else
return -ENODEV;
#endif
}
+
+/**
+ * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated
+ * with a specific CPU
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @cpu: associated CPU id
+ * @tag: Steering Tag to be returned
+ *
+ * Return the Steering Tag for a target memory that is associated with a
+ * specific CPU as indicated by cpu.
+ *
+ * Return: 0 if success, otherwise negative value (-errno)
+ */
+int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
+ unsigned int cpu, u16 *tag)
+{
+ return get_cpu_st(pdev, mem_type, pdev->tph_req_type, cpu, tag);
+}
EXPORT_SYMBOL(pcie_tph_get_cpu_st);
+/**
+ * pcie_tph_get_cpu_st_explicit - Get ST with explicit requester type
+ * @pdev: PCI device
+ * @mem_type: target memory type (volatile or persistent RAM)
+ * @extended: true=EXT_TPH, false=standard TPH only
+ * @cpu: associated CPU id
+ * @tag: output steering tag pointer
+ *
+ * Unlike auto pcie_tph_get_cpu_st(), caller manually picks requester type.
+ * Rejects EXT request if device lacks extended requester capability.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+ enum tph_mem_type mem_type,
+ bool extended, unsigned int cpu, u16 *tag)
+{
+ u8 req_type = extended ? PCI_TPH_REQ_EXT_TPH : PCI_TPH_REQ_TPH_ONLY;
+
+ if (extended && !pdev->tph_ext_support)
+ return -EINVAL;
+
+ return get_cpu_st(pdev, mem_type, req_type, cpu, tag);
+}
+EXPORT_SYMBOL(pcie_tph_get_cpu_st_explicit);
+
/**
* pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry
* @pdev: PCI device
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index ca0faa98afac..1a508b3d511f 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -27,6 +27,9 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev,
int pcie_tph_get_cpu_st(struct pci_dev *dev,
enum tph_mem_type mem_type,
unsigned int cpu, u16 *tag);
+int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+ enum tph_mem_type mem_type,
+ bool extended, unsigned int cpu, u16 *tag);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
@@ -40,6 +43,10 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
enum tph_mem_type mem_type,
unsigned int cpu, u16 *tag)
{ return -EINVAL; }
+static inline int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
+ enum tph_mem_type mem_type,
+ bool extended, unsigned int cpu, u16 *tag)
+{ return -EINVAL; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 06/12] PCI/TPH: Expose the enabled TPH requester type
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (4 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 8:57 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
` (5 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
From: Zhiping Zhang <zhipingz@meta.com>
Add pcie_tph_enabled_req_type() so drivers can query the enabled TPH
requester mode without reaching into pci_dev internals.
Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 12 ++++++++++++
include/linux/pci-tph.h | 3 +++
2 files changed, 15 insertions(+)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 70273c44e8fd..ea777cc33ae6 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -479,6 +479,18 @@ int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended)
}
EXPORT_SYMBOL(pcie_enable_tph_explicit);
+/**
+ * pcie_tph_enabled_req_type - Return the device's enabled TPH requester type
+ * @pdev: PCI device to query
+ *
+ * Return: PCI_TPH_REQ_DISABLE, PCI_TPH_REQ_TPH_ONLY or PCI_TPH_REQ_EXT_TPH.
+ */
+u8 pcie_tph_enabled_req_type(struct pci_dev *pdev)
+{
+ return pdev->tph_req_type;
+}
+EXPORT_SYMBOL(pcie_tph_enabled_req_type);
+
void pci_restore_tph_state(struct pci_dev *pdev)
{
struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 1a508b3d511f..e4f7045fc152 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -33,6 +33,7 @@ int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
+u8 pcie_tph_enabled_req_type(struct pci_dev *pdev);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
#else
@@ -47,6 +48,8 @@ static inline int pcie_tph_get_cpu_st_explicit(struct pci_dev *pdev,
enum tph_mem_type mem_type,
bool extended, unsigned int cpu, u16 *tag)
{ return -EINVAL; }
+static inline u8 pcie_tph_enabled_req_type(struct pci_dev *pdev)
+{ return PCI_TPH_REQ_DISABLE; }
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (5 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:07 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
` (4 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add new helper pcie_tph_supported() with want_ext parameter:
- want_ext = false: Check if device has valid TPH capability;
- want_ext = true: Check hardware Extended TPH support.
This helper is prepared for follow-up VFIO TPH virtualization patches to
uniformly query basic TPH existence and Extended TPH capability.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 19 +++++++++++++++++++
include/linux/pci-tph.h | 3 +++
2 files changed, 22 insertions(+)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index ea777cc33ae6..c080cf5401a3 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -596,3 +596,22 @@ void pci_tph_init(struct pci_dev *pdev)
save_size = sizeof(u32) + num_entries * sizeof(u16);
pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
}
+
+/**
+ * pcie_tph_supported - Check TPH capability attribute
+ * @pdev: PCI device to query
+ * @want_ext: false - check TPH cap exists; true - check EXT_TPH support
+ *
+ * Return: true on matched condition, false otherwise
+ */
+bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
+{
+ if (!pdev->tph_cap)
+ return false;
+
+ if (!want_ext)
+ return true;
+
+ return pdev->tph_ext_support;
+}
+EXPORT_SYMBOL(pcie_tph_supported);
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index e4f7045fc152..5917a0694c1d 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -36,6 +36,7 @@ int pcie_enable_tph_explicit(struct pci_dev *pdev, int mode, bool extended);
u8 pcie_tph_enabled_req_type(struct pci_dev *pdev);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext);
#else
static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag)
@@ -60,6 +61,8 @@ static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
{ return 0; }
static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{ return PCI_TPH_LOC_NONE; }
+static inline bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
+{ return false; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (6 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:02 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
` (3 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add per-Root-Port-only sysfs binary attribute tph_cpu_st to expose ACPI
DSM CPU-to-ST mapping to userspace, addressing concerns that VFIO should
not host CPU steering tag translation interfaces.
ABI: /sys/bus/pci/devices/<root-port-bdf>/tph_cpu_st
- Read-only root-only (0400) binary blob;
- Each entry is packed 8-byte struct pci_tph_cpu_st defined in uapi/pci.h;
- Support arbitrary offset partial read/sub-field extraction;
- Non-present/impossible CPUs return zero-filled entries to avoid
sequential read abort on sparse CPU topology;
- Insert cond_resched() in read loop to avoid soft lockup when dumping full
blob.
Dynamic visibility rules enforced via is_bin_visible:
1. Only expose file on PCIe Root Port devices, hide on all endpoints;
2. Root Port must implement TPH Completer capability in DevCap2;
3. Platform must provide valid ACPI DSM for CPU-to-ST mapping.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
Documentation/ABI/testing/sysfs-bus-pci | 24 ++++
drivers/pci/pci-sysfs.c | 3 +
drivers/pci/pci.h | 4 +
drivers/pci/tph.c | 151 +++++++++++++++++++++---
include/uapi/linux/pci.h | 16 +++
5 files changed, 183 insertions(+), 15 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index b767db2c52cb..edc64e4e5640 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -702,3 +702,27 @@ Description:
When present and the tsm/ attribute directory is present, the
authenticated attribute is an alias for the device 'connect'
state. See the 'tsm/connect' attribute for more details.
+
+What: /sys/bus/pci/devices/<root-port-bdf>/tph_cpu_st
+Contact: linux-pci@vger.kernel.org
+Description:
+ Read-only binary attribute only exposed on PCIe Root Ports that
+ support TPH Completer capability and implement the ACPI DSM
+ method for CPU-to-ST mapping. File permission is root-only
+ (0400).
+
+ The blob is a sequence of fixed-size 8-byte entries defined by
+ struct pci_tph_cpu_st in uapi/linux/pci.h:
+ __u8 vm_st;
+ __u8 pm_st;
+ __u16 vm_xst;
+ __u16 pm_xst;
+ __u16 reserved;
+
+ Each entry corresponds to a logical CPU index. Seek offset =
+ cpu_id * PCI_TPH_CPU_ST_ENTRY_SZ. Arbitrary unaligned partial
+ reads are supported; no alignment restriction enforced.
+
+ For CPUs outside cpu_possible_mask or offline CPUs, the entry
+ is filled with all zeros to avoid breaking sequential dump tools
+ like cat/hexdump on sparse CPU topologies.
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 5ec0b245a69b..b28732bd1554 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1911,6 +1911,9 @@ const struct attribute_group *pci_dev_attr_groups[] = {
#ifdef CONFIG_PCI_TSM
&pci_tsm_auth_attr_group,
&pci_tsm_attr_group,
+#endif
+#ifdef CONFIG_PCIE_TPH
+ &pcie_tph_cpu_st_attr_group,
#endif
NULL,
};
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4469e1a77f3c..5d28bec94add 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1366,6 +1366,10 @@ static inline pci_power_t acpi_pci_choose_state(struct pci_dev *pdev)
extern const struct attribute_group aspm_ctrl_attr_group;
#endif
+#ifdef CONFIG_PCIE_TPH
+extern const struct attribute_group pcie_tph_cpu_st_attr_group;
+#endif
+
#ifdef CONFIG_X86_INTEL_MID
bool pci_use_mid_pm(void);
int mid_pci_set_power_state(struct pci_dev *pdev, pci_power_t state);
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index c080cf5401a3..bb6f24eccafc 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -11,6 +11,7 @@
#include <linux/msi.h>
#include <linux/bitfield.h>
#include <linux/pci-tph.h>
+#include <linux/sched.h>
#include "pci.h"
@@ -130,8 +131,47 @@ static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
return AE_OK;
}
+
+static int tph_get_cpu_st_info(struct pci_dev *pdev, unsigned int cpu,
+ union st_info *info)
+{
+ acpi_handle rp_acpi_handle;
+ struct pci_dev *rp;
+ u32 cpu_uid;
+ int ret;
+
+ ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+ if (ret != 0)
+ return ret;
+
+ rp = pcie_find_root_port(pdev);
+ if (!rp || !rp->bus || !rp->bus->bridge)
+ return -ENODEV;
+
+ rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
+ if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, info) != AE_OK)
+ return -EINVAL;
+
+ return 0;
+}
#endif
+static bool tph_dsm_supported(struct pci_dev *pdev)
+{
+#ifdef CONFIG_ACPI
+ struct pci_dev *rp = pcie_find_root_port(pdev);
+ acpi_handle rp_acpi_handle;
+
+ if (!rp || !rp->bus || !rp->bus->bridge)
+ return false;
+
+ rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
+ return acpi_check_dsm(rp_acpi_handle, &pci_acpi_dsm_guid, 7,
+ BIT(TPH_ST_DSM_FUNC_INDEX));
+#endif
+ return false;
+}
+
/* Update the TPH Requester Enable field of TPH Control Register */
static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
{
@@ -230,31 +270,37 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
return pci_write_config_word(pdev, offset, tag);
}
+static int get_cpu_all_st(struct pci_dev *pdev, unsigned int cpu,
+ struct pci_tph_cpu_st *st)
+{
+#ifdef CONFIG_ACPI
+ union st_info info;
+ int ret;
+
+ ret = tph_get_cpu_st_info(pdev, cpu, &info);
+ if (ret == 0) {
+ st->vm_st = info.vm_st_valid ? info.vm_st : 0;
+ st->pm_st = info.pm_st_valid ? info.pm_st : 0;
+ st->vm_xst = info.vm_xst_valid ? info.vm_xst : 0;
+ st->pm_xst = info.pm_xst_valid ? info.pm_xst : 0;
+ }
+
+ return ret;
+#endif
+ return -ENODEV;
+}
+
static int get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
u8 req_type, unsigned int cpu, u16 *tag)
{
#ifdef CONFIG_ACPI
- struct pci_dev *rp;
- acpi_handle rp_acpi_handle;
union st_info info;
- u32 cpu_uid;
int ret;
- ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+ ret = tph_get_cpu_st_info(pdev, cpu, &info);
if (ret != 0)
return ret;
- rp = pcie_find_root_port(pdev);
- if (!rp || !rp->bus || !rp->bus->bridge)
- return -ENODEV;
-
- rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
-
- if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, &info) != AE_OK) {
- *tag = 0;
- return -EINVAL;
- }
-
*tag = tph_extract_tag(mem_type, req_type, &info);
pci_dbg(pdev, "get steering tag: mem_type=%s, req_type=%u, cpu=%d, tag=%#04x\n",
@@ -615,3 +661,78 @@ bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
return pdev->tph_ext_support;
}
EXPORT_SYMBOL(pcie_tph_supported);
+
+static ssize_t tph_cpu_st_read(struct file *filp, struct kobject *kobj,
+ const struct bin_attribute *bin_attr, char *buf,
+ loff_t off, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
+ const size_t entry_sz = PCI_TPH_CPU_ST_ENTRY_SZ;
+ const size_t total_size = nr_cpu_ids * entry_sz;
+ size_t copied = 0;
+ loff_t pos = off;
+
+ if (pos >= total_size)
+ return 0;
+
+ count = min_t(size_t, count, total_size - pos);
+
+ while (copied < count) {
+ unsigned int cpu_idx = pos / entry_sz;
+ size_t entry_off = pos % entry_sz;
+ size_t remain = entry_sz - entry_off;
+ size_t chunk = min_t(size_t, remain, count - copied);
+ struct pci_tph_cpu_st st = {0};
+
+ if (cpu_possible(cpu_idx))
+ get_cpu_all_st(pdev, cpu_idx, &st);
+
+ memcpy(buf + copied, (char *)&st + entry_off, chunk);
+
+ copied += chunk;
+ pos += chunk;
+
+ cond_resched();
+ }
+
+ return copied;
+}
+static BIN_ATTR(tph_cpu_st, 0400, tph_cpu_st_read, NULL, 0);
+
+static const struct bin_attribute *const tph_cpu_st_bin_attrs[] = {
+ &bin_attr_tph_cpu_st,
+ NULL,
+};
+
+static size_t tph_cpu_st_bin_size(struct kobject *kobj,
+ const struct bin_attribute *a, int n)
+{
+ return nr_cpu_ids * PCI_TPH_CPU_ST_ENTRY_SZ;
+}
+
+static umode_t tph_cpu_st_attr_is_visible(struct kobject *kobj,
+ const struct bin_attribute *a, int n)
+{
+ struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
+ bool is_root_port = pci_is_pcie(pdev) &&
+ pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT;
+ u32 devcap2 = 0;
+
+ if (!is_root_port)
+ return 0;
+
+ pci_read_config_dword(pdev, PCI_EXP_DEVCAP2, &devcap2);
+ if (!(devcap2 & PCI_EXP_DEVCAP2_TPH_COMP_MASK))
+ return 0;
+
+ if (!tph_dsm_supported(pdev))
+ return 0;
+
+ return a->attr.mode;
+}
+
+const struct attribute_group pcie_tph_cpu_st_attr_group = {
+ .bin_attrs = tph_cpu_st_bin_attrs,
+ .bin_size = tph_cpu_st_bin_size,
+ .is_bin_visible = tph_cpu_st_attr_is_visible,
+};
diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index 4f150028965d..5c4ea44d66d2 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -19,6 +19,7 @@
#define _UAPILINUX_PCI_H
#include <linux/pci_regs.h> /* The pci register defines */
+#include <linux/types.h>
/*
* The PCI interface treats multi-function devices as independent
@@ -46,4 +47,19 @@ enum pci_hotplug_event {
PCI_HOTPLUG_CARD_NOT_PRESENT,
};
+/*
+ * PCIe TPH sysfs binary entry for CPU-to-ST mapping
+ * Sysfs file: /sys/bus/pci/devices/<BDF>/tph_cpu_st
+ * Each entry is 8 bytes aligned, seek offset = cpu_id * PCI_TPH_CPU_ST_ENTRY_SZ
+ */
+struct pci_tph_cpu_st {
+ __u8 vm_st; /* Volatile Memory Steering Tag (1 byte) */
+ __u8 pm_st; /* Persistent Memory Steering Tag (1 byte) */
+ __u16 vm_xst; /* Volatile Memory Extended Steering Tag (2 bytes) */
+ __u16 pm_xst; /* Persistent Memory Extended Steering Tag (2 bytes) */
+ __u16 reserved; /* Padding to 8 bytes for aligned offset lookup */
+} __packed;
+
+#define PCI_TPH_CPU_ST_ENTRY_SZ sizeof(struct pci_tph_cpu_st)
+
#endif /* _UAPILINUX_PCI_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 09/12] vfio/pci: Hide TPH capability when TPH is unsupported
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (7 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:07 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
` (2 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Check the device negotiated TPH support status before parsing the TPH
extended capability. Return zero length to hide the capability from
userspace if TPH is disabled during topology negotiation.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_config.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a10ed733f0e3..5c6ab172df6c 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -22,6 +22,7 @@
#include <linux/fs.h>
#include <linux/pci.h>
+#include <linux/pci-tph.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/slab.h>
@@ -1450,6 +1451,8 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
byte &= PCI_DPA_CAP_SUBSTATE_MASK;
return PCI_DPA_BASE_SIZEOF + byte + 1;
case PCI_EXT_CAP_ID_TPH:
+ if (!pcie_tph_supported(pdev, false))
+ return 0;
ret = pci_read_config_dword(pdev, epos + PCI_TPH_CAP, &dword);
if (ret)
return pcibios_err_to_errno(ret);
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (8 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:03 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
2026-06-23 8:50 ` [PATCH RESEND v18 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Introduce module param enable_unsafe_tph to gate all TPH related features,
and add VFIO_DEVICE_FEATURE_TPH_ENABLE uapi together with per-device
tph_permit flag.
This is a preparatory implementation: only feature framework is added
for now, actual TPH_CTRL register permission control and steering tag
features (TPH_CPU_ST / TPH_ST_CONFIG) will be attached in subsequent
TPH capability virtualization commits.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci.c | 13 ++++++++++++-
drivers/vfio/pci/vfio_pci_config.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 25 ++++++++++++++++++++++++-
include/linux/vfio_pci_core.h | 4 +++-
include/uapi/linux/vfio.h | 7 +++++++
5 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..6d73668459cf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
module_param(disable_denylist, bool, 0444);
MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph;
+module_param(enable_unsafe_tph, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
+#endif
+
static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
{
switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
{
int ret;
bool is_disable_vga = true;
+ bool is_enable_unsafe_tph = false;
#ifdef CONFIG_VFIO_PCI_VGA
is_disable_vga = disable_vga;
#endif
+#ifdef CONFIG_PCIE_TPH
+ is_enable_unsafe_tph = enable_unsafe_tph;
+#endif
- vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+ vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+ is_enable_unsafe_tph);
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 5c6ab172df6c..251d3ec7fdd4 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1783,6 +1783,7 @@ int vfio_config_init(struct vfio_pci_core_device *vdev)
goto out;
vdev->bardirty = true;
+ vdev->tph_permit = false;
/*
* XXX can we just pci_load_saved_state/pci_restore_state?
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a28f1e99362c..b0193afca875 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -41,6 +41,7 @@
static bool nointxmask;
static bool disable_vga;
static bool disable_idle_d3;
+static bool enable_unsafe_tph;
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
@@ -1554,6 +1555,24 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
+ u32 flags, size_t argsz)
+{
+ int ret;
+
+ if (!enable_unsafe_tph)
+ return -EOPNOTSUPP;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
+ if (ret <= 0)
+ return ret;
+
+ if (!vdev->tph_permit)
+ vdev->tph_permit = 1;
+
+ return 0;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1572,6 +1591,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_TPH_ENABLE:
+ return vfio_pci_core_feature_tph_enable(vdev, flags, argsz);
default:
return -ENOTTY;
}
@@ -2615,11 +2636,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
}
void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3)
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph)
{
nointxmask = is_nointxmask;
disable_vga = is_disable_vga;
disable_idle_d3 = is_disable_idle_d3;
+ enable_unsafe_tph = is_enable_unsafe_tph;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 5fc6ce4dd786..d551e530dd86 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -128,6 +128,7 @@ struct vfio_pci_core_device {
bool pm_intx_masked:1;
bool pm_runtime_engaged:1;
bool sriov_active;
+ bool tph_permit;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
int ioeventfds_nr;
@@ -158,7 +159,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3);
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..e5a4d1d7091b 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,13 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/*
+ * Device-level opt-in for TPH (Transaction Processing Hints) support.
+ * When set, allows access to TPH_CPU_ST and TPH_ST_CONFIG features.
+ * Requires global enable_unsafe_tph module parameter to be enabled.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ENABLE 13
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (9 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:11 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add a new VFIO device feature VFIO_DEVICE_FEATURE_TPH_ST_CONFIG to allow
userspace to configure PCIe TPH Steering Tag table entries. This interface
supports only configuration writes, read operations are not permitted.
Implement shadow ST table to cache entries, paired with per-device mutex
for concurrent access protection. Batch write failure triggers entry
rollback to guarantee hardware and shadow table consistency.
The feature is double gated:
1. Global enable_unsafe_tph module parameter must be enabled;
2. Userspace needs to firstly SET VFIO_DEVICE_FEATURE_TPH_ENABLE
to set per-device tph_permit flag before using TPH_CPU_CONFIG.
Design note for Sashiko reset shadow table warning:
Do not clear tph_st_shadow on FLR/device reset. Userspace VFIO application
can detect hardware reset events and re-initialize full ST table
configuration to sync shadow cache with hardware state afterward. Retain
cached ST entries to support offline error diagnosis and post-reset
recovery.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_config.c | 1 -
drivers/vfio/pci/vfio_pci_core.c | 128 +++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 2 +
include/uapi/linux/vfio.h | 22 +++++
4 files changed, 152 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 251d3ec7fdd4..5c6ab172df6c 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1783,7 +1783,6 @@ int vfio_config_init(struct vfio_pci_core_device *vdev)
goto out;
vdev->bardirty = true;
- vdev->tph_permit = false;
/*
* XXX can we just pci_load_saved_state/pci_restore_state?
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b0193afca875..c327eff8e9cc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <linux/iommufd.h>
#include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
#if IS_ENABLED(CONFIG_EEH)
#include <asm/eeh.h>
#endif
@@ -532,6 +533,52 @@ static const struct dev_pm_ops vfio_pci_core_pm_ops = {
NULL)
};
+static int vfio_pci_tph_st_shadow_size(struct vfio_pci_core_device *vdev)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ u32 loc = pcie_tph_get_st_table_loc(pdev);
+ int ret;
+
+ if (loc == PCI_TPH_LOC_CAP) {
+ return pcie_tph_get_st_table_size(pdev);
+ } else if (loc == PCI_TPH_LOC_MSIX) {
+ ret = pci_msix_vec_count(pdev);
+ if (ret < 0)
+ return 0;
+ return ret;
+ } else {
+ return 0;
+ }
+}
+
+static int vfio_pci_tph_init(struct vfio_pci_core_device *vdev)
+{
+ vdev->tph_st_entries = 0;
+ vdev->tph_st_shadow = NULL;
+ vdev->tph_permit = false;
+
+ if (!enable_unsafe_tph)
+ return 0;
+
+ vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
+ if (vdev->tph_st_entries) {
+ vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
+ GFP_KERNEL_ACCOUNT);
+ if (!vdev->tph_st_shadow)
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void vfio_pci_tph_deinit(struct vfio_pci_core_device *vdev)
+{
+ kfree(vdev->tph_st_shadow);
+ vdev->tph_st_shadow = NULL;
+ vdev->tph_st_entries = 0;
+ vdev->tph_permit = false;
+}
+
int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
@@ -558,6 +605,11 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
goto out_disable_device;
vdev->reset_works = !ret;
+
+ ret = vfio_pci_tph_init(vdev);
+ if (ret)
+ goto out_disable_device;
+
pci_save_state(pdev);
vdev->pci_saved_state = pci_store_saved_state(pdev);
if (!vdev->pci_saved_state)
@@ -615,6 +667,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
out_free_state:
kfree(vdev->pci_saved_state);
vdev->pci_saved_state = NULL;
+ vfio_pci_tph_deinit(vdev);
out_disable_device:
pci_disable_device(pdev);
out_power:
@@ -683,6 +736,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
kfree(vdev->region);
vdev->region = NULL; /* don't krealloc a freed pointer */
+ vfio_pci_tph_deinit(vdev);
vfio_config_free(vdev);
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
@@ -1573,6 +1627,77 @@ static int vfio_pci_core_feature_tph_enable(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_tph_st_config(
+ struct vfio_pci_core_device *vdev,
+ u32 flags,
+ struct vfio_device_feature_tph_st_config __user *arg,
+ size_t argsz)
+{
+ struct vfio_device_feature_tph_st_config config;
+ struct pci_dev *pdev = vdev->pdev;
+ void __user *uptr;
+ int i, idx, ret;
+ size_t sz;
+ u16 *sts;
+
+ if (!enable_unsafe_tph)
+ return -EOPNOTSUPP;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(config));
+ if (ret <= 0)
+ return ret;
+
+ if (!vdev->tph_permit || !vdev->tph_st_shadow)
+ return -EINVAL;
+
+ if (copy_from_user(&config, arg, sizeof(config)))
+ return -EFAULT;
+
+ if (config.count == 0 || config.reserved != 0 ||
+ config.index >= vdev->tph_st_entries ||
+ config.count > vdev->tph_st_entries - config.index)
+ return -EINVAL;
+
+ uptr = u64_to_user_ptr(config.data_uptr);
+ sts = memdup_array_user(uptr, config.count, sizeof(u16));
+ sz = config.count * sizeof(u16);
+ if (IS_ERR(sts))
+ return PTR_ERR(sts);
+
+ down_write(&vdev->memory_lock);
+ ret = vfio_pci_set_power_state(vdev, PCI_D0);
+ if (ret)
+ goto out_unlock_memory;
+
+ if (pcie_tph_enabled_req_type(pdev) == PCI_TPH_REQ_DISABLE)
+ goto update_shadow;
+
+ for (i = 0; i < config.count; i++) {
+ idx = config.index + i;
+ ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
+ if (ret)
+ goto rollback;
+ }
+
+update_shadow:
+ memcpy(&vdev->tph_st_shadow[config.index], sts, sz);
+ ret = 0;
+ goto out_unlock_memory;
+
+rollback:
+ while (i-- > 0) {
+ idx = config.index + i;
+ pcie_tph_set_st_entry(pdev, idx, vdev->tph_st_shadow[idx]);
+ }
+
+out_unlock_memory:
+ up_write(&vdev->memory_lock);
+
+ kfree(sts);
+ return ret;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1593,6 +1718,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_TPH_ENABLE:
return vfio_pci_core_feature_tph_enable(vdev, flags, argsz);
+ case VFIO_DEVICE_FEATURE_TPH_ST_CONFIG:
+ return vfio_pci_core_feature_tph_st_config(vdev, flags,
+ arg, argsz);
default:
return -ENOTTY;
}
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index d551e530dd86..527c84f042aa 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -144,6 +144,8 @@ struct vfio_pci_core_device {
struct notifier_block nb;
struct rw_semaphore memory_lock;
struct list_head dmabufs;
+ u16 *tph_st_shadow;
+ u16 tph_st_entries;
};
enum vfio_pci_io_width {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e5a4d1d7091b..61079594a91f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1541,6 +1541,28 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_TPH_ENABLE 13
+/**
+ * VFIO_DEVICE_FEATURE_TPH_ST_CONFIG - Configure PCIe TPH Steering Tag entries
+ *
+ * Provides userspace interface to configure PCIe TPH ST table entries.
+ *
+ * @index: Start entry offset within ST table
+ * @count: Number of consecutive entries to configure
+ * @data_uptr: Userspace data buffer for 16-bit raw ST values
+ *
+ * This feature requires two preconditions:
+ * 1. Global enable_unsafe_tph module parameter is enabled;
+ * 2. VFIO_DEVICE_FEATURE_TPH_ENABLE has been SET on the device beforehand.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ST_CONFIG 14
+
+struct vfio_device_feature_tph_st_config {
+ __u16 index;
+ __u16 count;
+ __u32 reserved; /* Reserved for future use, must be zero */
+ __aligned_u64 data_uptr;
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RESEND v18 12/12] vfio/pci: Virtualize PCIe TPH capability registers
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
` (10 preceding siblings ...)
2026-06-23 8:50 ` [PATCH RESEND v18 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
@ 2026-06-23 8:50 ` Chengwen Feng
2026-06-23 9:13 ` sashiko-bot
11 siblings, 1 reply; 25+ messages in thread
From: Chengwen Feng @ 2026-06-23 8:50 UTC (permalink / raw)
To: alex, jgg, helgaas
Cc: wathsala.vithanage, wei.huang2, zhipingz, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Virtualize TPH extended capability config space registers:
- Original TPH capability was fully read-only; now split permission:
TPH_CAP header remains read-only, TPH_CTRL register supports write to
toggle TPH requester enable mode.
- Block direct ST-table programming via config space write access: all ST
entry configuration is restricted to VFIO_DEVICE_FEATURE_TPH_ST_CONFIG
feature exclusively after userspace SET TPH_ENABLE opt-in.
- Backup original virtual config value and revert vconfig if hardware TPH
enable operation fails or invalid requester mode is configured.
- After TPH requester gets enabled via CTRL write, sync cached shadow ST
table down to physical hardware with memory_lock protection and PCI D0
power check.
Add vconfig masking to hide EXT_TPH capability bit if underlying hardware
does not support extended TPH via new vfio_tph_mask_ext_tph_bit helper.
Reset hardware TPH state on device open/close to eliminate cross-session
TPH configuration leakage between different VM lifecycles.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_config.c | 117 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 4 +
2 files changed, 121 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 5c6ab172df6c..10f4e9fabea7 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1086,6 +1086,118 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm)
return 0;
}
+/* Permissions for TPH extended capability */
+static int __init init_pci_ext_cap_tph_perm(struct perm_bits *perm)
+{
+ int i;
+
+ if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_TPH]))
+ return -ENOMEM;
+
+ p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+ p_setd(perm, PCI_TPH_CAP, ALL_VIRT, NO_WRITE);
+
+ p_setd(perm, PCI_TPH_CTRL, ALL_VIRT, ALL_WRITE);
+
+ /* Per PCI specification: There is an upper limit of 64 entries
+ * when the ST table is located in the TPH Requester Extended
+ * Capability structure.
+ * And the pci_ext_cap_length[PCI_EXT_CAP_ID_TPH] is 0xFF, so the
+ * following operation is fine.
+ */
+ for (i = 0; i < 64; i++)
+ p_setw(perm, PCI_TPH_BASE_SIZEOF + i * sizeof(u16),
+ (u16)ALL_VIRT, (u16)ALL_WRITE);
+
+ return 0;
+}
+
+static void vfio_tph_mask_ext_tph_bit(struct vfio_pci_core_device *vdev,
+ int pos)
+{
+ __le32 *vptr = (__le32 *)&vdev->vconfig[pos + PCI_TPH_CAP];
+ struct pci_dev *pdev = vdev->pdev;
+ u32 val;
+
+ if (!pcie_tph_supported(pdev, true)) {
+ val = le32_to_cpu(*vptr);
+ val &= ~PCI_TPH_CAP_EXT_TPH;
+ *vptr = cpu_to_le32(val);
+ }
+}
+
+static int vfio_find_cap_start(struct vfio_pci_core_device *vdev, int pos);
+static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
+ int count, struct perm_bits *perm,
+ int offset, __le32 val)
+{
+ int req_en_byte = PCI_TPH_CTRL + 1;
+ struct pci_dev *pdev = vdev->pdev;
+ __le32 org_val = 0;
+ bool extended;
+ u8 mode, req;
+ int i, ret;
+ u16 start;
+ u32 data;
+
+ if (!vdev->tph_permit)
+ return count;
+
+ down_write(&vdev->memory_lock);
+
+ /* Back up the original values in order rollback when fail */
+ if (offset <= req_en_byte && offset + count > req_en_byte)
+ vfio_default_config_read(vdev, pos, count, perm, offset,
+ &org_val);
+
+ ret = vfio_default_config_write(vdev, pos, count, perm, offset, val);
+ if (ret != count) {
+ up_write(&vdev->memory_lock);
+ return ret;
+ }
+
+ /* Skip if write range does not cover Requester Enable byte */
+ if (offset > req_en_byte || offset + count <= req_en_byte) {
+ up_write(&vdev->memory_lock);
+ return count;
+ }
+
+ ret = vfio_pci_set_power_state(vdev, PCI_D0);
+ if (ret) {
+ vfio_default_config_write(vdev, pos, count, perm, offset,
+ org_val);
+ up_write(&vdev->memory_lock);
+ return count;
+ }
+
+ start = vfio_find_cap_start(vdev, pos);
+ data = le32_to_cpu(*(__le32 *)&vdev->vconfig[start + PCI_TPH_CTRL]);
+ mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
+ req = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
+
+ if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
+ extended = !!(req == PCI_TPH_REQ_EXT_TPH);
+ ret = pcie_enable_tph_explicit(pdev, mode, extended);
+ if (!ret && vdev->tph_st_shadow) {
+ for (i = 0; i < vdev->tph_st_entries; i++)
+ pcie_tph_set_st_entry(pdev, i,
+ vdev->tph_st_shadow[i]);
+ }
+ if (ret)
+ vfio_default_config_write(vdev, pos, count, perm,
+ offset, org_val);
+ } else if (req == PCI_TPH_REQ_DISABLE) {
+ pcie_disable_tph(vdev->pdev);
+ } else {
+ vfio_default_config_write(vdev, pos, count, perm, offset,
+ org_val);
+ }
+
+ up_write(&vdev->memory_lock);
+
+ return count;
+}
+
/*
* Initialize the shared permission tables
*/
@@ -1101,6 +1213,7 @@ void vfio_pci_uninit_perm_bits(void)
free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+ free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_TPH]);
}
int __init vfio_pci_init_perm_bits(void)
@@ -1121,6 +1234,8 @@ int __init vfio_pci_init_perm_bits(void)
/* Extended capabilities */
ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
+ ret |= init_pci_ext_cap_tph_perm(&ecap_perms[PCI_EXT_CAP_ID_TPH]);
+ ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_tph_config_write;
ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
@@ -1704,6 +1819,8 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
ret = vfio_fill_vconfig_bytes(vdev, epos, len);
if (ret)
return ret;
+ if (ecap == PCI_EXT_CAP_ID_TPH && !hidden)
+ vfio_tph_mask_ext_tph_bit(vdev, epos);
/*
* If we're just using this capability to anchor the list,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index c327eff8e9cc..1e706a690dbd 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -606,6 +606,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
vdev->reset_works = !ret;
+ /* Reset TPH status on new user session */
+ pcie_disable_tph(vdev->pdev);
ret = vfio_pci_tph_init(vdev);
if (ret)
goto out_disable_device;
@@ -736,6 +738,8 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
kfree(vdev->region);
vdev->region = NULL; /* don't krealloc a freed pointer */
+ /* Reset TPH status on session exit */
+ pcie_disable_tph(vdev->pdev);
vfio_pci_tph_deinit(vdev);
vfio_config_free(vdev);
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 06/12] PCI/TPH: Expose the enabled TPH requester type
2026-06-23 8:50 ` [PATCH RESEND v18 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
@ 2026-06-23 8:57 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 8:57 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
> From: Zhiping Zhang <zhipingz@meta.com>
>
> Add pcie_tph_enabled_req_type() so drivers can query the enabled TPH
> requester mode without reaching into pci_dev internals.
>
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=6
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-06-23 8:50 ` [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-06-23 8:59 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 8:59 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
> field value to bit 0. But the function is designed to return raw
> PCI_TPH_LOC_* values as defined in the function comment.
>
> This causes incorrect ST table location detection. Fix it by using bitwise
> AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
> the function specification.
>
> This doesn't make a difference to mlx5_st_create(), the lone external
> caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
> for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.
>
> Also add tph_cap validation for pcie_tph_get_st_table_loc() to prevent
> invalid PCI configuration space access when TPH is not supported. Add stub
> functions for pcie_tph_get_st_table_size() and pcie_tph_get_st_table_loc()
> when !CONFIG_PCIE_TPH.
>
> Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=1
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing
2026-06-23 8:50 ` [PATCH RESEND v18 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
@ 2026-06-23 9:00 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:00 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> Split tph_enabled from shared pci_dev bitfield into spare bit of tph_cap's
> u16: tph_cap is immutable post-enumeration (15 bits for offset), remaining
> 1 bit stores tph_enabled. Removes cross-bitfield concurrent write hazards
> highlighted by Sashiko after VFIO TPH exposure. No functional changes.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=2
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 03/12] PCI/TPH: Cache TPH requester capability at probe time
2026-06-23 8:50 ` [PATCH RESEND v18 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
@ 2026-06-23 9:00 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:00 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> Calculate the negotiated TPH requester type from device and root port
> capabilities once in pci_tph_init().
>
> Add tph_ext_support flag to cache whether the device is allowed to
> issue Extended TPH requests after topology negotiation. If the final
> requester type is disabled, clear TPH capability to prevent usage.
>
> Simplify pcie_enable_tph() by using the cached requester capability
> instead of recalculating every time.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=3
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant
2026-06-23 8:50 ` [PATCH RESEND v18 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
@ 2026-06-23 9:02 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:02 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
> Refactor pcie_tph_get_cpu_st(): extract core logic into static internal
> get_cpu_st() helper accepting explicit requester type parameter.
>
> - Preserve original pcie_tph_get_cpu_st() unchanged as auto wrapper; it
> uses existing pdev->tph_req_type automatically, existing callers require
> no change.
> - Add exported pcie_tph_get_cpu_st_explicit() with bool 'extended'
> parameter for manual STD/EXT requester selection, consumed by upcoming
> VFIO TPH code.
> - Add capability check: reject explicit EXT request when device does not
> support extended TPH requester.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=5
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
2026-06-23 8:50 ` [PATCH RESEND v18 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
@ 2026-06-23 9:02 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:02 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Using `pci_read_config_dword()` to read the PCIe capability register `PCI_EXP_DEVCAP2` reads from the wrong offset in the base PCI configuration space.
Pre-existing issues:
- [High] Missing ACPI buffer length check before dereferencing u64 data in `tph_invoke_dsm()`, leading to a potential out-of-bounds read.
--
commit a6ad2b09ffbe79acf9a6193d9d9e38bc463af8b8
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping
This commit adds a per-Root-Port sysfs binary attribute tph_cpu_st to expose
the ACPI DSM CPU-to-ST mapping to userspace. It supports unaligned partial
reads and implements dynamic visibility rules to ensure it is only exposed
on PCIe Root Ports that support TPH and provide a valid ACPI DSM.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index c080cf5401a31..bb6f24eccafcc 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -130,8 +131,47 @@ static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid,
>
> return AE_OK;
> }
> +
> +static int tph_get_cpu_st_info(struct pci_dev *pdev, unsigned int cpu,
> + union st_info *info)
> +{
> + acpi_handle rp_acpi_handle;
> + struct pci_dev *rp;
> + u32 cpu_uid;
> + int ret;
> +
> + ret = acpi_get_cpu_uid(cpu, &cpu_uid);
> + if (ret != 0)
> + return ret;
> +
> + rp = pcie_find_root_port(pdev);
> + if (!rp || !rp->bus || !rp->bus->bridge)
> + return -ENODEV;
> +
> + rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge);
> + if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, info) != AE_OK)
[Severity: High]
This is a pre-existing issue, but does tph_invoke_dsm() lack an ACPI buffer
length check before dereferencing u64 data?
Looking at tph_invoke_dsm(), it appears to unconditionally cast and dereference
the returned ACPI buffer pointer:
drivers/pci/tph.c:tph_invoke_dsm() {
...
if (out_obj->type != ACPI_TYPE_BUFFER) {
ACPI_FREE(out_obj);
return AE_ERROR;
}
st_out->value = *((u64 *)(out_obj->buffer.pointer));
...
}
If a buggy or malicious ACPI BIOS returns a buffer smaller than 8 bytes, could
this result in an out-of-bounds memory read?
Although pre-existing, exposing this via the new tph_cpu_st sysfs attribute
might allow a root user to systematically trigger this evaluation across all
possible CPUs.
> + return -EINVAL;
> +
> + return 0;
> +}
[ ... ]
> @@ -615,3 +661,78 @@ bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
[ ... ]
> +static umode_t tph_cpu_st_attr_is_visible(struct kobject *kobj,
> + const struct bin_attribute *a, int n)
> +{
> + struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> + bool is_root_port = pci_is_pcie(pdev) &&
> + pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT;
> + u32 devcap2 = 0;
> +
> + if (!is_root_port)
> + return 0;
> +
> + pci_read_config_dword(pdev, PCI_EXP_DEVCAP2, &devcap2);
[Severity: High]
Does this code read from the wrong offset in the base PCI configuration space?
PCI_EXP_DEVCAP2 is an offset relative to the PCIe Capability structure, but
pci_read_config_dword() takes an absolute offset in the base PCI config space.
Should this use pcie_capability_read_dword() instead so the attribute
visibility is determined correctly?
> + if (!(devcap2 & PCI_EXP_DEVCAP2_TPH_COMP_MASK))
> + return 0;
> +
> + if (!tph_dsm_supported(pdev))
> + return 0;
> +
> + return a->attr.mode;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=8
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter
2026-06-23 8:50 ` [PATCH RESEND v18 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
@ 2026-06-23 9:03 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:03 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
> Introduce module param enable_unsafe_tph to gate all TPH related features,
> and add VFIO_DEVICE_FEATURE_TPH_ENABLE uapi together with per-device
> tph_permit flag.
>
> This is a preparatory implementation: only feature framework is added
> for now, actual TPH_CTRL register permission control and steering tag
> features (TPH_CPU_ST / TPH_ST_CONFIG) will be attached in subsequent
> TPH capability virtualization commits.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=10
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
2026-06-23 8:50 ` [PATCH RESEND v18 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
@ 2026-06-23 9:04 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:04 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Concurrent execution of `pcie_tph_set_st_entry` can violate the PCIe spec and corrupt hardware state by modifying the ST table while TPH is globally enabled.
--
commit e3b30f139c34eadf8039e394826ffa947655fe86
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant
This commit refactors the pcie_enable_tph implementation by extracting the
core logic into a static internal enable_tph helper that accepts an explicit
requester type. It preserves the original pcie_enable_tph behavior as an
auto-detecting wrapper and adds a new pcie_enable_tph_explicit function
for explicit standard or extended TPH selection.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 4097787ea98a1..c22eb7f4b0bd7 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -400,10 +384,11 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
> if (!((1 << mode) & dev_modes))
> return -EINVAL;
>
> - pdev->tph_mode = mode;
> + if (req_type == PCI_TPH_REQ_EXT_TPH && !pdev->tph_ext_support)
> + return -EINVAL;
>
> - pdev->tph_req_type = pdev->tph_ext_support ? PCI_TPH_REQ_EXT_TPH :
> - PCI_TPH_REQ_TPH_ONLY;
> + pdev->tph_mode = mode;
> + pdev->tph_req_type = req_type;
>
> /* Write them into TPH control register */
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
[Severity: High]
This is a pre-existing issue, but does the TPH subsystem lack necessary
locking around the TPH Control Register and ST table updates?
If multiple MSI-X vectors' SMP affinities are updated simultaneously by a
user or irqbalance, concurrent calls to pcie_tph_set_st_entry could happen.
Since pcie_tph_set_st_entry temporarily disables TPH system-wide:
pcie_tph_set_st_entry()
set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
...
loc = pcie_tph_get_st_table_loc(pdev);
... write to ST table ...
...
set_ctrl_reg_req_en(pdev, pdev->tph_req_type);
Would CPU2 re-enable TPH while CPU1 is still writing to the ST table,
violating the PCIe r6.2 sec 6.17.3 specification which cautions against
updating the ST table while TPH is enabled?
Additionally, set_ctrl_reg_req_en performs an unprotected read-modify-write:
set_ctrl_reg_req_en()
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®);
FIELD_MODIFY(PCI_TPH_CTRL_REQ_EN_MASK, ®, req_type);
pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
Could concurrent IRQ notifiers cause a data race here, corrupting the TPH
control register state?
> FIELD_MODIFY(PCI_TPH_CTRL_MODE_SEL_MASK, ®, pdev->tph_mode);
> FIELD_MODIFY(PCI_TPH_CTRL_REQ_EN_MASK, ®, pdev->tph_req_type);
>
> pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
>
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=4
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 09/12] vfio/pci: Hide TPH capability when TPH is unsupported
2026-06-23 8:50 ` [PATCH RESEND v18 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
@ 2026-06-23 9:07 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:07 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Pre-existing kernel heap buffer overflow in `vfio_ecap_init()` due to unvalidated extended capability lengths.
--
commit 04070fd3c8083d2b670da1000c99d6fc517e4114
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Hide TPH capability when TPH is unsupported
Check the device negotiated TPH support status before parsing the TPH
extended capability. Return zero length to hide the capability from
userspace if TPH is disabled during topology negotiation.
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3a..5c6ab172df6c0 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -22,6 +22,7 @@
>
> #include <linux/fs.h>
> #include <linux/pci.h>
> +#include <linux/pci-tph.h>
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/slab.h>
[ ... ]
> @@ -1450,6 +1451,8 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
> byte &= PCI_DPA_CAP_SUBSTATE_MASK;
> return PCI_DPA_BASE_SIZEOF + byte + 1;
> case PCI_EXT_CAP_ID_TPH:
> + if (!pcie_tph_supported(pdev, false))
> + return 0;
> ret = pci_read_config_dword(pdev, epos + PCI_TPH_CAP, &dword);
[Severity: High]
This isn't a bug introduced by this patch, but I noticed a pre-existing
issue in how the returned capability length is handled.
Does the returned length from vfio_ext_cap_len() need to be validated
against pdev->cfg_size?
In vfio_ecap_init(), the length returned here is used directly:
drivers/vfio/pci/vfio_pci_config.c:vfio_ecap_init() {
...
len = vfio_ext_cap_len(vdev, ecap, epos);
if (!len) {
...
continue;
}
...
memset(map + epos, ecap, len);
ret = vfio_fill_vconfig_bytes(vdev, epos, len);
...
}
If a malicious or buggy PCIe device advertises an extended capability like
TPH with a large ST table length (e.g., up to 4108 bytes), could epos + len
exceed pdev->cfg_size?
If so, it appears the subsequent memset() and vfio_fill_vconfig_bytes()
operations could write out of bounds into the map and vconfig allocations.
> if (ret)
> return pcibios_err_to_errno(ret);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=9
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes
2026-06-23 8:50 ` [PATCH RESEND v18 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
@ 2026-06-23 9:07 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:07 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The `pcie_tph_supported()` helper ignores the system-wide `pci_tph_disabled` flag, causing VFIO to incorrectly expose an unusable TPH capability to guests.
--
commit 8a6b48383e4674ff26e8119bccd07a94e9e6b180
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes
This commit adds the pcie_tph_supported() helper to check if a PCI device
has a valid TPH capability or Extended TPH support. The helper is intended
for use by VFIO to query TPH existence and capabilities uniformly.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -596,3 +596,22 @@ void pci_tph_init(struct pci_dev *pdev)
> save_size = sizeof(u32) + num_entries * sizeof(u16);
> pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
> }
[ ... ]
> +/**
> + * pcie_tph_supported - Check TPH capability attribute
> + * @pdev: PCI device to query
> + * @want_ext: false - check TPH cap exists; true - check EXT_TPH support
> + *
> + * Return: true on matched condition, false otherwise
> + */
> +bool pcie_tph_supported(struct pci_dev *pdev, bool want_ext)
> +{
> + if (!pdev->tph_cap)
> + return false;
> +
[Severity: Medium]
Should this function also check the system-wide pci_tph_disabled flag?
If the host administrator explicitly disables TPH via the notph boot
parameter, pci_tph_disabled is set to true. Because this helper only
checks pdev->tph_cap, it appears VFIO will still evaluate this as true
and expose a "zombie" TPH capability to guests.
The guest would see the capability but silently fail to enable it since
VFIO's configuration intercept correctly enforces pci_tph_disabled.
Would it be better to return false here if pci_tph_disabled is set, so
we respect the administrator's global disable intent and avoid exposing
unusable capabilities to guests?
> + if (!want_ext)
> + return true;
> +
> + return pdev->tph_ext_support;
> +}
> +EXPORT_SYMBOL(pcie_tph_supported);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=7
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration
2026-06-23 8:50 ` [PATCH RESEND v18 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
@ 2026-06-23 9:11 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:11 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
> Add a new VFIO device feature VFIO_DEVICE_FEATURE_TPH_ST_CONFIG to allow
> userspace to configure PCIe TPH Steering Tag table entries. This interface
> supports only configuration writes, read operations are not permitted.
>
> Implement shadow ST table to cache entries, paired with per-device mutex
> for concurrent access protection. Batch write failure triggers entry
> rollback to guarantee hardware and shadow table consistency.
>
> The feature is double gated:
> 1. Global enable_unsafe_tph module parameter must be enabled;
> 2. Userspace needs to firstly SET VFIO_DEVICE_FEATURE_TPH_ENABLE
> to set per-device tph_permit flag before using TPH_CPU_CONFIG.
>
> Design note for Sashiko reset shadow table warning:
> Do not clear tph_st_shadow on FLR/device reset. Userspace VFIO application
> can detect hardware reset events and re-initialize full ST table
> configuration to sync shadow cache with hardware state afterward. Retain
> cached ST entries to support offline error diagnosis and post-reset
> recovery.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Sashiko has reviewed this patch and found no issues. It looks great!
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=11
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RESEND v18 12/12] vfio/pci: Virtualize PCIe TPH capability registers
2026-06-23 8:50 ` [PATCH RESEND v18 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
@ 2026-06-23 9:13 ` sashiko-bot
0 siblings, 0 replies; 25+ messages in thread
From: sashiko-bot @ 2026-06-23 9:13 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Unchecked return value from `pcie_tph_set_st_entry()` causes state desynchronization between the virtual config space and physical hardware.
--
commit 94cd790796325d732c4348b7bef18a12078f330c
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Virtualize PCIe TPH capability registers
This commit virtualizes the TPH extended capability config space registers.
It splits permissions, blocks direct ST-table programming via config space,
and syncs the cached shadow ST table to physical hardware when the TPH
requester is enabled.
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 5c6ab172df6c0..10f4e9fabea7c 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
[ ... ]
> +static int vfio_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> + int count, struct perm_bits *perm,
> + int offset, __le32 val)
> +{
[ ... ]
> + if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
> + extended = !!(req == PCI_TPH_REQ_EXT_TPH);
> + ret = pcie_enable_tph_explicit(pdev, mode, extended);
> + if (!ret && vdev->tph_st_shadow) {
> + for (i = 0; i < vdev->tph_st_entries; i++)
> + pcie_tph_set_st_entry(pdev, i,
> + vdev->tph_st_shadow[i]);
[Severity: Medium]
Should this loop check the return value of pcie_tph_set_st_entry()?
If syncing the Steering Tag (ST) table fails (for example, due to MSI-X
table access errors), pcie_tph_set_st_entry() disables TPH on the physical
hardware and returns an error code.
Because the return value is ignored here, it skips the rollback of the
virtual config space that would happen if ret was updated.
Could this result in the guest OS incorrectly believing TPH is enabled,
leading to an inconsistent state between the physical hardware and the
virtual config space?
> + }
> + if (ret)
> + vfio_default_config_write(vdev, pos, count, perm,
> + offset, org_val);
> + } else if (req == PCI_TPH_REQ_DISABLE) {
> + pcie_disable_tph(vdev->pdev);
> + } else {
> + vfio_default_config_write(vdev, pos, count, perm, offset,
> + org_val);
> + }
> +
> + up_write(&vdev->memory_lock);
> +
> + return count;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623085048.51139-1-fengchengwen@huawei.com?part=12
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-06-23 9:13 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 8:50 [PATCH RESEND v18 00/12] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-06-23 8:50 ` [PATCH RESEND v18 01/12] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-06-23 8:59 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 02/12] PCI/TPH: Fix tph_enabled concurrent update race by bitfield packing Chengwen Feng
2026-06-23 9:00 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 03/12] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
2026-06-23 9:00 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 04/12] PCI/TPH: Refactor pcie_enable_tph & add explicit requester variant Chengwen Feng
2026-06-23 9:04 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 05/12] PCI/TPH: Refactor pcie_tph_get_cpu_st & add explicit variant Chengwen Feng
2026-06-23 9:02 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 06/12] PCI/TPH: Expose the enabled TPH requester type Chengwen Feng
2026-06-23 8:57 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 07/12] PCI/TPH: Add pcie_tph_supported() helper to check TPH capability attributes Chengwen Feng
2026-06-23 9:07 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 08/12] PCI/TPH: Add sysfs binary file to export CPU to steering-tag mapping Chengwen Feng
2026-06-23 9:02 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 09/12] vfio/pci: Hide TPH capability when TPH is unsupported Chengwen Feng
2026-06-23 9:07 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 10/12] vfio/pci: Add TPH_ENABLE feature skeleton and unsafe module parameter Chengwen Feng
2026-06-23 9:03 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 11/12] vfio/pci: Add TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
2026-06-23 9:11 ` sashiko-bot
2026-06-23 8:50 ` [PATCH RESEND v18 12/12] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
2026-06-23 9:13 ` sashiko-bot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.