* [PATCH v12 0/6] vfio/pci: Add PCIe TPH support
@ 2026-05-26 4:08 Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
This patchset enables userspace control over PCIe TPH steering tags,
motivated by the following considerations:
1. Why userspace needs the capability to control steering tags:
When PCIe devices are fully owned by userspace workloads such as DPDK
and SPDK, only userspace has full knowledge of core binding policies
and traffic distribution strategies. Without this series, userspace
cannot enable TPH or configure steering tags, leaving built-in PCIe
performance optimizations unused in high-throughput polling I/O
scenarios.
2. Why this interface must be implemented in VFIO:
VFIO is the standard, secure community solution for granting full
PCIe device ownership to userspace. Existing kernel TPH interfaces
are designed purely for in-kernel drivers. For user-owned devices,
VFIO provides the only isolated and correct path to expose per-device
TPH management.
TPH supports both IV and DS modes. Since both modes could introduces
cross-VM isolation risks such as untrusted guests programming arbitrary
steering tags to impact other domains:
1. If ST location in MSI-X table, untrusted guests may program the
MSI-X table.
2. If ST don't locate in MSI-X or CAP, untrusted guests may program the
device-specific register.
So a new module parameter `enable_unsafe_tph` is added. It defaults to
off, and blocks all unsafe TPH operations when disabled.
Based on earlier RFC work by Wathsala Vithanage
---
v12:
- Fix Alex's comments:
- Support enable NS_MODE from userspace
- Remove restriction of get/st operation must enable TPH by impl
shadow ST table scheme
- Refine uAPI definition
v11:
- Fix Sashiko review comments:
Add tph_lock for protect all tph operations from vfio user
Fix memory-leak when copy_to_user failed
v10:
- Fix Sashiko review comments:
- Fix TPH virtualization write return negetive val of 4/5 commit
- Fix don't support probe feature of 5/5 commit
- Fix return positive val if copy_to_user fail of 5/5 commit
Chengwen Feng (6):
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
PCI/TPH: Export pcie_tph_get_st_modes() for external use
PCI/TPH: Add pcie_tph_enabled_mode() helper
PCI/TPH: Move tph_req_type initialization into pci_tph_init
vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag
management
vfio/pci: Add PCIe TPH control register virtualization
drivers/pci/tph.c | 77 +++++++-----
drivers/vfio/pci/vfio_pci.c | 13 ++-
drivers/vfio/pci/vfio_pci_config.c | 38 ++++++
drivers/vfio/pci/vfio_pci_core.c | 182 ++++++++++++++++++++++++++++-
include/linux/pci-tph.h | 10 ++
include/linux/vfio_pci_core.h | 6 +-
include/uapi/linux/vfio.h | 49 ++++++++
7 files changed, 344 insertions(+), 31 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-05-26 4:08 ` Chengwen Feng
2026-05-26 4:31 ` sashiko-bot
2026-05-26 4:08 ` [PATCH v12 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
` (4 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.
This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.
This doesn't make a difference to mlx5_st_create(), the lone external
caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 91145e8d9d95..877cf556242b 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -170,7 +170,7 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+ return reg & PCI_TPH_CAP_LOC_MASK;
}
EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
@@ -185,9 +185,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
/* Check ST table location first */
loc = pcie_tph_get_st_table_loc(pdev);
-
- /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
if (loc != PCI_TPH_LOC_CAP)
return 0;
@@ -316,8 +313,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
loc = pcie_tph_get_st_table_loc(pdev);
- /* Convert loc to match with PCI_TPH_LOC_* */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
switch (loc) {
case PCI_TPH_LOC_MSIX:
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v12 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-05-26 4:08 ` Chengwen Feng
2026-05-26 4:51 ` sashiko-bot
2026-05-26 4:08 ` [PATCH v12 3/6] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Export the helper to retrieve supported PCIe TPH steering tag modes so
that drivers like VFIO can query and expose device capabilities to
userspace.
Add stub functions for pcie_tph_get_st_table_size() and
pcie_tph_get_st_table_loc() when !CONFIG_PCIE_TPH.
Add tph_cap validation for pcie_tph_get_st_modes() and
pcie_tph_get_st_table_loc() to prevent invalid PCI configuration
space access when TPH is not supported.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 23 +++++++++++++++++++----
include/linux/pci-tph.h | 7 +++++++
2 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 877cf556242b..11a06730e08c 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
}
-static u8 get_st_modes(struct pci_dev *pdev)
+/**
+ * pcie_tph_get_st_modes - Get supported Steering Tag modes
+ * @pdev: PCI device to query
+ *
+ * Return:
+ * Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
+ * PCI_TPH_CAP_ST_DS)
+ */
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
{
- u32 reg;
+ u32 reg = 0;
+
+ if (!pdev->tph_cap)
+ return 0;
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
return reg;
}
+EXPORT_SYMBOL(pcie_tph_get_st_modes);
/**
* pcie_tph_get_st_table_loc - Return the device's ST table location
@@ -166,7 +178,10 @@ static u8 get_st_modes(struct pci_dev *pdev)
*/
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{
- u32 reg;
+ u32 reg = 0;
+
+ if (!pdev->tph_cap)
+ return PCI_TPH_LOC_NONE;
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
@@ -395,7 +410,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
/* Sanitize and check ST mode compatibility */
mode &= PCI_TPH_CTRL_MODE_SEL_MASK;
- dev_modes = get_st_modes(pdev);
+ dev_modes = pcie_tph_get_st_modes(pdev);
if (!((1 << mode) & dev_modes))
return -EINVAL;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..5c3997f4b51b 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -30,6 +30,7 @@ void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
#else
static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag)
@@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return PCI_TPH_LOC_NONE; }
+static inline u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
+{ return 0; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v12 3/6] PCI/TPH: Add pcie_tph_enabled_mode() helper
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-26 4:08 ` Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init Chengwen Feng
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add a helper to query enabled TPH mode on a PCI device. This is useful for
drivers like VFIO-PCI that need to validate TPH state before allowing
access to steering tag tables.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 14 ++++++++++++++
include/linux/pci-tph.h | 3 +++
2 files changed, 17 insertions(+)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 11a06730e08c..95e2a95055ee 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -451,6 +451,20 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
}
EXPORT_SYMBOL(pcie_enable_tph);
+/**
+ * pcie_tph_enabled_mode - Get current enabled TPH mode
+ * @pdev: PCI device
+ *
+ * Return: the enabled TPH mode (one of PCI_TPH_ST_NS_MODE, PCI_TPH_ST_IV_MODE
+ * or PCI_TPH_ST_DS_MODE) on success, or -EINVAL if TPH is not currently
+ * enabled on the device.
+ */
+int pcie_tph_enabled_mode(struct pci_dev *pdev)
+{
+ return pdev->tph_enabled ? pdev->tph_mode : -EINVAL;
+}
+EXPORT_SYMBOL(pcie_tph_enabled_mode);
+
void pci_restore_tph_state(struct pci_dev *pdev)
{
struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 5c3997f4b51b..e8c9166e33dd 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -28,6 +28,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev,
unsigned int cpu, u16 *tag);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
+int pcie_tph_enabled_mode(struct pci_dev *pdev);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
@@ -42,6 +43,8 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline int pcie_tph_enabled_mode(struct pci_dev *pdev)
+{ return -EINVAL; }
static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
{ return 0; }
static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
` (2 preceding siblings ...)
2026-05-26 4:08 ` [PATCH v12 3/6] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
@ 2026-05-26 4:08 ` Chengwen Feng
2026-05-26 5:35 ` sashiko-bot
2026-05-26 22:42 ` Alex Williamson
2026-05-26 4:08 ` [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 6/6] vfio/pci: Add PCIe TPH control register virtualization Chengwen Feng
5 siblings, 2 replies; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Relocate tph_req_type resolution logic from pcie_enable_tph() to
pci_tph_init(). The request type is fixed per device and root port topology
at probe time, no need recalculation on each TPH enable.
Also drop redundant tph_req_type reset in pcie_disable_tph(), the value
remains valid across disable/enable cycles.
This change allows pcie_tph_get_cpu_st() to work properly and retrieve
valid steering tag values even when TPH is not enabled on the device.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 95e2a95055ee..3660ad5d3623 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -371,7 +371,6 @@ void pcie_disable_tph(struct pci_dev *pdev)
pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, 0);
pdev->tph_mode = 0;
- pdev->tph_req_type = 0;
pdev->tph_enabled = 0;
}
EXPORT_SYMBOL(pcie_disable_tph);
@@ -396,7 +395,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
{
u32 reg;
u8 dev_modes;
- u8 rp_req_type;
/* Honor "notph" kernel parameter */
if (pci_tph_disabled)
@@ -416,21 +414,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
pdev->tph_mode = mode;
- /* Get req_type supported by device and its Root Port */
- pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
- pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
- else
- pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
-
- /* Check if the device is behind a Root Port */
- if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
- rp_req_type = get_rp_completer_type(pdev);
-
- /* Final req_type is the smallest value of two */
- pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
- }
-
if (pdev->tph_req_type == PCI_TPH_REQ_DISABLE)
return -EINVAL;
@@ -538,11 +521,27 @@ void pci_tph_init(struct pci_dev *pdev)
{
int num_entries;
u32 save_size;
+ u8 rp_req_type;
+ u32 reg = 0;
pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
if (!pdev->tph_cap)
return;
+ /* Get req_type supported by device and its Root Port */
+ pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
+ if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
+ pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
+ else
+ pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
+
+ /* Check if the device is behind a Root Port */
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
+ rp_req_type = get_rp_completer_type(pdev);
+ /* Final req_type is the smallest value of two */
+ pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
+ }
+
num_entries = pcie_tph_get_st_table_size(pdev);
save_size = sizeof(u32) + num_entries * sizeof(u16);
pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
` (3 preceding siblings ...)
2026-05-26 4:08 ` [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init Chengwen Feng
@ 2026-05-26 4:08 ` Chengwen Feng
2026-05-26 6:09 ` sashiko-bot
2026-05-26 22:42 ` Alex Williamson
2026-05-26 4:08 ` [PATCH v12 6/6] vfio/pci: Add PCIe TPH control register virtualization Chengwen Feng
5 siblings, 2 replies; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add new userspace device feature to access and maintain PCIe TPH Steering
Tag table entries, supporting two mutually exclusive operation modes:
1. Raw table read/write: Operate shadow ST table, sync updates to hardware
TPH capability or MSI-X table. Failed batch write triggers entry
rollback to keep hardware consistent.
2. CPU ID lookup: Query translated steering tag by specified memory type,
pure read-only without modifying ST table.
Introduce enable_unsafe_tph module parameter to gate this non-isolated TPH
related feature. All operations are protected by per-device mutex, shadow
table lifecycle is managed along device init/release.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci.c | 13 ++-
drivers/vfio/pci/vfio_pci_core.c | 176 ++++++++++++++++++++++++++++++-
include/linux/vfio_pci_core.h | 6 +-
include/uapi/linux/vfio.h | 49 +++++++++
4 files changed, 240 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..6d73668459cf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
module_param(disable_denylist, bool, 0444);
MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph;
+module_param(enable_unsafe_tph, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
+#endif
+
static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
{
switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
{
int ret;
bool is_disable_vga = true;
+ bool is_enable_unsafe_tph = false;
#ifdef CONFIG_VFIO_PCI_VGA
is_disable_vga = disable_vga;
#endif
+#ifdef CONFIG_PCIE_TPH
+ is_enable_unsafe_tph = enable_unsafe_tph;
+#endif
- vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+ vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+ is_enable_unsafe_tph);
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 050e7542952e..6fc7496cb8dd 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <linux/iommufd.h>
#include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
#if IS_ENABLED(CONFIG_EEH)
#include <asm/eeh.h>
#endif
@@ -41,6 +42,7 @@
static bool nointxmask;
static bool disable_vga;
static bool disable_idle_d3;
+static bool enable_unsafe_tph;
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
@@ -1551,6 +1553,159 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_tph_st_shadow_size(struct vfio_pci_core_device *vdev)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ u32 loc = pcie_tph_get_st_table_loc(pdev);
+ int ret;
+
+ if (loc == PCI_TPH_LOC_CAP) {
+ return pcie_tph_get_st_table_size(pdev);
+ } else if (loc == PCI_TPH_LOC_MSIX) {
+ ret = pci_msix_vec_count(pdev);
+ if (ret < 0)
+ return 0;
+ return ret;
+ } else {
+ return 0;
+ }
+}
+
+static int vfio_pci_tph_op_raw_table(struct vfio_pci_core_device *vdev,
+ bool is_get,
+ struct vfio_device_feature_tph_st *arg)
+{
+ void __user *uptr = u64_to_user_ptr(arg->data_uptr);
+ size_t sz = arg->count * sizeof(u16);
+ struct pci_dev *pdev = vdev->pdev;
+ int i, idx, ret;
+ u16 *sts;
+
+ if (!vdev->tph_st_shadow)
+ return -EOPNOTSUPP;
+
+ if (arg->flags & ~VFIO_TPH_ST_OP_TYPE_MASK)
+ return -EINVAL;
+ if (arg->count == 0 || arg->index >= vdev->tph_st_entries ||
+ arg->count > vdev->tph_st_entries ||
+ arg->index + arg->count > vdev->tph_st_entries)
+ return -EINVAL;
+
+ if (is_get) {
+ ret = copy_to_user(uptr, &vdev->tph_st_shadow[arg->index], sz);
+ if (ret)
+ return -EFAULT;
+ return 0;
+ }
+
+ sts = memdup_array_user(uptr, arg->count, sizeof(u16));
+ if (IS_ERR(sts))
+ return PTR_ERR(sts);
+
+ if (pcie_tph_enabled_mode(vdev->pdev) < 0) {
+ memcpy(&vdev->tph_st_shadow[arg->index], sts, sz);
+ kfree(sts);
+ return 0;
+ }
+
+ for (i = 0; i < arg->count; i++) {
+ idx = arg->index + i;
+ ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
+ if (ret)
+ goto rollback;
+ }
+
+ memcpy(&vdev->tph_st_shadow[arg->index], sts, sz);
+ kfree(sts);
+ return 0;
+
+rollback:
+ while (i-- > 0) {
+ idx = arg->index + i;
+ pcie_tph_set_st_entry(pdev, idx, vdev->tph_st_shadow[idx]);
+ }
+ kfree(sts);
+ return ret;
+}
+
+static int vfio_pci_tph_op_cpu_query(struct vfio_pci_core_device *vdev,
+ struct vfio_device_feature_tph_st *arg)
+{
+ void __user *uptr = u64_to_user_ptr(arg->data_uptr);
+ struct pci_dev *pdev = vdev->pdev;
+ enum tph_mem_type mtype;
+ int i, ret;
+ u32 *cpus;
+ u16 st;
+
+ if (arg->flags & ~(VFIO_TPH_ST_OP_TYPE_MASK | VFIO_TPH_ST_MEM_TYPE_MASK))
+ return -EINVAL;
+ if (arg->count == 0 || arg->count > nr_cpu_ids || arg->index != 0)
+ return -EINVAL;
+
+ cpus = memdup_array_user(uptr, arg->count, sizeof(u32));
+ if (IS_ERR(cpus))
+ return PTR_ERR(cpus);
+
+ mtype = (arg->flags & VFIO_TPH_ST_MEM_TYPE_MASK) == VFIO_TPH_ST_MEM_TYPE_VM ?
+ TPH_MEM_TYPE_VM : TPH_MEM_TYPE_PM;
+ for (i = 0; i < arg->count; i++) {
+ ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
+ if (ret)
+ goto out;
+ cpus[i] = st;
+ }
+
+ ret = copy_to_user(uptr, cpus, arg->count * sizeof(u32));
+out:
+ kfree(cpus);
+ return ret;
+}
+
+static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
+ u32 flags,
+ struct vfio_device_feature_tph_st __user *arg,
+ size_t argsz)
+{
+ struct vfio_device_feature_tph_st tph_st;
+ bool is_get, is_set;
+ u32 op_type;
+ int ret;
+
+ if (!enable_unsafe_tph)
+ return -EOPNOTSUPP;
+
+ ret = vfio_check_feature(flags, argsz,
+ VFIO_DEVICE_FEATURE_GET |
+ VFIO_DEVICE_FEATURE_SET |
+ VFIO_DEVICE_FEATURE_PROBE,
+ sizeof(tph_st));
+ if (ret <= 0)
+ return ret;
+
+ if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
+ return -EFAULT;
+
+ op_type = tph_st.flags & VFIO_TPH_ST_OP_TYPE_MASK;
+ is_get = !!(flags & VFIO_DEVICE_FEATURE_GET);
+ is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
+
+ guard(mutex)(&vdev->tph_lock);
+
+ switch (op_type) {
+ case VFIO_TPH_ST_OP_RAW_TABLE:
+ if (is_set && is_get)
+ return -EINVAL;
+ return vfio_pci_tph_op_raw_table(vdev, is_get, &tph_st);
+ case VFIO_TPH_ST_OP_CPU_QUERY:
+ if (is_set)
+ return -EOPNOTSUPP;
+ return vfio_pci_tph_op_cpu_query(vdev, &tph_st);
+ default:
+ return -EINVAL;
+ }
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1569,6 +1724,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_TPH_ST:
+ return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
default:
return -ENOTTY;
}
@@ -2132,12 +2289,23 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
mutex_init(&vdev->igate);
spin_lock_init(&vdev->irqlock);
mutex_init(&vdev->ioeventfds_lock);
+ mutex_init(&vdev->tph_lock);
+ vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
+ vdev->tph_st_shadow = NULL;
+ if (vdev->tph_st_entries) {
+ vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
+ GFP_KERNEL);
+ if (!vdev->tph_st_shadow)
+ return -ENOMEM;
+ }
INIT_LIST_HEAD(&vdev->dummy_resources_list);
INIT_LIST_HEAD(&vdev->ioeventfds_list);
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
ret = pcim_p2pdma_init(vdev->pdev);
- if (ret && ret != -EOPNOTSUPP)
+ if (ret && ret != -EOPNOTSUPP) {
+ kfree(vdev->tph_st_shadow);
return ret;
+ }
INIT_LIST_HEAD(&vdev->dmabufs);
init_rwsem(&vdev->memory_lock);
xa_init(&vdev->ctx);
@@ -2153,6 +2321,8 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
mutex_destroy(&vdev->igate);
mutex_destroy(&vdev->ioeventfds_lock);
+ mutex_destroy(&vdev->tph_lock);
+ kfree(vdev->tph_st_shadow);
kfree(vdev->region);
kfree(vdev->pm_save);
}
@@ -2605,11 +2775,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
}
void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3)
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph)
{
nointxmask = is_nointxmask;
disable_vga = is_disable_vga;
disable_idle_d3 = is_disable_idle_d3;
+ enable_unsafe_tph = is_enable_unsafe_tph;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 89165b769e5c..ae8e7011ab8e 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -142,6 +142,9 @@ struct vfio_pci_core_device {
struct notifier_block nb;
struct rw_semaphore memory_lock;
struct list_head dmabufs;
+ struct mutex tph_lock;
+ u16 *tph_st_shadow;
+ u16 tph_st_entries;
};
enum vfio_pci_io_width {
@@ -157,7 +160,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3);
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..c76196f93660 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,55 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * VFIO_DEVICE_FEATURE_TPH_ST - Manage PCIe TPH Steering Tag entries
+ *
+ * Provides userspace interface to manage PCIe TPH ST table entries.
+ *
+ * @flags: Composite control flags
+ * Operation type[bit0~3]: Exclusive operation selection
+ * Attribute bits[bit4~31]: Additional property parameters
+ *
+ * @index: Start entry offset, only valid for raw table operation
+ * @count: Number of consecutive entries to operate
+ * @data_uptr: Aligned userspace data buffer pointer
+ *
+ * VFIO_TPH_ST_OP_RAW_TABLE type:
+ * Access raw ST table entry directly.
+ * Attribute bits are ignored in this operation.
+ * Userspace data buffer stores 16-bit raw steering tag values.
+ * SET writes entries, and GET reads existing raw ST entries back to user
+ * buffer.
+ *
+ * VFIO_TPH_ST_OP_CPU_QUERY type:
+ * Resolve ST tag from CPU ID, only supports GET operation.
+ * Attribute bits carry memory type info.
+ * Userspace data buffer provides 32-bit CPU IDs, kernel returns translated
+ * 16-bit ST tag according to specified memory type. No modification to ST
+ * table during query.
+ *
+ * This feature is gated by enable_unsafe_tph module parameter.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ST 13
+
+struct vfio_device_feature_tph_st {
+ __u32 flags;
+
+/* Operation type field */
+#define VFIO_TPH_ST_OP_TYPE_MASK 0xFu
+#define VFIO_TPH_ST_OP_RAW_TABLE 0x0u
+#define VFIO_TPH_ST_OP_CPU_QUERY 0x1u
+
+/* Attribute bits for CPU query operation type */
+#define VFIO_TPH_ST_MEM_TYPE_MASK (1u << 4)
+#define VFIO_TPH_ST_MEM_TYPE_VM (0u << 4)
+#define VFIO_TPH_ST_MEM_TYPE_PM (1u << 4)
+
+ __u16 index;
+ __u16 count;
+ __aligned_u64 data_uptr;
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v12 6/6] vfio/pci: Add PCIe TPH control register virtualization
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
` (4 preceding siblings ...)
2026-05-26 4:08 ` [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management Chengwen Feng
@ 2026-05-26 4:08 ` Chengwen Feng
2026-05-26 6:56 ` sashiko-bot
5 siblings, 1 reply; 15+ messages in thread
From: Chengwen Feng @ 2026-05-26 4:08 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Implement virtualized handling for PCIe TPH capability control register
writes. Validate and mediate user write requests to accept only valid TPH
mode configurations. Synchronize shadow steering tag table to hardware when
TPH gets enabled successfully.
Automatically disable TPH feature during device ownership handover and
userspace file descriptor closing, ensuring hardware state consistency
after device release.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_config.c | 38 ++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 8 ++++++-
2 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a10ed733f0e3..188845f81626 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -20,8 +20,10 @@
* must be negotiated with the underlying OS.
*/
+#include <linux/bitfield.h>
#include <linux/fs.h>
#include <linux/pci.h>
+#include <linux/pci-tph.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/slab.h>
@@ -35,6 +37,8 @@
((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
(offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
+extern bool enable_unsafe_tph;
+
/*
* Lengths of PCI Config Capabilities
* 0: Removed from the user visible capability list
@@ -313,6 +317,39 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
return count;
}
+static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
+ int count, struct perm_bits *perm,
+ int offset, __le32 val)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ u32 data = le32_to_cpu(val);
+ u8 mode, req_en;
+ int i, ret;
+
+ if (!enable_unsafe_tph)
+ return count;
+
+ if (offset != PCI_TPH_CTRL || count < 2)
+ return count;
+
+ guard(mutex)(&vdev->tph_lock);
+
+ mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
+ req_en = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
+ if (req_en) {
+ ret = pcie_enable_tph(pdev, mode);
+ if (ret == 0 && vdev->tph_st_shadow) {
+ for (i = 0; i < vdev->tph_st_entries; i++)
+ pcie_tph_set_st_entry(pdev, i,
+ vdev->tph_st_shadow[i]);
+ }
+ } else {
+ pcie_disable_tph(vdev->pdev);
+ }
+
+ return count;
+}
+
static struct perm_bits direct_ro_perms = {
.readfn = vfio_direct_config_read,
};
@@ -1121,6 +1158,7 @@ int __init vfio_pci_init_perm_bits(void)
ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
+ ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_pci_tph_config_write;
ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
if (ret)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 6fc7496cb8dd..1a9de22202de 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -42,7 +42,7 @@
static bool nointxmask;
static bool disable_vga;
static bool disable_idle_d3;
-static bool enable_unsafe_tph;
+bool enable_unsafe_tph;
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
@@ -543,6 +543,9 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
return ret;
}
+ /* Clear TPH status when taking over ownership */
+ pcie_disable_tph(pdev);
+
/* Don't allow our initial saved state to include busmaster */
pci_clear_master(pdev);
@@ -656,6 +659,9 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
*/
vfio_pci_set_power_state(vdev, PCI_D0);
+ /* Clear TPH status when releasing ownership */
+ pcie_disable_tph(pdev);
+
/* Stop the device from further DMA */
pci_clear_master(pdev);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-05-26 4:08 ` [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-05-26 4:31 ` sashiko-bot
0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-05-26 4:31 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Exported functions pcie_tph_get_st_table_loc() and pcie_tph_get_st_table_size() do not verify that the device actually supports TPH (pdev->tph_cap != 0) before accessing the capability offset.
--
commit 778a377b4fe5a5811f9d4225d057f7de399a56a3
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
This commit fixes the ST table location detection in
pcie_tph_get_st_table_loc() by using bitwise AND with PCI_TPH_CAP_LOC_MASK
instead of FIELD_GET(). The function was designed to return raw
PCI_TPH_LOC_* values without shifting.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 91145e8d9d951..877cf556242b4 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -170,7 +170,7 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
>
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
[Severity: High]
This is a pre-existing issue, but should this function verify that the device
actually supports TPH (pdev->tph_cap != 0) before reading the capability?
If a device lacks the TPH capability, pdev->tph_cap is 0, which results in a
read of the generic PCI Command & Status registers at offset 0x04. The ST
Table Location mask (bits 9-10) maps exactly to the Fast Back-to-Back Enable
and Interrupt Disable bits in the Command register.
If Interrupt Disable is set (which is standard for MSI/MSI-X devices), would
this incorrectly return PCI_TPH_LOC_MSIX (0x400)? If Fast Back-to-Back Enable
is set, it might return PCI_TPH_LOC_CAP (0x200) and then incorrectly extract
the table size from the PCI Status register in pcie_tph_get_st_table_size().
Since these are EXPORT_SYMBOL APIs, they can be called by external modules
that might not independently validate pdev->tph_cap. This could lead to
falsely detecting TPH support and allocating unnecessary shadow memory.
> - return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
> + return reg & PCI_TPH_CAP_LOC_MASK;
> }
> EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260526040830.52854-1-fengchengwen@huawei.com?part=1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-26 4:08 ` [PATCH v12 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-26 4:51 ` sashiko-bot
0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-05-26 4:51 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The header `<linux/pci-tph.h>` is not self-contained due to newly introduced dependencies without corresponding `#include` directives.
--
commit 1f564e5018d91508967ee58123260567c2507196
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Export pcie_tph_get_st_modes() for external use
This commit exports pcie_tph_get_st_modes() to allow drivers like VFIO
to query PCIe TPH capabilities. It also adds stub functions for
!CONFIG_PCIE_TPH and tph_cap validation to prevent invalid configuration
space accesses.
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
[ ... ]
> @@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
> static inline void pcie_disable_tph(struct pci_dev *pdev) { }
> static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
> { return -EINVAL; }
> +static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
> +{ return 0; }
> +static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
> +{ return PCI_TPH_LOC_NONE; }
[Severity: Low]
Does this new stub make linux/pci-tph.h rely on implicit includes?
PCI_TPH_LOC_NONE is defined in uapi/linux/pci_regs.h, but it doesn't look
like linux/pci-tph.h includes it (or linux/pci.h). While current users
probably already include linux/pci.h, could this cause compilation errors if
a driver includes linux/pci-tph.h first or on its own?
> +static inline u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
> +{ return 0; }
> #endif
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260526040830.52854-1-fengchengwen@huawei.com?part=2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init
2026-05-26 4:08 ` [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init Chengwen Feng
@ 2026-05-26 5:35 ` sashiko-bot
2026-05-26 22:42 ` Alex Williamson
1 sibling, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-05-26 5:35 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] The function pcie_tph_set_st_entry() lacks internal serialization, allowing concurrent modifications to the TPH Control Register and violating the PCIe specification.
- [High] Unprotected modifications to the pdev->tph_enabled bitfield can cause silent data corruption of adjacent pci_dev state flags.
--
commit c7677c01bf11b8624e4ab3ed133719b3aefd9b43
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Move tph_req_type initialization into pci_tph_init
Relocate tph_req_type resolution logic from pcie_enable_tph() to
pci_tph_init(). The request type is fixed per device and root port topology
at probe time, no need recalculation on each TPH enable.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 95e2a95055eeb..3660ad5d36239 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -371,7 +371,6 @@ void pcie_disable_tph(struct pci_dev *pdev)
> pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, 0);
[Severity: High]
This isn't a bug introduced by this patch, but does pcie_tph_set_st_entry()
also need internal serialization when writing to the TPH Control Register?
pcie_tph_set_st_entry() temporarily disables TPH via set_ctrl_reg_req_en(),
updates the ST table, and then re-enables TPH.
Network drivers like bnxt call this function from irq_affinity_notify
callbacks, which execute in multi-threaded workqueues.
If multiple IRQ affinities are changed simultaneously, these callbacks can run
concurrently on different CPUs for the same pdev.
Without a lock protecting this sequence, could one thread re-enable TPH while
another thread is still actively updating its ST table entry?
This would violate PCIe r6.2, sec 6.17.3 (which requires TPH to be disabled
during ST updates) and cause overlapping read-modify-write operations on the
TPH Control Register, potentially leading to device instability.
>
> pdev->tph_mode = 0;
> - pdev->tph_req_type = 0;
> pdev->tph_enabled = 0;
[Severity: High]
This is a pre-existing issue, but can unprotected modifications to the
pdev->tph_enabled bitfield cause silent data corruption of adjacent pci_dev
state flags?
tph_enabled is defined as an unsigned int bitfield in struct pci_dev, sharing
a memory word with critical flags like block_cfg_access, is_busmaster, and
msi_enabled.
Bitfield assignments compile to non-atomic read-modify-write (RMW) operations
on the surrounding word.
pcie_enable_tph() and pcie_disable_tph() modify tph_enabled locklessly from
the PCI subsystem's perspective (e.g., bnxt calls it during device open under
rtnl_lock).
If another subsystem concurrently modifies an adjacent bitfield (e.g.,
pci_cfg_access_lock() updating block_cfg_access while holding pci_lock), will
the overlapping RMW cycles cause one of the updates to be silently lost,
corrupting the overall PCI device state?
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260526040830.52854-1-fengchengwen@huawei.com?part=4
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management
2026-05-26 4:08 ` [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management Chengwen Feng
@ 2026-05-26 6:09 ` sashiko-bot
2026-05-26 22:42 ` Alex Williamson
1 sibling, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-05-26 6:09 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The return value of `copy_to_user()` is mishandled in `vfio_pci_tph_op_cpu_query()`, resulting in a positive return value to userspace on memory faults.
- [High] Hardware modifications to the ST table via MSI-X MMIO occur without ensuring the PCI device is in an active power state (e.g., D0), risking PCIe errors.
- [Low] Unconditional memory allocation for the TPH ST shadow table during device initialization, even when the TPH feature is globally disabled.
--
commit 467d413fb2d015dba9001b9372af0200c578a4c6
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management
Add new userspace device feature to access and maintain PCIe TPH Steering
Tag table entries, supporting two mutually exclusive operation modes:
raw table read/write and CPU ID lookup.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3f8d093aacf8a..8cf341689a478 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1516,6 +1518,159 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
[ ... ]
> +static int vfio_pci_tph_op_cpu_query(struct vfio_pci_core_device *vdev,
> + struct vfio_device_feature_tph_st *arg)
> +{
[ ... ]
> + for (i = 0; i < arg->count; i++) {
> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> + if (ret)
> + goto out;
> + cpus[i] = st;
> + }
> +
> + ret = copy_to_user(uptr, cpus, arg->count * sizeof(u32));
> +out:
> + kfree(cpus);
> + return ret;
[Severity: High]
Does this return a positive value to userspace if the copy fails?
copy_to_user() returns the number of bytes that failed to copy, so returning
it directly here instead of a negative error code like -EFAULT might break
the ioctl ABI, as userspace typically checks for negative error codes.
> +}
> +
> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_tph_st __user *arg,
> + size_t argsz)
> +{
[ ... ]
> + switch (op_type) {
> + case VFIO_TPH_ST_OP_RAW_TABLE:
> + if (is_set && is_get)
> + return -EINVAL;
> + return vfio_pci_tph_op_raw_table(vdev, is_get, &tph_st);
[Severity: High]
Is it possible for the device to be in a suspended power state (e.g., D3hot)
when this ioctl is called?
If userspace triggers this while suspended, the subsequent call to
pcie_tph_set_st_entry() might perform direct MMIO writes to the MSI-X table.
Accessing MMIO on a suspended device can cause PCIe Unsupported Request (UR)
errors or Target Aborts.
Should there be a pm_runtime_resume_and_get() or similar power state check
before accessing hardware registers?
> + case VFIO_TPH_ST_OP_CPU_QUERY:
> + if (is_set)
> + return -EOPNOTSUPP;
> + return vfio_pci_tph_op_cpu_query(vdev, &tph_st);
> + default:
> + return -EINVAL;
> + }
> +}
[ ... ]
> @@ -2097,12 +2254,23 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> mutex_init(&vdev->igate);
> spin_lock_init(&vdev->irqlock);
> mutex_init(&vdev->ioeventfds_lock);
> + mutex_init(&vdev->tph_lock);
> + vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
> + vdev->tph_st_shadow = NULL;
> + if (vdev->tph_st_entries) {
> + vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
> + GFP_KERNEL);
[Severity: Low]
Does this code unconditionally allocate memory for the shadow table even when
the enable_unsafe_tph module parameter is false?
While the allocation is relatively small, operations will be rejected with
-EOPNOTSUPP anyway if the feature is disabled globally. Perhaps this
allocation can be avoided entirely in that case to save kernel memory.
> + if (!vdev->tph_st_shadow)
> + return -ENOMEM;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260526040830.52854-1-fengchengwen@huawei.com?part=5
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 6/6] vfio/pci: Add PCIe TPH control register virtualization
2026-05-26 4:08 ` [PATCH v12 6/6] vfio/pci: Add PCIe TPH control register virtualization Chengwen Feng
@ 2026-05-26 6:56 ` sashiko-bot
0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-05-26 6:56 UTC (permalink / raw)
To: Chengwen Feng; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The TPH configuration write handler silently drops valid byte-level writes, blocks ST Table config-space writes, and bypasses virtual configuration space updates.
- [High] TPH state is lost and becomes unrecoverable after a device reset due to missing state synchronization.
- [High] `vfio_pci_tph_config_write` performs MMIO accesses without holding `vdev->memory_lock`, racing with concurrent device resets.
--
commit e0337d4fe2416ad1997b3bae043a087facf49ef3
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add PCIe TPH control register virtualization
Implement virtualized handling for PCIe TPH capability control register
writes. Validate and mediate user write requests to accept only valid TPH
mode configurations. Synchronize shadow steering tag table to hardware when
TPH gets enabled successfully.
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3a..188845f81626d 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
[ ... ]
> @@ -313,6 +317,39 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
> return count;
> }
>
> +static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> + int count, struct perm_bits *perm,
> + int offset, __le32 val)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + u32 data = le32_to_cpu(val);
> + u8 mode, req_en;
> + int i, ret;
> +
> + if (!enable_unsafe_tph)
> + return count;
> +
> + if (offset != PCI_TPH_CTRL || count < 2)
> + return count;
[Severity: High]
Does this silently drop valid 1-byte writes to the control register, or
writes to the upper bytes of the register (e.g. offset 0x09)? This might
break guests that perform byte-wise accesses.
Also, could this drop guest writes to the ST Table if it resides in the
capability space (offset >= 0x0C)?
Additionally, since this function returns without ever updating
vdev->vconfig via vfio_default_config_write() or similar, does this bypass
virtual configuration space updates entirely?
> +
> + guard(mutex)(&vdev->tph_lock);
> +
> + mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
> + req_en = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
> + if (req_en) {
> + ret = pcie_enable_tph(pdev, mode);
[Severity: High]
If pcie_enable_tph() succeeds, it updates the hardware register and sets
pdev->tph_enabled = 1. However, since pci_save_state() is not called here,
if a device reset occurs (e.g. via VFIO_DEVICE_RESET), the PCI core
restores the stale saved state where TPH is disabled, clearing the hardware
register.
Since pdev->tph_enabled remains 1 in software, wouldn't subsequent attempts
by the guest to re-enable TPH cause pcie_enable_tph() to immediately return
-EBUSY, permanently blocking the guest from using TPH after a reset?
> + if (ret == 0 && vdev->tph_st_shadow) {
> + for (i = 0; i < vdev->tph_st_entries; i++)
> + pcie_tph_set_st_entry(pdev, i,
> + vdev->tph_st_shadow[i]);
[Severity: High]
If the hardware ST table resides in the MSI-X table, pcie_tph_set_st_entry()
may perform MMIO writes. Since this is executed holding only
vdev->tph_lock, could this race with a concurrent device reset (which
acquires down_write(&vdev->memory_lock) to block MMIO accesses and disables
memory decode)?
Without holding down_read(&vdev->memory_lock) here, could these MMIO writes
interleave with the physical reset or occur while memory decode is disabled,
causing Master Aborts?
> + }
> + } else {
> + pcie_disable_tph(vdev->pdev);
> + }
> +
> + return count;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260526040830.52854-1-fengchengwen@huawei.com?part=6
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init
2026-05-26 4:08 ` [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init Chengwen Feng
2026-05-26 5:35 ` sashiko-bot
@ 2026-05-26 22:42 ` Alex Williamson
1 sibling, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2026-05-26 22:42 UTC (permalink / raw)
To: Chengwen Feng
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Tue, 26 May 2026 12:08:28 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:
> Relocate tph_req_type resolution logic from pcie_enable_tph() to
> pci_tph_init(). The request type is fixed per device and root port topology
> at probe time, no need recalculation on each TPH enable.
>
> Also drop redundant tph_req_type reset in pcie_disable_tph(), the value
> remains valid across disable/enable cycles.
>
> This change allows pcie_tph_get_cpu_st() to work properly and retrieve
> valid steering tag values even when TPH is not enabled on the device.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> drivers/pci/tph.c | 33 ++++++++++++++++-----------------
> 1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 95e2a95055ee..3660ad5d3623 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -371,7 +371,6 @@ void pcie_disable_tph(struct pci_dev *pdev)
> pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, 0);
>
> pdev->tph_mode = 0;
> - pdev->tph_req_type = 0;
> pdev->tph_enabled = 0;
> }
> EXPORT_SYMBOL(pcie_disable_tph);
> @@ -396,7 +395,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
> {
> u32 reg;
> u8 dev_modes;
> - u8 rp_req_type;
>
> /* Honor "notph" kernel parameter */
> if (pci_tph_disabled)
> @@ -416,21 +414,6 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
>
> pdev->tph_mode = mode;
>
> - /* Get req_type supported by device and its Root Port */
> - pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
> - if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
> - pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
> - else
> - pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
> -
> - /* Check if the device is behind a Root Port */
> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
> - rp_req_type = get_rp_completer_type(pdev);
> -
> - /* Final req_type is the smallest value of two */
> - pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
> - }
> -
> if (pdev->tph_req_type == PCI_TPH_REQ_DISABLE)
> return -EINVAL;
>
> @@ -538,11 +521,27 @@ void pci_tph_init(struct pci_dev *pdev)
> {
> int num_entries;
> u32 save_size;
> + u8 rp_req_type;
> + u32 reg = 0;
>
> pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH);
> if (!pdev->tph_cap)
> return;
>
> + /* Get req_type supported by device and its Root Port */
> + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
> + if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg))
> + pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH;
> + else
> + pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY;
> +
> + /* Check if the device is behind a Root Port */
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END) {
> + rp_req_type = get_rp_completer_type(pdev);
> + /* Final req_type is the smallest value of two */
> + pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type);
> + }
> +
> num_entries = pcie_tph_get_st_table_size(pdev);
> save_size = sizeof(u32) + num_entries * sizeof(u16);
> pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size);
There's a virtualization problem hidden here that we haven't discussed
yet. tph_req_type goes on to define how pcie_enable_tph(),
pcie_tph_get_cpu_st(), and pcie_tph_set_st_entry work relative to
standard or extended TPH. The user has no visibility or control of
whether these interfaces enable extended TPH mode or get/set extended
TPH values.
That not only means that the virtualization of the TPH capability is
broken (user writes one value to TPH Requester Enable, sees another),
but I think it also breaks DS mode if the device implementation that
depends on the width, and also breaks interoperability with Zhiping's
series. The user doesn't know whether to take the 8-bit or 16-bit
steering tags. In fact, the kernel's mode selection only considers the
path to the root port completer and not to other devices and could
enable TPH between devices in incompatible modes.
Minimally it seems the Extended TPH Requester Supported register needs
to be virtualized, restricting it based on the topology support, and
the user written operating mode needs to be honored, but that also
presents a challenge in how we represent/interpret the steering tags
to/from the user. Thanks,
Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management
2026-05-26 4:08 ` [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management Chengwen Feng
2026-05-26 6:09 ` sashiko-bot
@ 2026-05-26 22:42 ` Alex Williamson
2026-05-27 9:54 ` fengchengwen
1 sibling, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2026-05-26 22:42 UTC (permalink / raw)
To: Chengwen Feng
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Tue, 26 May 2026 12:08:29 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:
> Add new userspace device feature to access and maintain PCIe TPH Steering
> Tag table entries, supporting two mutually exclusive operation modes:
> 1. Raw table read/write: Operate shadow ST table, sync updates to hardware
> TPH capability or MSI-X table. Failed batch write triggers entry
> rollback to keep hardware consistent.
> 2. CPU ID lookup: Query translated steering tag by specified memory type,
> pure read-only without modifying ST table.
>
> Introduce enable_unsafe_tph module parameter to gate this non-isolated TPH
> related feature. All operations are protected by per-device mutex, shadow
> table lifecycle is managed along device init/release.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 13 ++-
> drivers/vfio/pci/vfio_pci_core.c | 176 ++++++++++++++++++++++++++++++-
> include/linux/vfio_pci_core.h | 6 +-
> include/uapi/linux/vfio.h | 49 +++++++++
> 4 files changed, 240 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 0c771064c0b8..6d73668459cf 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -60,6 +60,12 @@ static bool disable_denylist;
> module_param(disable_denylist, bool, 0444);
> MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
>
> +#ifdef CONFIG_PCIE_TPH
> +static bool enable_unsafe_tph;
> +module_param(enable_unsafe_tph, bool, 0444);
> +MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
> +#endif
> +
> static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
> {
> switch (pdev->vendor) {
> @@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
> {
> int ret;
> bool is_disable_vga = true;
> + bool is_enable_unsafe_tph = false;
>
> #ifdef CONFIG_VFIO_PCI_VGA
> is_disable_vga = disable_vga;
> #endif
> +#ifdef CONFIG_PCIE_TPH
> + is_enable_unsafe_tph = enable_unsafe_tph;
> +#endif
>
> - vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
> + vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
> + is_enable_unsafe_tph);
>
> /* Register and scan for devices */
> ret = pci_register_driver(&vfio_pci_driver);
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 050e7542952e..6fc7496cb8dd 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -29,6 +29,7 @@
> #include <linux/sched/mm.h>
> #include <linux/iommufd.h>
> #include <linux/pci-p2pdma.h>
> +#include <linux/pci-tph.h>
> #if IS_ENABLED(CONFIG_EEH)
> #include <asm/eeh.h>
> #endif
> @@ -41,6 +42,7 @@
> static bool nointxmask;
> static bool disable_vga;
> static bool disable_idle_d3;
> +static bool enable_unsafe_tph;
>
> static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
> {
> @@ -1551,6 +1553,159 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
> return 0;
> }
>
> +static int vfio_pci_tph_st_shadow_size(struct vfio_pci_core_device *vdev)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + u32 loc = pcie_tph_get_st_table_loc(pdev);
> + int ret;
> +
> + if (loc == PCI_TPH_LOC_CAP) {
> + return pcie_tph_get_st_table_size(pdev);
> + } else if (loc == PCI_TPH_LOC_MSIX) {
> + ret = pci_msix_vec_count(pdev);
> + if (ret < 0)
> + return 0;
> + return ret;
> + } else {
> + return 0;
> + }
> +}
> +
> +static int vfio_pci_tph_op_raw_table(struct vfio_pci_core_device *vdev,
> + bool is_get,
> + struct vfio_device_feature_tph_st *arg)
> +{
> + void __user *uptr = u64_to_user_ptr(arg->data_uptr);
> + size_t sz = arg->count * sizeof(u16);
> + struct pci_dev *pdev = vdev->pdev;
> + int i, idx, ret;
> + u16 *sts;
> +
> + if (!vdev->tph_st_shadow)
> + return -EOPNOTSUPP;
> +
> + if (arg->flags & ~VFIO_TPH_ST_OP_TYPE_MASK)
> + return -EINVAL;
> + if (arg->count == 0 || arg->index >= vdev->tph_st_entries ||
> + arg->count > vdev->tph_st_entries ||
> + arg->index + arg->count > vdev->tph_st_entries)
> + return -EINVAL;
> +
> + if (is_get) {
> + ret = copy_to_user(uptr, &vdev->tph_st_shadow[arg->index], sz);
> + if (ret)
> + return -EFAULT;
> + return 0;
> + }
> +
> + sts = memdup_array_user(uptr, arg->count, sizeof(u16));
> + if (IS_ERR(sts))
> + return PTR_ERR(sts);
> +
> + if (pcie_tph_enabled_mode(vdev->pdev) < 0) {
> + memcpy(&vdev->tph_st_shadow[arg->index], sts, sz);
> + kfree(sts);
> + return 0;
> + }
> +
> + for (i = 0; i < arg->count; i++) {
> + idx = arg->index + i;
> + ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
> + if (ret)
> + goto rollback;
> + }
> +
> + memcpy(&vdev->tph_st_shadow[arg->index], sts, sz);
> + kfree(sts);
> + return 0;
> +
> +rollback:
> + while (i-- > 0) {
> + idx = arg->index + i;
> + pcie_tph_set_st_entry(pdev, idx, vdev->tph_st_shadow[idx]);
> + }
> + kfree(sts);
> + return ret;
> +}
> +
> +static int vfio_pci_tph_op_cpu_query(struct vfio_pci_core_device *vdev,
> + struct vfio_device_feature_tph_st *arg)
> +{
> + void __user *uptr = u64_to_user_ptr(arg->data_uptr);
> + struct pci_dev *pdev = vdev->pdev;
> + enum tph_mem_type mtype;
> + int i, ret;
> + u32 *cpus;
> + u16 st;
> +
> + if (arg->flags & ~(VFIO_TPH_ST_OP_TYPE_MASK | VFIO_TPH_ST_MEM_TYPE_MASK))
> + return -EINVAL;
> + if (arg->count == 0 || arg->count > nr_cpu_ids || arg->index != 0)
> + return -EINVAL;
> +
> + cpus = memdup_array_user(uptr, arg->count, sizeof(u32));
> + if (IS_ERR(cpus))
> + return PTR_ERR(cpus);
> +
> + mtype = (arg->flags & VFIO_TPH_ST_MEM_TYPE_MASK) == VFIO_TPH_ST_MEM_TYPE_VM ?
> + TPH_MEM_TYPE_VM : TPH_MEM_TYPE_PM;
> + for (i = 0; i < arg->count; i++) {
> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> + if (ret)
> + goto out;
> + cpus[i] = st;
> + }
> +
> + ret = copy_to_user(uptr, cpus, arg->count * sizeof(u32));
> +out:
> + kfree(cpus);
> + return ret;
> +}
> +
> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_tph_st __user *arg,
> + size_t argsz)
> +{
> + struct vfio_device_feature_tph_st tph_st;
> + bool is_get, is_set;
> + u32 op_type;
> + int ret;
> +
> + if (!enable_unsafe_tph)
> + return -EOPNOTSUPP;
> +
> + ret = vfio_check_feature(flags, argsz,
> + VFIO_DEVICE_FEATURE_GET |
> + VFIO_DEVICE_FEATURE_SET |
> + VFIO_DEVICE_FEATURE_PROBE,
> + sizeof(tph_st));
> + if (ret <= 0)
> + return ret;
> +
> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
> + return -EFAULT;
> +
> + op_type = tph_st.flags & VFIO_TPH_ST_OP_TYPE_MASK;
> + is_get = !!(flags & VFIO_DEVICE_FEATURE_GET);
> + is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
> +
> + guard(mutex)(&vdev->tph_lock);
> +
> + switch (op_type) {
> + case VFIO_TPH_ST_OP_RAW_TABLE:
> + if (is_set && is_get)
> + return -EINVAL;
> + return vfio_pci_tph_op_raw_table(vdev, is_get, &tph_st);
> + case VFIO_TPH_ST_OP_CPU_QUERY:
> + if (is_set)
> + return -EOPNOTSUPP;
> + return vfio_pci_tph_op_cpu_query(vdev, &tph_st);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> void __user *arg, size_t argsz)
> {
> @@ -1569,6 +1724,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
> case VFIO_DEVICE_FEATURE_DMA_BUF:
> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
> + case VFIO_DEVICE_FEATURE_TPH_ST:
> + return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
> default:
> return -ENOTTY;
> }
> @@ -2132,12 +2289,23 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> mutex_init(&vdev->igate);
> spin_lock_init(&vdev->irqlock);
> mutex_init(&vdev->ioeventfds_lock);
> + mutex_init(&vdev->tph_lock);
> + vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
> + vdev->tph_st_shadow = NULL;
> + if (vdev->tph_st_entries) {
> + vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
> + GFP_KERNEL);
> + if (!vdev->tph_st_shadow)
> + return -ENOMEM;
> + }
> INIT_LIST_HEAD(&vdev->dummy_resources_list);
> INIT_LIST_HEAD(&vdev->ioeventfds_list);
> INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> ret = pcim_p2pdma_init(vdev->pdev);
> - if (ret && ret != -EOPNOTSUPP)
> + if (ret && ret != -EOPNOTSUPP) {
> + kfree(vdev->tph_st_shadow);
> return ret;
> + }
> INIT_LIST_HEAD(&vdev->dmabufs);
> init_rwsem(&vdev->memory_lock);
> xa_init(&vdev->ctx);
> @@ -2153,6 +2321,8 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
>
> mutex_destroy(&vdev->igate);
> mutex_destroy(&vdev->ioeventfds_lock);
> + mutex_destroy(&vdev->tph_lock);
> + kfree(vdev->tph_st_shadow);
> kfree(vdev->region);
> kfree(vdev->pm_save);
> }
> @@ -2605,11 +2775,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
> }
>
> void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
> - bool is_disable_idle_d3)
> + bool is_disable_idle_d3,
> + bool is_enable_unsafe_tph)
> {
> nointxmask = is_nointxmask;
> disable_vga = is_disable_vga;
> disable_idle_d3 = is_disable_idle_d3;
> + enable_unsafe_tph = is_enable_unsafe_tph;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
>
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 89165b769e5c..ae8e7011ab8e 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -142,6 +142,9 @@ struct vfio_pci_core_device {
> struct notifier_block nb;
> struct rw_semaphore memory_lock;
> struct list_head dmabufs;
> + struct mutex tph_lock;
> + u16 *tph_st_shadow;
> + u16 tph_st_entries;
> };
>
> enum vfio_pci_io_width {
> @@ -157,7 +160,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
> const struct vfio_pci_regops *ops,
> size_t size, u32 flags, void *data);
> void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
> - bool is_disable_idle_d3);
> + bool is_disable_idle_d3,
> + bool is_enable_unsafe_tph);
> void vfio_pci_core_close_device(struct vfio_device *core_vdev);
> int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
> void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5de618a3a5ee..c76196f93660 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1534,6 +1534,55 @@ struct vfio_device_feature_dma_buf {
> */
> #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
>
> +/**
> + * VFIO_DEVICE_FEATURE_TPH_ST - Manage PCIe TPH Steering Tag entries
> + *
> + * Provides userspace interface to manage PCIe TPH ST table entries.
> + *
> + * @flags: Composite control flags
> + * Operation type[bit0~3]: Exclusive operation selection
> + * Attribute bits[bit4~31]: Additional property parameters
> + *
> + * @index: Start entry offset, only valid for raw table operation
> + * @count: Number of consecutive entries to operate
> + * @data_uptr: Aligned userspace data buffer pointer
> + *
> + * VFIO_TPH_ST_OP_RAW_TABLE type:
> + * Access raw ST table entry directly.
> + * Attribute bits are ignored in this operation.
> + * Userspace data buffer stores 16-bit raw steering tag values.
> + * SET writes entries, and GET reads existing raw ST entries back to user
> + * buffer.
> + *
> + * VFIO_TPH_ST_OP_CPU_QUERY type:
> + * Resolve ST tag from CPU ID, only supports GET operation.
> + * Attribute bits carry memory type info.
> + * Userspace data buffer provides 32-bit CPU IDs, kernel returns translated
> + * 16-bit ST tag according to specified memory type. No modification to ST
> + * table during query.
> + *
> + * This feature is gated by enable_unsafe_tph module parameter.
> + */
> +#define VFIO_DEVICE_FEATURE_TPH_ST 13
> +
> +struct vfio_device_feature_tph_st {
> + __u32 flags;
> +
> +/* Operation type field */
> +#define VFIO_TPH_ST_OP_TYPE_MASK 0xFu
> +#define VFIO_TPH_ST_OP_RAW_TABLE 0x0u
> +#define VFIO_TPH_ST_OP_CPU_QUERY 0x1u
> +
> +/* Attribute bits for CPU query operation type */
> +#define VFIO_TPH_ST_MEM_TYPE_MASK (1u << 4)
> +#define VFIO_TPH_ST_MEM_TYPE_VM (0u << 4)
> +#define VFIO_TPH_ST_MEM_TYPE_PM (1u << 4)
> +
> + __u16 index;
> + __u16 count;
> + __aligned_u64 data_uptr;
> +};
This is already a multiplexed ioctl, don't add another level of
multiplexing, use two separate features.
If the user is now handling the raw ST, we don't need GET support on
the ST entry feature.
The CPU lookup/xlate feature is on pretty thin standing for why it
needs to be implemented in vfio-pci. I think it's largely here because
there's no other obvious place for it and a sysfs implementation would
be massive if we need 2 (vm/pm) * 2 (8/16-bit) * NR_CPUS attributes per
root port.
As noted in the previous patch, there still seem to be gaps in the
virtualization and user ownership of the TPH mode enabled on the
device. The interface proposed here doesn't seem to fully support
emulation of the TPH capability from a virtualization perspective.
Thanks,
Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management
2026-05-26 22:42 ` Alex Williamson
@ 2026-05-27 9:54 ` fengchengwen
0 siblings, 0 replies; 15+ messages in thread
From: fengchengwen @ 2026-05-27 9:54 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci
On 5/27/2026 6:42 AM, Alex Williamson wrote:
> On Tue, 26 May 2026 12:08:29 +0800
> Chengwen Feng <fengchengwen@huawei.com> wrote:
>
>> Add new userspace device feature to access and maintain PCIe TPH Steering
>> Tag table entries, supporting two mutually exclusive operation modes:
>> 1. Raw table read/write: Operate shadow ST table, sync updates to hardware
>> TPH capability or MSI-X table. Failed batch write triggers entry
>> rollback to keep hardware consistent.
>> 2. CPU ID lookup: Query translated steering tag by specified memory type,
>> pure read-only without modifying ST table.
>>
>> Introduce enable_unsafe_tph module parameter to gate this non-isolated TPH
>> related feature. All operations are protected by per-device mutex, shadow
>> table lifecycle is managed along device init/release.
>>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>> drivers/vfio/pci/vfio_pci.c | 13 ++-
>> drivers/vfio/pci/vfio_pci_core.c | 176 ++++++++++++++++++++++++++++++-
>> include/linux/vfio_pci_core.h | 6 +-
>> include/uapi/linux/vfio.h | 49 +++++++++
>> 4 files changed, 240 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 0c771064c0b8..6d73668459cf 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -60,6 +60,12 @@ static bool disable_denylist;
>> module_param(disable_denylist, bool, 0444);
>> MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
>>
>> +#ifdef CONFIG_PCIE_TPH
>> +static bool enable_unsafe_tph;
>> +module_param(enable_unsafe_tph, bool, 0444);
>> +MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
>> +#endif
>> +
>> static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
>> {
>> switch (pdev->vendor) {
>> @@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
>> {
>> int ret;
>> bool is_disable_vga = true;
>> + bool is_enable_unsafe_tph = false;
>>
>> #ifdef CONFIG_VFIO_PCI_VGA
>> is_disable_vga = disable_vga;
>> #endif
>> +#ifdef CONFIG_PCIE_TPH
>> + is_enable_unsafe_tph = enable_unsafe_tph;
>> +#endif
>>
>> - vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
>> + vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
>> + is_enable_unsafe_tph);
>>
>> /* Register and scan for devices */
>> ret = pci_register_driver(&vfio_pci_driver);
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 050e7542952e..6fc7496cb8dd 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -29,6 +29,7 @@
>> #include <linux/sched/mm.h>
>> #include <linux/iommufd.h>
>> #include <linux/pci-p2pdma.h>
>> +#include <linux/pci-tph.h>
>> #if IS_ENABLED(CONFIG_EEH)
>> #include <asm/eeh.h>
>> #endif
>> @@ -41,6 +42,7 @@
>> static bool nointxmask;
>> static bool disable_vga;
>> static bool disable_idle_d3;
>> +static bool enable_unsafe_tph;
>>
>> static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
>> {
>> @@ -1551,6 +1553,159 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
>> return 0;
>> }
>>
>> +static int vfio_pci_tph_st_shadow_size(struct vfio_pci_core_device *vdev)
>> +{
>> + struct pci_dev *pdev = vdev->pdev;
>> + u32 loc = pcie_tph_get_st_table_loc(pdev);
>> + int ret;
>> +
>> + if (loc == PCI_TPH_LOC_CAP) {
>> + return pcie_tph_get_st_table_size(pdev);
>> + } else if (loc == PCI_TPH_LOC_MSIX) {
>> + ret = pci_msix_vec_count(pdev);
>> + if (ret < 0)
>> + return 0;
>> + return ret;
>> + } else {
>> + return 0;
>> + }
>> +}
>> +
>> +static int vfio_pci_tph_op_raw_table(struct vfio_pci_core_device *vdev,
>> + bool is_get,
>> + struct vfio_device_feature_tph_st *arg)
>> +{
>> + void __user *uptr = u64_to_user_ptr(arg->data_uptr);
>> + size_t sz = arg->count * sizeof(u16);
>> + struct pci_dev *pdev = vdev->pdev;
>> + int i, idx, ret;
>> + u16 *sts;
>> +
>> + if (!vdev->tph_st_shadow)
>> + return -EOPNOTSUPP;
>> +
>> + if (arg->flags & ~VFIO_TPH_ST_OP_TYPE_MASK)
>> + return -EINVAL;
>> + if (arg->count == 0 || arg->index >= vdev->tph_st_entries ||
>> + arg->count > vdev->tph_st_entries ||
>> + arg->index + arg->count > vdev->tph_st_entries)
>> + return -EINVAL;
>> +
>> + if (is_get) {
>> + ret = copy_to_user(uptr, &vdev->tph_st_shadow[arg->index], sz);
>> + if (ret)
>> + return -EFAULT;
>> + return 0;
>> + }
>> +
>> + sts = memdup_array_user(uptr, arg->count, sizeof(u16));
>> + if (IS_ERR(sts))
>> + return PTR_ERR(sts);
>> +
>> + if (pcie_tph_enabled_mode(vdev->pdev) < 0) {
>> + memcpy(&vdev->tph_st_shadow[arg->index], sts, sz);
>> + kfree(sts);
>> + return 0;
>> + }
>> +
>> + for (i = 0; i < arg->count; i++) {
>> + idx = arg->index + i;
>> + ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
>> + if (ret)
>> + goto rollback;
>> + }
>> +
>> + memcpy(&vdev->tph_st_shadow[arg->index], sts, sz);
>> + kfree(sts);
>> + return 0;
>> +
>> +rollback:
>> + while (i-- > 0) {
>> + idx = arg->index + i;
>> + pcie_tph_set_st_entry(pdev, idx, vdev->tph_st_shadow[idx]);
>> + }
>> + kfree(sts);
>> + return ret;
>> +}
>> +
>> +static int vfio_pci_tph_op_cpu_query(struct vfio_pci_core_device *vdev,
>> + struct vfio_device_feature_tph_st *arg)
>> +{
>> + void __user *uptr = u64_to_user_ptr(arg->data_uptr);
>> + struct pci_dev *pdev = vdev->pdev;
>> + enum tph_mem_type mtype;
>> + int i, ret;
>> + u32 *cpus;
>> + u16 st;
>> +
>> + if (arg->flags & ~(VFIO_TPH_ST_OP_TYPE_MASK | VFIO_TPH_ST_MEM_TYPE_MASK))
>> + return -EINVAL;
>> + if (arg->count == 0 || arg->count > nr_cpu_ids || arg->index != 0)
>> + return -EINVAL;
>> +
>> + cpus = memdup_array_user(uptr, arg->count, sizeof(u32));
>> + if (IS_ERR(cpus))
>> + return PTR_ERR(cpus);
>> +
>> + mtype = (arg->flags & VFIO_TPH_ST_MEM_TYPE_MASK) == VFIO_TPH_ST_MEM_TYPE_VM ?
>> + TPH_MEM_TYPE_VM : TPH_MEM_TYPE_PM;
>> + for (i = 0; i < arg->count; i++) {
>> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
>> + if (ret)
>> + goto out;
>> + cpus[i] = st;
>> + }
>> +
>> + ret = copy_to_user(uptr, cpus, arg->count * sizeof(u32));
>> +out:
>> + kfree(cpus);
>> + return ret;
>> +}
>> +
>> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
>> + u32 flags,
>> + struct vfio_device_feature_tph_st __user *arg,
>> + size_t argsz)
>> +{
>> + struct vfio_device_feature_tph_st tph_st;
>> + bool is_get, is_set;
>> + u32 op_type;
>> + int ret;
>> +
>> + if (!enable_unsafe_tph)
>> + return -EOPNOTSUPP;
>> +
>> + ret = vfio_check_feature(flags, argsz,
>> + VFIO_DEVICE_FEATURE_GET |
>> + VFIO_DEVICE_FEATURE_SET |
>> + VFIO_DEVICE_FEATURE_PROBE,
>> + sizeof(tph_st));
>> + if (ret <= 0)
>> + return ret;
>> +
>> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
>> + return -EFAULT;
>> +
>> + op_type = tph_st.flags & VFIO_TPH_ST_OP_TYPE_MASK;
>> + is_get = !!(flags & VFIO_DEVICE_FEATURE_GET);
>> + is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
>> +
>> + guard(mutex)(&vdev->tph_lock);
>> +
>> + switch (op_type) {
>> + case VFIO_TPH_ST_OP_RAW_TABLE:
>> + if (is_set && is_get)
>> + return -EINVAL;
>> + return vfio_pci_tph_op_raw_table(vdev, is_get, &tph_st);
>> + case VFIO_TPH_ST_OP_CPU_QUERY:
>> + if (is_set)
>> + return -EOPNOTSUPP;
>> + return vfio_pci_tph_op_cpu_query(vdev, &tph_st);
>> + default:
>> + return -EINVAL;
>> + }
>> +}
>> +
>> int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>> void __user *arg, size_t argsz)
>> {
>> @@ -1569,6 +1724,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
>> case VFIO_DEVICE_FEATURE_DMA_BUF:
>> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
>> + case VFIO_DEVICE_FEATURE_TPH_ST:
>> + return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
>> default:
>> return -ENOTTY;
>> }
>> @@ -2132,12 +2289,23 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
>> mutex_init(&vdev->igate);
>> spin_lock_init(&vdev->irqlock);
>> mutex_init(&vdev->ioeventfds_lock);
>> + mutex_init(&vdev->tph_lock);
>> + vdev->tph_st_entries = vfio_pci_tph_st_shadow_size(vdev);
>> + vdev->tph_st_shadow = NULL;
>> + if (vdev->tph_st_entries) {
>> + vdev->tph_st_shadow = kcalloc(vdev->tph_st_entries, sizeof(u16),
>> + GFP_KERNEL);
>> + if (!vdev->tph_st_shadow)
>> + return -ENOMEM;
>> + }
>> INIT_LIST_HEAD(&vdev->dummy_resources_list);
>> INIT_LIST_HEAD(&vdev->ioeventfds_list);
>> INIT_LIST_HEAD(&vdev->sriov_pfs_item);
>> ret = pcim_p2pdma_init(vdev->pdev);
>> - if (ret && ret != -EOPNOTSUPP)
>> + if (ret && ret != -EOPNOTSUPP) {
>> + kfree(vdev->tph_st_shadow);
>> return ret;
>> + }
>> INIT_LIST_HEAD(&vdev->dmabufs);
>> init_rwsem(&vdev->memory_lock);
>> xa_init(&vdev->ctx);
>> @@ -2153,6 +2321,8 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
>>
>> mutex_destroy(&vdev->igate);
>> mutex_destroy(&vdev->ioeventfds_lock);
>> + mutex_destroy(&vdev->tph_lock);
>> + kfree(vdev->tph_st_shadow);
>> kfree(vdev->region);
>> kfree(vdev->pm_save);
>> }
>> @@ -2605,11 +2775,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
>> }
>>
>> void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
>> - bool is_disable_idle_d3)
>> + bool is_disable_idle_d3,
>> + bool is_enable_unsafe_tph)
>> {
>> nointxmask = is_nointxmask;
>> disable_vga = is_disable_vga;
>> disable_idle_d3 = is_disable_idle_d3;
>> + enable_unsafe_tph = is_enable_unsafe_tph;
>> }
>> EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
>>
>> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
>> index 89165b769e5c..ae8e7011ab8e 100644
>> --- a/include/linux/vfio_pci_core.h
>> +++ b/include/linux/vfio_pci_core.h
>> @@ -142,6 +142,9 @@ struct vfio_pci_core_device {
>> struct notifier_block nb;
>> struct rw_semaphore memory_lock;
>> struct list_head dmabufs;
>> + struct mutex tph_lock;
>> + u16 *tph_st_shadow;
>> + u16 tph_st_entries;
>> };
>>
>> enum vfio_pci_io_width {
>> @@ -157,7 +160,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
>> const struct vfio_pci_regops *ops,
>> size_t size, u32 flags, void *data);
>> void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
>> - bool is_disable_idle_d3);
>> + bool is_disable_idle_d3,
>> + bool is_enable_unsafe_tph);
>> void vfio_pci_core_close_device(struct vfio_device *core_vdev);
>> int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
>> void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 5de618a3a5ee..c76196f93660 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -1534,6 +1534,55 @@ struct vfio_device_feature_dma_buf {
>> */
>> #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
>>
>> +/**
>> + * VFIO_DEVICE_FEATURE_TPH_ST - Manage PCIe TPH Steering Tag entries
>> + *
>> + * Provides userspace interface to manage PCIe TPH ST table entries.
>> + *
>> + * @flags: Composite control flags
>> + * Operation type[bit0~3]: Exclusive operation selection
>> + * Attribute bits[bit4~31]: Additional property parameters
>> + *
>> + * @index: Start entry offset, only valid for raw table operation
>> + * @count: Number of consecutive entries to operate
>> + * @data_uptr: Aligned userspace data buffer pointer
>> + *
>> + * VFIO_TPH_ST_OP_RAW_TABLE type:
>> + * Access raw ST table entry directly.
>> + * Attribute bits are ignored in this operation.
>> + * Userspace data buffer stores 16-bit raw steering tag values.
>> + * SET writes entries, and GET reads existing raw ST entries back to user
>> + * buffer.
>> + *
>> + * VFIO_TPH_ST_OP_CPU_QUERY type:
>> + * Resolve ST tag from CPU ID, only supports GET operation.
>> + * Attribute bits carry memory type info.
>> + * Userspace data buffer provides 32-bit CPU IDs, kernel returns translated
>> + * 16-bit ST tag according to specified memory type. No modification to ST
>> + * table during query.
>> + *
>> + * This feature is gated by enable_unsafe_tph module parameter.
>> + */
>> +#define VFIO_DEVICE_FEATURE_TPH_ST 13
>> +
>> +struct vfio_device_feature_tph_st {
>> + __u32 flags;
>> +
>> +/* Operation type field */
>> +#define VFIO_TPH_ST_OP_TYPE_MASK 0xFu
>> +#define VFIO_TPH_ST_OP_RAW_TABLE 0x0u
>> +#define VFIO_TPH_ST_OP_CPU_QUERY 0x1u
>> +
>> +/* Attribute bits for CPU query operation type */
>> +#define VFIO_TPH_ST_MEM_TYPE_MASK (1u << 4)
>> +#define VFIO_TPH_ST_MEM_TYPE_VM (0u << 4)
>> +#define VFIO_TPH_ST_MEM_TYPE_PM (1u << 4)
>> +
>> + __u16 index;
>> + __u16 count;
>> + __aligned_u64 data_uptr;
>> +};
>
> This is already a multiplexed ioctl, don't add another level of
> multiplexing, use two separate features.
done in v13
>
> If the user is now handling the raw ST, we don't need GET support on
> the ST entry feature.
done in v13
>
> The CPU lookup/xlate feature is on pretty thin standing for why it
> needs to be implemented in vfio-pci. I think it's largely here because
> there's no other obvious place for it and a sysfs implementation would
> be massive if we need 2 (vm/pm) * 2 (8/16-bit) * NR_CPUS attributes per
> root port.
Yes
>
> As noted in the previous patch, there still seem to be gaps in the
> virtualization and user ownership of the TPH mode enabled on the
> device. The interface proposed here doesn't seem to fully support
> emulation of the TPH capability from a virtualization perspective.
> Thanks,
v13 support virtualize TPH request type.
By v13:
1\ Userspace could query device TPH request type
2\ Userspace decide use which request type: standard(8bit) or extended(16bit)
3\ Userspace query CPU'st based on TPH request type and memory type
4\ Userspace set device's ST table entry if ST location in CAP or MSIX
5\ Userspace enable device's TPH with request type
If device support standard and extended, then usespace could chose anyone
of them.
Thanks
>
> Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-05-27 9:54 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 4:08 [PATCH v12 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-26 4:31 ` sashiko-bot
2026-05-26 4:08 ` [PATCH v12 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-26 4:51 ` sashiko-bot
2026-05-26 4:08 ` [PATCH v12 3/6] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
2026-05-26 4:08 ` [PATCH v12 4/6] PCI/TPH: Move tph_req_type initialization into pci_tph_init Chengwen Feng
2026-05-26 5:35 ` sashiko-bot
2026-05-26 22:42 ` Alex Williamson
2026-05-26 4:08 ` [PATCH v12 5/6] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for PCIe TPH steering tag management Chengwen Feng
2026-05-26 6:09 ` sashiko-bot
2026-05-26 22:42 ` Alex Williamson
2026-05-27 9:54 ` fengchengwen
2026-05-26 4:08 ` [PATCH v12 6/6] vfio/pci: Add PCIe TPH control register virtualization Chengwen Feng
2026-05-26 6:56 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox