* [PATCH v11 0/5] vfio/pci: Add PCIe TPH support
@ 2026-05-18 7:16 Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 1/5] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
` (5 more replies)
0 siblings, 6 replies; 22+ messages in thread
From: Chengwen Feng @ 2026-05-18 7:16 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
This patchset enables userspace control over PCIe TPH steering tags,
motivated by the following considerations:
1. Why userspace needs the capability to control steering tags:
When PCIe devices are fully owned by userspace workloads such as DPDK
and SPDK, only userspace has full knowledge of core binding policies
and traffic distribution strategies. Without this series, userspace
cannot enable TPH or configure steering tags, leaving built-in PCIe
performance optimizations unused in high-throughput polling I/O
scenarios.
2. Why this interface must be implemented in VFIO:
VFIO is the standard, secure community solution for granting full
PCIe device ownership to userspace. Existing kernel TPH interfaces
are designed purely for in-kernel drivers. For user-owned devices,
VFIO provides the only isolated and correct path to expose per-device
TPH management.
TPH supports both IV and DS modes. Since both modes could introduces
cross-VM isolation risks such as untrusted guests programming arbitrary
steering tags to impact other domains:
1. If ST location in MSI-X table, untrusted guests may program the
MSI-X table.
2. If ST don't locate in MSI-X or CAP, untrusted guests may program the
device-specific register.
So a new module parameter `enable_unsafe_tph` is added. It defaults to
off, and blocks all unsafe TPH operations when disabled.
Based on earlier RFC work by Wathsala Vithanage
v11:
- Fix Sashiko review comments:
Add tph_lock for protect all tph operations from vfio user
Fix memory-leak when copy_to_user failed
v10:
- Fix Sashiko review comments:
- Fix TPH virtualization write return negetive val of 4/5 commit
- Fix don't support probe feature of 5/5 commit
- Fix return positive val if copy_to_user fail of 5/5 commit
v9:
- Address Alex's comment on VFIO part: adopt device feature scheme
- Remove pcie_tph_get_st_table_size and add pcie_tph_is_enabled
depending on the implementation requirements
Chengwen Feng (5):
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
PCI/TPH: Export pcie_tph_get_st_modes() for external use
PCI/TPH: Add pcie_tph_enabled_mode() helper
vfio/pci: Add PCIe TPH configuration space virtualization
vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
drivers/pci/tph.c | 38 +++++++---
drivers/vfio/pci/vfio_pci.c | 13 +++-
drivers/vfio/pci/vfio_pci_config.c | 28 ++++++++
drivers/vfio/pci/vfio_pci_core.c | 107 ++++++++++++++++++++++++++++-
include/linux/pci-tph.h | 10 +++
include/linux/vfio_pci_core.h | 4 +-
include/uapi/linux/vfio.h | 41 +++++++++++
7 files changed, 230 insertions(+), 11 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v11 1/5] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-05-18 7:16 ` Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
` (4 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: Chengwen Feng @ 2026-05-18 7:16 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.
This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.
This doesn't make a difference to mlx5_st_create(), the lone external
caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 91145e8d9d95..877cf556242b 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -170,7 +170,7 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+ return reg & PCI_TPH_CAP_LOC_MASK;
}
EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
@@ -185,9 +185,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
/* Check ST table location first */
loc = pcie_tph_get_st_table_loc(pdev);
-
- /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
if (loc != PCI_TPH_LOC_CAP)
return 0;
@@ -316,8 +313,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
loc = pcie_tph_get_st_table_loc(pdev);
- /* Convert loc to match with PCI_TPH_LOC_* */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
switch (loc) {
case PCI_TPH_LOC_MSIX:
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 1/5] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-05-18 7:16 ` Chengwen Feng
2026-05-18 7:57 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-18 7:16 ` [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
` (3 subsequent siblings)
5 siblings, 2 replies; 22+ messages in thread
From: Chengwen Feng @ 2026-05-18 7:16 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Export the helper to retrieve supported PCIe TPH steering tag modes so
that drivers like VFIO can query and expose device capabilities to
userspace.
Add stub functions for pcie_tph_get_st_table_size() and
pcie_tph_get_st_table_loc() when !CONFIG_PCIE_TPH.
Add tph_cap validation for pcie_tph_get_st_modes() and
pcie_tph_get_st_table_loc() to prevent invalid PCI configuration
space access when TPH is not supported.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 19 +++++++++++++++++--
include/linux/pci-tph.h | 7 +++++++
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 877cf556242b..ba31b010f67a 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
}
-static u8 get_st_modes(struct pci_dev *pdev)
+/**
+ * pcie_tph_get_st_modes - Get supported Steering Tag modes
+ * @pdev: PCI device to query
+ *
+ * Return:
+ * Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
+ * PCI_TPH_CAP_ST_DS)
+ */
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
{
u32 reg;
+ if (!pdev->tph_cap)
+ return 0;
+
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
return reg;
}
+EXPORT_SYMBOL(pcie_tph_get_st_modes);
/**
* pcie_tph_get_st_table_loc - Return the device's ST table location
@@ -168,6 +180,9 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{
u32 reg;
+ if (!pdev->tph_cap)
+ return PCI_TPH_LOC_NONE;
+
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
return reg & PCI_TPH_CAP_LOC_MASK;
@@ -395,7 +410,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
/* Sanitize and check ST mode compatibility */
mode &= PCI_TPH_CTRL_MODE_SEL_MASK;
- dev_modes = get_st_modes(pdev);
+ dev_modes = pcie_tph_get_st_modes(pdev);
if (!((1 << mode) & dev_modes))
return -EINVAL;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..5772d48ea444 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -30,6 +30,7 @@ void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
#else
static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag)
@@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return 0; }
+static inline u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
+{ return 0; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 1/5] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-18 7:16 ` Chengwen Feng
2026-05-18 8:14 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-18 7:17 ` [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization Chengwen Feng
` (2 subsequent siblings)
5 siblings, 2 replies; 22+ messages in thread
From: Chengwen Feng @ 2026-05-18 7:16 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add a helper to query enabled TPH mode on a PCI device. This is useful for
drivers like VFIO-PCI that need to validate TPH state before allowing
access to steering tag tables.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/pci/tph.c | 12 ++++++++++++
include/linux/pci-tph.h | 3 +++
2 files changed, 15 insertions(+)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index ba31b010f67a..91c1e83410a3 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -451,6 +451,18 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
}
EXPORT_SYMBOL(pcie_enable_tph);
+/**
+ * pcie_tph_enabled_mode - Get current enabled TPH mode
+ * @pdev: PCI device
+ *
+ * Return the enabled TPH mode (IV/DS) or 0 if disabled.
+ */
+int pcie_tph_enabled_mode(struct pci_dev *pdev)
+{
+ return pdev->tph_enabled ? pdev->tph_mode : 0;
+}
+EXPORT_SYMBOL(pcie_tph_enabled_mode);
+
void pci_restore_tph_state(struct pci_dev *pdev)
{
struct pci_cap_saved_state *save_state;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 5772d48ea444..28d0fa762146 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -28,6 +28,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev,
unsigned int cpu, u16 *tag);
void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
+int pcie_tph_enabled_mode(struct pci_dev *pdev);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
@@ -42,6 +43,8 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline int pcie_tph_enabled_mode(struct pci_dev *pdev)
+{ return 0; }
static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
{ return 0; }
static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
` (2 preceding siblings ...)
2026-05-18 7:16 ` [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
@ 2026-05-18 7:17 ` Chengwen Feng
2026-05-18 8:48 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-18 7:17 ` [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management Chengwen Feng
2026-05-22 1:31 ` [PATCH v11 0/5] vfio/pci: Add PCIe TPH support fengchengwen
5 siblings, 2 replies; 22+ messages in thread
From: Chengwen Feng @ 2026-05-18 7:17 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add support for virtualizing the PCIe TPH (Transaction Processing Hints)
control register. TPH may break platform isolation, so add a module
parameter "enable_unsafe_tph" to allow administrators to globally control
this feature.
TPH control register writes are mediated to only allow valid mode settings,
and TPH is automatically disabled when VFIO takes ownership of the device
or when userspace closes the device file descriptor.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci.c | 13 ++++++++++++-
drivers/vfio/pci/vfio_pci_config.c | 28 ++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 16 +++++++++++++++-
include/linux/vfio_pci_core.h | 4 +++-
4 files changed, 58 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..6d73668459cf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
module_param(disable_denylist, bool, 0444);
MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph;
+module_param(enable_unsafe_tph, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
+#endif
+
static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
{
switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
{
int ret;
bool is_disable_vga = true;
+ bool is_enable_unsafe_tph = false;
#ifdef CONFIG_VFIO_PCI_VGA
is_disable_vga = disable_vga;
#endif
+#ifdef CONFIG_PCIE_TPH
+ is_enable_unsafe_tph = enable_unsafe_tph;
+#endif
- vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+ vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+ is_enable_unsafe_tph);
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a10ed733f0e3..6da35c2313b9 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -22,6 +22,7 @@
#include <linux/fs.h>
#include <linux/pci.h>
+#include <linux/pci-tph.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/slab.h>
@@ -35,6 +36,8 @@
((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
(offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
+extern bool enable_unsafe_tph;
+
/*
* Lengths of PCI Config Capabilities
* 0: Removed from the user visible capability list
@@ -313,6 +316,30 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
return count;
}
+static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
+ int count, struct perm_bits *perm,
+ int offset, __le32 val)
+{
+ u32 data = le32_to_cpu(val);
+
+ guard(mutex)(&vdev->tph_lock);
+
+ if (!enable_unsafe_tph)
+ return count;
+
+ if (offset != PCI_TPH_CTRL)
+ return count;
+
+ /* Only permit write TPH mode. */
+ data &= PCI_TPH_CTRL_MODE_SEL_MASK;
+ if (data == PCI_TPH_ST_IV_MODE || data == PCI_TPH_ST_DS_MODE)
+ pcie_enable_tph(vdev->pdev, data);
+ else if (data == PCI_TPH_ST_NS_MODE)
+ pcie_disable_tph(vdev->pdev);
+
+ return count;
+}
+
static struct perm_bits direct_ro_perms = {
.readfn = vfio_direct_config_read,
};
@@ -1121,6 +1148,7 @@ int __init vfio_pci_init_perm_bits(void)
ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
+ ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_pci_tph_config_write;
ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
if (ret)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 050e7542952e..0d0140202280 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <linux/iommufd.h>
#include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
#if IS_ENABLED(CONFIG_EEH)
#include <asm/eeh.h>
#endif
@@ -41,6 +42,7 @@
static bool nointxmask;
static bool disable_vga;
static bool disable_idle_d3;
+bool enable_unsafe_tph;
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
@@ -771,6 +773,11 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
#endif
vfio_pci_dma_buf_cleanup(vdev);
+ /* Disable TPH when userspace closes the device FD */
+ mutex_lock(&vdev->tph_lock);
+ pcie_disable_tph(vdev->pdev);
+ mutex_unlock(&vdev->tph_lock);
+
vfio_pci_core_disable(vdev);
mutex_lock(&vdev->igate);
@@ -2132,6 +2139,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
mutex_init(&vdev->igate);
spin_lock_init(&vdev->irqlock);
mutex_init(&vdev->ioeventfds_lock);
+ mutex_init(&vdev->tph_lock);
INIT_LIST_HEAD(&vdev->dummy_resources_list);
INIT_LIST_HEAD(&vdev->ioeventfds_list);
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
@@ -2151,6 +2159,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
+ mutex_destroy(&vdev->tph_lock);
mutex_destroy(&vdev->igate);
mutex_destroy(&vdev->ioeventfds_lock);
kfree(vdev->region);
@@ -2240,6 +2249,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
if (!disable_idle_d3)
pm_runtime_put(dev);
+ /* Disable TPH when taking over ownership of the device */
+ pcie_disable_tph(pdev);
+
ret = vfio_register_group_dev(&vdev->vdev);
if (ret)
goto out_power;
@@ -2605,11 +2617,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
}
void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3)
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph)
{
nointxmask = is_nointxmask;
disable_vga = is_disable_vga;
disable_idle_d3 = is_disable_idle_d3;
+ enable_unsafe_tph = is_enable_unsafe_tph;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 89165b769e5c..741d5bcc7ba3 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -142,6 +142,7 @@ struct vfio_pci_core_device {
struct notifier_block nb;
struct rw_semaphore memory_lock;
struct list_head dmabufs;
+ struct mutex tph_lock;
};
enum vfio_pci_io_width {
@@ -157,7 +158,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3);
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
` (3 preceding siblings ...)
2026-05-18 7:17 ` [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization Chengwen Feng
@ 2026-05-18 7:17 ` Chengwen Feng
2026-05-18 9:22 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-22 1:31 ` [PATCH v11 0/5] vfio/pci: Add PCIe TPH support fengchengwen
5 siblings, 2 replies; 22+ messages in thread
From: Chengwen Feng @ 2026-05-18 7:17 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage PCIe TPH
Steering Tag entries.
SET: Program contiguous ST entries when ST table resides in TPH Capability
or MSI-X table. U32_MAX CPU ID clears the corresponding ST entry.
GET: Retrieve ST values per CPU ID, only available in DS mode.
Both operations are only valid when TPH is enabled on the device.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_core.c | 91 ++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 41 ++++++++++++++
2 files changed, 132 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 0d0140202280..13a22c1d6ea6 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1558,6 +1558,95 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
+ u32 flags,
+ struct vfio_device_feature_tph_st __user *arg,
+ size_t argsz)
+{
+ bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
+ struct vfio_device_feature_tph_st tph_st;
+ struct pci_dev *pdev = vdev->pdev;
+ enum tph_mem_type mtype;
+ int i, j, ret;
+ u32 *cpus;
+ u16 st;
+
+ guard(mutex)(&vdev->tph_lock);
+
+ ret = vfio_check_feature(flags, argsz,
+ VFIO_DEVICE_FEATURE_GET |
+ VFIO_DEVICE_FEATURE_SET,
+ sizeof(tph_st));
+ if (ret <= 0)
+ return ret;
+
+ if (!enable_unsafe_tph ||
+ pcie_tph_enabled_mode(pdev) == PCI_TPH_ST_NS_MODE)
+ return -EOPNOTSUPP;
+ if (!is_set && pcie_tph_enabled_mode(pdev) != PCI_TPH_ST_DS_MODE)
+ return -EOPNOTSUPP;
+ if (is_set && pcie_tph_get_st_table_loc(pdev) == PCI_TPH_LOC_NONE)
+ return -EOPNOTSUPP;
+
+ if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
+ return -EFAULT;
+
+ if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
+ tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)
+ return -EINVAL;
+ if (!is_set && tph_st.index != 0)
+ return -EINVAL;
+ if (is_set && (tph_st.index >= VFIO_TPH_ST_MAX_COUNT ||
+ tph_st.index + tph_st.count > VFIO_TPH_ST_MAX_COUNT))
+ return -EINVAL;
+
+ cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));
+ if (IS_ERR(cpus))
+ return PTR_ERR(cpus);
+
+ mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
+ TPH_MEM_TYPE_VM;
+ if (!is_set) {
+ for (i = 0; i < tph_st.count; i++) {
+ ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
+ if (ret)
+ goto out;
+ cpus[i] = st;
+ }
+ goto out;
+ }
+
+ for (i = 0; i < tph_st.count; i++) {
+ if (cpus[i] == U32_MAX) {
+ ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, 0);
+ if (ret)
+ goto out;
+ continue;
+ }
+
+ ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
+ if (ret)
+ goto out;
+ ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, st);
+ if (ret)
+ goto out;
+ }
+
+out:
+ if (!is_set && !ret) {
+ if (copy_to_user(&arg->data, cpus,
+ tph_st.count * sizeof(*cpus)))
+ ret = -EFAULT;
+ }
+ if (is_set && ret) {
+ /* Roll back previously programmed entries to 0 */
+ for (j = 0; j < i; j++)
+ pcie_tph_set_st_entry(pdev, tph_st.index + j, 0);
+ }
+ kfree(cpus);
+ return ret;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1576,6 +1665,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_TPH_ST:
+ return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
default:
return -ENOTTY;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..aca39d4e5307 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,47 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * VFIO_DEVICE_FEATURE_TPH_ST - Get/Set PCIe TPH Steering Tag (ST) entries
+ *
+ * Provides userspace interface to manage PCIe TPH ST table entries.
+ * This feature is only available when device TPH is enabled.
+ *
+ * Upon VFIO_DEVICE_FEATURE_SET:
+ * Program contiguous ST table entries from the starting @index.
+ * Valid only for hardware with ST table located in TPH Capability
+ * space or MSI-X table. If an entry CPU ID is specified as U32_MAX,
+ * the corresponding ST entry will be cleared. @index and @count define
+ * the contiguous entry range to be programmed.
+ * If any entry programming fails, the operation will roll back and
+ * clear all entries that were successfully programmed before the error.
+ *
+ * Upon VFIO_DEVICE_FEATURE_GET:
+ * Retrieve the ST value mapped to each given CPU ID in the @data array.
+ * Userspace fills @data with CPU ID array, the interface returns each
+ * CPU's corresponding ST value back in place.
+ * Valid only when TPH DS mode is enabled.
+ *
+ * @flags: Operation flags (VFIO_TPH_ST_MEM_TYPE_*).
+ * @index: Starting ST entry index, only valid for FEATURE_SET.
+ * @count: Number of contiguous entries to access.
+ * @data: Array of CPU IDs for both SET and GET. On SET it programs ST for
+ * each CPU; on GET it returns the mapped ST value of each CPU.
+ *
+ * This feature is gated by enable_unsafe_tph module parameter.
+ */
+#define VFIO_DEVICE_FEATURE_TPH_ST 13
+
+struct vfio_device_feature_tph_st {
+ __u32 flags;
+#define VFIO_TPH_ST_MEM_TYPE_VM (0U << 0)
+#define VFIO_TPH_ST_MEM_TYPE_PM (1U << 0)
+ __u16 index;
+ __u16 count;
+#define VFIO_TPH_ST_MAX_COUNT 2048
+ __u32 data[] __counted_by(count);
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-18 7:16 ` [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-18 7:57 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
1 sibling, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-05-18 7:57 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unchecked return value of pci_read_config_dword() leads to reporting false capabilities and bypassing safety checks on disconnected devices.
--
commit 7d284f4750da4fed3caa1876a0ac6b5f46c96ad9
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Export pcie_tph_get_st_modes() for external use
Export the helper to retrieve supported PCIe TPH steering tag modes so that
drivers like VFIO can query and expose device capabilities to userspace.
Add stub functions for when CONFIG_PCIE_TPH is disabled, and add validation
to prevent invalid configuration space access.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 877cf556242b4..ba31b010f67ad 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
[ ... ]
> +u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
> {
> u32 reg;
>
> + if (!pdev->tph_cap)
> + return 0;
> +
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
> reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
>
> return reg;
> }
> +EXPORT_SYMBOL(pcie_tph_get_st_modes);
Is it possible for pci_read_config_dword() to fail here?
If the device is disconnected or the read fails, reg could be set to
0xFFFFFFFF (PCI_ERROR_RESPONSE). It might also be left uninitialized if the
host controller fails the read without updating the variable.
Because the return value of pci_read_config_dword() is not checked, a value
of 0xFFFFFFFF in reg would cause this function to return 0x7 after masking.
With this function now being exported to external drivers like VFIO, could
this result in falsely exposing all ST modes as supported to userspace
instead of returning 0 on failure?
> @@ -168,6 +180,9 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
> {
> u32 reg;
>
> + if (!pdev->tph_cap)
> + return PCI_TPH_LOC_NONE;
> +
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
>
> return reg & PCI_TPH_CAP_LOC_MASK;
> }
Similarly, if pci_read_config_dword() fails here and sets reg to 0xFFFFFFFF,
this will return PCI_TPH_CAP_LOC_MASK (a reserved location) rather than
PCI_TPH_LOC_NONE.
Could this incorrect return value bypass safety checks that expect
PCI_TPH_LOC_NONE for invalid devices, or cause pcie_enable_tph() to
improperly enable TPH when hardware access has failed?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260518071701.25177-1-fengchengwen@huawei.com?part=2
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper
2026-05-18 7:16 ` [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
@ 2026-05-18 8:14 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
1 sibling, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-05-18 8:14 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] `pcie_tph_enabled_mode()` conflates the "disabled" state with the valid `PCI_TPH_ST_NS_MODE` (No ST Mode), returning 0 for both.
--
commit 054f3de083c771d0143aa6293e77e165e4c44a41
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Add pcie_tph_enabled_mode() helper
This commit adds a helper to query the enabled TPH mode on a PCI device,
useful for drivers like VFIO-PCI that need to validate TPH state.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index ba31b010f67ad..91c1e83410a3e 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> @@ -451,6 +451,18 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
> }
> EXPORT_SYMBOL(pcie_enable_tph);
>
> +/**
> + * pcie_tph_enabled_mode - Get current enabled TPH mode
> + * @pdev: PCI device
> + *
> + * Return the enabled TPH mode (IV/DS) or 0 if disabled.
> + */
> +int pcie_tph_enabled_mode(struct pci_dev *pdev)
> +{
> + return pdev->tph_enabled ? pdev->tph_mode : 0;
Will returning 0 when TPH is disabled cause issues for generic callers
trying to differentiate between a disabled state and an active "No ST Mode"?
Since PCI_TPH_ST_NS_MODE is defined as 0, 0 is a valid active TPH mode
supported and enabled via pcie_enable_tph(pdev, 0).
If a generic driver uses this helper as a boolean check, for example:
if (pcie_tph_enabled_mode(pdev))
pcie_disable_tph(pdev);
it will incorrectly evaluate to false for NS mode, failing to properly
disable TPH.
Could this return a negative error code (e.g., -EINVAL or -ENODATA) when TPH
is disabled to avoid aliasing an active state with the disabled state?
> +}
> +EXPORT_SYMBOL(pcie_tph_enabled_mode);
> +
[ ... ]
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index 5772d48ea4446..28d0fa7621465 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
[ ... ]
> @@ -42,6 +43,8 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
> static inline void pcie_disable_tph(struct pci_dev *pdev) { }
> static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
> { return -EINVAL; }
> +static inline int pcie_tph_enabled_mode(struct pci_dev *pdev)
> +{ return 0; }
Does this stub need to be updated to match as well if a negative error code
is used for the disabled state?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260518071701.25177-1-fengchengwen@huawei.com?part=3
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization
2026-05-18 7:17 ` [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization Chengwen Feng
@ 2026-05-18 8:48 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
1 sibling, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-05-18 8:48 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Unsafe PCI configuration space access during PM state transition (Race condition leading to Unsupported Request).
- [High] Dropped configuration writes to the TPH capability break ST Table and Control Register virtualization.
- [High] Host kernel API overrides the guest's explicit TPH Request Enable state, preventing the guest from disabling TPH requests.
--
commit 5ada31d58b6a34d641ac5392c15e54c3404870a2
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add PCIe TPH configuration space virtualization
This commit adds support for virtualizing the PCIe TPH control register.
It introduces a module parameter to globally control the TPH feature and
mediates TPH control register writes to only allow valid mode settings.
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
[ ... ]
> @@ -313,6 +316,30 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
> return count;
> }
>
> +static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> + int count, struct perm_bits *perm,
> + int offset, __le32 val)
> +{
> + u32 data = le32_to_cpu(val);
> +
> + guard(mutex)(&vdev->tph_lock);
> +
> + if (!enable_unsafe_tph)
> + return count;
> +
> + if (offset != PCI_TPH_CTRL)
> + return count;
Does this function silently discard ST Table and other TPH capability writes?
By unconditionally returning count without actually modifying the virtual
configuration space (vconfig) or the physical hardware, guest writes to the ST
Table, the ST Table Select field, or unaligned byte-writes to the control
register are dropped.
Since there is no .readfn, reads will fetch directly from the physical hardware
returning unconfigured values, which could break guest OS configurations.
> +
> + /* Only permit write TPH mode. */
> + data &= PCI_TPH_CTRL_MODE_SEL_MASK;
> + if (data == PCI_TPH_ST_IV_MODE || data == PCI_TPH_ST_DS_MODE)
> + pcie_enable_tph(vdev->pdev, data);
If the guest clears the 'TPH Requester Enable' field but leaves a valid mode
in PCI_TPH_CTRL_MODE_SEL_MASK, does this call to pcie_enable_tph() override
the guest's request?
pcie_enable_tph() force-enables the feature in physical hardware, which prevents
the guest from disabling it.
Additionally, if the guest changes active modes, pcie_enable_tph() will return
-EBUSY and fail silently without informing the user.
> + else if (data == PCI_TPH_ST_NS_MODE)
> + pcie_disable_tph(vdev->pdev);
> +
> + return count;
> +}
> +
> static struct perm_bits direct_ro_perms = {
[ ... ]
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -736,6 +738,11 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> #endif
> vfio_pci_dma_buf_cleanup(vdev);
>
> + /* Disable TPH when userspace closes the device FD */
> + mutex_lock(&vdev->tph_lock);
> + pcie_disable_tph(vdev->pdev);
> + mutex_unlock(&vdev->tph_lock);
> +
> vfio_pci_core_disable(vdev);
Is it safe to access the PCI config space here?
A malicious or buggy guest could have left the device in the suspended D3 state.
vfio_pci_core_disable() wakes up the device via pm_runtime_resume(), but
calling pcie_disable_tph() before that could access a suspended device. This
would lead to an Unsupported Request (UR) and potentially a host kernel panic.
[ ... ]
> @@ -2205,6 +2214,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> if (!disable_idle_d3)
> pm_runtime_put(dev);
>
> + /* Disable TPH when taking over ownership of the device */
> + pcie_disable_tph(pdev);
> +
Could calling pcie_disable_tph() after pm_runtime_put() risk accessing a
suspended device?
pm_runtime_put() drops the runtime PM reference, allowing the PM core to
asynchronously suspend the device to D3. Accessing the config space of a
suspended device violates PCIe semantics.
> ret = vfio_register_group_dev(&vdev->vdev);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260518071701.25177-1-fengchengwen@huawei.com?part=4
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
2026-05-18 7:17 ` [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management Chengwen Feng
@ 2026-05-18 9:22 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
1 sibling, 0 replies; 22+ messages in thread
From: sashiko-bot @ 2026-05-18 9:22 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The function mixes `guard()`-based cleanup with a `goto`-based cleanup label.
- [Medium] The flag-gated parameter `index` is validated outside the scope of its flag gate.
--
commit 06e9eeaeacf7925e08b8f7df545f7710af906cae
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
This commit introduces VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage
PCIe TPH Steering Tag entries. It supports SET operations to program contiguous
ST entries and GET operations to retrieve ST values per CPU ID.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_tph_st __user *arg,
> + size_t argsz)
> +{
> + bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
> + struct vfio_device_feature_tph_st tph_st;
> + struct pci_dev *pdev = vdev->pdev;
> + enum tph_mem_type mtype;
> + int i, j, ret;
> + u32 *cpus;
> + u16 st;
> +
> + guard(mutex)(&vdev->tph_lock);
[ ... ]
> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
> + return -EFAULT;
> +
> + if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
> + tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)
> + return -EINVAL;
> + if (!is_set && tph_st.index != 0)
> + return -EINVAL;
Since the index field is documented as only valid for FEATURE_SET operations,
could userspace legitimately pass uninitialized memory in this field during
a GET request?
If userspace is not required to zero-fill unused fields, validating that
index is exactly 0 during a GET operation might cause spurious failures
for valid requests.
[ ... ]
> + cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));
> + if (IS_ERR(cpus))
> + return PTR_ERR(cpus);
> +
> + mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
> + TPH_MEM_TYPE_VM;
> + if (!is_set) {
> + for (i = 0; i < tph_st.count; i++) {
> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> + if (ret)
> + goto out;
Is it recommended to mix guard() based cleanup with goto-based error handling
in the same function?
The cleanup guidelines suggest that functions should either use scope-based
cleanups for all resources or convert none of them. Could the cpus array be
managed with __free(kfree) and the rollback handled directly in the error
paths to avoid using goto labels alongside guard()?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260518071701.25177-1-fengchengwen@huawei.com?part=5
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 0/5] vfio/pci: Add PCIe TPH support
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
` (4 preceding siblings ...)
2026-05-18 7:17 ` [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management Chengwen Feng
@ 2026-05-22 1:31 ` fengchengwen
5 siblings, 0 replies; 22+ messages in thread
From: fengchengwen @ 2026-05-22 1:31 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Hi Alex & Jason,
Gentle reminder for this v11 TPH patchset.
Could you help review it at your convenience?
Your comments will help verify the overall design direction and polish the code further.
Thanks
On 5/18/2026 3:16 PM, Chengwen Feng wrote:
> This patchset enables userspace control over PCIe TPH steering tags,
> motivated by the following considerations:
>
> 1. Why userspace needs the capability to control steering tags:
> When PCIe devices are fully owned by userspace workloads such as DPDK
> and SPDK, only userspace has full knowledge of core binding policies
> and traffic distribution strategies. Without this series, userspace
> cannot enable TPH or configure steering tags, leaving built-in PCIe
> performance optimizations unused in high-throughput polling I/O
> scenarios.
>
> 2. Why this interface must be implemented in VFIO:
> VFIO is the standard, secure community solution for granting full
> PCIe device ownership to userspace. Existing kernel TPH interfaces
> are designed purely for in-kernel drivers. For user-owned devices,
> VFIO provides the only isolated and correct path to expose per-device
> TPH management.
>
> TPH supports both IV and DS modes. Since both modes could introduces
> cross-VM isolation risks such as untrusted guests programming arbitrary
> steering tags to impact other domains:
> 1. If ST location in MSI-X table, untrusted guests may program the
> MSI-X table.
> 2. If ST don't locate in MSI-X or CAP, untrusted guests may program the
> device-specific register.
> So a new module parameter `enable_unsafe_tph` is added. It defaults to
> off, and blocks all unsafe TPH operations when disabled.
>
> Based on earlier RFC work by Wathsala Vithanage
>
> v11:
> - Fix Sashiko review comments:
> Add tph_lock for protect all tph operations from vfio user
> Fix memory-leak when copy_to_user failed
> v10:
> - Fix Sashiko review comments:
> - Fix TPH virtualization write return negetive val of 4/5 commit
> - Fix don't support probe feature of 5/5 commit
> - Fix return positive val if copy_to_user fail of 5/5 commit
> v9:
> - Address Alex's comment on VFIO part: adopt device feature scheme
> - Remove pcie_tph_get_st_table_size and add pcie_tph_is_enabled
> depending on the implementation requirements
>
> Chengwen Feng (5):
> PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
> PCI/TPH: Export pcie_tph_get_st_modes() for external use
> PCI/TPH: Add pcie_tph_enabled_mode() helper
> vfio/pci: Add PCIe TPH configuration space virtualization
> vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
>
> drivers/pci/tph.c | 38 +++++++---
> drivers/vfio/pci/vfio_pci.c | 13 +++-
> drivers/vfio/pci/vfio_pci_config.c | 28 ++++++++
> drivers/vfio/pci/vfio_pci_core.c | 107 ++++++++++++++++++++++++++++-
> include/linux/pci-tph.h | 10 +++
> include/linux/vfio_pci_core.h | 4 +-
> include/uapi/linux/vfio.h | 41 +++++++++++
> 7 files changed, 230 insertions(+), 11 deletions(-)
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
2026-05-18 7:17 ` [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management Chengwen Feng
2026-05-18 9:22 ` sashiko-bot
@ 2026-05-22 4:10 ` Alex Williamson
2026-05-22 10:04 ` fengchengwen
1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 4:10 UTC (permalink / raw)
To: Chengwen Feng
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Mon, 18 May 2026 15:17:01 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:
> Add VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage PCIe TPH
> Steering Tag entries.
>
> SET: Program contiguous ST entries when ST table resides in TPH Capability
> or MSI-X table. U32_MAX CPU ID clears the corresponding ST entry.
> GET: Retrieve ST values per CPU ID, only available in DS mode.
>
> Both operations are only valid when TPH is enabled on the device.
What? That doesn't align with the hardware programming model. The
specification even suggests that the device should be quiesced or TPH
disabled during programming of the steering tags for deterministic
behavior.
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 91 ++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 41 ++++++++++++++
> 2 files changed, 132 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 0d0140202280..13a22c1d6ea6 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1558,6 +1558,95 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
> return 0;
> }
>
> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_tph_st __user *arg,
> + size_t argsz)
> +{
> + bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
> + struct vfio_device_feature_tph_st tph_st;
> + struct pci_dev *pdev = vdev->pdev;
> + enum tph_mem_type mtype;
> + int i, j, ret;
> + u32 *cpus;
> + u16 st;
> +
> + guard(mutex)(&vdev->tph_lock);
Wrong place for this lock. Why do we even need it? If a user races
SET/GET with enable/disable, that's their problem.
> +
> + ret = vfio_check_feature(flags, argsz,
> + VFIO_DEVICE_FEATURE_GET |
> + VFIO_DEVICE_FEATURE_SET,
> + sizeof(tph_st));
> + if (ret <= 0)
> + return ret;
> +
> + if (!enable_unsafe_tph ||
Why did we allow the feature to be probe if it can't be used?
> + pcie_tph_enabled_mode(pdev) == PCI_TPH_ST_NS_MODE)
Here's that ambiguity of reporting NS_MODE for disabled, but as above
it's a bogus requirement.
> + return -EOPNOTSUPP;
> + if (!is_set && pcie_tph_enabled_mode(pdev) != PCI_TPH_ST_DS_MODE)
Why? Let the interface be symmetric that GET works regardless of mode.
> + return -EOPNOTSUPP;
> + if (is_set && pcie_tph_get_st_table_loc(pdev) == PCI_TPH_LOC_NONE)
> + return -EOPNOTSUPP;
> +
> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
> + return -EFAULT;
> +
> + if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
> + tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)
Wrong operation for a flags field.
> + return -EINVAL;
> + if (!is_set && tph_st.index != 0)
We can only GET index 0? Why?
> + return -EINVAL;
> + if (is_set && (tph_st.index >= VFIO_TPH_ST_MAX_COUNT ||
> + tph_st.index + tph_st.count > VFIO_TPH_ST_MAX_COUNT))
DS can have more that 2048 steering tags.
> + return -EINVAL;
> +
> + cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));
Use an __aligned_u64 data_uptr like iommufd does and avoid copying this
into the kernel. Isn't this steering tags, why's it named cpus?
> + if (IS_ERR(cpus))
> + return PTR_ERR(cpus);
> +
> + mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
> + TPH_MEM_TYPE_VM;
Then why did we use two flag bits for this?
> + if (!is_set) {
> + for (i = 0; i < tph_st.count; i++) {
> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
My understanding was that GET would return the translated values from a
previous SET, not translate on the fly. I think in order to support
writing STs before enabling TPH, we're going to need to store them
somewhere anyway so that we can push them to the correct location when
TPH is enabled. That means there a buffer for SET/GET regardless of
the mode that's enabled.
> + if (ret)
> + goto out;
> + cpus[i] = st;
> + }
> + goto out;
> + }
> +
> + for (i = 0; i < tph_st.count; i++) {
> + if (cpus[i] == U32_MAX) {
> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, 0);
Why do we need a magic value to set it to zero? Hardware doesn't have
a "clear ST register value".
> + if (ret)
> + goto out;
> + continue;
> + }
> +
> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> + if (ret)
> + goto out;
> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, st);
> + if (ret)
> + goto out;
Clearly broken for DS mode.
> + }
> +
> +out:
> + if (!is_set && !ret) {
> + if (copy_to_user(&arg->data, cpus,
> + tph_st.count * sizeof(*cpus)))
> + ret = -EFAULT;
> + }
> + if (is_set && ret) {
> + /* Roll back previously programmed entries to 0 */
> + for (j = 0; j < i; j++)
> + pcie_tph_set_st_entry(pdev, tph_st.index + j, 0);
The magic value is zero, not U32_MAX. Thanks,
Alex
> + }
> + kfree(cpus);
> + return ret;
> +}
> +
> int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> void __user *arg, size_t argsz)
> {
> @@ -1576,6 +1665,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
> case VFIO_DEVICE_FEATURE_DMA_BUF:
> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
> + case VFIO_DEVICE_FEATURE_TPH_ST:
> + return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
> default:
> return -ENOTTY;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5de618a3a5ee..aca39d4e5307 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1534,6 +1534,47 @@ struct vfio_device_feature_dma_buf {
> */
> #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
>
> +/**
> + * VFIO_DEVICE_FEATURE_TPH_ST - Get/Set PCIe TPH Steering Tag (ST) entries
> + *
> + * Provides userspace interface to manage PCIe TPH ST table entries.
> + * This feature is only available when device TPH is enabled.
> + *
> + * Upon VFIO_DEVICE_FEATURE_SET:
> + * Program contiguous ST table entries from the starting @index.
> + * Valid only for hardware with ST table located in TPH Capability
> + * space or MSI-X table. If an entry CPU ID is specified as U32_MAX,
> + * the corresponding ST entry will be cleared. @index and @count define
> + * the contiguous entry range to be programmed.
> + * If any entry programming fails, the operation will roll back and
> + * clear all entries that were successfully programmed before the error.
> + *
> + * Upon VFIO_DEVICE_FEATURE_GET:
> + * Retrieve the ST value mapped to each given CPU ID in the @data array.
> + * Userspace fills @data with CPU ID array, the interface returns each
> + * CPU's corresponding ST value back in place.
> + * Valid only when TPH DS mode is enabled.
> + *
> + * @flags: Operation flags (VFIO_TPH_ST_MEM_TYPE_*).
> + * @index: Starting ST entry index, only valid for FEATURE_SET.
> + * @count: Number of contiguous entries to access.
> + * @data: Array of CPU IDs for both SET and GET. On SET it programs ST for
> + * each CPU; on GET it returns the mapped ST value of each CPU.
> + *
> + * This feature is gated by enable_unsafe_tph module parameter.
> + */
> +#define VFIO_DEVICE_FEATURE_TPH_ST 13
> +
> +struct vfio_device_feature_tph_st {
> + __u32 flags;
> +#define VFIO_TPH_ST_MEM_TYPE_VM (0U << 0)
> +#define VFIO_TPH_ST_MEM_TYPE_PM (1U << 0)
> + __u16 index;
> + __u16 count;
> +#define VFIO_TPH_ST_MAX_COUNT 2048
> + __u32 data[] __counted_by(count);
> +};
> +
> /* -------- API for Type1 VFIO IOMMU -------- */
>
> /**
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization
2026-05-18 7:17 ` [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization Chengwen Feng
2026-05-18 8:48 ` sashiko-bot
@ 2026-05-22 4:10 ` Alex Williamson
2026-05-22 9:39 ` fengchengwen
1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 4:10 UTC (permalink / raw)
To: Chengwen Feng
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Mon, 18 May 2026 15:17:00 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:
> Add support for virtualizing the PCIe TPH (Transaction Processing Hints)
> control register. TPH may break platform isolation, so add a module
> parameter "enable_unsafe_tph" to allow administrators to globally control
> this feature.
>
> TPH control register writes are mediated to only allow valid mode settings,
> and TPH is automatically disabled when VFIO takes ownership of the device
> or when userspace closes the device file descriptor.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 13 ++++++++++++-
> drivers/vfio/pci/vfio_pci_config.c | 28 ++++++++++++++++++++++++++++
> drivers/vfio/pci/vfio_pci_core.c | 16 +++++++++++++++-
> include/linux/vfio_pci_core.h | 4 +++-
> 4 files changed, 58 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 0c771064c0b8..6d73668459cf 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -60,6 +60,12 @@ static bool disable_denylist;
> module_param(disable_denylist, bool, 0444);
> MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
>
> +#ifdef CONFIG_PCIE_TPH
> +static bool enable_unsafe_tph;
> +module_param(enable_unsafe_tph, bool, 0444);
> +MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
> +#endif
> +
> static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
> {
> switch (pdev->vendor) {
> @@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
> {
> int ret;
> bool is_disable_vga = true;
> + bool is_enable_unsafe_tph = false;
>
> #ifdef CONFIG_VFIO_PCI_VGA
> is_disable_vga = disable_vga;
> #endif
> +#ifdef CONFIG_PCIE_TPH
> + is_enable_unsafe_tph = enable_unsafe_tph;
> +#endif
>
> - vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
> + vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
> + is_enable_unsafe_tph);
>
> /* Register and scan for devices */
> ret = pci_register_driver(&vfio_pci_driver);
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3..6da35c2313b9 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -22,6 +22,7 @@
>
> #include <linux/fs.h>
> #include <linux/pci.h>
> +#include <linux/pci-tph.h>
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/slab.h>
> @@ -35,6 +36,8 @@
> ((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
> (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
>
> +extern bool enable_unsafe_tph;
> +
> /*
> * Lengths of PCI Config Capabilities
> * 0: Removed from the user visible capability list
> @@ -313,6 +316,30 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
> return count;
> }
>
> +static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> + int count, struct perm_bits *perm,
> + int offset, __le32 val)
> +{
> + u32 data = le32_to_cpu(val);
> +
> + guard(mutex)(&vdev->tph_lock);
> +
> + if (!enable_unsafe_tph)
> + return count;
> +
> + if (offset != PCI_TPH_CTRL)
> + return count;
> +
> + /* Only permit write TPH mode. */
> + data &= PCI_TPH_CTRL_MODE_SEL_MASK;
> + if (data == PCI_TPH_ST_IV_MODE || data == PCI_TPH_ST_DS_MODE)
> + pcie_enable_tph(vdev->pdev, data);
> + else if (data == PCI_TPH_ST_NS_MODE)
> + pcie_disable_tph(vdev->pdev);
> +
> + return count;
> +}
I don't understand the virtualization here, the Requester Enable bits
control what's actually enabled, not the Mode Selection bits. All
three of these are valid modes that can be enabled, why aren't they
toggled by the value written to PCI_TPH_CTRL_REQ_EN_MASK?
Isn't this patch ordering wrong, why are we allowing tph to be enabled
and disabled before we've provided a mechanism to set/get steering tags?
> +
> static struct perm_bits direct_ro_perms = {
> .readfn = vfio_direct_config_read,
> };
> @@ -1121,6 +1148,7 @@ int __init vfio_pci_init_perm_bits(void)
> ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
> ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
> ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
> + ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_pci_tph_config_write;
> ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
>
> if (ret)
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 050e7542952e..0d0140202280 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -29,6 +29,7 @@
> #include <linux/sched/mm.h>
> #include <linux/iommufd.h>
> #include <linux/pci-p2pdma.h>
> +#include <linux/pci-tph.h>
> #if IS_ENABLED(CONFIG_EEH)
> #include <asm/eeh.h>
> #endif
> @@ -41,6 +42,7 @@
> static bool nointxmask;
> static bool disable_vga;
> static bool disable_idle_d3;
> +bool enable_unsafe_tph;
>
> static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
> {
> @@ -771,6 +773,11 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> #endif
> vfio_pci_dma_buf_cleanup(vdev);
>
> + /* Disable TPH when userspace closes the device FD */
> + mutex_lock(&vdev->tph_lock);
> + pcie_disable_tph(vdev->pdev);
> + mutex_unlock(&vdev->tph_lock);
> +
We generally don't need to lock on device close, there's nothing it can
race with. Also this should happen in the below function after the
device is known to be back in D0.
In general the tph_lock seems more like it's protecting PCI-core since
tph enable/disable doesn't serialize itself.
> vfio_pci_core_disable(vdev);
>
> mutex_lock(&vdev->igate);
> @@ -2132,6 +2139,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> mutex_init(&vdev->igate);
> spin_lock_init(&vdev->irqlock);
> mutex_init(&vdev->ioeventfds_lock);
> + mutex_init(&vdev->tph_lock);
> INIT_LIST_HEAD(&vdev->dummy_resources_list);
> INIT_LIST_HEAD(&vdev->ioeventfds_list);
> INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> @@ -2151,6 +2159,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
> struct vfio_pci_core_device *vdev =
> container_of(core_vdev, struct vfio_pci_core_device, vdev);
>
> + mutex_destroy(&vdev->tph_lock);
> mutex_destroy(&vdev->igate);
> mutex_destroy(&vdev->ioeventfds_lock);
> kfree(vdev->region);
> @@ -2240,6 +2249,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> if (!disable_idle_d3)
> pm_runtime_put(dev);
>
> + /* Disable TPH when taking over ownership of the device */
> + pcie_disable_tph(pdev);
> +
Seems like a vfio_pci_core_enable() operation, why do we care if it's
enabled here? Thanks,
Alex
> ret = vfio_register_group_dev(&vdev->vdev);
> if (ret)
> goto out_power;
> @@ -2605,11 +2617,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
> }
>
> void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
> - bool is_disable_idle_d3)
> + bool is_disable_idle_d3,
> + bool is_enable_unsafe_tph)
> {
> nointxmask = is_nointxmask;
> disable_vga = is_disable_vga;
> disable_idle_d3 = is_disable_idle_d3;
> + enable_unsafe_tph = is_enable_unsafe_tph;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
>
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 89165b769e5c..741d5bcc7ba3 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -142,6 +142,7 @@ struct vfio_pci_core_device {
> struct notifier_block nb;
> struct rw_semaphore memory_lock;
> struct list_head dmabufs;
> + struct mutex tph_lock;
> };
>
> enum vfio_pci_io_width {
> @@ -157,7 +158,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
> const struct vfio_pci_regops *ops,
> size_t size, u32 flags, void *data);
> void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
> - bool is_disable_idle_d3);
> + bool is_disable_idle_d3,
> + bool is_enable_unsafe_tph);
> void vfio_pci_core_close_device(struct vfio_device *core_vdev);
> int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
> void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper
2026-05-18 7:16 ` [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
2026-05-18 8:14 ` sashiko-bot
@ 2026-05-22 4:10 ` Alex Williamson
2026-05-22 9:18 ` fengchengwen
1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 4:10 UTC (permalink / raw)
To: Chengwen Feng
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Mon, 18 May 2026 15:16:59 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:
> Add a helper to query enabled TPH mode on a PCI device. This is useful for
> drivers like VFIO-PCI that need to validate TPH state before allowing
> access to steering tag tables.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> drivers/pci/tph.c | 12 ++++++++++++
> include/linux/pci-tph.h | 3 +++
> 2 files changed, 15 insertions(+)
>
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index ba31b010f67a..91c1e83410a3 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -451,6 +451,18 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
> }
> EXPORT_SYMBOL(pcie_enable_tph);
>
> +/**
> + * pcie_tph_enabled_mode - Get current enabled TPH mode
> + * @pdev: PCI device
> + *
> + * Return the enabled TPH mode (IV/DS) or 0 if disabled.
> + */
> +int pcie_tph_enabled_mode(struct pci_dev *pdev)
> +{
> + return pdev->tph_enabled ? pdev->tph_mode : 0;
> +}
> +EXPORT_SYMBOL(pcie_tph_enabled_mode);
Doesn't this create ambiguity with PCI_TPH_ST_NS_MODE? Maybe return an
-errno if not enabled. Duplicate same for stub below. Thanks,
Alex
> +
> void pci_restore_tph_state(struct pci_dev *pdev)
> {
> struct pci_cap_saved_state *save_state;
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index 5772d48ea444..28d0fa762146 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
> @@ -28,6 +28,7 @@ int pcie_tph_get_cpu_st(struct pci_dev *dev,
> unsigned int cpu, u16 *tag);
> void pcie_disable_tph(struct pci_dev *pdev);
> int pcie_enable_tph(struct pci_dev *pdev, int mode);
> +int pcie_tph_enabled_mode(struct pci_dev *pdev);
> u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
> u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
> u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
> @@ -42,6 +43,8 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
> static inline void pcie_disable_tph(struct pci_dev *pdev) { }
> static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
> { return -EINVAL; }
> +static inline int pcie_tph_enabled_mode(struct pci_dev *pdev)
> +{ return 0; }
> static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
> { return 0; }
> static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-18 7:16 ` [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-18 7:57 ` sashiko-bot
@ 2026-05-22 4:10 ` Alex Williamson
1 sibling, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 4:10 UTC (permalink / raw)
To: Chengwen Feng
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Mon, 18 May 2026 15:16:58 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:
> Export the helper to retrieve supported PCIe TPH steering tag modes so
> that drivers like VFIO can query and expose device capabilities to
> userspace.
>
> Add stub functions for pcie_tph_get_st_table_size() and
> pcie_tph_get_st_table_loc() when !CONFIG_PCIE_TPH.
>
> Add tph_cap validation for pcie_tph_get_st_modes() and
> pcie_tph_get_st_table_loc() to prevent invalid PCI configuration
> space access when TPH is not supported.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/tph.c | 19 +++++++++++++++++--
> include/linux/pci-tph.h | 7 +++++++
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 877cf556242b..ba31b010f67a 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
> pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
> }
>
> -static u8 get_st_modes(struct pci_dev *pdev)
> +/**
> + * pcie_tph_get_st_modes - Get supported Steering Tag modes
> + * @pdev: PCI device to query
> + *
> + * Return:
> + * Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
> + * PCI_TPH_CAP_ST_DS)
> + */
> +u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
> {
> u32 reg;
>
> + if (!pdev->tph_cap)
> + return 0;
> +
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
> reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
>
> return reg;
> }
> +EXPORT_SYMBOL(pcie_tph_get_st_modes);
>
> /**
> * pcie_tph_get_st_table_loc - Return the device's ST table location
> @@ -168,6 +180,9 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
> {
> u32 reg;
>
> + if (!pdev->tph_cap)
> + return PCI_TPH_LOC_NONE;
> +
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
>
> return reg & PCI_TPH_CAP_LOC_MASK;
> @@ -395,7 +410,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
>
> /* Sanitize and check ST mode compatibility */
> mode &= PCI_TPH_CTRL_MODE_SEL_MASK;
> - dev_modes = get_st_modes(pdev);
> + dev_modes = pcie_tph_get_st_modes(pdev);
> if (!((1 << mode) & dev_modes))
> return -EINVAL;
>
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index be68cd17f2f8..5772d48ea444 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
> @@ -30,6 +30,7 @@ void pcie_disable_tph(struct pci_dev *pdev);
> int pcie_enable_tph(struct pci_dev *pdev, int mode);
> u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
> u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
> +u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
> #else
> static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
> unsigned int index, u16 tag)
> @@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
> static inline void pcie_disable_tph(struct pci_dev *pdev) { }
> static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
> { return -EINVAL; }
> +static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
> +{ return 0; }
> +static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
> +{ return 0; }
Functionally the same, but this should be PCI_TPH_LOC_NONE for
consistency.
It seems there are a couple instances above where reg is passed
uninitialized to pci_read_config_dword() where the return value is
untested and the function could operate on garbage stack data.
Existing issue. Thanks,
Alex
> +static inline u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
> +{ return 0; }
> #endif
>
> #endif /* LINUX_PCI_TPH_H */
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper
2026-05-22 4:10 ` Alex Williamson
@ 2026-05-22 9:18 ` fengchengwen
2026-05-22 14:00 ` Alex Williamson
0 siblings, 1 reply; 22+ messages in thread
From: fengchengwen @ 2026-05-22 9:18 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci
On 5/22/2026 12:10 PM, Alex Williamson wrote:
> On Mon, 18 May 2026 15:16:59 +0800
> Chengwen Feng <fengchengwen@huawei.com> wrote:
>
>> Add a helper to query enabled TPH mode on a PCI device. This is useful for
>> drivers like VFIO-PCI that need to validate TPH state before allowing
>> access to steering tag tables.
>>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>> drivers/pci/tph.c | 12 ++++++++++++
>> include/linux/pci-tph.h | 3 +++
>> 2 files changed, 15 insertions(+)
>>
>> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
>> index ba31b010f67a..91c1e83410a3 100644
>> --- a/drivers/pci/tph.c
>> +++ b/drivers/pci/tph.c
>> @@ -451,6 +451,18 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
>> }
>> EXPORT_SYMBOL(pcie_enable_tph);
>>
>> +/**
>> + * pcie_tph_enabled_mode - Get current enabled TPH mode
>> + * @pdev: PCI device
>> + *
>> + * Return the enabled TPH mode (IV/DS) or 0 if disabled.
>> + */
>> +int pcie_tph_enabled_mode(struct pci_dev *pdev)
>> +{
>> + return pdev->tph_enabled ? pdev->tph_mode : 0;
>> +}
>> +EXPORT_SYMBOL(pcie_tph_enabled_mode);
>
> Doesn't this create ambiguity with PCI_TPH_ST_NS_MODE? Maybe return an
> -errno if not enabled. Duplicate same for stub below. Thanks,
Because pcie_enable_tph(PCI_TPH_ST_NS_MODE) could also enable TPH,
but PCI_TPH_ST_NS_MODE mean "The function must use a value of all zeros for all Steering Tags"
So I think pcie_enable_tph() should treat PCI_TPH_ST_NS_MODE as a
invalid input.
Thanks
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization
2026-05-22 4:10 ` Alex Williamson
@ 2026-05-22 9:39 ` fengchengwen
2026-05-22 14:09 ` Alex Williamson
0 siblings, 1 reply; 22+ messages in thread
From: fengchengwen @ 2026-05-22 9:39 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci
On 5/22/2026 12:10 PM, Alex Williamson wrote:
> On Mon, 18 May 2026 15:17:00 +0800
> Chengwen Feng <fengchengwen@huawei.com> wrote:
>
>> Add support for virtualizing the PCIe TPH (Transaction Processing Hints)
>> control register. TPH may break platform isolation, so add a module
>> parameter "enable_unsafe_tph" to allow administrators to globally control
>> this feature.
>>
>> TPH control register writes are mediated to only allow valid mode settings,
>> and TPH is automatically disabled when VFIO takes ownership of the device
>> or when userspace closes the device file descriptor.
>>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>> drivers/vfio/pci/vfio_pci.c | 13 ++++++++++++-
>> drivers/vfio/pci/vfio_pci_config.c | 28 ++++++++++++++++++++++++++++
>> drivers/vfio/pci/vfio_pci_core.c | 16 +++++++++++++++-
>> include/linux/vfio_pci_core.h | 4 +++-
>> 4 files changed, 58 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 0c771064c0b8..6d73668459cf 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -60,6 +60,12 @@ static bool disable_denylist;
>> module_param(disable_denylist, bool, 0444);
>> MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
>>
>> +#ifdef CONFIG_PCIE_TPH
>> +static bool enable_unsafe_tph;
>> +module_param(enable_unsafe_tph, bool, 0444);
>> +MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
>> +#endif
>> +
>> static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
>> {
>> switch (pdev->vendor) {
>> @@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
>> {
>> int ret;
>> bool is_disable_vga = true;
>> + bool is_enable_unsafe_tph = false;
>>
>> #ifdef CONFIG_VFIO_PCI_VGA
>> is_disable_vga = disable_vga;
>> #endif
>> +#ifdef CONFIG_PCIE_TPH
>> + is_enable_unsafe_tph = enable_unsafe_tph;
>> +#endif
>>
>> - vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
>> + vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
>> + is_enable_unsafe_tph);
>>
>> /* Register and scan for devices */
>> ret = pci_register_driver(&vfio_pci_driver);
>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
>> index a10ed733f0e3..6da35c2313b9 100644
>> --- a/drivers/vfio/pci/vfio_pci_config.c
>> +++ b/drivers/vfio/pci/vfio_pci_config.c
>> @@ -22,6 +22,7 @@
>>
>> #include <linux/fs.h>
>> #include <linux/pci.h>
>> +#include <linux/pci-tph.h>
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/slab.h>
>> @@ -35,6 +36,8 @@
>> ((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
>> (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
>>
>> +extern bool enable_unsafe_tph;
>> +
>> /*
>> * Lengths of PCI Config Capabilities
>> * 0: Removed from the user visible capability list
>> @@ -313,6 +316,30 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
>> return count;
>> }
>>
>> +static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
>> + int count, struct perm_bits *perm,
>> + int offset, __le32 val)
>> +{
>> + u32 data = le32_to_cpu(val);
>> +
>> + guard(mutex)(&vdev->tph_lock);
>> +
>> + if (!enable_unsafe_tph)
>> + return count;
>> +
>> + if (offset != PCI_TPH_CTRL)
>> + return count;
>> +
>> + /* Only permit write TPH mode. */
>> + data &= PCI_TPH_CTRL_MODE_SEL_MASK;
>> + if (data == PCI_TPH_ST_IV_MODE || data == PCI_TPH_ST_DS_MODE)
>> + pcie_enable_tph(vdev->pdev, data);
>> + else if (data == PCI_TPH_ST_NS_MODE)
>> + pcie_disable_tph(vdev->pdev);
>> +
>> + return count;
>> +}
>
> I don't understand the virtualization here, the Requester Enable bits
> control what's actually enabled, not the Mode Selection bits. All
> three of these are valid modes that can be enabled, why aren't they
> toggled by the value written to PCI_TPH_CTRL_REQ_EN_MASK?
OK, will fix in next version
{
...
mode = data & PCI_TPH_CTRL_MODE_SEL_MASK;
req_en = data & PCI_TPH_CTRL_REQ_EN_MASK;
if (req_en && (mode == PCI_TPH_ST_IV_MODE || mode == PCI_TPH_ST_DS_MODE))
pcie_enable_tph(vdev->pdev, data);
else if (!req_en)
pcie_disable_tph(vdev->pdev);
...
}
>
> Isn't this patch ordering wrong, why are we allowing tph to be enabled
> and disabled before we've provided a mechanism to set/get steering tags?
Regarding patch order: TPH must be enabled first before accessing steering tag entries.
The tag query interfaces rely on runtime TPH state and request type info stored in device struct,
which is only initialized after successful TPH enabling. So current order is functional necessary.
>
>> +
>> static struct perm_bits direct_ro_perms = {
>> .readfn = vfio_direct_config_read,
>> };
>> @@ -1121,6 +1148,7 @@ int __init vfio_pci_init_perm_bits(void)
>> ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
>> ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
>> ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
>> + ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_pci_tph_config_write;
>> ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
>>
>> if (ret)
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 050e7542952e..0d0140202280 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -29,6 +29,7 @@
>> #include <linux/sched/mm.h>
>> #include <linux/iommufd.h>
>> #include <linux/pci-p2pdma.h>
>> +#include <linux/pci-tph.h>
>> #if IS_ENABLED(CONFIG_EEH)
>> #include <asm/eeh.h>
>> #endif
>> @@ -41,6 +42,7 @@
>> static bool nointxmask;
>> static bool disable_vga;
>> static bool disable_idle_d3;
>> +bool enable_unsafe_tph;
>>
>> static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
>> {
>> @@ -771,6 +773,11 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>> #endif
>> vfio_pci_dma_buf_cleanup(vdev);
>>
>> + /* Disable TPH when userspace closes the device FD */
>> + mutex_lock(&vdev->tph_lock);
>> + pcie_disable_tph(vdev->pdev);
>> + mutex_unlock(&vdev->tph_lock);
>> +
>
> We generally don't need to lock on device close, there's nothing it can
> race with. Also this should happen in the below function after the
> device is known to be back in D0.
I'll remove unnecessary mutex lock in device close path as there is no concurrent race risk.
Also adjust TPH disable invocation to after device returns to D0 state.
>
> In general the tph_lock seems more like it's protecting PCI-core since
> tph enable/disable doesn't serialize itself.
The tph_lock is designed to serialize all userspace triggered TPH operations on the same device.
>
>> vfio_pci_core_disable(vdev);
>>
>> mutex_lock(&vdev->igate);
>> @@ -2132,6 +2139,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
>> mutex_init(&vdev->igate);
>> spin_lock_init(&vdev->irqlock);
>> mutex_init(&vdev->ioeventfds_lock);
>> + mutex_init(&vdev->tph_lock);
>> INIT_LIST_HEAD(&vdev->dummy_resources_list);
>> INIT_LIST_HEAD(&vdev->ioeventfds_list);
>> INIT_LIST_HEAD(&vdev->sriov_pfs_item);
>> @@ -2151,6 +2159,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
>> struct vfio_pci_core_device *vdev =
>> container_of(core_vdev, struct vfio_pci_core_device, vdev);
>>
>> + mutex_destroy(&vdev->tph_lock);
>> mutex_destroy(&vdev->igate);
>> mutex_destroy(&vdev->ioeventfds_lock);
>> kfree(vdev->region);
>> @@ -2240,6 +2249,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
>> if (!disable_idle_d3)
>> pm_runtime_put(dev);
>>
>> + /* Disable TPH when taking over ownership of the device */
>> + pcie_disable_tph(pdev);
>> +
>
> Seems like a vfio_pci_core_enable() operation, why do we care if it's
> enabled here? Thanks,
For TPH disable during device takeover: VFIO ownership transfer triggers device reset,
which clears hardware TPH registers and ST tables. We call pcie_disable_tph to synchronize
software state with hardware reset status.
Thanks
>
> Alex
>
>> ret = vfio_register_group_dev(&vdev->vdev);
>> if (ret)
>> goto out_power;
>> @@ -2605,11 +2617,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
>> }
>>
>> void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
>> - bool is_disable_idle_d3)
>> + bool is_disable_idle_d3,
>> + bool is_enable_unsafe_tph)
>> {
>> nointxmask = is_nointxmask;
>> disable_vga = is_disable_vga;
>> disable_idle_d3 = is_disable_idle_d3;
>> + enable_unsafe_tph = is_enable_unsafe_tph;
>> }
>> EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
>>
>> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
>> index 89165b769e5c..741d5bcc7ba3 100644
>> --- a/include/linux/vfio_pci_core.h
>> +++ b/include/linux/vfio_pci_core.h
>> @@ -142,6 +142,7 @@ struct vfio_pci_core_device {
>> struct notifier_block nb;
>> struct rw_semaphore memory_lock;
>> struct list_head dmabufs;
>> + struct mutex tph_lock;
>> };
>>
>> enum vfio_pci_io_width {
>> @@ -157,7 +158,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
>> const struct vfio_pci_regops *ops,
>> size_t size, u32 flags, void *data);
>> void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
>> - bool is_disable_idle_d3);
>> + bool is_disable_idle_d3,
>> + bool is_enable_unsafe_tph);
>> void vfio_pci_core_close_device(struct vfio_device *core_vdev);
>> int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
>> void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
2026-05-22 4:10 ` Alex Williamson
@ 2026-05-22 10:04 ` fengchengwen
2026-05-22 14:27 ` Alex Williamson
0 siblings, 1 reply; 22+ messages in thread
From: fengchengwen @ 2026-05-22 10:04 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci
On 5/22/2026 12:10 PM, Alex Williamson wrote:
> On Mon, 18 May 2026 15:17:01 +0800
> Chengwen Feng <fengchengwen@huawei.com> wrote:
>
>> Add VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage PCIe TPH
>> Steering Tag entries.
>>
>> SET: Program contiguous ST entries when ST table resides in TPH Capability
>> or MSI-X table. U32_MAX CPU ID clears the corresponding ST entry.
>> GET: Retrieve ST values per CPU ID, only available in DS mode.
>>
>> Both operations are only valid when TPH is enabled on the device.
>
> What? That doesn't align with the hardware programming model. The
> specification even suggests that the device should be quiesced or TPH
> disabled during programming of the steering tags for deterministic
> behavior.
As mentioned in the 4/5 commit, obtaining ST must first enable TPH.
Of course, I think it might be necessary to calculate pdev->tph_req_type
during the capability detection phase, so that we can modify ST before
enabling TPH. Do you think such a modification is needed?
>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>> drivers/vfio/pci/vfio_pci_core.c | 91 ++++++++++++++++++++++++++++++++
>> include/uapi/linux/vfio.h | 41 ++++++++++++++
>> 2 files changed, 132 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 0d0140202280..13a22c1d6ea6 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1558,6 +1558,95 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
>> return 0;
>> }
>>
>> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
>> + u32 flags,
>> + struct vfio_device_feature_tph_st __user *arg,
>> + size_t argsz)
>> +{
>> + bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
>> + struct vfio_device_feature_tph_st tph_st;
>> + struct pci_dev *pdev = vdev->pdev;
>> + enum tph_mem_type mtype;
>> + int i, j, ret;
>> + u32 *cpus;
>> + u16 st;
>> +
>> + guard(mutex)(&vdev->tph_lock);
>
> Wrong place for this lock. Why do we even need it? If a user races
> SET/GET with enable/disable, that's their problem.
OK, will fix in next version
>
>> +
>> + ret = vfio_check_feature(flags, argsz,
>> + VFIO_DEVICE_FEATURE_GET |
>> + VFIO_DEVICE_FEATURE_SET,
>> + sizeof(tph_st));
>> + if (ret <= 0)
>> + return ret;
>> +
>> + if (!enable_unsafe_tph ||
>
> Why did we allow the feature to be probe if it can't be used?
OK, will fix in next version
>
>> + pcie_tph_enabled_mode(pdev) == PCI_TPH_ST_NS_MODE)
>
> Here's that ambiguity of reporting NS_MODE for disabled, but as above
> it's a bogus requirement.
OK, will fix in next version
>
>> + return -EOPNOTSUPP;
>> + if (!is_set && pcie_tph_enabled_mode(pdev) != PCI_TPH_ST_DS_MODE)
>
> Why? Let the interface be symmetric that GET works regardless of mode.
OK, will remove the restrict
>
>> + return -EOPNOTSUPP;
>> + if (is_set && pcie_tph_get_st_table_loc(pdev) == PCI_TPH_LOC_NONE)
>> + return -EOPNOTSUPP;
>> +
>> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
>> + return -EFAULT;
>> +
>> + if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
>> + tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)
>
> Wrong operation for a flags field.
Because currently flags only use bit0 (0-VM, 1-PM), this judge make sure
it will return error if userspace passing non-zero in remain bits.
>
>> + return -EINVAL;
>> + if (!is_set && tph_st.index != 0)
>
> We can only GET index 0? Why?
Because GET operation don't need index, this judgment make sure
it will return error if userspace passing non-zero index.
>
>> + return -EINVAL;
>> + if (is_set && (tph_st.index >= VFIO_TPH_ST_MAX_COUNT ||
>> + tph_st.index + tph_st.count > VFIO_TPH_ST_MAX_COUNT))
>
> DS can have more that 2048 steering tags.
I check the PCIE protocol, DS mode with ST in pcie-config or msi-x must not
hold > 2048 steering tags.
>
>> + return -EINVAL;
>> +
>> + cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));
>
> Use an __aligned_u64 data_uptr like iommufd does and avoid copying this
> into the kernel. Isn't this steering tags, why's it named cpus?
we reuse the field which according your previous suggestion:
in GET, user pass in cpus, and vfio will feedback steering tags in cpus.
>
>> + if (IS_ERR(cpus))
>> + return PTR_ERR(cpus);
>> +
>> + mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
>> + TPH_MEM_TYPE_VM;
>
> Then why did we use two flag bits for this?
VFIO_TPH_ST_MEM_TYPE_PM is just one bit, could you explain the two flag bits?
>
>> + if (!is_set) {
>> + for (i = 0; i < tph_st.count; i++) {
>> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
>
> My understanding was that GET would return the translated values from a
> previous SET, not translate on the fly. I think in order to support
> writing STs before enabling TPH, we're going to need to store them
> somewhere anyway so that we can push them to the correct location when
> TPH is enabled. That means there a buffer for SET/GET regardless of
> the mode that's enabled.
OK, will try to fix
>
>> + if (ret)
>> + goto out;
>> + cpus[i] = st;
>> + }
>> + goto out;
>> + }
>> +
>> + for (i = 0; i < tph_st.count; i++) {
>> + if (cpus[i] == U32_MAX) {
>> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, 0);
>
> Why do we need a magic value to set it to zero? Hardware doesn't have
> a "clear ST register value".
So you mean for SET, usespace direct passing steering-tag not cpus in this patch?
>
>> + if (ret)
>> + goto out;
>> + continue;
>> + }
>> +
>> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
>> + if (ret)
>> + goto out;
>> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, st);
>> + if (ret)
>> + goto out;
>
> Clearly broken for DS mode.
For DS mode:
1\ if ST table in CONFIG or MSI-X, then it could done by this way
2\ if ST table not in CONFIG or MSI-X, then it only need invoke GET, and set by device-specific way.
Thanks
>
>> + }
>> +
>> +out:
>> + if (!is_set && !ret) {
>> + if (copy_to_user(&arg->data, cpus,
>> + tph_st.count * sizeof(*cpus)))
>> + ret = -EFAULT;
>> + }
>> + if (is_set && ret) {
>> + /* Roll back previously programmed entries to 0 */
>> + for (j = 0; j < i; j++)
>> + pcie_tph_set_st_entry(pdev, tph_st.index + j, 0);
>
> The magic value is zero, not U32_MAX. Thanks,
>
> Alex
>
>> + }
>> + kfree(cpus);
>> + return ret;
>> +}
>> +
>> int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>> void __user *arg, size_t argsz)
>> {
>> @@ -1576,6 +1665,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
>> case VFIO_DEVICE_FEATURE_DMA_BUF:
>> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
>> + case VFIO_DEVICE_FEATURE_TPH_ST:
>> + return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
>> default:
>> return -ENOTTY;
>> }
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 5de618a3a5ee..aca39d4e5307 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -1534,6 +1534,47 @@ struct vfio_device_feature_dma_buf {
>> */
>> #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
>>
>> +/**
>> + * VFIO_DEVICE_FEATURE_TPH_ST - Get/Set PCIe TPH Steering Tag (ST) entries
>> + *
>> + * Provides userspace interface to manage PCIe TPH ST table entries.
>> + * This feature is only available when device TPH is enabled.
>> + *
>> + * Upon VFIO_DEVICE_FEATURE_SET:
>> + * Program contiguous ST table entries from the starting @index.
>> + * Valid only for hardware with ST table located in TPH Capability
>> + * space or MSI-X table. If an entry CPU ID is specified as U32_MAX,
>> + * the corresponding ST entry will be cleared. @index and @count define
>> + * the contiguous entry range to be programmed.
>> + * If any entry programming fails, the operation will roll back and
>> + * clear all entries that were successfully programmed before the error.
>> + *
>> + * Upon VFIO_DEVICE_FEATURE_GET:
>> + * Retrieve the ST value mapped to each given CPU ID in the @data array.
>> + * Userspace fills @data with CPU ID array, the interface returns each
>> + * CPU's corresponding ST value back in place.
>> + * Valid only when TPH DS mode is enabled.
>> + *
>> + * @flags: Operation flags (VFIO_TPH_ST_MEM_TYPE_*).
>> + * @index: Starting ST entry index, only valid for FEATURE_SET.
>> + * @count: Number of contiguous entries to access.
>> + * @data: Array of CPU IDs for both SET and GET. On SET it programs ST for
>> + * each CPU; on GET it returns the mapped ST value of each CPU.
>> + *
>> + * This feature is gated by enable_unsafe_tph module parameter.
>> + */
>> +#define VFIO_DEVICE_FEATURE_TPH_ST 13
>> +
>> +struct vfio_device_feature_tph_st {
>> + __u32 flags;
>> +#define VFIO_TPH_ST_MEM_TYPE_VM (0U << 0)
>> +#define VFIO_TPH_ST_MEM_TYPE_PM (1U << 0)
>> + __u16 index;
>> + __u16 count;
>> +#define VFIO_TPH_ST_MAX_COUNT 2048
>> + __u32 data[] __counted_by(count);
>> +};
>> +
>> /* -------- API for Type1 VFIO IOMMU -------- */
>>
>> /**
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper
2026-05-22 9:18 ` fengchengwen
@ 2026-05-22 14:00 ` Alex Williamson
0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 14:00 UTC (permalink / raw)
To: fengchengwen
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Fri, 22 May 2026 17:18:39 +0800
fengchengwen <fengchengwen@huawei.com> wrote:
> On 5/22/2026 12:10 PM, Alex Williamson wrote:
> > On Mon, 18 May 2026 15:16:59 +0800
> > Chengwen Feng <fengchengwen@huawei.com> wrote:
> >
> >> Add a helper to query enabled TPH mode on a PCI device. This is useful for
> >> drivers like VFIO-PCI that need to validate TPH state before allowing
> >> access to steering tag tables.
> >>
> >> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >> ---
> >> drivers/pci/tph.c | 12 ++++++++++++
> >> include/linux/pci-tph.h | 3 +++
> >> 2 files changed, 15 insertions(+)
> >>
> >> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> >> index ba31b010f67a..91c1e83410a3 100644
> >> --- a/drivers/pci/tph.c
> >> +++ b/drivers/pci/tph.c
> >> @@ -451,6 +451,18 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
> >> }
> >> EXPORT_SYMBOL(pcie_enable_tph);
> >>
> >> +/**
> >> + * pcie_tph_enabled_mode - Get current enabled TPH mode
> >> + * @pdev: PCI device
> >> + *
> >> + * Return the enabled TPH mode (IV/DS) or 0 if disabled.
> >> + */
> >> +int pcie_tph_enabled_mode(struct pci_dev *pdev)
> >> +{
> >> + return pdev->tph_enabled ? pdev->tph_mode : 0;
> >> +}
> >> +EXPORT_SYMBOL(pcie_tph_enabled_mode);
> >
> > Doesn't this create ambiguity with PCI_TPH_ST_NS_MODE? Maybe return an
> > -errno if not enabled. Duplicate same for stub below. Thanks,
>
> Because pcie_enable_tph(PCI_TPH_ST_NS_MODE) could also enable TPH,
> but PCI_TPH_ST_NS_MODE mean "The function must use a value of all
> zeros for all Steering Tags" So I think pcie_enable_tph() should
> treat PCI_TPH_ST_NS_MODE as a invalid input.
This doesn't make any sense to me, PCI_TPH_ST_NS_MODE is a valid enable
mode for the TPH capability. Thanks,
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization
2026-05-22 9:39 ` fengchengwen
@ 2026-05-22 14:09 ` Alex Williamson
0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 14:09 UTC (permalink / raw)
To: fengchengwen
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Fri, 22 May 2026 17:39:09 +0800
fengchengwen <fengchengwen@huawei.com> wrote:
> On 5/22/2026 12:10 PM, Alex Williamson wrote:
> > On Mon, 18 May 2026 15:17:00 +0800
> > Chengwen Feng <fengchengwen@huawei.com> wrote:
> >
> >> Add support for virtualizing the PCIe TPH (Transaction Processing Hints)
> >> control register. TPH may break platform isolation, so add a module
> >> parameter "enable_unsafe_tph" to allow administrators to globally control
> >> this feature.
> >>
> >> TPH control register writes are mediated to only allow valid mode settings,
> >> and TPH is automatically disabled when VFIO takes ownership of the device
> >> or when userspace closes the device file descriptor.
> >>
> >> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >> ---
> >> drivers/vfio/pci/vfio_pci.c | 13 ++++++++++++-
> >> drivers/vfio/pci/vfio_pci_config.c | 28 ++++++++++++++++++++++++++++
> >> drivers/vfio/pci/vfio_pci_core.c | 16 +++++++++++++++-
> >> include/linux/vfio_pci_core.h | 4 +++-
> >> 4 files changed, 58 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >> index 0c771064c0b8..6d73668459cf 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -60,6 +60,12 @@ static bool disable_denylist;
> >> module_param(disable_denylist, bool, 0444);
> >> MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
> >>
> >> +#ifdef CONFIG_PCIE_TPH
> >> +static bool enable_unsafe_tph;
> >> +module_param(enable_unsafe_tph, bool, 0444);
> >> +MODULE_PARM_DESC(enable_unsafe_tph, "Enable PCIe TPH (Transaction Processing Hints) support. It may break platform isolation. If you do not know what this is for, step away. (default: false)");
> >> +#endif
> >> +
> >> static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
> >> {
> >> switch (pdev->vendor) {
> >> @@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
> >> {
> >> int ret;
> >> bool is_disable_vga = true;
> >> + bool is_enable_unsafe_tph = false;
> >>
> >> #ifdef CONFIG_VFIO_PCI_VGA
> >> is_disable_vga = disable_vga;
> >> #endif
> >> +#ifdef CONFIG_PCIE_TPH
> >> + is_enable_unsafe_tph = enable_unsafe_tph;
> >> +#endif
> >>
> >> - vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
> >> + vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
> >> + is_enable_unsafe_tph);
> >>
> >> /* Register and scan for devices */
> >> ret = pci_register_driver(&vfio_pci_driver);
> >> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> >> index a10ed733f0e3..6da35c2313b9 100644
> >> --- a/drivers/vfio/pci/vfio_pci_config.c
> >> +++ b/drivers/vfio/pci/vfio_pci_config.c
> >> @@ -22,6 +22,7 @@
> >>
> >> #include <linux/fs.h>
> >> #include <linux/pci.h>
> >> +#include <linux/pci-tph.h>
> >> #include <linux/uaccess.h>
> >> #include <linux/vfio.h>
> >> #include <linux/slab.h>
> >> @@ -35,6 +36,8 @@
> >> ((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
> >> (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4))
> >>
> >> +extern bool enable_unsafe_tph;
> >> +
> >> /*
> >> * Lengths of PCI Config Capabilities
> >> * 0: Removed from the user visible capability list
> >> @@ -313,6 +316,30 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
> >> return count;
> >> }
> >>
> >> +static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> >> + int count, struct perm_bits *perm,
> >> + int offset, __le32 val)
> >> +{
> >> + u32 data = le32_to_cpu(val);
> >> +
> >> + guard(mutex)(&vdev->tph_lock);
> >> +
> >> + if (!enable_unsafe_tph)
> >> + return count;
> >> +
> >> + if (offset != PCI_TPH_CTRL)
> >> + return count;
> >> +
> >> + /* Only permit write TPH mode. */
> >> + data &= PCI_TPH_CTRL_MODE_SEL_MASK;
> >> + if (data == PCI_TPH_ST_IV_MODE || data == PCI_TPH_ST_DS_MODE)
> >> + pcie_enable_tph(vdev->pdev, data);
> >> + else if (data == PCI_TPH_ST_NS_MODE)
> >> + pcie_disable_tph(vdev->pdev);
> >> +
> >> + return count;
> >> +}
> >
> > I don't understand the virtualization here, the Requester Enable bits
> > control what's actually enabled, not the Mode Selection bits. All
> > three of these are valid modes that can be enabled, why aren't they
> > toggled by the value written to PCI_TPH_CTRL_REQ_EN_MASK?
>
> OK, will fix in next version
> {
> ...
> mode = data & PCI_TPH_CTRL_MODE_SEL_MASK;
> req_en = data & PCI_TPH_CTRL_REQ_EN_MASK;
> if (req_en && (mode == PCI_TPH_ST_IV_MODE || mode == PCI_TPH_ST_DS_MODE))
> pcie_enable_tph(vdev->pdev, data);
> else if (!req_en)
> pcie_disable_tph(vdev->pdev);
> ...
> }
Again, NS_MODE is a valid enabled mode of TPH.
> > Isn't this patch ordering wrong, why are we allowing tph to be enabled
> > and disabled before we've provided a mechanism to set/get steering tags?
>
> Regarding patch order: TPH must be enabled first before accessing steering tag entries.
> The tag query interfaces rely on runtime TPH state and request type info stored in device struct,
> which is only initialized after successful TPH enabling. So current order is functional necessary.
As discussed in the next patch, I disagree with the direction of the
uAPI proposed here.
> >> +
> >> static struct perm_bits direct_ro_perms = {
> >> .readfn = vfio_direct_config_read,
> >> };
> >> @@ -1121,6 +1148,7 @@ int __init vfio_pci_init_perm_bits(void)
> >> ret |= init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]);
> >> ret |= init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]);
> >> ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write;
> >> + ecap_perms[PCI_EXT_CAP_ID_TPH].writefn = vfio_pci_tph_config_write;
> >> ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn = vfio_raw_config_write;
> >>
> >> if (ret)
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index 050e7542952e..0d0140202280 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -29,6 +29,7 @@
> >> #include <linux/sched/mm.h>
> >> #include <linux/iommufd.h>
> >> #include <linux/pci-p2pdma.h>
> >> +#include <linux/pci-tph.h>
> >> #if IS_ENABLED(CONFIG_EEH)
> >> #include <asm/eeh.h>
> >> #endif
> >> @@ -41,6 +42,7 @@
> >> static bool nointxmask;
> >> static bool disable_vga;
> >> static bool disable_idle_d3;
> >> +bool enable_unsafe_tph;
> >>
> >> static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
> >> {
> >> @@ -771,6 +773,11 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> >> #endif
> >> vfio_pci_dma_buf_cleanup(vdev);
> >>
> >> + /* Disable TPH when userspace closes the device FD */
> >> + mutex_lock(&vdev->tph_lock);
> >> + pcie_disable_tph(vdev->pdev);
> >> + mutex_unlock(&vdev->tph_lock);
> >> +
> >
> > We generally don't need to lock on device close, there's nothing it can
> > race with. Also this should happen in the below function after the
> > device is known to be back in D0.
>
> I'll remove unnecessary mutex lock in device close path as there is no concurrent race risk.
> Also adjust TPH disable invocation to after device returns to D0 state.
>
> >
> > In general the tph_lock seems more like it's protecting PCI-core since
> > tph enable/disable doesn't serialize itself.
>
> The tph_lock is designed to serialize all userspace triggered TPH operations on the same device.
I'm not sure it's our job to serialize everything. Serializing to
protect PCI-core from entering an inconsistent state is perhaps valid.
> >> vfio_pci_core_disable(vdev);
> >>
> >> mutex_lock(&vdev->igate);
> >> @@ -2132,6 +2139,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> >> mutex_init(&vdev->igate);
> >> spin_lock_init(&vdev->irqlock);
> >> mutex_init(&vdev->ioeventfds_lock);
> >> + mutex_init(&vdev->tph_lock);
> >> INIT_LIST_HEAD(&vdev->dummy_resources_list);
> >> INIT_LIST_HEAD(&vdev->ioeventfds_list);
> >> INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> >> @@ -2151,6 +2159,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
> >> struct vfio_pci_core_device *vdev =
> >> container_of(core_vdev, struct vfio_pci_core_device, vdev);
> >>
> >> + mutex_destroy(&vdev->tph_lock);
> >> mutex_destroy(&vdev->igate);
> >> mutex_destroy(&vdev->ioeventfds_lock);
> >> kfree(vdev->region);
> >> @@ -2240,6 +2249,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> >> if (!disable_idle_d3)
> >> pm_runtime_put(dev);
> >>
> >> + /* Disable TPH when taking over ownership of the device */
> >> + pcie_disable_tph(pdev);
> >> +
> >
> > Seems like a vfio_pci_core_enable() operation, why do we care if it's
> > enabled here? Thanks,
>
> For TPH disable during device takeover: VFIO ownership transfer triggers device reset,
> which clears hardware TPH registers and ST tables. We call pcie_disable_tph to synchronize
> software state with hardware reset status.
So why wouldn't we do that prior to giving the device to userspace
rather than upon registering the device? We try not to modify the
device state more than necessary on probe. Thanks,
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
2026-05-22 10:04 ` fengchengwen
@ 2026-05-22 14:27 ` Alex Williamson
2026-05-26 6:29 ` fengchengwen
0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2026-05-22 14:27 UTC (permalink / raw)
To: fengchengwen
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci, alex
On Fri, 22 May 2026 18:04:13 +0800
fengchengwen <fengchengwen@huawei.com> wrote:
> On 5/22/2026 12:10 PM, Alex Williamson wrote:
> > On Mon, 18 May 2026 15:17:01 +0800
> > Chengwen Feng <fengchengwen@huawei.com> wrote:
> >
> >> Add VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage PCIe TPH
> >> Steering Tag entries.
> >>
> >> SET: Program contiguous ST entries when ST table resides in TPH Capability
> >> or MSI-X table. U32_MAX CPU ID clears the corresponding ST entry.
> >> GET: Retrieve ST values per CPU ID, only available in DS mode.
> >>
> >> Both operations are only valid when TPH is enabled on the device.
> >
> > What? That doesn't align with the hardware programming model. The
> > specification even suggests that the device should be quiesced or TPH
> > disabled during programming of the steering tags for deterministic
> > behavior.
>
> As mentioned in the 4/5 commit, obtaining ST must first enable TPH.
> Of course, I think it might be necessary to calculate pdev->tph_req_type
> during the capability detection phase, so that we can modify ST before
> enabling TPH. Do you think such a modification is needed?
I think a uAPI that doesn't align with the recommended programming
order of the specification, potentially resulting in non-deterministic
behavior is bogus.
> >> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >> ---
> >> drivers/vfio/pci/vfio_pci_core.c | 91 ++++++++++++++++++++++++++++++++
> >> include/uapi/linux/vfio.h | 41 ++++++++++++++
> >> 2 files changed, 132 insertions(+)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index 0d0140202280..13a22c1d6ea6 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -1558,6 +1558,95 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
> >> return 0;
> >> }
> >>
> >> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
> >> + u32 flags,
> >> + struct vfio_device_feature_tph_st __user *arg,
> >> + size_t argsz)
> >> +{
> >> + bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
> >> + struct vfio_device_feature_tph_st tph_st;
> >> + struct pci_dev *pdev = vdev->pdev;
> >> + enum tph_mem_type mtype;
> >> + int i, j, ret;
> >> + u32 *cpus;
> >> + u16 st;
> >> +
> >> + guard(mutex)(&vdev->tph_lock);
> >
> > Wrong place for this lock. Why do we even need it? If a user races
> > SET/GET with enable/disable, that's their problem.
>
> OK, will fix in next version
>
> >
> >> +
> >> + ret = vfio_check_feature(flags, argsz,
> >> + VFIO_DEVICE_FEATURE_GET |
> >> + VFIO_DEVICE_FEATURE_SET,
> >> + sizeof(tph_st));
> >> + if (ret <= 0)
> >> + return ret;
> >> +
> >> + if (!enable_unsafe_tph ||
> >
> > Why did we allow the feature to be probe if it can't be used?
>
> OK, will fix in next version
>
> >
> >> + pcie_tph_enabled_mode(pdev) == PCI_TPH_ST_NS_MODE)
> >
> > Here's that ambiguity of reporting NS_MODE for disabled, but as above
> > it's a bogus requirement.
>
> OK, will fix in next version
>
> >
> >> + return -EOPNOTSUPP;
> >> + if (!is_set && pcie_tph_enabled_mode(pdev) != PCI_TPH_ST_DS_MODE)
> >
> > Why? Let the interface be symmetric that GET works regardless of mode.
>
> OK, will remove the restrict
>
> >
> >> + return -EOPNOTSUPP;
> >> + if (is_set && pcie_tph_get_st_table_loc(pdev) == PCI_TPH_LOC_NONE)
> >> + return -EOPNOTSUPP;
> >> +
> >> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
> >> + return -EFAULT;
> >> +
> >> + if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
> >> + tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)
> >
> > Wrong operation for a flags field.
>
> Because currently flags only use bit0 (0-VM, 1-PM), this judge make sure
> it will return error if userspace passing non-zero in remain bits.
This should be a mask to test undefined bits are not set, not a value
comparison.
> >> + return -EINVAL;
> >> + if (!is_set && tph_st.index != 0)
> >
> > We can only GET index 0? Why?
>
> Because GET operation don't need index, this judgment make sure
> it will return error if userspace passing non-zero index.
The interface is not symmetric.
> >> + return -EINVAL;
> >> + if (is_set && (tph_st.index >= VFIO_TPH_ST_MAX_COUNT ||
> >> + tph_st.index + tph_st.count > VFIO_TPH_ST_MAX_COUNT))
> >
> > DS can have more that 2048 steering tags.
>
> I check the PCIE protocol, DS mode with ST in pcie-config or msi-x must not
> hold > 2048 steering tags.
The MSI-X storage location is limited to 2048, but DS mode doesn't
necessarily use either of these storage locations.
> >> + return -EINVAL;
> >> +
> >> + cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));
> >
> > Use an __aligned_u64 data_uptr like iommufd does and avoid copying this
> > into the kernel. Isn't this steering tags, why's it named cpus?
>
> we reuse the field which according your previous suggestion:
> in GET, user pass in cpus, and vfio will feedback steering tags in cpus.
Perhaps this was over stated, but aligning to the generic uAPI name
since it's dual-purpose would seem to make sense.
> >> + if (IS_ERR(cpus))
> >> + return PTR_ERR(cpus);
> >> +
> >> + mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
> >> + TPH_MEM_TYPE_VM;
> >
> > Then why did we use two flag bits for this?
>
> VFIO_TPH_ST_MEM_TYPE_PM is just one bit, could you explain the two flag bits?
My bad, I overlooked that they're both shifted by 0. Typically we
wouldn't define both the 0 and 1 value, only the latter.
> >> + if (!is_set) {
> >> + for (i = 0; i < tph_st.count; i++) {
> >> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> >
> > My understanding was that GET would return the translated values from a
> > previous SET, not translate on the fly. I think in order to support
> > writing STs before enabling TPH, we're going to need to store them
> > somewhere anyway so that we can push them to the correct location when
> > TPH is enabled. That means there a buffer for SET/GET regardless of
> > the mode that's enabled.
>
> OK, will try to fix
>
> >
> >> + if (ret)
> >> + goto out;
> >> + cpus[i] = st;
> >> + }
> >> + goto out;
> >> + }
> >> +
> >> + for (i = 0; i < tph_st.count; i++) {
> >> + if (cpus[i] == U32_MAX) {
> >> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, 0);
> >
> > Why do we need a magic value to set it to zero? Hardware doesn't have
> > a "clear ST register value".
>
> So you mean for SET, usespace direct passing steering-tag not cpus in this patch?
How does this work with Zhiping's series where there might be TPH values
exposed through the dma-buf to support p2p DMA? Where would that level
of TPH programming occur?
> >> + if (ret)
> >> + goto out;
> >> + continue;
> >> + }
> >> +
> >> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> >> + if (ret)
> >> + goto out;
> >> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, st);
> >> + if (ret)
> >> + goto out;
> >
> > Clearly broken for DS mode.
>
> For DS mode:
> 1\ if ST table in CONFIG or MSI-X, then it could done by this way
> 2\ if ST table not in CONFIG or MSI-X, then it only need invoke GET, and set by device-specific way.
The SET only after enable model is broken, DS mode needs to support
SET, where GET returns the host translations. Thanks,
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
2026-05-22 14:27 ` Alex Williamson
@ 2026-05-26 6:29 ` fengchengwen
0 siblings, 0 replies; 22+ messages in thread
From: fengchengwen @ 2026-05-26 6:29 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg, wathsala.vithanage, helgaas, wei.huang2, wangzhou1,
wangyushan12, liuyonglong, kvm, linux-pci
On 5/22/2026 10:27 PM, Alex Williamson wrote:
> On Fri, 22 May 2026 18:04:13 +0800
> fengchengwen <fengchengwen@huawei.com> wrote:
>
>> On 5/22/2026 12:10 PM, Alex Williamson wrote:
>>> On Mon, 18 May 2026 15:17:01 +0800
>>> Chengwen Feng <fengchengwen@huawei.com> wrote:
>>>
>>>> Add VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage PCIe TPH
>>>> Steering Tag entries.
>>>>
>>>> SET: Program contiguous ST entries when ST table resides in TPH Capability
>>>> or MSI-X table. U32_MAX CPU ID clears the corresponding ST entry.
>>>> GET: Retrieve ST values per CPU ID, only available in DS mode.
>>>>
>>>> Both operations are only valid when TPH is enabled on the device.
>>>
>>> What? That doesn't align with the hardware programming model. The
>>> specification even suggests that the device should be quiesced or TPH
>>> disabled during programming of the steering tags for deterministic
>>> behavior.
>>
>> As mentioned in the 4/5 commit, obtaining ST must first enable TPH.
>> Of course, I think it might be necessary to calculate pdev->tph_req_type
>> during the capability detection phase, so that we can modify ST before
>> enabling TPH. Do you think such a modification is needed?
>
> I think a uAPI that doesn't align with the recommended programming
> order of the specification, potentially resulting in non-deterministic
> behavior is bogus.
>
>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>> ---
>>>> drivers/vfio/pci/vfio_pci_core.c | 91 ++++++++++++++++++++++++++++++++
>>>> include/uapi/linux/vfio.h | 41 ++++++++++++++
>>>> 2 files changed, 132 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>>>> index 0d0140202280..13a22c1d6ea6 100644
>>>> --- a/drivers/vfio/pci/vfio_pci_core.c
>>>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>>>> @@ -1558,6 +1558,95 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
>>>> return 0;
>>>> }
>>>>
>>>> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
>>>> + u32 flags,
>>>> + struct vfio_device_feature_tph_st __user *arg,
>>>> + size_t argsz)
>>>> +{
>>>> + bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
>>>> + struct vfio_device_feature_tph_st tph_st;
>>>> + struct pci_dev *pdev = vdev->pdev;
>>>> + enum tph_mem_type mtype;
>>>> + int i, j, ret;
>>>> + u32 *cpus;
>>>> + u16 st;
>>>> +
>>>> + guard(mutex)(&vdev->tph_lock);
>>>
>>> Wrong place for this lock. Why do we even need it? If a user races
>>> SET/GET with enable/disable, that's their problem.
>>
>> OK, will fix in next version
I still keep the lock which used to protect the ST table shadow
>>
>>>
>>>> +
>>>> + ret = vfio_check_feature(flags, argsz,
>>>> + VFIO_DEVICE_FEATURE_GET |
>>>> + VFIO_DEVICE_FEATURE_SET,
>>>> + sizeof(tph_st));
>>>> + if (ret <= 0)
>>>> + return ret;
>>>> +
>>>> + if (!enable_unsafe_tph ||
>>>
>>> Why did we allow the feature to be probe if it can't be used?
>>
>> OK, will fix in next version
done in v12
>>
>>>
>>>> + pcie_tph_enabled_mode(pdev) == PCI_TPH_ST_NS_MODE)
>>>
>>> Here's that ambiguity of reporting NS_MODE for disabled, but as above
>>> it's a bogus requirement.
>>
>> OK, will fix in next version
done in v12
>>
>>>
>>>> + return -EOPNOTSUPP;
>>>> + if (!is_set && pcie_tph_enabled_mode(pdev) != PCI_TPH_ST_DS_MODE)
>>>
>>> Why? Let the interface be symmetric that GET works regardless of mode.
>>
>> OK, will remove the restrict
done in v12
>>
>>>
>>>> + return -EOPNOTSUPP;
>>>> + if (is_set && pcie_tph_get_st_table_loc(pdev) == PCI_TPH_LOC_NONE)
>>>> + return -EOPNOTSUPP;
>>>> +
>>>> + if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
>>>> + return -EFAULT;
>>>> +
>>>> + if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
>>>> + tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)
>>>
>>> Wrong operation for a flags field.
>>
>> Because currently flags only use bit0 (0-VM, 1-PM), this judge make sure
>> it will return error if userspace passing non-zero in remain bits.
>
> This should be a mask to test undefined bits are not set, not a value
> comparison.
done in v12
>
>>>> + return -EINVAL;
>>>> + if (!is_set && tph_st.index != 0)
>>>
>>> We can only GET index 0? Why?
>>
>> Because GET operation don't need index, this judgment make sure
>> it will return error if userspace passing non-zero index.
>
> The interface is not symmetric.
Referring to your suggestion, I redesigned the interface (v12).
supporting two mutually exclusive operation modes:
1. Raw table read/write: Operate shadow ST table, sync updates to hardware
TPH capability or MSI-X table. Failed batch write triggers entry
rollback to keep hardware consistent.
2. CPU ID lookup: Query translated steering tag by specified memory type,
pure read-only without modifying ST table.
The first operation (raw table) is symmetric.
>
>>>> + return -EINVAL;
>>>> + if (is_set && (tph_st.index >= VFIO_TPH_ST_MAX_COUNT ||
>>>> + tph_st.index + tph_st.count > VFIO_TPH_ST_MAX_COUNT))
>>>
>>> DS can have more that 2048 steering tags.
>>
>> I check the PCIE protocol, DS mode with ST in pcie-config or msi-x must not
>> hold > 2048 steering tags.
>
> The MSI-X storage location is limited to 2048, but DS mode doesn't
> necessarily use either of these storage locations.
>
>>>> + return -EINVAL;
>>>> +
>>>> + cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));
>>>
>>> Use an __aligned_u64 data_uptr like iommufd does and avoid copying this
>>> into the kernel. Isn't this steering tags, why's it named cpus?
>>
>> we reuse the field which according your previous suggestion:
>> in GET, user pass in cpus, and vfio will feedback steering tags in cpus.
>
> Perhaps this was over stated, but aligning to the generic uAPI name
> since it's dual-purpose would seem to make sense.
>
>>>> + if (IS_ERR(cpus))
>>>> + return PTR_ERR(cpus);
>>>> +
>>>> + mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
>>>> + TPH_MEM_TYPE_VM;
>>>
>>> Then why did we use two flag bits for this?
>>
>> VFIO_TPH_ST_MEM_TYPE_PM is just one bit, could you explain the two flag bits?
>
> My bad, I overlooked that they're both shifted by 0. Typically we
> wouldn't define both the 0 and 1 value, only the latter.
>
>>>> + if (!is_set) {
>>>> + for (i = 0; i < tph_st.count; i++) {
>>>> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
>>>
>>> My understanding was that GET would return the translated values from a
>>> previous SET, not translate on the fly. I think in order to support
>>> writing STs before enabling TPH, we're going to need to store them
>>> somewhere anyway so that we can push them to the correct location when
>>> TPH is enabled. That means there a buffer for SET/GET regardless of
>>> the mode that's enabled.
>>
>> OK, will try to fix
done in v12
>>
>>>
>>>> + if (ret)
>>>> + goto out;
>>>> + cpus[i] = st;
>>>> + }
>>>> + goto out;
>>>> + }
>>>> +
>>>> + for (i = 0; i < tph_st.count; i++) {
>>>> + if (cpus[i] == U32_MAX) {
>>>> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, 0);
>>>
>>> Why do we need a magic value to set it to zero? Hardware doesn't have
>>> a "clear ST register value".
>>
>> So you mean for SET, usespace direct passing steering-tag not cpus in this patch?
>
> How does this work with Zhiping's series where there might be TPH values
> exposed through the dma-buf to support p2p DMA? Where would that level
> of TPH programming occur?
My patch set has two basic functions:
1. Obtain the ST corresponding to the specified CPU.
2. Set and read the ST table.
Zhiping's patchset is used to cache the P2P data to an Lx cache in the GPU, it
include two part:
1. Set the ST corresponding to a DMABUF of the GPU (the ST is obtained in another
way and is not in the kernel).
2. Set the ST to the ST table of the MLX NIC.
The two can be unified to some extent, as long as the following two points are met:
1. An interface is provided to obtain whether the req_tph_type is normal or extend.
In this way, the 8-bit ST or 16-bit ST of the GPU can be selected based on the
type.
2. An interface is provided to obtain the target ST index. In this way, the
corresponding ST index can be directly written based on my patchset.
The following uses the setting of the ST cache stash for P2P as an example:
1. The application obtains the ST corresponding to a DMABUF of the GPU in other ways.
The ST can be 8-bit or 16-bit or both.
2. The application calls the VFIO interface to obtain the req_tph_type supported by
the NIC device (whether only 8-bit or 16-bit is supported).
3. The application determines which ST to be selected based on 1 and 2.
4. The application calls the interface to obtain the target ST index. This interface
may be provided by the vendor driver.
5. The application calls the SET ST ENTRY VFIO interface to set the ST to the NIC.
>
>>>> + if (ret)
>>>> + goto out;
>>>> + continue;
>>>> + }
>>>> +
>>>> + ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
>>>> + if (ret)
>>>> + goto out;
>>>> + ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, st);
>>>> + if (ret)
>>>> + goto out;
>>>
>>> Clearly broken for DS mode.
>>
>> For DS mode:
>> 1\ if ST table in CONFIG or MSI-X, then it could done by this way
>> 2\ if ST table not in CONFIG or MSI-X, then it only need invoke GET, and set by device-specific way.
>
> The SET only after enable model is broken, DS mode needs to support
> SET, where GET returns the host translations. Thanks,
done in v12
BTW, there are two points I would like to retain:
1. Whether to allocate temporary cache in ioctl—I retained it in version v12
because the interface supports batch operations, and copying only a portion
each time would affect performance.
2. After the device is taken over and released by vfio, I still maintained the
call to pcie_disable_tph in v12 because this ensures that the software state
remains consistent with the hardware state.
Thanks
>
> Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-05-26 6:29 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 1/5] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-18 7:16 ` [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-18 7:57 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-18 7:16 ` [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
2026-05-18 8:14 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-22 9:18 ` fengchengwen
2026-05-22 14:00 ` Alex Williamson
2026-05-18 7:17 ` [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization Chengwen Feng
2026-05-18 8:48 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-22 9:39 ` fengchengwen
2026-05-22 14:09 ` Alex Williamson
2026-05-18 7:17 ` [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management Chengwen Feng
2026-05-18 9:22 ` sashiko-bot
2026-05-22 4:10 ` Alex Williamson
2026-05-22 10:04 ` fengchengwen
2026-05-22 14:27 ` Alex Williamson
2026-05-26 6:29 ` fengchengwen
2026-05-22 1:31 ` [PATCH v11 0/5] vfio/pci: Add PCIe TPH support fengchengwen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.