* [PATCH v7 0/6] vfio/pci: Add PCIe TPH support
@ 2026-05-07 13:09 Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
This patchset enables userspace control over PCIe TPH steering tags,
motivated by the following considerations:
1. Why userspace needs the capability to control steering tags:
When PCIe devices are fully owned by userspace workloads such as DPDK
and SPDK, only userspace has full knowledge of core binding policies
and traffic distribution strategies. Without this series, userspace
cannot enable TPH or configure steering tags, leaving built-in PCIe
performance optimizations unused in high-throughput polling I/O
scenarios.
2. Why this interface must be implemented in VFIO:
VFIO is the standard, secure community solution for granting full
PCIe device ownership to userspace. Existing kernel TPH interfaces
are designed purely for in-kernel drivers. For user-owned devices,
VFIO provides the only isolated and correct path to expose per-device
TPH management.
TPH supports both IV and DS modes. Since device-specific (DS) TPH mode
introduces cross-VM isolation risks such as untrusted guests programming
arbitrary steering tags to impact other domains, so a new module parameter
`enable_unsafe_tph_ds_mode` is added. It defaults to off, and blocks all
unsafe DS-mode TPH operations when disabled.
To restrict abuse of SET_ST and prevent arbitrary steering tag programming
from userspace, the interface only accepts explicit CPU ID, memory type
and index inputs. The kernel resolves the corresponding steering tag
internally before programming, limiting userspace to controlled,
index-based configuration.
Based on earlier RFC work by Wathsala Vithanage
v7:
- Address Bjorn's comment on [1/6] commit.
- Don't report ds mode defaultly (enable_unsafe_tph_ds_mode=0)
- Fix Sashiko review comments:
1. pcie_tph_get_st_table_loc()'s stub return 0
2. Tph ioctl argsz validation wrong use offsetofend
3. Disable TPH when device was taken-over/close to/by userspace
4. Serialize all TPH operations under vdev->igate to prevent hardware
control and bitfield races.
5. Check unused ioctl field to be zero.
v6:
- Address Alex's comment on [1/6] commit.
- Fix Sashiko review comments:
Add tph_cap validation for pcie_tph_get_st_modes/st_table_loc.
Add argsz validation for each op cmd.
Move disable tph from ioctl-reset to register.
Verify reserved field for get/set ST op.
Fix ABI mismatch due to pointer arithmetic of get/st ST op.
v5:
- Fix pcie_tph_get_st_table_loc() field extraction bug
- Add disable TPH in vfio_pci_ioctl_reset() to clean software state
v4:
- Address Jason's comment of restrict device-specific mode under module
param control.
- Rename module param to enable_unsafe_tph_ds_mode
v3:
- Add module param enable_unsafe_tph_ds to guard unsafe usage
of TPH device-specific mode with no ST table
v2:
- Export pcie_tph_get_st_modes()
- Add detailed comment for UAPI structures and operations
- Add batch entry limit VFIO_TPH_MAX_ENTRIES
- Improve robustness and error handling
Chengwen Feng (6):
PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
PCI/TPH: Export pcie_tph_get_st_modes() for external use
vfio/pci: Add PCIe TPH interface with capability query
vfio/pci: Add PCIe TPH enable/disable support
vfio/pci: Add PCIe TPH GET_ST interface
vfio/pci: Add PCIe TPH SET_ST interface
drivers/pci/tph.c | 26 ++-
drivers/vfio/pci/vfio_pci.c | 13 +-
drivers/vfio/pci/vfio_pci_core.c | 264 ++++++++++++++++++++++++++++++-
include/linux/pci-tph.h | 7 +
include/linux/vfio_pci_core.h | 3 +-
include/uapi/linux/vfio.h | 131 +++++++++++++++
6 files changed, 433 insertions(+), 11 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v7 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
@ 2026-05-07 13:09 ` Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
pcie_tph_get_st_table_loc() incorrectly uses FIELD_GET(), which shifts the
field value to bit 0. But the function is designed to return raw
PCI_TPH_LOC_* values as defined in the function comment.
This causes incorrect ST table location detection. Fix it by using bitwise
AND with PCI_TPH_CAP_LOC_MASK to return the unshifted field value matching
the function specification.
This doesn't make a difference to mlx5_st_create(), the lone external
caller, because it only checks for PCI_TPH_LOC_NONE (0), but will be needed
for callers that check for PCI_TPH_LOC_CAP or PCI_TPH_LOC_MSIX.
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 91145e8d9d95..877cf556242b 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -170,7 +170,7 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
- return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg);
+ return reg & PCI_TPH_CAP_LOC_MASK;
}
EXPORT_SYMBOL(pcie_tph_get_st_table_loc);
@@ -185,9 +185,6 @@ u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
/* Check ST table location first */
loc = pcie_tph_get_st_table_loc(pdev);
-
- /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
if (loc != PCI_TPH_LOC_CAP)
return 0;
@@ -316,8 +313,6 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag)
set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE);
loc = pcie_tph_get_st_table_loc(pdev);
- /* Convert loc to match with PCI_TPH_LOC_* */
- loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc);
switch (loc) {
case PCI_TPH_LOC_MSIX:
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
@ 2026-05-07 13:09 ` Chengwen Feng
2026-05-07 22:19 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
` (3 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Export the helper to retrieve supported PCIe TPH steering tag modes so
that drivers like VFIO can query and expose device capabilities to
userspace.
Add stub functions for pcie_tph_get_st_table_size() and
pcie_tph_get_st_table_loc() when !CONFIG_PCI_TPH.
Add tph_cap validation for pcie_tph_get_st_modes() and
pcie_tph_get_st_table_loc() to prevent invalid PCI configuration
space access when TPH is not supported.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/tph.c | 19 +++++++++++++++++--
include/linux/pci-tph.h | 7 +++++++
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index 877cf556242b..ba31b010f67a 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -145,15 +145,27 @@ static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type)
pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg);
}
-static u8 get_st_modes(struct pci_dev *pdev)
+/**
+ * pcie_tph_get_st_modes - Get supported Steering Tag modes
+ * @pdev: PCI device to query
+ *
+ * Return:
+ * Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
+ * PCI_TPH_CAP_ST_DS)
+ */
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
{
u32 reg;
+ if (!pdev->tph_cap)
+ return 0;
+
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
return reg;
}
+EXPORT_SYMBOL(pcie_tph_get_st_modes);
/**
* pcie_tph_get_st_table_loc - Return the device's ST table location
@@ -168,6 +180,9 @@ u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
{
u32 reg;
+ if (!pdev->tph_cap)
+ return PCI_TPH_LOC_NONE;
+
pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
return reg & PCI_TPH_CAP_LOC_MASK;
@@ -395,7 +410,7 @@ int pcie_enable_tph(struct pci_dev *pdev, int mode)
/* Sanitize and check ST mode compatibility */
mode &= PCI_TPH_CTRL_MODE_SEL_MASK;
- dev_modes = get_st_modes(pdev);
+ dev_modes = pcie_tph_get_st_modes(pdev);
if (!((1 << mode) & dev_modes))
return -EINVAL;
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index be68cd17f2f8..5772d48ea444 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -30,6 +30,7 @@ void pcie_disable_tph(struct pci_dev *pdev);
int pcie_enable_tph(struct pci_dev *pdev, int mode);
u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev);
+u8 pcie_tph_get_st_modes(struct pci_dev *pdev);
#else
static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
unsigned int index, u16 tag)
@@ -41,6 +42,12 @@ static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
static inline void pcie_disable_tph(struct pci_dev *pdev) { }
static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
{ return -EINVAL; }
+static inline u16 pcie_tph_get_st_table_size(struct pci_dev *pdev)
+{ return 0; }
+static inline u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
+{ return 0; }
+static inline u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
+{ return 0; }
#endif
#endif /* LINUX_PCI_TPH_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 3/6] vfio/pci: Add PCIe TPH interface with capability query
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-07 13:09 ` Chengwen Feng
2026-05-07 23:20 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add VFIO_DEVICE_PCI_TPH IOCTL to allow userspace to query device TPH
capabilities, supported modes, and steering tag table information.
Add module parameter 'enable_unsafe_tph_ds_mode' to restrict unsafe
device-specific TPH mode to trusted userspace only.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci.c | 13 ++-
drivers/vfio/pci/vfio_pci_core.c | 57 +++++++++++++-
include/linux/vfio_pci_core.h | 3 +-
include/uapi/linux/vfio.h | 131 +++++++++++++++++++++++++++++++
4 files changed, 201 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 0c771064c0b8..40bf5aa9fd0b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -60,6 +60,12 @@ static bool disable_denylist;
module_param(disable_denylist, bool, 0444);
MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
+#ifdef CONFIG_PCIE_TPH
+static bool enable_unsafe_tph_ds_mode;
+module_param(enable_unsafe_tph_ds_mode, bool, 0444);
+MODULE_PARM_DESC(enable_unsafe_tph_ds_mode, "Enable UNSAFE TPH device-specific (DS) mode. This mode provides weak isolation, cannot be safely used for virtual machines. If you do not know what this is for, step away. (default: false)");
+#endif
+
static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
{
switch (pdev->vendor) {
@@ -257,12 +263,17 @@ static int __init vfio_pci_init(void)
{
int ret;
bool is_disable_vga = true;
+ bool is_enable_unsafe_tph_ds_mode = false;
#ifdef CONFIG_VFIO_PCI_VGA
is_disable_vga = disable_vga;
#endif
+#ifdef CONFIG_PCIE_TPH
+ is_enable_unsafe_tph_ds_mode = enable_unsafe_tph_ds_mode;
+#endif
- vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3);
+ vfio_pci_core_set_params(nointxmask, is_disable_vga, disable_idle_d3,
+ is_enable_unsafe_tph_ds_mode);
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3f8d093aacf8..e7efa8f230be 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <linux/iommufd.h>
#include <linux/pci-p2pdma.h>
+#include <linux/pci-tph.h>
#if IS_ENABLED(CONFIG_EEH)
#include <asm/eeh.h>
#endif
@@ -41,6 +42,7 @@
static bool nointxmask;
static bool disable_vga;
static bool disable_idle_d3;
+static bool enable_unsafe_tph_ds_mode;
static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu)
{
@@ -1461,6 +1463,55 @@ static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
ioeventfd.fd);
}
+static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
+ struct vfio_device_pci_tph_op *op,
+ void __user *uarg)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_tph_cap cap = {0};
+ u8 mode;
+
+ if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, cap))
+ return -EINVAL;
+
+ mode = pcie_tph_get_st_modes(pdev);
+ if (!enable_unsafe_tph_ds_mode)
+ mode &= ~PCI_TPH_CAP_ST_DS;
+ if (mode == 0 || mode == PCI_TPH_CAP_ST_NS)
+ return -EOPNOTSUPP;
+
+ if (mode & PCI_TPH_CAP_ST_IV)
+ cap.supported_modes |= VFIO_PCI_TPH_MODE_IV;
+ if (mode & PCI_TPH_CAP_ST_DS)
+ cap.supported_modes |= VFIO_PCI_TPH_MODE_DS;
+
+ if (pcie_tph_get_st_table_loc(pdev) != PCI_TPH_LOC_NONE)
+ cap.st_table_sz = pcie_tph_get_st_table_size(pdev);
+
+ if (copy_to_user(uarg, &cap, sizeof(cap)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
+ void __user *uarg)
+{
+ struct vfio_device_pci_tph_op op = {0};
+ size_t minsz = sizeof(op.argsz) + sizeof(op.op);
+
+ if (copy_from_user(&op, uarg, minsz))
+ return -EFAULT;
+
+ switch (op.op) {
+ case VFIO_PCI_TPH_GET_CAP:
+ return vfio_pci_tph_get_cap(vdev, &op, uarg + minsz);
+ default:
+ /* Other ops are not implemented yet */
+ return -EINVAL;
+ }
+}
+
long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
unsigned long arg)
{
@@ -1483,6 +1534,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
return vfio_pci_ioctl_reset(vdev, uarg);
case VFIO_DEVICE_SET_IRQS:
return vfio_pci_ioctl_set_irqs(vdev, uarg);
+ case VFIO_DEVICE_PCI_TPH:
+ return vfio_pci_ioctl_tph(vdev, uarg);
default:
return -ENOTTY;
}
@@ -2570,11 +2623,13 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
}
void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3)
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph_ds_mode)
{
nointxmask = is_nointxmask;
disable_vga = is_disable_vga;
disable_idle_d3 = is_disable_idle_d3;
+ enable_unsafe_tph_ds_mode = is_enable_unsafe_tph_ds_mode;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_set_params);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 2ebba746c18f..5af2a2e04ca7 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -157,7 +157,8 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);
void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga,
- bool is_disable_idle_d3);
+ bool is_disable_idle_d3,
+ bool is_enable_unsafe_tph_ds_mode);
void vfio_pci_core_close_device(struct vfio_device *core_vdev);
int vfio_pci_core_init_dev(struct vfio_device *core_vdev);
void vfio_pci_core_release_dev(struct vfio_device *core_vdev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..f899521e52c6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1321,6 +1321,137 @@ struct vfio_precopy_info {
#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21)
+/**
+ * struct vfio_pci_tph_cap - PCIe TPH capability information
+ * @supported_modes: Supported TPH operating modes
+ * @st_table_sz: Number of entries in ST table; 0 means no ST table
+ * @reserved: Must be zero
+ *
+ * Used with VFIO_PCI_TPH_GET_CAP operation to return device
+ * TLP Processing Hints (TPH) capabilities to userspace.
+ */
+struct vfio_pci_tph_cap {
+ __u8 supported_modes;
+#define VFIO_PCI_TPH_MODE_IV (1u << 0) /* Interrupt vector */
+#define VFIO_PCI_TPH_MODE_DS (1u << 1) /* Device specific */
+ __u8 reserved0;
+ __u16 st_table_sz;
+ __u32 reserved;
+};
+
+/**
+ * struct vfio_pci_tph_ctrl - TPH enable control structure
+ * @mode: Selected TPH operating mode (VFIO_PCI_TPH_MODE_*)
+ * @reserved: Must be zero
+ *
+ * Used with VFIO_PCI_TPH_ENABLE operation to specify the
+ * operating mode when enabling TPH on the device.
+ */
+struct vfio_pci_tph_ctrl {
+ __u8 mode;
+ __u8 reserved[7];
+};
+
+/**
+ * struct vfio_pci_tph_entry - Single TPH steering tag entry
+ * @cpu: CPU identifier for steering tag calculation
+ * @mem_type: Memory type (VFIO_PCI_TPH_MEM_TYPE_*)
+ * @reserved0: Must be zero
+ * @index: ST table index for programming
+ * @st: Unused for SET_ST
+ * @reserved1: Must be zero
+ *
+ * For VFIO_PCI_TPH_GET_ST:
+ * Userspace sets @cpu and @mem_type; kernel returns @st.
+ *
+ * For VFIO_PCI_TPH_SET_ST:
+ * Userspace sets @index, @cpu, and @mem_type.
+ * Kernel internally computes the steering tag and programs
+ * it into the specified @index.
+ *
+ * If @cpu == U32_MAX, kernel clears the steering tag at
+ * the specified @index.
+ */
+struct vfio_pci_tph_entry {
+ __u32 cpu;
+ __u8 mem_type;
+#define VFIO_PCI_TPH_MEM_TYPE_VM 0
+#define VFIO_PCI_TPH_MEM_TYPE_PM 1
+ __u8 reserved0;
+ __u16 index;
+ __u16 st;
+ __u16 reserved1;
+};
+
+/**
+ * struct vfio_pci_tph_st - Batch steering tag request
+ * @count: Number of entries in the array
+ * @reserved: Must be zero
+ * @ents: Flexible array of steering tag entries
+ *
+ * Container structure for batch get/set operations.
+ * Used with both VFIO_PCI_TPH_GET_ST and VFIO_PCI_TPH_SET_ST.
+ */
+struct vfio_pci_tph_st {
+ __u32 count;
+ __u32 reserved;
+ struct vfio_pci_tph_entry ents[];
+#define VFIO_PCI_TPH_MAX_ENTRIES 2048
+};
+
+/**
+ * struct vfio_device_pci_tph_op - Argument for VFIO_DEVICE_PCI_TPH
+ * @argsz: User allocated size of this structure
+ * @op: TPH operation (VFIO_PCI_TPH_*)
+ * @cap: Capability data for GET_CAP
+ * @ctrl: Control data for ENABLE
+ * @st: Batch entry data for GET_ST/SET_ST
+ *
+ * @argsz must be set by the user to the size of the structure
+ * being executed. Kernel validates input and returns data
+ * only within the specified size.
+ *
+ * Operations:
+ * - VFIO_PCI_TPH_GET_CAP: Query device TPH capabilities.
+ * - VFIO_PCI_TPH_ENABLE: Enable TPH using mode from &ctrl.
+ * - VFIO_PCI_TPH_DISABLE: Disable TPH on the device.
+ * - VFIO_PCI_TPH_GET_ST: Retrieve CPU's steering tags.
+ * Valid only for Device-Specific mode and
+ * no ST table is present.
+ * - VFIO_PCI_TPH_SET_ST: Program steering tags into device table.
+ * If any entry fails, previously programmed entries
+ * are rolled back to 0 before returning error.
+ */
+struct vfio_device_pci_tph_op {
+ __u32 argsz;
+ __u32 op;
+#define VFIO_PCI_TPH_GET_CAP 0
+#define VFIO_PCI_TPH_ENABLE 1
+#define VFIO_PCI_TPH_DISABLE 2
+#define VFIO_PCI_TPH_GET_ST 3
+#define VFIO_PCI_TPH_SET_ST 4
+ union {
+ struct vfio_pci_tph_cap cap;
+ struct vfio_pci_tph_ctrl ctrl;
+ struct vfio_pci_tph_st st;
+ };
+};
+
+/**
+ * VFIO_DEVICE_PCI_TPH - _IO(VFIO_TYPE, VFIO_BASE + 22)
+ *
+ * IOCTL for managing PCIe TLP Processing Hints (TPH) on
+ * a VFIO-assigned PCI device. Provides operations to query
+ * device capabilities, enable/disable TPH, retrieve CPU's
+ * steering tags, and program steering tag tables.
+ *
+ * Return: 0 on success, negative errno on failure.
+ * -EOPNOTSUPP: Operation not supported
+ * -ENODEV: Device or required functionality not present
+ * -EINVAL: Invalid argument or TPH not supported
+ */
+#define VFIO_DEVICE_PCI_TPH _IO(VFIO_TYPE, VFIO_BASE + 22)
+
/*
* Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
* state with the platform-based power management. Device use of lower power
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 4/6] vfio/pci: Add PCIe TPH enable/disable support
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
` (2 preceding siblings ...)
2026-05-07 13:09 ` [PATCH v7 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
@ 2026-05-07 13:09 ` Chengwen Feng
2026-05-07 23:49 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
5 siblings, 1 reply; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add support to enable and disable TPH function with mode selection.
Restrict unsafe device-specific TPH mode to be allowed only when module
parameter enable_unsafe_tph_ds_mode=1 is set.
Disable TPH when:
1) Taking over ownership of the device (before user visibility),
2) Userspace closes the device FD to clean up state.
Serialize all TPH operations under vdev->igate mutex using scope-based
automatic locking to prevent hardware control and bitfield races.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index e7efa8f230be..7a5dc2bfe2e9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -738,6 +738,9 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
#endif
vfio_pci_dma_buf_cleanup(vdev);
+ /* Disable TPH when userspace closes the device FD */
+ pcie_disable_tph(vdev->pdev);
+
vfio_pci_core_disable(vdev);
mutex_lock(&vdev->igate);
@@ -1494,18 +1497,60 @@ static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_tph_enable(struct vfio_pci_core_device *vdev,
+ struct vfio_device_pci_tph_op *op,
+ void __user *uarg)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_tph_ctrl ctrl;
+ int mode;
+
+ if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, ctrl))
+ return -EINVAL;
+
+ if (copy_from_user(&ctrl, uarg, sizeof(ctrl)))
+ return -EFAULT;
+
+ if (ctrl.mode != VFIO_PCI_TPH_MODE_IV &&
+ ctrl.mode != VFIO_PCI_TPH_MODE_DS)
+ return -EINVAL;
+
+ if (ctrl.mode == VFIO_PCI_TPH_MODE_DS && !enable_unsafe_tph_ds_mode)
+ return -EOPNOTSUPP;
+
+ /* Reserved must be zero */
+ if (memchr_inv(ctrl.reserved, 0, sizeof(ctrl.reserved)))
+ return -EINVAL;
+
+ mode = (ctrl.mode == VFIO_PCI_TPH_MODE_IV) ? PCI_TPH_ST_IV_MODE :
+ PCI_TPH_ST_DS_MODE;
+ return pcie_enable_tph(pdev, mode);
+}
+
+static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
+{
+ pcie_disable_tph(vdev->pdev);
+ return 0;
+}
+
static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
void __user *uarg)
{
struct vfio_device_pci_tph_op op = {0};
size_t minsz = sizeof(op.argsz) + sizeof(op.op);
+ guard(mutex)(&vdev->igate);
+
if (copy_from_user(&op, uarg, minsz))
return -EFAULT;
switch (op.op) {
case VFIO_PCI_TPH_GET_CAP:
return vfio_pci_tph_get_cap(vdev, &op, uarg + minsz);
+ case VFIO_PCI_TPH_ENABLE:
+ return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
+ case VFIO_PCI_TPH_DISABLE:
+ return vfio_pci_tph_disable(vdev);
default:
/* Other ops are not implemented yet */
return -EINVAL;
@@ -2258,6 +2303,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
if (!disable_idle_d3)
pm_runtime_put(dev);
+ /* Disable TPH when taking over ownership of the device */
+ pcie_disable_tph(pdev);
+
ret = vfio_register_group_dev(&vdev->vdev);
if (ret)
goto out_power;
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 5/6] vfio/pci: Add PCIe TPH GET_ST interface
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
` (3 preceding siblings ...)
2026-05-07 13:09 ` [PATCH v7 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
@ 2026-05-07 13:09 ` Chengwen Feng
2026-05-08 0:18 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
5 siblings, 1 reply; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add support to batch get CPU steering tags for device-specific TPH mode
that does not implement an ST table. This interface requires enabling the
'enable_unsafe_tph_ds_mode' module parameter.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_core.c | 73 ++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7a5dc2bfe2e9..c328515bcaaf 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1533,6 +1533,77 @@ static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
return 0;
}
+static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
+ struct vfio_device_pci_tph_op *op,
+ void __user *uarg)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_tph_entry *ents;
+ struct vfio_pci_tph_st st;
+ enum tph_mem_type mtype;
+ size_t size, ents_off;
+ int i, err;
+
+ if (!enable_unsafe_tph_ds_mode ||
+ pcie_tph_get_st_table_loc(pdev) != PCI_TPH_LOC_NONE)
+ return -EOPNOTSUPP;
+
+ if (copy_from_user(&st, uarg, sizeof(st)))
+ return -EFAULT;
+
+ /* Check reserved fields are zero */
+ if (memchr_inv(&st.reserved, 0, sizeof(st.reserved)))
+ return -EINVAL;
+
+ if (!st.count || st.count > VFIO_PCI_TPH_MAX_ENTRIES)
+ return -EINVAL;
+
+ size = st.count * sizeof(*ents);
+ if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) + size)
+ return -EINVAL;
+
+ ents = kvmalloc(size, GFP_KERNEL);
+ if (!ents)
+ return -ENOMEM;
+
+ ents_off = offsetof(struct vfio_pci_tph_st, ents);
+ if (copy_from_user(ents, uarg + ents_off, size)) {
+ err = -EFAULT;
+ goto out;
+ }
+
+ for (i = 0; i < st.count; i++) {
+ /* Check reserved fields and index are zero */
+ if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
+ memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1)) ||
+ ents[i].index != 0) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_VM) {
+ mtype = TPH_MEM_TYPE_VM;
+ } else if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_PM) {
+ mtype = TPH_MEM_TYPE_PM;
+ } else {
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = pcie_tph_get_cpu_st(pdev, mtype, ents[i].cpu,
+ &ents[i].st);
+ if (err)
+ goto out;
+ }
+
+ if (copy_to_user(uarg + ents_off, ents, size))
+ err = -EFAULT;
+
+out:
+ kvfree(ents);
+ return err;
+}
+
static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
void __user *uarg)
{
@@ -1551,6 +1622,8 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
case VFIO_PCI_TPH_DISABLE:
return vfio_pci_tph_disable(vdev);
+ case VFIO_PCI_TPH_GET_ST:
+ return vfio_pci_tph_get_st(vdev, &op, uarg + minsz);
default:
/* Other ops are not implemented yet */
return -EINVAL;
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 6/6] vfio/pci: Add PCIe TPH SET_ST interface
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
` (4 preceding siblings ...)
2026-05-07 13:09 ` [PATCH v7 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
@ 2026-05-07 13:09 ` Chengwen Feng
2026-05-08 0:52 ` sashiko-bot
5 siblings, 1 reply; 12+ messages in thread
From: Chengwen Feng @ 2026-05-07 13:09 UTC (permalink / raw)
To: alex, jgg
Cc: wathsala.vithanage, helgaas, wei.huang2, wangzhou1, wangyushan12,
liuyonglong, kvm, linux-pci
Add VFIO_PCI_TPH_SET_ST operation to support batch programming of steering
tag entries. If any entry fails, roll back successfully programmed entries
to 0 to prevent inconsistent device state.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
drivers/vfio/pci/vfio_pci_core.c | 86 ++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index c328515bcaaf..5d10de546d5c 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1604,6 +1604,90 @@ static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
return err;
}
+static int vfio_pci_tph_set_st(struct vfio_pci_core_device *vdev,
+ struct vfio_device_pci_tph_op *op,
+ void __user *uarg)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_tph_entry *ents;
+ struct vfio_pci_tph_st st;
+ enum tph_mem_type mtype;
+ size_t size, ents_off;
+ int i = 0, j, err;
+ u32 tab_loc;
+ u16 st_val;
+
+ tab_loc = pcie_tph_get_st_table_loc(pdev);
+ if (tab_loc != PCI_TPH_LOC_CAP && tab_loc != PCI_TPH_LOC_MSIX)
+ return -EOPNOTSUPP;
+
+ if (copy_from_user(&st, uarg, sizeof(st)))
+ return -EFAULT;
+
+ if (!st.count || st.count > VFIO_PCI_TPH_MAX_ENTRIES)
+ return -EINVAL;
+
+ /* Check reserved fields are zero */
+ if (memchr_inv(&st.reserved, 0, sizeof(st.reserved)))
+ return -EINVAL;
+
+ size = st.count * sizeof(*ents);
+ if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) + size)
+ return -EINVAL;
+
+ ents = kvmalloc(size, GFP_KERNEL);
+ if (!ents)
+ return -ENOMEM;
+
+ ents_off = offsetof(struct vfio_pci_tph_st, ents);
+ if (copy_from_user(ents, uarg + ents_off, size)) {
+ err = -EFAULT;
+ goto out;
+ }
+
+ for (; i < st.count; i++) {
+ /* Check reserved fields and st are zero */
+ if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
+ memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1)) ||
+ ents[i].st != 0) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (ents[i].cpu == U32_MAX) {
+ err = pcie_tph_set_st_entry(pdev, ents[i].index, 0);
+ if (err)
+ goto out;
+ continue;
+ }
+
+ if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_VM) {
+ mtype = TPH_MEM_TYPE_VM;
+ } else if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_PM) {
+ mtype = TPH_MEM_TYPE_PM;
+ } else {
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = pcie_tph_get_cpu_st(pdev, mtype, ents[i].cpu, &st_val);
+ if (err)
+ goto out;
+ err = pcie_tph_set_st_entry(pdev, ents[i].index, st_val);
+ if (err)
+ goto out;
+ }
+
+out:
+ if (err) {
+ /* Roll back previously programmed entries to 0 */
+ for (j = 0; j < i; j++)
+ pcie_tph_set_st_entry(pdev, ents[j].index, 0);
+ }
+ kvfree(ents);
+ return err;
+}
+
static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
void __user *uarg)
{
@@ -1624,6 +1708,8 @@ static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
return vfio_pci_tph_disable(vdev);
case VFIO_PCI_TPH_GET_ST:
return vfio_pci_tph_get_st(vdev, &op, uarg + minsz);
+ case VFIO_PCI_TPH_SET_ST:
+ return vfio_pci_tph_set_st(vdev, &op, uarg + minsz);
default:
/* Other ops are not implemented yet */
return -EINVAL;
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v7 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use
2026-05-07 13:09 ` [PATCH v7 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
@ 2026-05-07 22:19 ` sashiko-bot
0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-07 22:19 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The `pcie_tph_get_st_modes()` and `pcie_tph_get_st_table_loc()` functions ignore the return value of `pci_read_config_dword()`, leading to fabricated capabilities being exposed to userspace.
- [Low] The commit message incorrectly refers to the configuration option as `!CONFIG_PCI_TPH` instead of `!CONFIG_PCIE_TPH`.
--
commit 4a10965994892856e6b789d0c75e23d0e2e09be7
Author: Chengwen Feng <fengchengwen@huawei.com>
PCI/TPH: Export pcie_tph_get_st_modes() for external use
This commit exports the pcie_tph_get_st_modes() helper so drivers like VFIO
can query and expose device capabilities to userspace. It also adds stub
functions and tph_cap validation to prevent invalid configuration access.
> Add stub functions for pcie_tph_get_st_table_size() and
> pcie_tph_get_st_table_loc() when !CONFIG_PCI_TPH.
This isn't a bug, but there is a typo in the configuration option name.
The actual kernel configuration option for PCIe TLP Processing Hints is
CONFIG_PCIE_TPH rather than CONFIG_PCI_TPH.
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index 877cf556242b4..ba31b010f67ad 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
[ ... ]
> -static u8 get_st_modes(struct pci_dev *pdev)
> +/**
> + * pcie_tph_get_st_modes - Get supported Steering Tag modes
> + * @pdev: PCI device to query
> + *
> + * Return:
> + * Bitmask of supported ST modes (PCI_TPH_CAP_ST_NS, PCI_TPH_CAP_ST_IV,
> + * PCI_TPH_CAP_ST_DS)
> + */
> +u8 pcie_tph_get_st_modes(struct pci_dev *pdev)
> {
> u32 reg;
>
> + if (!pdev->tph_cap)
> + return 0;
> +
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
Should we check the return value of pci_read_config_dword() here?
If the device is disconnected or unresponsive, pci_read_config_dword()
returns an error and populates the variable with the standard PCI error
response 0xFFFFFFFF.
When the mask (PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS)
is applied to 0xFFFFFFFF, it results in 7. This falsely indicates that all
Steering Tag modes are supported.
Since this function is now exported for external subsystems like VFIO, does
this risk exposing fabricated hardware capabilities to userspace when a
device becomes unresponsive?
> reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS;
>
> return reg;
> }
> +EXPORT_SYMBOL(pcie_tph_get_st_modes);
[ ... ]
> u32 pcie_tph_get_st_table_loc(struct pci_dev *pdev)
> {
> u32 reg;
>
> + if (!pdev->tph_cap)
> + return PCI_TPH_LOC_NONE;
> +
> pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®);
Similarly, does this need error checking for the config read?
Applying the PCI_TPH_CAP_LOC_MASK (3 << 9) to 0xFFFFFFFF results in 0x600,
which is an invalid, undefined location enum.
>
> return reg & PCI_TPH_CAP_LOC_MASK;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260507130956.34441-1-fengchengwen@huawei.com?part=2
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/6] vfio/pci: Add PCIe TPH interface with capability query
2026-05-07 13:09 ` [PATCH v7 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
@ 2026-05-07 23:20 ` sashiko-bot
0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-07 23:20 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The ST table size (`st_table_sz`) is incorrectly reported as 0 when the Steering Tag (ST) table is located in the MSI-X table.
--
commit 84cffb4ace52d740319be3c3a2f634b68268963c
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add PCIe TPH interface with capability query
This commit adds the VFIO_DEVICE_PCI_TPH IOCTL to allow userspace to query
device TPH capabilities, supported modes, and steering tag table information.
It also introduces a module parameter to restrict unsafe device-specific TPH
mode to trusted userspace only.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3f8d093aacf8a..e7efa8f230bec 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1461,6 +1463,55 @@ static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> ioeventfd.fd);
> }
>
> +static int vfio_pci_tph_get_cap(struct vfio_pci_core_device *vdev,
> + struct vfio_device_pci_tph_op *op,
> + void __user *uarg)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + struct vfio_pci_tph_cap cap = {0};
> + u8 mode;
> +
> + if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, cap))
> + return -EINVAL;
> +
> + mode = pcie_tph_get_st_modes(pdev);
> + if (!enable_unsafe_tph_ds_mode)
> + mode &= ~PCI_TPH_CAP_ST_DS;
> + if (mode == 0 || mode == PCI_TPH_CAP_ST_NS)
> + return -EOPNOTSUPP;
> +
> + if (mode & PCI_TPH_CAP_ST_IV)
> + cap.supported_modes |= VFIO_PCI_TPH_MODE_IV;
> + if (mode & PCI_TPH_CAP_ST_DS)
> + cap.supported_modes |= VFIO_PCI_TPH_MODE_DS;
> +
> + if (pcie_tph_get_st_table_loc(pdev) != PCI_TPH_LOC_NONE)
> + cap.st_table_sz = pcie_tph_get_st_table_size(pdev);
Is it possible that st_table_sz is incorrectly reported as 0 here when the
Steering Tag (ST) table is located in the MSI-X table?
The kernel function pcie_tph_get_st_table_size() returns 0 if the ST table
location is anything other than PCI_TPH_LOC_CAP, meaning it returns 0 for
PCI_TPH_LOC_MSIX.
As a result, userspace would receive st_table_sz == 0, which the UAPI
documentation explicitly defines as 'no ST table'.
Furthermore, because the location is not PCI_TPH_LOC_NONE, any subsequent
userspace attempt to use VFIO_PCI_TPH_GET_ST would be rejected by the kernel
with -EOPNOTSUPP.
This would leave userspace completely unable to properly program the ST table
using SET_ST because it falsely believes the table size is 0 and cannot query
the CPU tags.
Should the MSI-X table size (e.g., via pci_msix_vec_count(pdev)) be used to
populate st_table_sz when the location is PCI_TPH_LOC_MSIX?
> +
> + if (copy_to_user(uarg, &cap, sizeof(cap)))
> + return -EFAULT;
> +
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260507130956.34441-1-fengchengwen@huawei.com?part=3
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 4/6] vfio/pci: Add PCIe TPH enable/disable support
2026-05-07 13:09 ` [PATCH v7 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
@ 2026-05-07 23:49 ` sashiko-bot
0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-07 23:49 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] PCIe configuration accesses for TPH in device register/close paths occur while the device may be suspended (D3).
- [High] Missing Runtime PM wake-up in TPH ioctl handlers allows hardware access on suspended devices.
--
commit 7ffb59002e7aaac5444cc6452b34924776dcf634
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add PCIe TPH enable/disable support
This commit adds support to enable and disable the PCIe TPH function
with mode selection, restricting unsafe modes. It also disables TPH when
ownership is taken or when userspace closes the device file descriptor.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index e7efa8f230bec..7a5dc2bfe2e9d 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -738,6 +738,9 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> #endif
> vfio_pci_dma_buf_cleanup(vdev);
>
> + /* Disable TPH when userspace closes the device FD */
> + pcie_disable_tph(vdev->pdev);
> +
Is it safe to call pcie_disable_tph() here before vfio_pci_core_disable()
wakes the device up?
A device might be in a suspended state when its file descriptor is closed.
vfio_pci_core_disable() explicitly handles waking the device up via
pm_runtime_resume(&pdev->dev).
If the device is in D3, won't accessing the configuration space cause
dropped writes or PCIe bus faults?
> vfio_pci_core_disable(vdev);
>
> mutex_lock(&vdev->igate);
[ ... ]
> static int vfio_pci_ioctl_tph(struct vfio_pci_core_device *vdev,
> void __user *uarg)
> {
> struct vfio_device_pci_tph_op op = {0};
> size_t minsz = sizeof(op.argsz) + sizeof(op.op);
>
> + guard(mutex)(&vdev->igate);
> +
Could this ioctl be called while the device is suspended?
Neither vfio_pci_core_ioctl() nor vfio_pci_ioctl_tph() appears to call
pm_runtime_resume_and_get() before dispatching these ioctls.
Since userspace can place the device into a low-power state, accessing
the TPH configuration space registers via pcie_tph_get_st_modes(),
pcie_enable_tph(), or pcie_disable_tph() might read invalid data or
drop writes.
> if (copy_from_user(&op, uarg, minsz))
> return -EFAULT;
>
> switch (op.op) {
> case VFIO_PCI_TPH_GET_CAP:
> return vfio_pci_tph_get_cap(vdev, &op, uarg + minsz);
> + case VFIO_PCI_TPH_ENABLE:
> + return vfio_pci_tph_enable(vdev, &op, uarg + minsz);
> + case VFIO_PCI_TPH_DISABLE:
> + return vfio_pci_tph_disable(vdev);
> default:
> /* Other ops are not implemented yet */
> return -EINVAL;
[ ... ]
> @@ -2258,6 +2303,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> if (!disable_idle_d3)
> pm_runtime_put(dev);
>
> + /* Disable TPH when taking over ownership of the device */
> + pcie_disable_tph(pdev);
> +
Does this call to pcie_disable_tph() happen after the device may have
already entered a low-power state?
pm_runtime_put() can asynchronously drop the PM usage counter and allow
the device to enter D3.
Shouldn't the TPH disablement occur before dropping the PM reference to
avoid accessing the configuration space of a suspended device?
> ret = vfio_register_group_dev(&vdev->vdev);
> if (ret)
> goto out_power;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260507130956.34441-1-fengchengwen@huawei.com?part=4
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 5/6] vfio/pci: Add PCIe TPH GET_ST interface
2026-05-07 13:09 ` [PATCH v7 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
@ 2026-05-08 0:18 ` sashiko-bot
0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-08 0:18 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Missing op->argsz validation before copying the st structure from user space.
--
commit 3e63e4fb1478346ccd55bac2783b7a7e1dce2cc3
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add PCIe TPH GET_ST interface
Add support to batch get CPU steering tags for device-specific TPH mode
that does not implement an ST table. This interface requires enabling the
'enable_unsafe_tph_ds_mode' module parameter.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1533,6 +1533,77 @@ static int vfio_pci_tph_disable(struct vfio_pci_core_device *vdev)
> +static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
> + struct vfio_device_pci_tph_op *op,
> + void __user *uarg)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + struct vfio_pci_tph_entry *ents;
> + struct vfio_pci_tph_st st;
> + enum tph_mem_type mtype;
> + size_t size, ents_off;
> + int i, err;
> +
> + if (!enable_unsafe_tph_ds_mode ||
> + pcie_tph_get_st_table_loc(pdev) != PCI_TPH_LOC_NONE)
> + return -EOPNOTSUPP;
> +
> + if (copy_from_user(&st, uarg, sizeof(st)))
> + return -EFAULT;
Should this code validate op->argsz before copying the structure from user
space?
If a user program passes an argsz that is smaller than sizeof(st), and
the provided pointer happens to be at the very end of a mapped page,
copy_from_user() will attempt to read into unmapped memory and return
-EFAULT instead of the expected -EINVAL.
Other VFIO ioctl implementations, such as vfio_pci_tph_enable(), check
op->argsz before calling copy_from_user().
> +
> + /* Check reserved fields are zero */
> + if (memchr_inv(&st.reserved, 0, sizeof(st.reserved)))
> + return -EINVAL;
> +
> + if (!st.count || st.count > VFIO_PCI_TPH_MAX_ENTRIES)
> + return -EINVAL;
> +
> + size = st.count * sizeof(*ents);
> + if (op->argsz < offsetofend(struct vfio_device_pci_tph_op, st) + size)
> + return -EINVAL;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260507130956.34441-1-fengchengwen@huawei.com?part=5
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 6/6] vfio/pci: Add PCIe TPH SET_ST interface
2026-05-07 13:09 ` [PATCH v7 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
@ 2026-05-08 0:52 ` sashiko-bot
0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-08 0:52 UTC (permalink / raw)
To: Chengwen Feng; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The rollback logic for successfully programmed steering tag entries silently fails when an entry write fails, leaving the device in an inconsistent state.
- [Medium] The `mem_type` field of `struct vfio_pci_tph_entry` is not validated when `cpu == U32_MAX`, breaking future ABI extensibility.
--
commit b4acf86102a9bd7b08e0de22fe836e8de9341956
Author: Chengwen Feng <fengchengwen@huawei.com>
vfio/pci: Add PCIe TPH SET_ST interface
This commit adds the VFIO_PCI_TPH_SET_ST operation to support batch
programming of steering tag entries. It includes logic to roll back
successfully programmed entries to 0 if any entry fails, aiming to prevent
inconsistent device states.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index c328515bcaaff..5d10de546d5c6 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
[ ... ]
> @@ -1604,6 +1604,90 @@ static int vfio_pci_tph_get_st(struct vfio_pci_core_device *vdev,
> + for (; i < st.count; i++) {
> + /* Check reserved fields and st are zero */
> + if (memchr_inv(&ents[i].reserved0, 0, sizeof(ents[i].reserved0)) ||
> + memchr_inv(&ents[i].reserved1, 0, sizeof(ents[i].reserved1)) ||
> + ents[i].st != 0) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + if (ents[i].cpu == U32_MAX) {
> + err = pcie_tph_set_st_entry(pdev, ents[i].index, 0);
> + if (err)
> + goto out;
> + continue;
If userspace sets cpu to U32_MAX to clear a steering tag, does this bypass
the validation of mem_type?
By skipping validation and allowing userspace to pass uninitialized data in
this field, could this break forward compatibility if mem_type needs to be
repurposed later for this case?
> + }
> +
> + if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_VM) {
> + mtype = TPH_MEM_TYPE_VM;
> + } else if (ents[i].mem_type == VFIO_PCI_TPH_MEM_TYPE_PM) {
> + mtype = TPH_MEM_TYPE_PM;
> + } else {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + err = pcie_tph_get_cpu_st(pdev, mtype, ents[i].cpu, &st_val);
> + if (err)
> + goto out;
> + err = pcie_tph_set_st_entry(pdev, ents[i].index, st_val);
> + if (err)
> + goto out;
> + }
> +
> +out:
> + if (err) {
> + /* Roll back previously programmed entries to 0 */
> + for (j = 0; j < i; j++)
> + pcie_tph_set_st_entry(pdev, ents[j].index, 0);
If a prior call to pcie_tph_set_st_entry() fails (e.g., due to an out of
bounds index) and disables TPH on the device internally, will these rollback
calls silently fail?
Since pcie_tph_set_st_entry() checks whether TPH is enabled and returns
-EINVAL if it is disabled, the partially programmed entries might remain on
the device without being cleared.
If the user later re-enables TPH, could the device end up using an
inconsistent steering tag table?
> + }
> + kvfree(ents);
> + return err;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260507130956.34441-1-fengchengwen@huawei.com?part=6
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-05-08 0:52 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 13:09 [PATCH v7 0/6] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 1/6] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-07 13:09 ` [PATCH v7 2/6] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-07 22:19 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 3/6] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
2026-05-07 23:20 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 4/6] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
2026-05-07 23:49 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 5/6] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
2026-05-08 0:18 ` sashiko-bot
2026-05-07 13:09 ` [PATCH v7 6/6] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
2026-05-08 0:52 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox