public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/2] Retrieve tph from dmabuf for PCIe P2P memory access
@ 2026-03-24 23:46 Zhiping Zhang
  2026-03-24 23:46 ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Zhiping Zhang
  2026-03-24 23:46 ` [RFC v2 2/2] RDMA/mlx5: get tph for p2p access when registering dmabuf mr Zhiping Zhang
  0 siblings, 2 replies; 7+ messages in thread
From: Zhiping Zhang @ 2026-03-24 23:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen,
	Yishai Hadas
  Cc: Bjorn Helgaas, Zhiping Zhang

Currently, the steering tag can be used for a CPU on the motherboard; the
ACPI check is in place to query and obtain the supported tph settings. Here
we intend to use the tph info to improve RDMA NIC memory access on a vfio-based
accelerator device via PCIe peer-to-peer. When an application registers a
RDMA memory region with DMABUF for the RDMA NIC to access the device memory,
the tph info of the memory region can be retrieved and used to set the
steering tag / process hint (ph). Additional instructions or hints can be
passed to the GPU or accelerator device for advanced memory operations,
such as, read cache selection.

Note this RFC is for the discussion on the direction and is not intended to be
a complete implementation. If no objection, we will convert it to a Patch set.

Changes v1 -> v2:
- Encode steering tag and ph in vfio_device_feature_dma_buf flags field
  instead of adding new uapi struct fields, to preserve ABI compatibility
- Fix subject prefixes: "Vfio:" -> "vfio:", "RMDA MLX5:" -> "RDMA/mlx5:"
- Fix raw steering tag vs st_index mismatch: get_tph() now returns a raw
  steering tag, and the mlx5 consumer converts it to an st_index via the
  new mlx5_st_alloc_index_by_tag() API
- Fix @tph doc typo to @steering_tag in dma-buf.h
- Remove unused variable, fix parameter alignment, fix trailing semicolon

Zhiping Zhang (2):
  vfio: add callback to get tph info for dmabuf
  RDMA/mlx5: get tph for p2p access when registering dmabuf mr

 drivers/infiniband/hw/mlx5/mr.c               | 34 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 23 +++++++++----
 drivers/vfio/pci/vfio_pci_dmabuf.c            | 26 ++++++++++++--
 include/linux/dma-buf.h                       | 30 ++++++++++++++++
 include/linux/mlx5/driver.h                   |  7 ++++
 include/uapi/linux/vfio.h                     |  9 +++--
 6 files changed, 118 insertions(+), 11 deletions(-)

--
2.52.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-24 23:46 [RFC v2 0/2] Retrieve tph from dmabuf for PCIe P2P memory access Zhiping Zhang
@ 2026-03-24 23:46 ` Zhiping Zhang
  2026-03-25  8:25   ` Leon Romanovsky
  2026-03-28  2:21   ` fengchengwen
  2026-03-24 23:46 ` [RFC v2 2/2] RDMA/mlx5: get tph for p2p access when registering dmabuf mr Zhiping Zhang
  1 sibling, 2 replies; 7+ messages in thread
From: Zhiping Zhang @ 2026-03-24 23:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen,
	Yishai Hadas
  Cc: Bjorn Helgaas, Zhiping Zhang

This patch adds a callback to get the tph info on DMA buffer exporters.
The tph info includes both the steering tag and the process hint (ph).

The steering tag and ph are encoded in the flags field of
vfio_device_feature_dma_buf instead of adding new fields to the uapi
struct, to preserve ABI compatibility.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
---
 drivers/vfio/pci/vfio_pci_dmabuf.c | 26 ++++++++++++++++++++++++--
 include/linux/dma-buf.h            | 30 ++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h          |  9 +++++++--
 3 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 478beafc6ac3..c45cb3884b85 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
 	struct phys_vec *phys_vec;
 	struct p2pdma_provider *provider;
 	u32 nr_ranges;
+	u16 steering_tag;
+	u8 ph;
 	u8 revoked : 1;
 };

@@ -60,6 +62,15 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
 				       priv->size, dir);
 }

+static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
+				    u8 *ph)
+{
+	struct vfio_pci_dma_buf *priv = dmabuf->priv;
+	*steering_tag = priv->steering_tag;
+	*ph = priv->ph;
+	return 0;
+}
+
 static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
 				   struct sg_table *sgt,
 				   enum dma_data_direction dir)
@@ -90,6 +101,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
 	.unpin = vfio_pci_dma_buf_unpin,
 	.attach = vfio_pci_dma_buf_attach,
 	.map_dma_buf = vfio_pci_dma_buf_map,
+	.get_tph = vfio_pci_dma_buf_get_tph,
 	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
 	.release = vfio_pci_dma_buf_release,
 };
@@ -228,7 +240,10 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
 		return -EFAULT;

-	if (!get_dma_buf.nr_ranges || get_dma_buf.flags)
+	if (!get_dma_buf.nr_ranges ||
+	    (get_dma_buf.flags & ~(VFIO_DMABUF_FL_TPH |
+				   VFIO_DMABUF_TPH_PH_MASK |
+				   VFIO_DMABUF_TPH_ST_MASK)))
 		return -EINVAL;

 	/*
@@ -285,7 +300,14 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 		ret = PTR_ERR(priv->dmabuf);
 		goto err_dev_put;
 	}
-
+	if (get_dma_buf.flags & VFIO_DMABUF_FL_TPH) {
+		priv->steering_tag = (get_dma_buf.flags &
+				      VFIO_DMABUF_TPH_ST_MASK) >>
+				     VFIO_DMABUF_TPH_ST_SHIFT;
+		priv->ph = (get_dma_buf.flags &
+			    VFIO_DMABUF_TPH_PH_MASK) >>
+			   VFIO_DMABUF_TPH_PH_SHIFT;
+	}
 	/* dma_buf_put() now frees priv */
 	INIT_LIST_HEAD(&priv->dmabufs_elm);
 	down_write(&vdev->memory_lock);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 133b9e637b55..26705c83ad80 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -113,6 +113,36 @@ struct dma_buf_ops {
 	 */
 	void (*unpin)(struct dma_buf_attachment *attach);

+	/**
+	 * @get_tph:
+	 *
+	 * Get the TPH (TLP Processing Hints) for this DMA buffer.
+	 *
+	 * This callback allows DMA buffer exporters to provide TPH including
+	 * both the steering tag and the process hints (ph), which can be used
+	 * to optimize peer-to-peer (P2P) memory access. The TPH info is typically
+	 * used in scenarios where:
+	 * - A PCIe device (e.g., RDMA NIC) needs to access memory on another
+	 *   PCIe device (e.g., GPU),
+	 * - The system supports TPH and can use steering tags / ph to optimize
+	 *   cache placement and memory access patterns,
+	 * - The memory is exported via DMABUF for cross-device sharing.
+	 *
+	 * @dmabuf: [in] The DMA buffer for which to retrieve TPH
+	 * @steering_tag: [out] Pointer to store the 16-bit TPH steering tag value
+	 * @ph: [out] Pointer to store the 8-bit TPH processing-hint value
+	 *
+	 * Returns:
+	 * * 0 - Success, steering tag stored in @steering_tag
+	 * * -EOPNOTSUPP - TPH steering tags not supported for this buffer
+	 * * -EINVAL - Invalid parameters
+	 *
+	 * This callback is optional. If not implemented, the buffer does not
+	 * support TPH.
+	 *
+	 */
+	int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph);
+
 	/**
 	 * @map_dma_buf:
 	 *
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index bb7b89330d35..e2a8962641d2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1505,8 +1505,13 @@ struct vfio_region_dma_range {
 struct vfio_device_feature_dma_buf {
 	__u32	region_index;
 	__u32	open_flags;
-	__u32   flags;
-	__u32   nr_ranges;
+	__u32	flags;
+#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
+#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
+#define VFIO_DMABUF_TPH_PH_MASK	0x6U
+#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
+#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U
+	__u32	nr_ranges;
 	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
 };

--
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC v2 2/2] RDMA/mlx5: get tph for p2p access when registering dmabuf mr
  2026-03-24 23:46 [RFC v2 0/2] Retrieve tph from dmabuf for PCIe P2P memory access Zhiping Zhang
  2026-03-24 23:46 ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Zhiping Zhang
@ 2026-03-24 23:46 ` Zhiping Zhang
  1 sibling, 0 replies; 7+ messages in thread
From: Zhiping Zhang @ 2026-03-24 23:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen,
	Yishai Hadas
  Cc: Bjorn Helgaas, Zhiping Zhang

This patch adds support to retrieve tph info from dmabuf during mr
registration for P2P memory access. A new helper get_tph_mr_dmabuf()
queries the dmabuf exporter for tph (steering tag and processing hint)
and converts the raw steering tag to an st_index via
mlx5_st_alloc_index_by_tag(). The DMAH workflow for CPU still takes
precedence in the process.

The new mlx5_st_alloc_index_by_tag() API is extracted from
mlx5_st_alloc_index() to allow callers that already have a raw steering
tag value (e.g., from a dmabuf) to allocate an st_index directly,
without requiring a CPU ID and memory type lookup.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
---
 drivers/infiniband/hw/mlx5/mr.c               | 34 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 23 +++++++++----
 include/linux/mlx5/driver.h                   |  7 ++++
 3 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 665323b90b64..041922ba3bff 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -46,6 +46,8 @@
 #include "data_direct.h"
 #include "dmah.h"

+MODULE_IMPORT_NS("DMA_BUF");
+
 enum {
 	MAX_PENDING_REG_MR = 8,
 };
@@ -1622,6 +1624,36 @@ static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
 	.move_notify = mlx5_ib_dmabuf_invalidate_cb,
 };

+static void get_tph_mr_dmabuf(struct mlx5_ib_dev *dev, int fd, u16 *st_index,
+			      u8 *ph)
+{
+	struct dma_buf *dmabuf;
+	u16 steering_tag;
+	int ret;
+
+	dmabuf = dma_buf_get(fd);
+	if (IS_ERR(dmabuf))
+		return;
+
+	if (!dmabuf->ops->get_tph)
+		goto end_dbuf_put;
+
+	ret = dmabuf->ops->get_tph(dmabuf, &steering_tag, ph);
+	if (ret) {
+		mlx5_ib_dbg(dev, "get_tph failed (%d)\n", ret);
+		goto end_dbuf_put;
+	}
+
+	ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index);
+	if (ret) {
+		*ph = MLX5_IB_NO_PH;
+		mlx5_ib_dbg(dev, "st_alloc_index_by_tag failed (%d)\n", ret);
+	}
+
+end_dbuf_put:
+	dma_buf_put(dmabuf);
+}
+
 static struct ib_mr *
 reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device,
 		   u64 offset, u64 length, u64 virt_addr,
@@ -1664,6 +1696,8 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device,
 		ph = dmah->ph;
 		if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS))
 			st_index = mdmah->st_index;
+	} else {
+		get_tph_mr_dmabuf(dev, fd, &st_index, &ph);
 	}

 	mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
index 997be91f0a13..112c55ede731 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
@@ -92,23 +92,18 @@ void mlx5_st_destroy(struct mlx5_core_dev *dev)
 	kfree(st);
 }

-int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
-			unsigned int cpu_uid, u16 *st_index)
+int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag,
+			       u16 *st_index)
 {
 	struct mlx5_st_idx_data *idx_data;
 	struct mlx5_st *st = dev->st;
 	unsigned long index;
 	u32 xa_id;
-	u16 tag;
 	int ret;

 	if (!st)
 		return -EOPNOTSUPP;

-	ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag);
-	if (ret)
-		return ret;
-
 	if (st->direct_mode) {
 		*st_index = tag;
 		return 0;
@@ -152,6 +147,20 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
 	mutex_unlock(&st->lock);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(mlx5_st_alloc_index_by_tag);
+
+int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
+			unsigned int cpu_uid, u16 *st_index)
+{
+	u16 tag;
+	int ret;
+
+	ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag);
+	if (ret)
+		return ret;
+
+	return mlx5_st_alloc_index_by_tag(dev, tag, st_index);
+}
 EXPORT_SYMBOL_GPL(mlx5_st_alloc_index);

 int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 04dcd09f7517..c1d2d603bd96 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1177,10 +1177,17 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type
 			   u64 length, u16 uid, phys_addr_t addr, u32 obj_id);

 #ifdef CONFIG_PCIE_TPH
+int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag,
+			       u16 *st_index);
 int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
 			unsigned int cpu_uid, u16 *st_index);
 int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index);
 #else
+static inline int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev,
+					     u16 tag, u16 *st_index)
+{
+	return -EOPNOTSUPP;
+}
 static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev,
 				      enum tph_mem_type mem_type,
 				      unsigned int cpu_uid, u16 *st_index)
--
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-24 23:46 ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Zhiping Zhang
@ 2026-03-25  8:25   ` Leon Romanovsky
  2026-03-26 22:41     ` Keith Busch
  2026-03-28  2:21   ` fengchengwen
  1 sibling, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2026-03-25  8:25 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Bjorn Helgaas

On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> This patch adds a callback to get the tph info on DMA buffer exporters.
> The tph info includes both the steering tag and the process hint (ph).
> 
> The steering tag and ph are encoded in the flags field of
> vfio_device_feature_dma_buf instead of adding new fields to the uapi
> struct, to preserve ABI compatibility.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 26 ++++++++++++++++++++++++--
>  include/linux/dma-buf.h            | 30 ++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h          |  9 +++++++--
>  3 files changed, 61 insertions(+), 4 deletions(-)

<...>

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..e2a8962641d2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1505,8 +1505,13 @@ struct vfio_region_dma_range {
>  struct vfio_device_feature_dma_buf {
>  	__u32	region_index;
>  	__u32	open_flags;
> -	__u32   flags;
> -	__u32   nr_ranges;
> +	__u32	flags;
> +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U

This extension of flags is basically kills future extension of this
struct for anything that includes TPH.

Add new
enum vfio_device_feature_dma_buf_flags {
    VFIO_DMABUF_FL_TPH  = 1 << 0
}

> +	__u32	nr_ranges;

add your "__u16 steering_tag" and "__u8 ph" fields here.

>  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
>  };
> 
> --
> 2.52.0
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-25  8:25   ` Leon Romanovsky
@ 2026-03-26 22:41     ` Keith Busch
  2026-03-26 22:55       ` Zhiping Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2026-03-26 22:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Wed, Mar 25, 2026 at 10:25:34AM +0200, Leon Romanovsky wrote:
> On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> >  struct vfio_device_feature_dma_buf {
> >  	__u32	region_index;
> >  	__u32	open_flags;
> > -	__u32   flags;
> > -	__u32   nr_ranges;
> > +	__u32	flags;
> > +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> > +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> > +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> > +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> > +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U
> 
> This extension of flags is basically kills future extension of this
> struct for anything that includes TPH.
> 
> Add new
> enum vfio_device_feature_dma_buf_flags {
>     VFIO_DMABUF_FL_TPH  = 1 << 0
> }
> 
> > +	__u32	nr_ranges;
> 
> add your "__u16 steering_tag" and "__u8 ph" fields here.

You're suggesting that Ziping append the new fields to the end of this
struct? I don't think we can modify the layout of a uapi.

If we can't carve the space for this out of the existing unused flags
field, I think we'd have to introduce a new vfio device feature that
basically copies VFIO_DEVICE_FEATURE_DMA_BUF with the extra hints
fields.
 
> >  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> >  };

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-26 22:41     ` Keith Busch
@ 2026-03-26 22:55       ` Zhiping Zhang
  0 siblings, 0 replies; 7+ messages in thread
From: Zhiping Zhang @ 2026-03-26 22:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: Leon Romanovsky, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Thu, Mar 26, 2026 at 3:41 PM Keith Busch <kbusch@kernel.org> wrote:
>
> >
> On Wed, Mar 25, 2026 at 10:25:34AM +0200, Leon Romanovsky wrote:
> > On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> > >  struct vfio_device_feature_dma_buf {
> > >     __u32   region_index;
> > >     __u32   open_flags;
> > > -   __u32   flags;
> > > -   __u32   nr_ranges;
> > > +   __u32   flags;
> > > +#define VFIO_DMABUF_FL_TPH         (1U << 0) /* TPH info is present */
> > > +#define VFIO_DMABUF_TPH_PH_SHIFT   1         /* bits 1-2: PH (2-bit) */
> > > +#define VFIO_DMABUF_TPH_PH_MASK    0x6U
> > > +#define VFIO_DMABUF_TPH_ST_SHIFT   16        /* bits 16-31: steering tag */
> > > +#define VFIO_DMABUF_TPH_ST_MASK            0xffff0000U
> >
> > This extension of flags is basically kills future extension of this
> > struct for anything that includes TPH.
> >
> > Add new
> > enum vfio_device_feature_dma_buf_flags {
> >     VFIO_DMABUF_FL_TPH  = 1 << 0
> > }

yes we can do that.

> >
> > > +   __u32   nr_ranges;
> >
> > add your "__u16 steering_tag" and "__u8 ph" fields here.
>
That is what I did in V1, Leon.

> You're suggesting that Ziping append the new fields to the end of this
> struct? I don't think we can modify the layout of a uapi.
>
> If we can't carve the space for this out of the existing unused flags
> field, I think we'd have to introduce a new vfio device feature that
> basically copies VFIO_DEVICE_FEATURE_DMA_BUF with the extra hints
> fields.
>
if not using the fields in the flag, then we probably have to
introduce a new vfio
device feature.

> > >     struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> > >  };

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-24 23:46 ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Zhiping Zhang
  2026-03-25  8:25   ` Leon Romanovsky
@ 2026-03-28  2:21   ` fengchengwen
  1 sibling, 0 replies; 7+ messages in thread
From: fengchengwen @ 2026-03-28  2:21 UTC (permalink / raw)
  To: Zhiping Zhang, Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas,
	linux-rdma, linux-pci, netdev, dri-devel, Keith Busch,
	Yochai Cohen, Yishai Hadas
  Cc: Bjorn Helgaas

Hi Zhiping,

On 3/25/2026 7:46 AM, Zhiping Zhang wrote:
> This patch adds a callback to get the tph info on DMA buffer exporters.
> The tph info includes both the steering tag and the process hint (ph).
> 
> The steering tag and ph are encoded in the flags field of
> vfio_device_feature_dma_buf instead of adding new fields to the uapi
> struct, to preserve ABI compatibility.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 26 ++++++++++++++++++++++++--
>  include/linux/dma-buf.h            | 30 ++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h          |  9 +++++++--
>  3 files changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 478beafc6ac3..c45cb3884b85 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
>  	struct phys_vec *phys_vec;
>  	struct p2pdma_provider *provider;
>  	u32 nr_ranges;
> +	u16 steering_tag;
> +	u8 ph;
>  	u8 revoked : 1;
>  };
> 
> @@ -60,6 +62,15 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
>  				       priv->size, dir);
>  }
> 
> +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
> +				    u8 *ph)
> +{
> +	struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +	*steering_tag = priv->steering_tag;
> +	*ph = priv->ph;

If the dmabuf exporter don't provide st&ph, this ops should return error

> +	return 0;
> +}
> +
>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
>  				   struct sg_table *sgt,
>  				   enum dma_data_direction dir)
> @@ -90,6 +101,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
>  	.unpin = vfio_pci_dma_buf_unpin,
>  	.attach = vfio_pci_dma_buf_attach,
>  	.map_dma_buf = vfio_pci_dma_buf_map,
> +	.get_tph = vfio_pci_dma_buf_get_tph,
>  	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
>  	.release = vfio_pci_dma_buf_release,
>  };
> @@ -228,7 +240,10 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
>  		return -EFAULT;
> 
> -	if (!get_dma_buf.nr_ranges || get_dma_buf.flags)
> +	if (!get_dma_buf.nr_ranges ||
> +	    (get_dma_buf.flags & ~(VFIO_DMABUF_FL_TPH |
> +				   VFIO_DMABUF_TPH_PH_MASK |
> +				   VFIO_DMABUF_TPH_ST_MASK)))
>  		return -EINVAL;
> 
>  	/*
> @@ -285,7 +300,14 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  		ret = PTR_ERR(priv->dmabuf);
>  		goto err_dev_put;
>  	}
> -
> +	if (get_dma_buf.flags & VFIO_DMABUF_FL_TPH) {
> +		priv->steering_tag = (get_dma_buf.flags &
> +				      VFIO_DMABUF_TPH_ST_MASK) >>
> +				     VFIO_DMABUF_TPH_ST_SHIFT;
> +		priv->ph = (get_dma_buf.flags &
> +			    VFIO_DMABUF_TPH_PH_MASK) >>
> +			   VFIO_DMABUF_TPH_PH_SHIFT;
> +	}
>  	/* dma_buf_put() now frees priv */
>  	INIT_LIST_HEAD(&priv->dmabufs_elm);
>  	down_write(&vdev->memory_lock);
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 133b9e637b55..26705c83ad80 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -113,6 +113,36 @@ struct dma_buf_ops {
>  	 */
>  	void (*unpin)(struct dma_buf_attachment *attach);
> 
> +	/**
> +	 * @get_tph:
> +	 *
> +	 * Get the TPH (TLP Processing Hints) for this DMA buffer.
> +	 *
> +	 * This callback allows DMA buffer exporters to provide TPH including
> +	 * both the steering tag and the process hints (ph), which can be used
> +	 * to optimize peer-to-peer (P2P) memory access. The TPH info is typically
> +	 * used in scenarios where:
> +	 * - A PCIe device (e.g., RDMA NIC) needs to access memory on another
> +	 *   PCIe device (e.g., GPU),
> +	 * - The system supports TPH and can use steering tags / ph to optimize
> +	 *   cache placement and memory access patterns,
> +	 * - The memory is exported via DMABUF for cross-device sharing.
> +	 *
> +	 * @dmabuf: [in] The DMA buffer for which to retrieve TPH
> +	 * @steering_tag: [out] Pointer to store the 16-bit TPH steering tag value
> +	 * @ph: [out] Pointer to store the 8-bit TPH processing-hint value
> +	 *
> +	 * Returns:
> +	 * * 0 - Success, steering tag stored in @steering_tag
> +	 * * -EOPNOTSUPP - TPH steering tags not supported for this buffer
> +	 * * -EINVAL - Invalid parameters
> +	 *
> +	 * This callback is optional. If not implemented, the buffer does not
> +	 * support TPH.

It seemed already impl...

> +	 *
> +	 */
> +	int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph);
> +
>  	/**
>  	 * @map_dma_buf:
>  	 *
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..e2a8962641d2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1505,8 +1505,13 @@ struct vfio_region_dma_range {
>  struct vfio_device_feature_dma_buf {
>  	__u32	region_index;
>  	__u32	open_flags;
> -	__u32   flags;
> -	__u32   nr_ranges;
> +	__u32	flags;
> +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U
> +	__u32	nr_ranges;
>  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
>  };

Another question:
1\ PCIE protocol define 8bit and 16bit ST
2\ In host-device ST impl, the ACPI will provide 8bit and 16bit ST, the choice of which
   one to use depends on the minimum supported range of the device and the RP.
3\ So in this P2P scene, although exporter (e.g. GPU) support 16bit ST, but the consumer
   (e.g. RDMA NIC) only support 8bit this may lead to mis-match

> 
> --
> 2.52.0
> 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-28  2:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 23:46 [RFC v2 0/2] Retrieve tph from dmabuf for PCIe P2P memory access Zhiping Zhang
2026-03-24 23:46 ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Zhiping Zhang
2026-03-25  8:25   ` Leon Romanovsky
2026-03-26 22:41     ` Keith Busch
2026-03-26 22:55       ` Zhiping Zhang
2026-03-28  2:21   ` fengchengwen
2026-03-24 23:46 ` [RFC v2 2/2] RDMA/mlx5: get tph for p2p access when registering dmabuf mr Zhiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox