* [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access
@ 2026-04-30 20:06 Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
0 siblings, 2 replies; 9+ messages in thread
From: Zhiping Zhang @ 2026-04-30 20:06 UTC (permalink / raw)
To: Alex Williamson, Jason Gunthorpe, Leon Romanovsky
Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel,
Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
This series adds TLP Processing Hints (TPH) support to the VFIO dma-buf
export path, allowing importing drivers (e.g. mlx5) to use the exporter's
steering tag when performing peer-to-peer DMA into a VFIO-owned device.
Changes since v1:
- VFIO_DEVICE_FEATURE_DMA_BUF is now unchanged — dma_ranges[],
__counted_by(nr_ranges), and flags==0 are all preserved
- Added a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH (feature 13) as a separate
SET ioctl that takes a dmabuf fd, validates it belongs to this vfio
device, and stores the steering tag + processing hint under memory_lock
- Kept the dma_buf_ops.get_tph callback as the general exporter-side
interface for importing drivers
Patch 1 adds the dma-buf get_tph callback and the new vfio uAPI.
Patch 2 wires up the mlx5 RDMA driver as a consumer.
Previous links:
https://lore.kernel.org/linux-pci/20260324234615.3731237-1-zhipingz@meta.com/
https://lore.kernel.org/dri-devel/20260420183920.3626389-1-zhipingz@meta.com/
Zhiping Zhang (2):
vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature
RDMA/mlx5: get tph for p2p access when registering dma-buf mr
drivers/infiniband/hw/mlx5/mr.c | 38 +++++++
drivers/net/ethernet/mellanox/mlx5/core/lib/st.c | 25 +++--
drivers/vfio/pci/vfio_pci_core.c | 3 +
drivers/vfio/pci/vfio_pci_dmabuf.c | 65 ++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 11 ++
include/linux/dma-buf.h | 17 +++
include/linux/mlx5/driver.h | 7 ++
include/uapi/linux/vfio.h | 22 ++++
8 files changed, 180 insertions(+), 8 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature 2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang @ 2026-04-30 20:06 ` Zhiping Zhang 2026-05-04 21:44 ` Alex Williamson 2026-05-06 6:58 ` fengchengwen 2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang 1 sibling, 2 replies; 9+ messages in thread From: Zhiping Zhang @ 2026-04-30 20:06 UTC (permalink / raw) To: Alex Williamson, Jason Gunthorpe, Leon Romanovsky Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang Add a dma-buf callback that returns raw TPH metadata from the exporter so peer devices can reuse the steering tag and processing hint associated with a VFIO-exported buffer. Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing hint, validates the fd is a vfio-exported dma-buf belonging to this device, and stores the TPH values under memory_lock. This keeps the existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged. The user sequences setting TPH on the dma-buf before the importer consumes it. Add an st_width parameter to get_tph() so the exporter can reject steering tags that exceed the consumer's supported width (8 vs 16 bit). When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP. Signed-off-by: Zhiping Zhang <zhipingz@meta.com> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, return vfio_pci_core_feature_token(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_DMA_BUF: return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH: + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg, + argsz); default: return -ENOTTY; } diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -19,6 +19,9 @@ struct vfio_pci_dma_buf { u32 nr_ranges; struct kref kref; struct completion comp; + u16 steering_tag; + u8 ph; + u8 tph_present : 1; u8 revoked : 1; }; @@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, return ret; } +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag, + u8 *ph, u8 st_width) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + + if (!priv->tph_present) + return -EOPNOTSUPP; + + if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1)) + return -EINVAL; + + *steering_tag = priv->steering_tag; + *ph = priv->ph; + return 0; +} + static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, struct sg_table *sgt, enum dma_data_direction dir) @@ -101,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) static const struct dma_buf_ops vfio_pci_dmabuf_ops = { .attach = vfio_pci_dma_buf_attach, + .get_tph = vfio_pci_dma_buf_get_tph, .map_dma_buf = vfio_pci_dma_buf_map, .unmap_dma_buf = vfio_pci_dma_buf_unmap, .release = vfio_pci_dma_buf_release, @@ -331,6 +351,55 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, return ret; } +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, + u32 flags, + struct vfio_device_feature_dma_buf_tph __user *arg, + size_t argsz) +{ + struct vfio_device_feature_dma_buf_tph set_tph; + struct vfio_pci_dma_buf *priv; + struct dma_buf *dmabuf; + int ret; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, + sizeof(set_tph)); + if (ret != 1) + return ret; + + if (copy_from_user(&set_tph, arg, sizeof(set_tph))) + return -EFAULT; + + if (set_tph.reserved) + return -EINVAL; + + dmabuf = dma_buf_get(set_tph.dmabuf_fd); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + if (dmabuf->ops != &vfio_pci_dmabuf_ops) { + ret = -EINVAL; + goto out_put; + } + + priv = dmabuf->priv; + down_write(&vdev->memory_lock); + if (priv->vdev != vdev) { + ret = -EINVAL; + goto out_unlock; + } + + priv->steering_tag = set_tph.steering_tag; + priv->ph = set_tph.ph; + priv->tph_present = 1; + ret = 0; + +out_unlock: + up_write(&vdev->memory_lock); +out_put: + dma_buf_put(dmabuf); + return ret; +} + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) { struct vfio_pci_dma_buf *priv; diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -118,6 +118,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, struct vfio_device_feature_dma_buf __user *arg, size_t argsz); +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, + u32 flags, + struct vfio_device_feature_dma_buf_tph __user *arg, + size_t argsz); void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); #else @@ -128,6 +132,13 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, { return -ENOTTY; } +static inline int +vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf_tph __user *arg, + size_t argsz) +{ + return -ENOTTY; +} static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) { } diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -113,6 +113,23 @@ struct dma_buf_ops { */ void (*unpin)(struct dma_buf_attachment *attach); + /** + * @get_tph: + * @dmabuf: DMA buffer for which to retrieve TPH metadata + * @steering_tag: Returns the raw TPH steering tag + * @ph: Returns the TPH processing hint + * @st_width: Consumer's supported steering tag width in bits (8 or 16) + * + * Return the TPH (TLP Processing Hints) metadata associated with this + * DMA buffer. Exporters that do not provide TPH metadata should return + * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return + * -EINVAL. + * + * This callback is optional. + */ + int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph, + u8 st_width); + /** * @map_dma_buf: * diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1534,6 +1534,28 @@ struct vfio_device_feature_dma_buf { */ #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 +/** + * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata + * with a vfio-exported dma-buf. The dma-buf must have been created by + * VFIO_DEVICE_FEATURE_DMA_BUF on this device. + * + * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF. + * steering_tag and ph are the raw TPH values that importing drivers should use + * when accessing the buffer. + * + * The user must set TPH on the dma-buf before the importer consumes it. + * + * Return: 0 on success, -errno on failure. + */ +#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13 + +struct vfio_device_feature_dma_buf_tph { + __s32 dmabuf_fd; + __u16 steering_tag; + __u8 ph; + __u8 reserved; +}; + /* -------- API for Type1 VFIO IOMMU -------- */ /** ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature 2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang @ 2026-05-04 21:44 ` Alex Williamson 2026-05-05 6:54 ` Zhiping Zhang 2026-05-06 6:58 ` fengchengwen 1 sibling, 1 reply; 9+ messages in thread From: Alex Williamson @ 2026-05-04 21:44 UTC (permalink / raw) To: Zhiping Zhang Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, alex On Thu, 30 Apr 2026 13:06:56 -0700 Zhiping Zhang <zhipingz@meta.com> wrote: > Add a dma-buf callback that returns raw TPH metadata from the exporter > so peer devices can reuse the steering tag and processing hint > associated with a VFIO-exported buffer. > > Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from > VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing > hint, validates the fd is a vfio-exported dma-buf belonging to this > device, and stores the TPH values under memory_lock. This keeps the > existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged. > > The user sequences setting TPH on the dma-buf before the importer > consumes it. > > Add an st_width parameter to get_tph() so the exporter can reject > steering tags that exceed the consumer's supported width (8 vs 16 bit). > When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> The uAPI is better, but sashiko has some review comments[1] for you. Please also copy the kvm list for vfio related development. Thanks, Alex [1]https://sashiko.dev/#/patchset/20260430200704.352228-1-zhipingz@meta.com > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, > return vfio_pci_core_feature_token(vdev, flags, arg, argsz); > case VFIO_DEVICE_FEATURE_DMA_BUF: > return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); > + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH: > + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg, > + argsz); > default: > return -ENOTTY; > } > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c > @@ -19,6 +19,9 @@ struct vfio_pci_dma_buf { > u32 nr_ranges; > struct kref kref; > struct completion comp; > + u16 steering_tag; > + u8 ph; > + u8 tph_present : 1; > u8 revoked : 1; > }; > > @@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, > return ret; > } > > +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag, > + u8 *ph, u8 st_width) > +{ > + struct vfio_pci_dma_buf *priv = dmabuf->priv; > + > + if (!priv->tph_present) > + return -EOPNOTSUPP; > + > + if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1)) > + return -EINVAL; > + > + *steering_tag = priv->steering_tag; > + *ph = priv->ph; > + return 0; > +} > + > static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, > struct sg_table *sgt, > enum dma_data_direction dir) > @@ -101,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) > > static const struct dma_buf_ops vfio_pci_dmabuf_ops = { > .attach = vfio_pci_dma_buf_attach, > + .get_tph = vfio_pci_dma_buf_get_tph, > .map_dma_buf = vfio_pci_dma_buf_map, > .unmap_dma_buf = vfio_pci_dma_buf_unmap, > .release = vfio_pci_dma_buf_release, > @@ -331,6 +351,55 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > return ret; > } > > +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, > + u32 flags, > + struct vfio_device_feature_dma_buf_tph __user *arg, > + size_t argsz) > +{ > + struct vfio_device_feature_dma_buf_tph set_tph; > + struct vfio_pci_dma_buf *priv; > + struct dma_buf *dmabuf; > + int ret; > + > + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, > + sizeof(set_tph)); > + if (ret != 1) > + return ret; > + > + if (copy_from_user(&set_tph, arg, sizeof(set_tph))) > + return -EFAULT; > + > + if (set_tph.reserved) > + return -EINVAL; > + > + dmabuf = dma_buf_get(set_tph.dmabuf_fd); > + if (IS_ERR(dmabuf)) > + return PTR_ERR(dmabuf); > + > + if (dmabuf->ops != &vfio_pci_dmabuf_ops) { > + ret = -EINVAL; > + goto out_put; > + } > + > + priv = dmabuf->priv; > + down_write(&vdev->memory_lock); > + if (priv->vdev != vdev) { > + ret = -EINVAL; > + goto out_unlock; > + } > + > + priv->steering_tag = set_tph.steering_tag; > + priv->ph = set_tph.ph; > + priv->tph_present = 1; > + ret = 0; > + > +out_unlock: > + up_write(&vdev->memory_lock); > +out_put: > + dma_buf_put(dmabuf); > + return ret; > +} > + > void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) > { > struct vfio_pci_dma_buf *priv; > diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h > --- a/drivers/vfio/pci/vfio_pci_priv.h > +++ b/drivers/vfio/pci/vfio_pci_priv.h > @@ -118,6 +118,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) > int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > struct vfio_device_feature_dma_buf __user *arg, > size_t argsz); > +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, > + u32 flags, > + struct vfio_device_feature_dma_buf_tph __user *arg, > + size_t argsz); > void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); > void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); > #else > @@ -128,6 +132,13 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > { > return -ENOTTY; > } > +static inline int > +vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, u32 flags, > + struct vfio_device_feature_dma_buf_tph __user *arg, > + size_t argsz) > +{ > + return -ENOTTY; > +} > static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) > { > } > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -113,6 +113,23 @@ struct dma_buf_ops { > */ > void (*unpin)(struct dma_buf_attachment *attach); > > + /** > + * @get_tph: > + * @dmabuf: DMA buffer for which to retrieve TPH metadata > + * @steering_tag: Returns the raw TPH steering tag > + * @ph: Returns the TPH processing hint > + * @st_width: Consumer's supported steering tag width in bits (8 or 16) > + * > + * Return the TPH (TLP Processing Hints) metadata associated with this > + * DMA buffer. Exporters that do not provide TPH metadata should return > + * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return > + * -EINVAL. > + * > + * This callback is optional. > + */ > + int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph, > + u8 st_width); > + > /** > * @map_dma_buf: > * > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -1534,6 +1534,28 @@ struct vfio_device_feature_dma_buf { > */ > #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 > > +/** > + * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata > + * with a vfio-exported dma-buf. The dma-buf must have been created by > + * VFIO_DEVICE_FEATURE_DMA_BUF on this device. > + * > + * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF. > + * steering_tag and ph are the raw TPH values that importing drivers should use > + * when accessing the buffer. > + * > + * The user must set TPH on the dma-buf before the importer consumes it. > + * > + * Return: 0 on success, -errno on failure. > + */ > +#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13 > + > +struct vfio_device_feature_dma_buf_tph { > + __s32 dmabuf_fd; > + __u16 steering_tag; > + __u8 ph; > + __u8 reserved; > +}; > + > /* -------- API for Type1 VFIO IOMMU -------- */ > > /** ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature 2026-05-04 21:44 ` Alex Williamson @ 2026-05-05 6:54 ` Zhiping Zhang 0 siblings, 0 replies; 9+ messages in thread From: Zhiping Zhang @ 2026-05-05 6:54 UTC (permalink / raw) To: Alex Williamson Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, kvm On Mon, May 4, 2026 at 2:45 PM Alex Williamson <alex@shazbot.org> wrote: > > > > On Thu, 30 Apr 2026 13:06:56 -0700 > Zhiping Zhang <zhipingz@meta.com> wrote: > > > Add a dma-buf callback that returns raw TPH metadata from the exporter > > so peer devices can reuse the steering tag and processing hint > > associated with a VFIO-exported buffer. > > > > Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from > > VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing > > hint, validates the fd is a vfio-exported dma-buf belonging to this > > device, and stores the TPH values under memory_lock. This keeps the > > existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged. > > > > The user sequences setting TPH on the dma-buf before the importer > > consumes it. > > > > Add an st_width parameter to get_tph() so the exporter can reject > > steering tags that exceed the consumer's supported width (8 vs 16 bit). > > When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP. > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > The uAPI is better, but sashiko has some review comments[1] for you. > > Please also copy the kvm list for vfio related development. Thanks, > > Alex Got it, thanks Alex. let me check sashiko's comments and post a new revision. i also copied kvm@vger.kernel.org and will include in future revisions. Zhiping > > [1]https://urldefense.com/v3/__https://sashiko.dev/*/patchset/20260430200704.352228-1-zhipingz@meta.com__;Iw!!Bt8RZUm9aw!7glmqoMRhcdDwOgCAQuuEVqlhFJrh9bAYHXvicXPAO2M-k-NPwE_wFeUjVhe7EXbkXMd6g7eOe13$ > > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > @@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, > > return vfio_pci_core_feature_token(vdev, flags, arg, argsz); > > case VFIO_DEVICE_FEATURE_DMA_BUF: > > return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); > > + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH: > > + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg, > > + argsz); > > default: > > return -ENOTTY; > > } > > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c > > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c > > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c > > @@ -19,6 +19,9 @@ struct vfio_pci_dma_buf { > > u32 nr_ranges; > > struct kref kref; > > struct completion comp; > > + u16 steering_tag; > > + u8 ph; > > + u8 tph_present : 1; > > u8 revoked : 1; > > }; > > > > @@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, > > return ret; > > } > > > > +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag, > > + u8 *ph, u8 st_width) > > +{ > > + struct vfio_pci_dma_buf *priv = dmabuf->priv; > > + > > + if (!priv->tph_present) > > + return -EOPNOTSUPP; > > + > > + if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1)) > > + return -EINVAL; > > + > > + *steering_tag = priv->steering_tag; > > + *ph = priv->ph; > > + return 0; > > +} > > + > > static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, > > struct sg_table *sgt, > > enum dma_data_direction dir) > > @@ -101,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) > > > > static const struct dma_buf_ops vfio_pci_dmabuf_ops = { > > .attach = vfio_pci_dma_buf_attach, > > + .get_tph = vfio_pci_dma_buf_get_tph, > > .map_dma_buf = vfio_pci_dma_buf_map, > > .unmap_dma_buf = vfio_pci_dma_buf_unmap, > > .release = vfio_pci_dma_buf_release, > > @@ -331,6 +351,55 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > > return ret; > > } > > > > +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, > > + u32 flags, > > + struct vfio_device_feature_dma_buf_tph __user *arg, > > + size_t argsz) > > +{ > > + struct vfio_device_feature_dma_buf_tph set_tph; > > + struct vfio_pci_dma_buf *priv; > > + struct dma_buf *dmabuf; > > + int ret; > > + > > + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, > > + sizeof(set_tph)); > > + if (ret != 1) > > + return ret; > > + > > + if (copy_from_user(&set_tph, arg, sizeof(set_tph))) > > + return -EFAULT; > > + > > + if (set_tph.reserved) > > + return -EINVAL; > > + > > + dmabuf = dma_buf_get(set_tph.dmabuf_fd); > > + if (IS_ERR(dmabuf)) > > + return PTR_ERR(dmabuf); > > + > > + if (dmabuf->ops != &vfio_pci_dmabuf_ops) { > > + ret = -EINVAL; > > + goto out_put; > > + } > > + > > + priv = dmabuf->priv; > > + down_write(&vdev->memory_lock); > > + if (priv->vdev != vdev) { > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > + > > + priv->steering_tag = set_tph.steering_tag; > > + priv->ph = set_tph.ph; > > + priv->tph_present = 1; > > + ret = 0; > > + > > +out_unlock: > > + up_write(&vdev->memory_lock); > > +out_put: > > + dma_buf_put(dmabuf); > > + return ret; > > +} > > + > > void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) > > { > > struct vfio_pci_dma_buf *priv; > > diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h > > --- a/drivers/vfio/pci/vfio_pci_priv.h > > +++ b/drivers/vfio/pci/vfio_pci_priv.h > > @@ -118,6 +118,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) > > int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > > struct vfio_device_feature_dma_buf __user *arg, > > size_t argsz); > > +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, > > + u32 flags, > > + struct vfio_device_feature_dma_buf_tph __user *arg, > > + size_t argsz); > > void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); > > void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); > > #else > > @@ -128,6 +132,13 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > > { > > return -ENOTTY; > > } > > +static inline int > > +vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, u32 flags, > > + struct vfio_device_feature_dma_buf_tph __user *arg, > > + size_t argsz) > > +{ > > + return -ENOTTY; > > +} > > static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) > > { > > } > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > > --- a/include/linux/dma-buf.h > > +++ b/include/linux/dma-buf.h > > @@ -113,6 +113,23 @@ struct dma_buf_ops { > > */ > > void (*unpin)(struct dma_buf_attachment *attach); > > > > + /** > > + * @get_tph: > > + * @dmabuf: DMA buffer for which to retrieve TPH metadata > > + * @steering_tag: Returns the raw TPH steering tag > > + * @ph: Returns the TPH processing hint > > + * @st_width: Consumer's supported steering tag width in bits (8 or 16) > > + * > > + * Return the TPH (TLP Processing Hints) metadata associated with this > > + * DMA buffer. Exporters that do not provide TPH metadata should return > > + * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return > > + * -EINVAL. > > + * > > + * This callback is optional. > > + */ > > + int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph, > > + u8 st_width); > > + > > /** > > * @map_dma_buf: > > * > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -1534,6 +1534,28 @@ struct vfio_device_feature_dma_buf { > > */ > > #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 > > > > +/** > > + * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata > > + * with a vfio-exported dma-buf. The dma-buf must have been created by > > + * VFIO_DEVICE_FEATURE_DMA_BUF on this device. > > + * > > + * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF. > > + * steering_tag and ph are the raw TPH values that importing drivers should use > > + * when accessing the buffer. > > + * > > + * The user must set TPH on the dma-buf before the importer consumes it. > > + * > > + * Return: 0 on success, -errno on failure. > > + */ > > +#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13 > > + > > +struct vfio_device_feature_dma_buf_tph { > > + __s32 dmabuf_fd; > > + __u16 steering_tag; > > + __u8 ph; > > + __u8 reserved; > > +}; > > + > > /* -------- API for Type1 VFIO IOMMU -------- */ > > > > /** > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature 2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang 2026-05-04 21:44 ` Alex Williamson @ 2026-05-06 6:58 ` fengchengwen 2026-05-06 18:23 ` Zhiping Zhang 1 sibling, 1 reply; 9+ messages in thread From: fengchengwen @ 2026-05-06 6:58 UTC (permalink / raw) To: Zhiping Zhang, Alex Williamson, Jason Gunthorpe, Leon Romanovsky Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas On 5/1/2026 4:06 AM, Zhiping Zhang wrote: > Add a dma-buf callback that returns raw TPH metadata from the exporter > so peer devices can reuse the steering tag and processing hint > associated with a VFIO-exported buffer. > > Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from > VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing > hint, validates the fd is a vfio-exported dma-buf belonging to this > device, and stores the TPH values under memory_lock. This keeps the > existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged. > > The user sequences setting TPH on the dma-buf before the importer > consumes it. > > Add an st_width parameter to get_tph() so the exporter can reject > steering tags that exceed the consumer's supported width (8 vs 16 bit). > When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, > return vfio_pci_core_feature_token(vdev, flags, arg, argsz); > case VFIO_DEVICE_FEATURE_DMA_BUF: > return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); > + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH: > + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg, > + argsz); > default: > return -ENOTTY; > } > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c > @@ -19,6 +19,9 @@ struct vfio_pci_dma_buf { > u32 nr_ranges; > struct kref kref; > struct completion comp; > + u16 steering_tag; > + u8 ph; > + u8 tph_present : 1; > u8 revoked : 1; > }; > > @@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, > return ret; > } > > +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag, > + u8 *ph, u8 st_width) > +{ > + struct vfio_pci_dma_buf *priv = dmabuf->priv; > + > + if (!priv->tph_present) > + return -EOPNOTSUPP; > + > + if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1)) > + return -EINVAL; The checker will failed in following cases: 1. If the exporter passed 8bit st, and importer support 16bit st, then it will pass the checker. 2. The exporter enabled 16bit st and its st is < 256 (note: the pcie protocol doesn't restrict 16bit-st must >=256), and importer only support 8bit st, then it will also pass the checker Suggest userspace passing both st(8bit) and extend-st(16bit), and importer chose the right one. > + > + *steering_tag = priv->steering_tag; > + *ph = priv->ph; > + return 0; > +} > + > static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, > struct sg_table *sgt, > enum dma_data_direction dir) > @@ -101,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) > > static const struct dma_buf_ops vfio_pci_dmabuf_ops = { > .attach = vfio_pci_dma_buf_attach, > + .get_tph = vfio_pci_dma_buf_get_tph, > .map_dma_buf = vfio_pci_dma_buf_map, > .unmap_dma_buf = vfio_pci_dma_buf_unmap, > .release = vfio_pci_dma_buf_release, > @@ -331,6 +351,55 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > return ret; > } > > +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, > + u32 flags, > + struct vfio_device_feature_dma_buf_tph __user *arg, > + size_t argsz) > +{ > + struct vfio_device_feature_dma_buf_tph set_tph; > + struct vfio_pci_dma_buf *priv; > + struct dma_buf *dmabuf; > + int ret; > + > + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, > + sizeof(set_tph)); > + if (ret != 1) > + return ret; > + > + if (copy_from_user(&set_tph, arg, sizeof(set_tph))) > + return -EFAULT; > + > + if (set_tph.reserved) > + return -EINVAL; > + > + dmabuf = dma_buf_get(set_tph.dmabuf_fd); > + if (IS_ERR(dmabuf)) > + return PTR_ERR(dmabuf); > + > + if (dmabuf->ops != &vfio_pci_dmabuf_ops) { > + ret = -EINVAL; > + goto out_put; > + } > + > + priv = dmabuf->priv; > + down_write(&vdev->memory_lock); > + if (priv->vdev != vdev) { > + ret = -EINVAL; > + goto out_unlock; > + } > + > + priv->steering_tag = set_tph.steering_tag; > + priv->ph = set_tph.ph; > + priv->tph_present = 1; > + ret = 0; > + > +out_unlock: > + up_write(&vdev->memory_lock); > +out_put: > + dma_buf_put(dmabuf); > + return ret; > +} > + > void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) > { > struct vfio_pci_dma_buf *priv; > diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h > --- a/drivers/vfio/pci/vfio_pci_priv.h > +++ b/drivers/vfio/pci/vfio_pci_priv.h > @@ -118,6 +118,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) > int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > struct vfio_device_feature_dma_buf __user *arg, > size_t argsz); > +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, > + u32 flags, > + struct vfio_device_feature_dma_buf_tph __user *arg, > + size_t argsz); > void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); > void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); > #else > @@ -128,6 +132,13 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > { > return -ENOTTY; > } > +static inline int > +vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, u32 flags, > + struct vfio_device_feature_dma_buf_tph __user *arg, > + size_t argsz) > +{ > + return -ENOTTY; > +} > static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) > { > } > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -113,6 +113,23 @@ struct dma_buf_ops { > */ > void (*unpin)(struct dma_buf_attachment *attach); > > + /** > + * @get_tph: > + * @dmabuf: DMA buffer for which to retrieve TPH metadata > + * @steering_tag: Returns the raw TPH steering tag > + * @ph: Returns the TPH processing hint > + * @st_width: Consumer's supported steering tag width in bits (8 or 16) > + * > + * Return the TPH (TLP Processing Hints) metadata associated with this > + * DMA buffer. Exporters that do not provide TPH metadata should return > + * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return > + * -EINVAL. > + * > + * This callback is optional. > + */ > + int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph, > + u8 st_width); > + > /** > * @map_dma_buf: > * > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -1534,6 +1534,28 @@ struct vfio_device_feature_dma_buf { > */ > #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 > > +/** > + * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata > + * with a vfio-exported dma-buf. The dma-buf must have been created by > + * VFIO_DEVICE_FEATURE_DMA_BUF on this device. > + * > + * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF. > + * steering_tag and ph are the raw TPH values that importing drivers should use > + * when accessing the buffer. > + * > + * The user must set TPH on the dma-buf before the importer consumes it. > + * > + * Return: 0 on success, -errno on failure. > + */ > +#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13 > + > +struct vfio_device_feature_dma_buf_tph { > + __s32 dmabuf_fd; > + __u16 steering_tag; > + __u8 ph; > + __u8 reserved; > +}; > + > /* -------- API for Type1 VFIO IOMMU -------- */ > > /** > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature 2026-05-06 6:58 ` fengchengwen @ 2026-05-06 18:23 ` Zhiping Zhang 0 siblings, 0 replies; 9+ messages in thread From: Zhiping Zhang @ 2026-05-06 18:23 UTC (permalink / raw) To: fengchengwen Cc: Alex Williamson, Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, kvm On Tue, May 5, 2026 at 11:58 PM fengchengwen <fengchengwen@huawei.com> wrote: > > > > On 5/1/2026 4:06 AM, Zhiping Zhang wrote: > > Add a dma-buf callback that returns raw TPH metadata from the exporter > > so peer devices can reuse the steering tag and processing hint > > associated with a VFIO-exported buffer. > > > > Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from > > VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing > > hint, validates the fd is a vfio-exported dma-buf belonging to this > > device, and stores the TPH values under memory_lock. This keeps the > > existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged. > > > > The user sequences setting TPH on the dma-buf before the importer > > consumes it. > > > > Add an st_width parameter to get_tph() so the exporter can reject > > steering tags that exceed the consumer's supported width (8 vs 16 bit). > > When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP. > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > @@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, > > return vfio_pci_core_feature_token(vdev, flags, arg, argsz); > > case VFIO_DEVICE_FEATURE_DMA_BUF: > > return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); > > + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH: > > + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg, > > + argsz); > > default: > > return -ENOTTY; > > } > > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c > > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c > > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c > > @@ -19,6 +19,9 @@ struct vfio_pci_dma_buf { > > u32 nr_ranges; > > struct kref kref; > > struct completion comp; > > + u16 steering_tag; > > + u8 ph; > > + u8 tph_present : 1; > > u8 revoked : 1; > > }; > > > > @@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, > > return ret; > > } > > > > +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag, > > + u8 *ph, u8 st_width) > > +{ > > + struct vfio_pci_dma_buf *priv = dmabuf->priv; > > + > > + if (!priv->tph_present) > > + return -EOPNOTSUPP; > > + > > + if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1)) > > + return -EINVAL; > > The checker will failed in following cases: > 1. If the exporter passed 8bit st, and importer support 16bit st, then it will pass > the checker. > 2. The exporter enabled 16bit st and its st is < 256 (note: the pcie protocol doesn't > restrict 16bit-st must >=256), and importer only support 8bit st, then it will also > pass the checker > > Suggest userspace passing both st(8bit) and extend-st(16bit), and importer chose the > right one. > Agreed — 8-bit ST and 16-bit Extended ST are distinct namespaces (firmware returns them as separate fields with separate validity bits), so a numeric range check is insufficient. For v3 I'll change the uAPI to carry both, gated by a flags field: #define VFIO_DMA_BUF_TPH_ST (1 << 0) /* steering_tag valid */ #define VFIO_DMA_BUF_TPH_ST_EXT (1 << 1) /* steering_tag_ext valid */ struct vfio_device_feature_dma_buf_tph { __s32 dmabuf_fd; __u32 flags; __u16 steering_tag; /* 8-bit ST */ __u16 steering_tag_ext; /* 16-bit Extended ST */ __u8 ph; __u8 reserved[3]; }; get_tph() then picks the field matching the importer's st_width and returns -EOPNOTSUPP if that one isn't valid. Thanks, Zhiping ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr 2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang 2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang @ 2026-04-30 20:06 ` Zhiping Zhang 2026-05-06 7:04 ` fengchengwen 1 sibling, 1 reply; 9+ messages in thread From: Zhiping Zhang @ 2026-04-30 20:06 UTC (permalink / raw) To: Alex Williamson, Jason Gunthorpe, Leon Romanovsky Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang Query dma-buf TPH metadata when registering a dma-buf MR for peer to peer access and translate the raw steering tag into an mlx5 steering tag index. Factor mlx5_st_alloc_index() so callers that already have a raw steering tag can allocate the corresponding mlx5 index directly. Keep the DMAH path as the first priority and only fall back to dma-buf metadata when no DMAH is supplied. Pass the device's supported ST width (8 or 16 bit, derived from pdev->tph_req_type) to get_tph() so the exporter can reject tags that exceed the consumer's capability. Initialize ret in mlx5_st_create() so the cached steering-tag path returns success cleanly under clang builds. Signed-off-by: Zhiping Zhang <zhipingz@meta.com> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -46,6 +46,8 @@ #include "data_direct.h" #include "dmah.h" +MODULE_IMPORT_NS("DMA_BUF"); + static int mkey_max_umr_order(struct mlx5_ib_dev *dev) { if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) @@ -899,6 +901,40 @@ static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = { .invalidate_mappings = mlx5_ib_dmabuf_invalidate_cb, }; +static void get_tph_mr_dmabuf(struct mlx5_ib_dev *dev, int fd, u16 *st_index, + u8 *ph) +{ + struct pci_dev *pdev = dev->mdev->pdev; + struct dma_buf *dmabuf; + u16 steering_tag; + u8 st_width; + int ret; + + st_width = (pdev->tph_req_type == PCI_TPH_REQ_EXT_TPH) ? 16 : 8; + + dmabuf = dma_buf_get(fd); + if (IS_ERR(dmabuf)) + return; + + if (!dmabuf->ops->get_tph) + goto end_dbuf_put; + + ret = dmabuf->ops->get_tph(dmabuf, &steering_tag, ph, st_width); + if (ret) { + mlx5_ib_dbg(dev, "get_tph failed (%d)\n", ret); + goto end_dbuf_put; + } + + ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index); + if (ret) { + *ph = MLX5_IB_NO_PH; + mlx5_ib_dbg(dev, "st_alloc_index_by_tag failed (%d)\n", ret); + } + +end_dbuf_put: + dma_buf_put(dmabuf); +} + static struct ib_mr * reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, u64 offset, u64 length, u64 virt_addr, @@ -941,6 +977,8 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, ph = dmah->ph; if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) st_index = mdmah->st_index; + } else { + get_tph_mr_dmabuf(dev, fd, &st_index, &ph); } mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c @@ -29,7 +29,7 @@ struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev) u8 direct_mode = 0; u16 num_entries; u32 tbl_loc; - int ret; + int ret = 0; if (!MLX5_CAP_GEN(dev, mkey_pcie_tph)) return NULL; @@ -92,23 +92,18 @@ void mlx5_st_destroy(struct mlx5_core_dev *dev) kfree(st); } -int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, - unsigned int cpu_uid, u16 *st_index) +int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag, + u16 *st_index) { struct mlx5_st_idx_data *idx_data; struct mlx5_st *st = dev->st; unsigned long index; u32 xa_id; - u16 tag; - int ret; + int ret = 0; if (!st) return -EOPNOTSUPP; - ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); - if (ret) - return ret; - if (st->direct_mode) { *st_index = tag; return 0; @@ -152,6 +147,20 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, mutex_unlock(&st->lock); return ret; } +EXPORT_SYMBOL_GPL(mlx5_st_alloc_index_by_tag); + +int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, + unsigned int cpu_uid, u16 *st_index) +{ + u16 tag; + int ret; + + ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); + if (ret) + return ret; + + return mlx5_st_alloc_index_by_tag(dev, tag, st_index); +} EXPORT_SYMBOL_GPL(mlx5_st_alloc_index); int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1166,10 +1166,17 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type u64 length, u16 uid, phys_addr_t addr, u32 obj_id); #ifdef CONFIG_PCIE_TPH +int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag, + u16 *st_index); int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, unsigned int cpu_uid, u16 *st_index); int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index); #else +static inline int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, + u16 tag, u16 *st_index) +{ + return -EOPNOTSUPP; +} static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, unsigned int cpu_uid, u16 *st_index) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr 2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang @ 2026-05-06 7:04 ` fengchengwen 2026-05-06 18:13 ` Zhiping Zhang 0 siblings, 1 reply; 9+ messages in thread From: fengchengwen @ 2026-05-06 7:04 UTC (permalink / raw) To: Zhiping Zhang, Alex Williamson, Jason Gunthorpe, Leon Romanovsky Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas On 5/1/2026 4:06 AM, Zhiping Zhang wrote: > Query dma-buf TPH metadata when registering a dma-buf MR for peer to > peer access and translate the raw steering tag into an mlx5 steering tag > index. Factor mlx5_st_alloc_index() so callers that already have a raw > steering tag can allocate the corresponding mlx5 index directly. Keep the > DMAH path as the first priority and only fall back to dma-buf metadata when > no DMAH is supplied. > > Pass the device's supported ST width (8 or 16 bit, derived from > pdev->tph_req_type) to get_tph() so the exporter can reject tags that > exceed the consumer's capability. Initialize ret in mlx5_st_create() so the > cached steering-tag path returns success cleanly under clang builds. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -46,6 +46,8 @@ > #include "data_direct.h" > #include "dmah.h" > > +MODULE_IMPORT_NS("DMA_BUF"); > + > static int mkey_max_umr_order(struct mlx5_ib_dev *dev) > { > if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) > @@ -899,6 +901,40 @@ static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = { > .invalidate_mappings = mlx5_ib_dmabuf_invalidate_cb, > }; > > +static void get_tph_mr_dmabuf(struct mlx5_ib_dev *dev, int fd, u16 *st_index, > + u8 *ph) > +{ > + struct pci_dev *pdev = dev->mdev->pdev; > + struct dma_buf *dmabuf; > + u16 steering_tag; > + u8 st_width; > + int ret; > + > + st_width = (pdev->tph_req_type == PCI_TPH_REQ_EXT_TPH) ? 16 : 8; The tph_req_type is defined under CONFIG_PCIE_TPH, how about add a wrap function to query it. > + > + dmabuf = dma_buf_get(fd); > + if (IS_ERR(dmabuf)) > + return; > + > + if (!dmabuf->ops->get_tph) > + goto end_dbuf_put; > + > + ret = dmabuf->ops->get_tph(dmabuf, &steering_tag, ph, st_width); > + if (ret) { > + mlx5_ib_dbg(dev, "get_tph failed (%d)\n", ret); > + goto end_dbuf_put; > + } > + > + ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index); > + if (ret) { > + *ph = MLX5_IB_NO_PH; > + mlx5_ib_dbg(dev, "st_alloc_index_by_tag failed (%d)\n", ret); > + } > + > +end_dbuf_put: > + dma_buf_put(dmabuf); > +} > + > static struct ib_mr * > reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, > u64 offset, u64 length, u64 virt_addr, > @@ -941,6 +977,8 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, > ph = dmah->ph; > if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) > st_index = mdmah->st_index; > + } else { > + get_tph_mr_dmabuf(dev, fd, &st_index, &ph); > } > > mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr, > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c > --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c > @@ -29,7 +29,7 @@ struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev) > u8 direct_mode = 0; > u16 num_entries; > u32 tbl_loc; > - int ret; > + int ret = 0; > > if (!MLX5_CAP_GEN(dev, mkey_pcie_tph)) > return NULL; > @@ -92,23 +92,18 @@ void mlx5_st_destroy(struct mlx5_core_dev *dev) > kfree(st); > } > > -int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, > - unsigned int cpu_uid, u16 *st_index) > +int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag, > + u16 *st_index) > { > struct mlx5_st_idx_data *idx_data; > struct mlx5_st *st = dev->st; > unsigned long index; > u32 xa_id; > - u16 tag; > - int ret; > + int ret = 0; > > if (!st) > return -EOPNOTSUPP; > > - ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); > - if (ret) > - return ret; > - > if (st->direct_mode) { > *st_index = tag; > return 0; > @@ -152,6 +147,20 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, > mutex_unlock(&st->lock); > return ret; > } > +EXPORT_SYMBOL_GPL(mlx5_st_alloc_index_by_tag); > + > +int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, > + unsigned int cpu_uid, u16 *st_index) > +{ > + u16 tag; > + int ret; > + > + ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); > + if (ret) > + return ret; > + > + return mlx5_st_alloc_index_by_tag(dev, tag, st_index); > +} > EXPORT_SYMBOL_GPL(mlx5_st_alloc_index); > > int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index) > diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h > --- a/include/linux/mlx5/driver.h > +++ b/include/linux/mlx5/driver.h > @@ -1166,10 +1166,17 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type > u64 length, u16 uid, phys_addr_t addr, u32 obj_id); > > #ifdef CONFIG_PCIE_TPH > +int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag, > + u16 *st_index); > int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, > unsigned int cpu_uid, u16 *st_index); > int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index); > #else > +static inline int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, > + u16 tag, u16 *st_index) > +{ > + return -EOPNOTSUPP; > +} > static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev, > enum tph_mem_type mem_type, > unsigned int cpu_uid, u16 *st_index) > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr 2026-05-06 7:04 ` fengchengwen @ 2026-05-06 18:13 ` Zhiping Zhang 0 siblings, 0 replies; 9+ messages in thread From: Zhiping Zhang @ 2026-05-06 18:13 UTC (permalink / raw) To: fengchengwen Cc: Alex Williamson, Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, kvm On Wed, May 6, 2026 at 12:04 AM fengchengwen <fengchengwen@huawei.com> wrote: > > > > On 5/1/2026 4:06 AM, Zhiping Zhang wrote: > > Query dma-buf TPH metadata when registering a dma-buf MR for peer to > > peer access and translate the raw steering tag into an mlx5 steering tag > > index. Factor mlx5_st_alloc_index() so callers that already have a raw > > steering tag can allocate the corresponding mlx5 index directly. Keep the > > DMAH path as the first priority and only fall back to dma-buf metadata when > > no DMAH is supplied. > > > > Pass the device's supported ST width (8 or 16 bit, derived from > > pdev->tph_req_type) to get_tph() so the exporter can reject tags that > > exceed the consumer's capability. Initialize ret in mlx5_st_create() so the > > cached steering-tag path returns success cleanly under clang builds. > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > > > diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c > > --- a/drivers/infiniband/hw/mlx5/mr.c > > +++ b/drivers/infiniband/hw/mlx5/mr.c > > @@ -46,6 +46,8 @@ > > #include "data_direct.h" > > #include "dmah.h" > > > > +MODULE_IMPORT_NS("DMA_BUF"); > > + > > static int mkey_max_umr_order(struct mlx5_ib_dev *dev) > > { > > if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) > > @@ -899,6 +901,40 @@ static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = { > > .invalidate_mappings = mlx5_ib_dmabuf_invalidate_cb, > > }; > > > > +static void get_tph_mr_dmabuf(struct mlx5_ib_dev *dev, int fd, u16 *st_index, > > + u8 *ph) > > +{ > > + struct pci_dev *pdev = dev->mdev->pdev; > > + struct dma_buf *dmabuf; > > + u16 steering_tag; > > + u8 st_width; > > + int ret; > > + > > + st_width = (pdev->tph_req_type == PCI_TPH_REQ_EXT_TPH) ? 16 : 8; > > The tph_req_type is defined under CONFIG_PCIE_TPH, how about add a wrap function > to query it. > Good catch! so the direct dereference here will break the build when TPH is disabled. I'll add a small wrapper in include/linux/pci-tph.h alongside the existing helpers, e.g.: #ifdef CONFIG_PCIE_TPH u8 pcie_tph_get_st_width(struct pci_dev *pdev); #else static inline u8 pcie_tph_get_st_width(struct pci_dev *pdev) { return 0; } #endif with the implementation in drivers/pci/pcie/tph.c returning 16 for PCI_TPH_REQ_EXT_TPH and 8 otherwise. Then get_tph_mr_dmabuf() becomes: st_width = pcie_tph_get_st_width(pdev); if (!st_width) goto end_dbuf_put; which also gives us a clean early-out when TPH isn't supported on the device. Will fix in v3. Thanks, Zhiping ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-06 18:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang 2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang 2026-05-04 21:44 ` Alex Williamson 2026-05-05 6:54 ` Zhiping Zhang 2026-05-06 6:58 ` fengchengwen 2026-05-06 18:23 ` Zhiping Zhang 2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang 2026-05-06 7:04 ` fengchengwen 2026-05-06 18:13 ` Zhiping Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox