public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
       [not found] ` <20260324234615.3731237-2-zhipingz@meta.com>
@ 2026-03-25  8:25   ` Leon Romanovsky
  2026-03-26 22:41     ` Keith Busch
  2026-03-28  2:21   ` fengchengwen
  1 sibling, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-25  8:25 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Bjorn Helgaas

On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> This patch adds a callback to get the tph info on DMA buffer exporters.
> The tph info includes both the steering tag and the process hint (ph).
> 
> The steering tag and ph are encoded in the flags field of
> vfio_device_feature_dma_buf instead of adding new fields to the uapi
> struct, to preserve ABI compatibility.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 26 ++++++++++++++++++++++++--
>  include/linux/dma-buf.h            | 30 ++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h          |  9 +++++++--
>  3 files changed, 61 insertions(+), 4 deletions(-)

<...>

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..e2a8962641d2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1505,8 +1505,13 @@ struct vfio_region_dma_range {
>  struct vfio_device_feature_dma_buf {
>  	__u32	region_index;
>  	__u32	open_flags;
> -	__u32   flags;
> -	__u32   nr_ranges;
> +	__u32	flags;
> +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U

This extension of flags is basically kills future extension of this
struct for anything that includes TPH.

Add new
enum vfio_device_feature_dma_buf_flags {
    VFIO_DMABUF_FL_TPH  = 1 << 0
}

> +	__u32	nr_ranges;

add your "__u16 steering_tag" and "__u8 ph" fields here.

>  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
>  };
> 
> --
> 2.52.0
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-25  8:25   ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Leon Romanovsky
@ 2026-03-26 22:41     ` Keith Busch
  2026-03-26 22:55       ` Zhiping Zhang
  2026-03-31  8:37       ` Leon Romanovsky
  0 siblings, 2 replies; 14+ messages in thread
From: Keith Busch @ 2026-03-26 22:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Wed, Mar 25, 2026 at 10:25:34AM +0200, Leon Romanovsky wrote:
> On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> >  struct vfio_device_feature_dma_buf {
> >  	__u32	region_index;
> >  	__u32	open_flags;
> > -	__u32   flags;
> > -	__u32   nr_ranges;
> > +	__u32	flags;
> > +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> > +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> > +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> > +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> > +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U
> 
> This extension of flags is basically kills future extension of this
> struct for anything that includes TPH.
> 
> Add new
> enum vfio_device_feature_dma_buf_flags {
>     VFIO_DMABUF_FL_TPH  = 1 << 0
> }
> 
> > +	__u32	nr_ranges;
> 
> add your "__u16 steering_tag" and "__u8 ph" fields here.

You're suggesting that Ziping append the new fields to the end of this
struct? I don't think we can modify the layout of a uapi.

If we can't carve the space for this out of the existing unused flags
field, I think we'd have to introduce a new vfio device feature that
basically copies VFIO_DEVICE_FEATURE_DMA_BUF with the extra hints
fields.
 
> >  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> >  };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-26 22:41     ` Keith Busch
@ 2026-03-26 22:55       ` Zhiping Zhang
  2026-03-31  8:39         ` Leon Romanovsky
  2026-03-31  8:37       ` Leon Romanovsky
  1 sibling, 1 reply; 14+ messages in thread
From: Zhiping Zhang @ 2026-03-26 22:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: Leon Romanovsky, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Thu, Mar 26, 2026 at 3:41 PM Keith Busch <kbusch@kernel.org> wrote:
>
> >
> On Wed, Mar 25, 2026 at 10:25:34AM +0200, Leon Romanovsky wrote:
> > On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> > >  struct vfio_device_feature_dma_buf {
> > >     __u32   region_index;
> > >     __u32   open_flags;
> > > -   __u32   flags;
> > > -   __u32   nr_ranges;
> > > +   __u32   flags;
> > > +#define VFIO_DMABUF_FL_TPH         (1U << 0) /* TPH info is present */
> > > +#define VFIO_DMABUF_TPH_PH_SHIFT   1         /* bits 1-2: PH (2-bit) */
> > > +#define VFIO_DMABUF_TPH_PH_MASK    0x6U
> > > +#define VFIO_DMABUF_TPH_ST_SHIFT   16        /* bits 16-31: steering tag */
> > > +#define VFIO_DMABUF_TPH_ST_MASK            0xffff0000U
> >
> > This extension of flags is basically kills future extension of this
> > struct for anything that includes TPH.
> >
> > Add new
> > enum vfio_device_feature_dma_buf_flags {
> >     VFIO_DMABUF_FL_TPH  = 1 << 0
> > }

yes we can do that.

> >
> > > +   __u32   nr_ranges;
> >
> > add your "__u16 steering_tag" and "__u8 ph" fields here.
>
That is what I did in V1, Leon.

> You're suggesting that Ziping append the new fields to the end of this
> struct? I don't think we can modify the layout of a uapi.
>
> If we can't carve the space for this out of the existing unused flags
> field, I think we'd have to introduce a new vfio device feature that
> basically copies VFIO_DEVICE_FEATURE_DMA_BUF with the extra hints
> fields.
>
if not using the fields in the flag, then we probably have to
introduce a new vfio
device feature.

> > >     struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> > >  };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
       [not found] ` <20260324234615.3731237-2-zhipingz@meta.com>
  2026-03-25  8:25   ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Leon Romanovsky
@ 2026-03-28  2:21   ` fengchengwen
  2026-03-31  0:49     ` Zhiping Zhang
  1 sibling, 1 reply; 14+ messages in thread
From: fengchengwen @ 2026-03-28  2:21 UTC (permalink / raw)
  To: Zhiping Zhang, Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas,
	linux-rdma, linux-pci, netdev, dri-devel, Keith Busch,
	Yochai Cohen, Yishai Hadas
  Cc: Bjorn Helgaas

Hi Zhiping,

On 3/25/2026 7:46 AM, Zhiping Zhang wrote:
> This patch adds a callback to get the tph info on DMA buffer exporters.
> The tph info includes both the steering tag and the process hint (ph).
> 
> The steering tag and ph are encoded in the flags field of
> vfio_device_feature_dma_buf instead of adding new fields to the uapi
> struct, to preserve ABI compatibility.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 26 ++++++++++++++++++++++++--
>  include/linux/dma-buf.h            | 30 ++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h          |  9 +++++++--
>  3 files changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 478beafc6ac3..c45cb3884b85 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
>  	struct phys_vec *phys_vec;
>  	struct p2pdma_provider *provider;
>  	u32 nr_ranges;
> +	u16 steering_tag;
> +	u8 ph;
>  	u8 revoked : 1;
>  };
> 
> @@ -60,6 +62,15 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
>  				       priv->size, dir);
>  }
> 
> +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
> +				    u8 *ph)
> +{
> +	struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +	*steering_tag = priv->steering_tag;
> +	*ph = priv->ph;

If the dmabuf exporter don't provide st&ph, this ops should return error

> +	return 0;
> +}
> +
>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
>  				   struct sg_table *sgt,
>  				   enum dma_data_direction dir)
> @@ -90,6 +101,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
>  	.unpin = vfio_pci_dma_buf_unpin,
>  	.attach = vfio_pci_dma_buf_attach,
>  	.map_dma_buf = vfio_pci_dma_buf_map,
> +	.get_tph = vfio_pci_dma_buf_get_tph,
>  	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
>  	.release = vfio_pci_dma_buf_release,
>  };
> @@ -228,7 +240,10 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
>  		return -EFAULT;
> 
> -	if (!get_dma_buf.nr_ranges || get_dma_buf.flags)
> +	if (!get_dma_buf.nr_ranges ||
> +	    (get_dma_buf.flags & ~(VFIO_DMABUF_FL_TPH |
> +				   VFIO_DMABUF_TPH_PH_MASK |
> +				   VFIO_DMABUF_TPH_ST_MASK)))
>  		return -EINVAL;
> 
>  	/*
> @@ -285,7 +300,14 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  		ret = PTR_ERR(priv->dmabuf);
>  		goto err_dev_put;
>  	}
> -
> +	if (get_dma_buf.flags & VFIO_DMABUF_FL_TPH) {
> +		priv->steering_tag = (get_dma_buf.flags &
> +				      VFIO_DMABUF_TPH_ST_MASK) >>
> +				     VFIO_DMABUF_TPH_ST_SHIFT;
> +		priv->ph = (get_dma_buf.flags &
> +			    VFIO_DMABUF_TPH_PH_MASK) >>
> +			   VFIO_DMABUF_TPH_PH_SHIFT;
> +	}
>  	/* dma_buf_put() now frees priv */
>  	INIT_LIST_HEAD(&priv->dmabufs_elm);
>  	down_write(&vdev->memory_lock);
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 133b9e637b55..26705c83ad80 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -113,6 +113,36 @@ struct dma_buf_ops {
>  	 */
>  	void (*unpin)(struct dma_buf_attachment *attach);
> 
> +	/**
> +	 * @get_tph:
> +	 *
> +	 * Get the TPH (TLP Processing Hints) for this DMA buffer.
> +	 *
> +	 * This callback allows DMA buffer exporters to provide TPH including
> +	 * both the steering tag and the process hints (ph), which can be used
> +	 * to optimize peer-to-peer (P2P) memory access. The TPH info is typically
> +	 * used in scenarios where:
> +	 * - A PCIe device (e.g., RDMA NIC) needs to access memory on another
> +	 *   PCIe device (e.g., GPU),
> +	 * - The system supports TPH and can use steering tags / ph to optimize
> +	 *   cache placement and memory access patterns,
> +	 * - The memory is exported via DMABUF for cross-device sharing.
> +	 *
> +	 * @dmabuf: [in] The DMA buffer for which to retrieve TPH
> +	 * @steering_tag: [out] Pointer to store the 16-bit TPH steering tag value
> +	 * @ph: [out] Pointer to store the 8-bit TPH processing-hint value
> +	 *
> +	 * Returns:
> +	 * * 0 - Success, steering tag stored in @steering_tag
> +	 * * -EOPNOTSUPP - TPH steering tags not supported for this buffer
> +	 * * -EINVAL - Invalid parameters
> +	 *
> +	 * This callback is optional. If not implemented, the buffer does not
> +	 * support TPH.

It seemed already impl...

> +	 *
> +	 */
> +	int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph);
> +
>  	/**
>  	 * @map_dma_buf:
>  	 *
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..e2a8962641d2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1505,8 +1505,13 @@ struct vfio_region_dma_range {
>  struct vfio_device_feature_dma_buf {
>  	__u32	region_index;
>  	__u32	open_flags;
> -	__u32   flags;
> -	__u32   nr_ranges;
> +	__u32	flags;
> +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U
> +	__u32	nr_ranges;
>  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
>  };

Another question:
1\ PCIE protocol define 8bit and 16bit ST
2\ In host-device ST impl, the ACPI will provide 8bit and 16bit ST, the choice of which
   one to use depends on the minimum supported range of the device and the RP.
3\ So in this P2P scene, although exporter (e.g. GPU) support 16bit ST, but the consumer
   (e.g. RDMA NIC) only support 8bit this may lead to mis-match

> 
> --
> 2.52.0
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-28  2:21   ` fengchengwen
@ 2026-03-31  0:49     ` Zhiping Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Zhiping Zhang @ 2026-03-31  0:49 UTC (permalink / raw)
  To: fengchengwen
  Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen,
	Yishai Hadas, Bjorn Helgaas

On Fri, Mar 27, 2026 at 7:22 PM fengchengwen <fengchengwen@huawei.com> wrote:
>
> >
> Hi Zhiping,
>
> On 3/25/2026 7:46 AM, Zhiping Zhang wrote:
> > This patch adds a callback to get the tph info on DMA buffer exporters.
> > The tph info includes both the steering tag and the process hint (ph).
> >
> > The steering tag and ph are encoded in the flags field of
> > vfio_device_feature_dma_buf instead of adding new fields to the uapi
> > struct, to preserve ABI compatibility.
> >
> > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_dmabuf.c | 26 ++++++++++++++++++++++++--
> >  include/linux/dma-buf.h            | 30 ++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h          |  9 +++++++--
> >  3 files changed, 61 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> > index 478beafc6ac3..c45cb3884b85 100644
> > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> > @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
> >       struct phys_vec *phys_vec;
> >       struct p2pdma_provider *provider;
> >       u32 nr_ranges;
> > +     u16 steering_tag;
> > +     u8 ph;
> >       u8 revoked : 1;
> >  };
> >
> > @@ -60,6 +62,15 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> >                                      priv->size, dir);
> >  }
> >
> > +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
> > +                                 u8 *ph)
> > +{
> > +     struct vfio_pci_dma_buf *priv = dmabuf->priv;
> > +     *steering_tag = priv->steering_tag;
> > +     *ph = priv->ph;
>
> If the dmabuf exporter don't provide st&ph, this ops should return error

That is a good call, let me address that in the new revision.

>
> > +     return 0;
> > +}
> > +
> >  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> >                                  struct sg_table *sgt,
> >                                  enum dma_data_direction dir)
> > @@ -90,6 +101,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
> >       .unpin = vfio_pci_dma_buf_unpin,
> >       .attach = vfio_pci_dma_buf_attach,
> >       .map_dma_buf = vfio_pci_dma_buf_map,
> > +     .get_tph = vfio_pci_dma_buf_get_tph,
> >       .unmap_dma_buf = vfio_pci_dma_buf_unmap,
> >       .release = vfio_pci_dma_buf_release,
> >  };
> > @@ -228,7 +240,10 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> >       if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
> >               return -EFAULT;
> >
> > -     if (!get_dma_buf.nr_ranges || get_dma_buf.flags)
> > +     if (!get_dma_buf.nr_ranges ||
> > +         (get_dma_buf.flags & ~(VFIO_DMABUF_FL_TPH |
> > +                                VFIO_DMABUF_TPH_PH_MASK |
> > +                                VFIO_DMABUF_TPH_ST_MASK)))
> >               return -EINVAL;
> >
> >       /*
> > @@ -285,7 +300,14 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> >               ret = PTR_ERR(priv->dmabuf);
> >               goto err_dev_put;
> >       }
> > -
> > +     if (get_dma_buf.flags & VFIO_DMABUF_FL_TPH) {
> > +             priv->steering_tag = (get_dma_buf.flags &
> > +                                   VFIO_DMABUF_TPH_ST_MASK) >>
> > +                                  VFIO_DMABUF_TPH_ST_SHIFT;
> > +             priv->ph = (get_dma_buf.flags &
> > +                         VFIO_DMABUF_TPH_PH_MASK) >>
> > +                        VFIO_DMABUF_TPH_PH_SHIFT;
> > +     }
> >       /* dma_buf_put() now frees priv */
> >       INIT_LIST_HEAD(&priv->dmabufs_elm);
> >       down_write(&vdev->memory_lock);
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 133b9e637b55..26705c83ad80 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -113,6 +113,36 @@ struct dma_buf_ops {
> >        */
> >       void (*unpin)(struct dma_buf_attachment *attach);
> >
> > +     /**
> > +      * @get_tph:
> > +      *
> > +      * Get the TPH (TLP Processing Hints) for this DMA buffer.
> > +      *
> > +      * This callback allows DMA buffer exporters to provide TPH including
> > +      * both the steering tag and the process hints (ph), which can be used
> > +      * to optimize peer-to-peer (P2P) memory access. The TPH info is typically
> > +      * used in scenarios where:
> > +      * - A PCIe device (e.g., RDMA NIC) needs to access memory on another
> > +      *   PCIe device (e.g., GPU),
> > +      * - The system supports TPH and can use steering tags / ph to optimize
> > +      *   cache placement and memory access patterns,
> > +      * - The memory is exported via DMABUF for cross-device sharing.
> > +      *
> > +      * @dmabuf: [in] The DMA buffer for which to retrieve TPH
> > +      * @steering_tag: [out] Pointer to store the 16-bit TPH steering tag value
> > +      * @ph: [out] Pointer to store the 8-bit TPH processing-hint value
> > +      *
> > +      * Returns:
> > +      * * 0 - Success, steering tag stored in @steering_tag
> > +      * * -EOPNOTSUPP - TPH steering tags not supported for this buffer
> > +      * * -EINVAL - Invalid parameters
> > +      *
> > +      * This callback is optional. If not implemented, the buffer does not
> > +      * support TPH.
>
> It seemed already impl...

Yup, it's supposed to be implemented.

>
> > +      *
> > +      */
> > +     int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph);
> > +
> >       /**
> >        * @map_dma_buf:
> >        *
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index bb7b89330d35..e2a8962641d2 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -1505,8 +1505,13 @@ struct vfio_region_dma_range {
> >  struct vfio_device_feature_dma_buf {
> >       __u32   region_index;
> >       __u32   open_flags;
> > -     __u32   flags;
> > -     __u32   nr_ranges;
> > +     __u32   flags;
> > +#define VFIO_DMABUF_FL_TPH           (1U << 0) /* TPH info is present */
> > +#define VFIO_DMABUF_TPH_PH_SHIFT     1         /* bits 1-2: PH (2-bit) */
> > +#define VFIO_DMABUF_TPH_PH_MASK      0x6U
> > +#define VFIO_DMABUF_TPH_ST_SHIFT     16        /* bits 16-31: steering tag */
> > +#define VFIO_DMABUF_TPH_ST_MASK              0xffff0000U
> > +     __u32   nr_ranges;
> >       struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> >  };
>
> Another question:
> 1\ PCIE protocol define 8bit and 16bit ST
> 2\ In host-device ST impl, the ACPI will provide 8bit and 16bit ST, the choice of which
>    one to use depends on the minimum supported range of the device and the RP.
> 3\ So in this P2P scene, although exporter (e.g. GPU) support 16bit ST, but the consumer
>    (e.g. RDMA NIC) only support 8bit this may lead to mis-match
>

Hmm, let me check how we can address this mis-match issue. One option
is to add an
additional parameter and fail the get_tph call when a mis-match is found.

> >
> > --
> > 2.52.0
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-26 22:41     ` Keith Busch
  2026-03-26 22:55       ` Zhiping Zhang
@ 2026-03-31  8:37       ` Leon Romanovsky
  2026-03-31 13:00         ` Keith Busch
  1 sibling, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-31  8:37 UTC (permalink / raw)
  To: Keith Busch
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Thu, Mar 26, 2026 at 04:41:11PM -0600, Keith Busch wrote:
> On Wed, Mar 25, 2026 at 10:25:34AM +0200, Leon Romanovsky wrote:
> > On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> > >  struct vfio_device_feature_dma_buf {
> > >  	__u32	region_index;
> > >  	__u32	open_flags;
> > > -	__u32   flags;
> > > -	__u32   nr_ranges;
> > > +	__u32	flags;
> > > +#define VFIO_DMABUF_FL_TPH		(1U << 0) /* TPH info is present */
> > > +#define VFIO_DMABUF_TPH_PH_SHIFT	1         /* bits 1-2: PH (2-bit) */
> > > +#define VFIO_DMABUF_TPH_PH_MASK	0x6U
> > > +#define VFIO_DMABUF_TPH_ST_SHIFT	16        /* bits 16-31: steering tag */
> > > +#define VFIO_DMABUF_TPH_ST_MASK		0xffff0000U
> > 
> > This extension of flags is basically kills future extension of this
> > struct for anything that includes TPH.
> > 
> > Add new
> > enum vfio_device_feature_dma_buf_flags {
> >     VFIO_DMABUF_FL_TPH  = 1 << 0
> > }
> > 
> > > +	__u32	nr_ranges;
> > 
> > add your "__u16 steering_tag" and "__u8 ph" fields here.
> 
> You're suggesting that Ziping append the new fields to the end of this
> struct? I don't think we can modify the layout of a uapi.

He needs to add before flex array. This struct is submitted by the user
and kernel can easily calculate the position of that array.

Something like this:
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index b1d658b8f7b51..d78d915992232 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -237,7 +237,11 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
        if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
                return -ENODEV;

-       dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
+       if (!tph_supplied)
+               dma_ranges = memdup_array_user(old_dma_ranges_pos, get_dma_buf.nr_ranges,
+                                      sizeof(*dma_ranges));
+       else
+               dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
                                       sizeof(*dma_ranges));
        if (IS_ERR(dma_ranges))
                return PTR_ERR(dma_ranges);
~


Thanks

> 
> If we can't carve the space for this out of the existing unused flags
> field, I think we'd have to introduce a new vfio device feature that
> basically copies VFIO_DEVICE_FEATURE_DMA_BUF with the extra hints
> fields.
>  
> > >  	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> > >  };

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-26 22:55       ` Zhiping Zhang
@ 2026-03-31  8:39         ` Leon Romanovsky
  0 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-31  8:39 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Keith Busch, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Thu, Mar 26, 2026 at 03:55:44PM -0700, Zhiping Zhang wrote:
> On Thu, Mar 26, 2026 at 3:41 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > >
> > On Wed, Mar 25, 2026 at 10:25:34AM +0200, Leon Romanovsky wrote:
> > > On Tue, Mar 24, 2026 at 04:46:02PM -0700, Zhiping Zhang wrote:
> > > >  struct vfio_device_feature_dma_buf {
> > > >     __u32   region_index;
> > > >     __u32   open_flags;
> > > > -   __u32   flags;
> > > > -   __u32   nr_ranges;
> > > > +   __u32   flags;
> > > > +#define VFIO_DMABUF_FL_TPH         (1U << 0) /* TPH info is present */
> > > > +#define VFIO_DMABUF_TPH_PH_SHIFT   1         /* bits 1-2: PH (2-bit) */
> > > > +#define VFIO_DMABUF_TPH_PH_MASK    0x6U
> > > > +#define VFIO_DMABUF_TPH_ST_SHIFT   16        /* bits 16-31: steering tag */
> > > > +#define VFIO_DMABUF_TPH_ST_MASK            0xffff0000U
> > >
> > > This extension of flags is basically kills future extension of this
> > > struct for anything that includes TPH.
> > >
> > > Add new
> > > enum vfio_device_feature_dma_buf_flags {
> > >     VFIO_DMABUF_FL_TPH  = 1 << 0
> > > }
> 
> yes we can do that.
> 
> > >
> > > > +   __u32   nr_ranges;
> > >
> > > add your "__u16 steering_tag" and "__u8 ph" fields here.
> >
> That is what I did in V1, Leon.

Not really, you did only half of the work. You didn't introduce new flag
and didn't calculate "old dma range" position.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31  8:37       ` Leon Romanovsky
@ 2026-03-31 13:00         ` Keith Busch
  2026-03-31 13:29           ` Leon Romanovsky
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2026-03-31 13:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 11:37:58AM +0300, Leon Romanovsky wrote:
> On Thu, Mar 26, 2026 at 04:41:11PM -0600, Keith Busch wrote:
> > 
> > You're suggesting that Ziping append the new fields to the end of this
> > struct? I don't think we can modify the layout of a uapi.
> 
> He needs to add before flex array. This struct is submitted by the user
> and kernel can easily calculate the position of that array.

No, you can't just do that. Existing applications would break when they
compile against the updated kernel header. They don't know about this
new "tph" supplied flag, but they'll all accidently use the new
dma_ranges offset. 
 
> Something like this:
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index b1d658b8f7b51..d78d915992232 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -237,7 +237,11 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>         if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
>                 return -ENODEV;
> 
> -       dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
> +       if (!tph_supplied)
> +               dma_ranges = memdup_array_user(old_dma_ranges_pos, get_dma_buf.nr_ranges,
> +                                      sizeof(*dma_ranges));
> +       else
> +               dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
>                                        sizeof(*dma_ranges));
>         if (IS_ERR(dma_ranges))
>                 return PTR_ERR(dma_ranges);
> ~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31 13:00         ` Keith Busch
@ 2026-03-31 13:29           ` Leon Romanovsky
  2026-03-31 13:35             ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-31 13:29 UTC (permalink / raw)
  To: Keith Busch
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 07:00:07AM -0600, Keith Busch wrote:
> On Tue, Mar 31, 2026 at 11:37:58AM +0300, Leon Romanovsky wrote:
> > On Thu, Mar 26, 2026 at 04:41:11PM -0600, Keith Busch wrote:
> > > 
> > > You're suggesting that Ziping append the new fields to the end of this
> > > struct? I don't think we can modify the layout of a uapi.
> > 
> > He needs to add before flex array. This struct is submitted by the user
> > and kernel can easily calculate the position of that array.
> 
> No, you can't just do that. Existing applications would break when they
> compile against the updated kernel header. They don't know about this
> new "tph" supplied flag, but they'll all accidently use the new
> dma_ranges offset. 

So we need to always pass TPH flag and treat 0 as do-nothing-field.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31 13:29           ` Leon Romanovsky
@ 2026-03-31 13:35             ` Keith Busch
  2026-03-31 14:03               ` Leon Romanovsky
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2026-03-31 13:35 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 04:29:42PM +0300, Leon Romanovsky wrote:
> On Tue, Mar 31, 2026 at 07:00:07AM -0600, Keith Busch wrote:
> > On Tue, Mar 31, 2026 at 11:37:58AM +0300, Leon Romanovsky wrote:
> > > On Thu, Mar 26, 2026 at 04:41:11PM -0600, Keith Busch wrote:
> > > > 
> > > > You're suggesting that Ziping append the new fields to the end of this
> > > > struct? I don't think we can modify the layout of a uapi.
> > > 
> > > He needs to add before flex array. This struct is submitted by the user
> > > and kernel can easily calculate the position of that array.
> > 
> > No, you can't just do that. Existing applications would break when they
> > compile against the updated kernel header. They don't know about this
> > new "tph" supplied flag, but they'll all accidently use the new
> > dma_ranges offset. 
> 
> So we need to always pass TPH flag and treat 0 as do-nothing-field.

I don't think you're understanding the implications. If Zhiping appends
new fields in front of the flex array dma_ranges, then existing
applications will implicitly use the new offset if they are recompiled
against the new kernel header. But if the binary was compiled against
the older kernel header, then that application would use the previous
offset. Both applications have the TPH flag cleared to 0. How is the
kernel supposed to know which offset the application used?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31 13:35             ` Keith Busch
@ 2026-03-31 14:03               ` Leon Romanovsky
  2026-03-31 14:13                 ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-31 14:03 UTC (permalink / raw)
  To: Keith Busch
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 07:35:46AM -0600, Keith Busch wrote:
> On Tue, Mar 31, 2026 at 04:29:42PM +0300, Leon Romanovsky wrote:
> > On Tue, Mar 31, 2026 at 07:00:07AM -0600, Keith Busch wrote:
> > > On Tue, Mar 31, 2026 at 11:37:58AM +0300, Leon Romanovsky wrote:
> > > > On Thu, Mar 26, 2026 at 04:41:11PM -0600, Keith Busch wrote:
> > > > > 
> > > > > You're suggesting that Ziping append the new fields to the end of this
> > > > > struct? I don't think we can modify the layout of a uapi.
> > > > 
> > > > He needs to add before flex array. This struct is submitted by the user
> > > > and kernel can easily calculate the position of that array.
> > > 
> > > No, you can't just do that. Existing applications would break when they
> > > compile against the updated kernel header. They don't know about this
> > > new "tph" supplied flag, but they'll all accidently use the new
> > > dma_ranges offset. 
> > 
> > So we need to always pass TPH flag and treat 0 as do-nothing-field.
> 
> I don't think you're understanding the implications. If Zhiping appends
> new fields in front of the flex array dma_ranges, then existing
> applications will implicitly use the new offset if they are recompiled
> against the new kernel header. But if the binary was compiled against
> the older kernel header, then that application would use the previous
> offset. Both applications have the TPH flag cleared to 0. How is the
> kernel supposed to know which offset the application used?

I understand, my proposal is always set TPH flag when new struct is
used. Everything will be much easier if we can add fields after flex
array.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31 14:03               ` Leon Romanovsky
@ 2026-03-31 14:13                 ` Keith Busch
  2026-03-31 19:02                   ` Leon Romanovsky
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2026-03-31 14:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 05:03:09PM +0300, Leon Romanovsky wrote:
> I understand, my proposal is always set TPH flag when new struct is
> used.

An existing application recompiled against the new kernel api implicitly
uses the new struct layout without setting the TPH flag, so kernel and
application are out of sync on where dma_ranges exists with your
proposal.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31 14:13                 ` Keith Busch
@ 2026-03-31 19:02                   ` Leon Romanovsky
  2026-03-31 19:44                     ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-31 19:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 08:13:58AM -0600, Keith Busch wrote:
> On Tue, Mar 31, 2026 at 05:03:09PM +0300, Leon Romanovsky wrote:
> > I understand, my proposal is always set TPH flag when new struct is
> > used.
> 
> An existing application recompiled against the new kernel api implicitly
> uses the new struct layout without setting the TPH flag, so kernel and
> application are out of sync on where dma_ranges exists with your
> proposal.

Right, what about adding TPH fields to struct vfio_region_dma_range
instead of struct vfio_device_feature_dma_buf?

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
  2026-03-31 19:02                   ` Leon Romanovsky
@ 2026-03-31 19:44                     ` Keith Busch
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2026-03-31 19:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas

On Tue, Mar 31, 2026 at 10:02:20PM +0300, Leon Romanovsky wrote:
> 
> Right, what about adding TPH fields to struct vfio_region_dma_range
> instead of struct vfio_device_feature_dma_buf?

You might have to show me with code what you're talking about because I
can't see any way we can add fields to any struct here without breaking
backward compatibility.

If we can't claim bits out of the unused "flags" field for this feature,
then my initial reply is the only sane approach: we can introduce a new
feature and struct for it that closely mirrors the existing one, but
with the extra hint fields.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-31 19:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260324234615.3731237-1-zhipingz@meta.com>
     [not found] ` <20260324234615.3731237-2-zhipingz@meta.com>
2026-03-25  8:25   ` [RFC v2 1/2] vfio: add callback to get tph info for dmabuf Leon Romanovsky
2026-03-26 22:41     ` Keith Busch
2026-03-26 22:55       ` Zhiping Zhang
2026-03-31  8:39         ` Leon Romanovsky
2026-03-31  8:37       ` Leon Romanovsky
2026-03-31 13:00         ` Keith Busch
2026-03-31 13:29           ` Leon Romanovsky
2026-03-31 13:35             ` Keith Busch
2026-03-31 14:03               ` Leon Romanovsky
2026-03-31 14:13                 ` Keith Busch
2026-03-31 19:02                   ` Leon Romanovsky
2026-03-31 19:44                     ` Keith Busch
2026-03-28  2:21   ` fengchengwen
2026-03-31  0:49     ` Zhiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox