From: Leon Romanovsky <leon@kernel.org>
To: Zhiping Zhang <zhipingz@meta.com>
Cc: Alex Williamson <alex@shazbot.org>,
Stanislav Fomichev <sdf@meta.com>,
Keith Busch <kbusch@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
Bjorn Helgaas <helgaas@kernel.org>,
linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
Yochai Cohen <yochai@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH v1 1/2] vfio: add callback to get tph info for dma-buf
Date: Mon, 27 Apr 2026 21:35:13 +0300 [thread overview]
Message-ID: <20260427183513.GK440345@unreal> (raw)
In-Reply-To: <CAH3zFs2Sy0mv=QkK4VSV+MVR=ef_CdoxMhTFgzaqoZ+uSOpxoQ@mail.gmail.com>
On Mon, Apr 27, 2026 at 07:28:57AM -0700, Zhiping Zhang wrote:
> On Mon, Apr 27, 2026 at 6:37 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > >
> > On Wed, Apr 22, 2026 at 09:23:27AM -0600, Alex Williamson wrote:
> > > On Mon, 20 Apr 2026 11:39:15 -0700
> > > Zhiping Zhang <zhipingz@meta.com> wrote:
> > >
> > > > Add a dma-buf callback that returns raw TPH metadata from the exporter
> > > > so peer devices can reuse the steering tag and processing hint
> > > > associated with a VFIO-exported buffer.
> > > >
> > > > Keep the existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI layout intact by
> > > > using a flag plus one extra trailing entries[] object for the optional
> > > > TPH metadata. Rename the uAPI field dma_ranges to entries. The
> > > > nr_ranges field remains the DMA range count; when VFIO_DMABUF_FLAG_TPH
> > > > is set the kernel reads one extra entry beyond nr_ranges for the TPH
> > > > metadata.
> > > >
> > > > Add an st_width parameter to get_tph() so the exporter can reject
> > > > steering tags that exceed the consumer's supported width (8 vs 16 bit).
> > > > When no TPH metadata was supplied, make get_tph() return -EOPNOTSUPP.
> > > >
> > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > > > ---
> > > > drivers/vfio/pci/vfio_pci_dmabuf.c | 62 +++++++++++++++++++++++-------
> > > > include/linux/dma-buf.h | 17 ++++++++
> > > > include/uapi/linux/vfio.h | 28 ++++++++++++--
> > > > 3 files changed, 89 insertions(+), 18 deletions(-)
> >
> > <...>
> >
> > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > index bb7b89330d35..a0bd24623c52 100644
> > > > --- a/include/uapi/linux/vfio.h
> > > > +++ b/include/uapi/linux/vfio.h
> > > > @@ -1490,16 +1490,36 @@ struct vfio_device_feature_bus_master {
> > > > * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
> > > > * etc. offset/length specify a slice of the region to create the dmabuf from.
> > > > * nr_ranges is the total number of (P2P DMA) ranges that comprise the dmabuf.
> > > > + * When VFIO_DMABUF_FLAG_TPH is set, entries[] contains one extra trailing
> > > > + * object after the nr_ranges DMA ranges carrying the TPH steering tag and
> > > > + * processing hint.
> > >
> > > I really don't think we want to design an API where entries is
> > > implicitly one-off from what's actually there. This feeds back into
> > > the below removal of the __counted by attribute, which is a red flag
> > > that this is the wrong approach.
> >
> > I believe removing `__counted` is a mistake. In my proposal, the intent
> > was to adjust the meaning of the storage object based on the flag bit.
> > The size of the array should still be represented correctly.
> >
> > Thanks
>
> Thanks Leon — you're right that __counted_by should be preserved. In
> your approach, when the flag is set, the last entry in the array
> carries the TPH data, so the effective DMA range count is nr_ranges -
> 1.
It is correct only if you keep the original name *nr_range*. However, the
variable name is worth changing to something clearer, such as
*nr_array_elements*.
>
> That said, after discussing internally, we're leaning toward
> introducing a new VFIO device feature with dedicated TPH fields (as
> Alex suggested too), to avoid overloading vfio_region_dma_range with a
> union that changes semantics based on position.
>
> Would you have concerns with that direction? I'll post a v3 with the
> new approach.
I don’t have any concerns. My only worry was about “stealing” too many
bits from the flags variable, and you’ve avoided that here.
Thanks
next prev parent reply other threads:[~2026-04-27 18:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 18:39 [PATCH v1 0/2] Retrieve TPH from dma-buf for PCIe P2P memory access Zhiping Zhang
2026-04-20 18:39 ` [PATCH v1 1/2] vfio: add callback to get tph info for dma-buf Zhiping Zhang
2026-04-22 15:23 ` Alex Williamson
2026-04-22 16:29 ` Jason Gunthorpe
2026-04-22 19:27 ` Alex Williamson
2026-04-23 14:28 ` Jason Gunthorpe
2026-04-23 19:20 ` Alex Williamson
2026-04-23 22:46 ` Jason Gunthorpe
2026-04-24 5:41 ` Zhiping Zhang
2026-04-27 13:37 ` Leon Romanovsky
2026-04-27 14:28 ` Zhiping Zhang
2026-04-27 18:35 ` Leon Romanovsky [this message]
2026-04-20 18:39 ` [PATCH v1 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427183513.GK440345@unreal \
--to=leon@kernel.org \
--cc=alex@shazbot.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=helgaas@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sdf@meta.com \
--cc=yishaih@nvidia.com \
--cc=yochai@nvidia.com \
--cc=zhipingz@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.