From: Leon Romanovsky <leon@kernel.org>
To: Zhiping Zhang <zhipingz@meta.com>
Cc: Alex Williamson <alex@shazbot.org>,
Stanislav Fomichev <sdf@meta.com>,
Keith Busch <kbusch@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
Bjorn Helgaas <helgaas@kernel.org>,
linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
Yochai Cohen <yochai@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH v1 1/2] vfio: add callback to get tph info for dma-buf
Date: Mon, 27 Apr 2026 21:35:13 +0300 [thread overview]
Message-ID: <20260427183513.GK440345@unreal> (raw)
In-Reply-To: <CAH3zFs2Sy0mv=QkK4VSV+MVR=ef_CdoxMhTFgzaqoZ+uSOpxoQ@mail.gmail.com>
On Mon, Apr 27, 2026 at 07:28:57AM -0700, Zhiping Zhang wrote:
> On Mon, Apr 27, 2026 at 6:37 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > >
> > On Wed, Apr 22, 2026 at 09:23:27AM -0600, Alex Williamson wrote:
> > > On Mon, 20 Apr 2026 11:39:15 -0700
> > > Zhiping Zhang <zhipingz@meta.com> wrote:
> > >
> > > > Add a dma-buf callback that returns raw TPH metadata from the exporter
> > > > so peer devices can reuse the steering tag and processing hint
> > > > associated with a VFIO-exported buffer.
> > > >
> > > > Keep the existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI layout intact by
> > > > using a flag plus one extra trailing entries[] object for the optional
> > > > TPH metadata. Rename the uAPI field dma_ranges to entries. The
> > > > nr_ranges field remains the DMA range count; when VFIO_DMABUF_FLAG_TPH
> > > > is set the kernel reads one extra entry beyond nr_ranges for the TPH
> > > > metadata.
> > > >
> > > > Add an st_width parameter to get_tph() so the exporter can reject
> > > > steering tags that exceed the consumer's supported width (8 vs 16 bit).
> > > > When no TPH metadata was supplied, make get_tph() return -EOPNOTSUPP.
> > > >
> > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > > > ---
> > > > drivers/vfio/pci/vfio_pci_dmabuf.c | 62 +++++++++++++++++++++++-------
> > > > include/linux/dma-buf.h | 17 ++++++++
> > > > include/uapi/linux/vfio.h | 28 ++++++++++++--
> > > > 3 files changed, 89 insertions(+), 18 deletions(-)
> >
> > <...>
> >
> > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > index bb7b89330d35..a0bd24623c52 100644
> > > > --- a/include/uapi/linux/vfio.h
> > > > +++ b/include/uapi/linux/vfio.h
> > > > @@ -1490,16 +1490,36 @@ struct vfio_device_feature_bus_master {
> > > > * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
> > > > * etc. offset/length specify a slice of the region to create the dmabuf from.
> > > > * nr_ranges is the total number of (P2P DMA) ranges that comprise the dmabuf.
> > > > + * When VFIO_DMABUF_FLAG_TPH is set, entries[] contains one extra trailing
> > > > + * object after the nr_ranges DMA ranges carrying the TPH steering tag and
> > > > + * processing hint.
> > >
> > > I really don't think we want to design an API where entries is
> > > implicitly one-off from what's actually there. This feeds back into
> > > the below removal of the __counted by attribute, which is a red flag
> > > that this is the wrong approach.
> >
> > I believe removing `__counted` is a mistake. In my proposal, the intent
> > was to adjust the meaning of the storage object based on the flag bit.
> > The size of the array should still be represented correctly.
> >
> > Thanks
>
> Thanks Leon — you're right that __counted_by should be preserved. In
> your approach, when the flag is set, the last entry in the array
> carries the TPH data, so the effective DMA range count is nr_ranges -
> 1.
It is correct only if you keep the original name *nr_range*. However, the
variable name is worth changing to something clearer, such as
*nr_array_elements*.
>
> That said, after discussing internally, we're leaning toward
> introducing a new VFIO device feature with dedicated TPH fields (as
> Alex suggested too), to avoid overloading vfio_region_dma_range with a
> union that changes semantics based on position.
>
> Would you have concerns with that direction? I'll post a v3 with the
> new approach.
I don’t have any concerns. My only worry was about “stealing” too many
bits from the flags variable, and you’ve avoided that here.
Thanks
prev parent reply other threads:[~2026-04-27 18:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260420183920.3626389-1-zhipingz@meta.com>
[not found] ` <20260420183920.3626389-2-zhipingz@meta.com>
2026-04-22 15:23 ` [PATCH v1 1/2] vfio: add callback to get tph info for dma-buf Alex Williamson
2026-04-22 16:29 ` Jason Gunthorpe
2026-04-22 19:27 ` Alex Williamson
2026-04-23 14:28 ` Jason Gunthorpe
2026-04-23 19:20 ` Alex Williamson
2026-04-23 22:46 ` Jason Gunthorpe
2026-04-24 5:41 ` Zhiping Zhang
2026-04-27 13:37 ` Leon Romanovsky
2026-04-27 14:28 ` Zhiping Zhang
2026-04-27 18:35 ` Leon Romanovsky [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427183513.GK440345@unreal \
--to=leon@kernel.org \
--cc=alex@shazbot.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=helgaas@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sdf@meta.com \
--cc=yishaih@nvidia.com \
--cc=yochai@nvidia.com \
--cc=zhipingz@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox