From: Pranjal Shrivastava <praan@google.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
Logan Gunthorpe <logang@deltatee.com>,
Alex Williamson <alex@shazbot.org>,
Kevin Tian <kevin.tian@intel.com>,
Ankit Agrawal <ankita@nvidia.com>, Matt Evans <mattev@meta.com>,
Vivek Kasireddy <vivek.kasireddy@intel.com>,
Leon Romanovsky <leon@kernel.org>,
Shivaji Kant <shivajikant@google.com>,
Samiullah Khawaja <skhawaja@google.com>
Subject: Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
Date: Fri, 12 Jun 2026 14:50:18 +0000 [thread overview]
Message-ID: <aiwcquSgAonkh_6L@google.com> (raw)
In-Reply-To: <20260611221447.GH1066031@ziepe.ca>
On Thu, Jun 11, 2026 at 07:14:47PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 11, 2026 at 02:40:17PM +0000, Pranjal Shrivastava wrote:
> > On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> > >
> > > > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > > > memory slices once a BAR is registered.
> > >
> > > I'm shocked someone wants to use API, what are you expecting to do
> > > with it??
> >
> > Our primary use-case is PCIe BAR (DDR / HBM) -> NFS via P2PDMA while the
> > PCIe device is managed by a user-space driver based on vfio-pci. While
> > kernel drivers (e.g.drm) can register BARs with ZONE_DEVICE natively to
> > enable this, VFIO currently lacks an equivalent mechanism.
>
> I mean the weird sysfs mmap API. It is only useful if the device is
> basically pure memory with no functionality. You can't even learn what
> MMIO offset the returned allocation gives so it is almost completely
> useless.
>
> nvme could use it because CMB is pure memory and you reference it by
> its MMIO address, but that doesn't apply to VFIO..
>
Ack, I agree, sysfs allocation doesn't provide the offset-level control.
I'll pivot entirely to the DMABUF approach.
> > > > An alternative implementation has been explored which integrates with the
> > > > ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> > > > registering a BAR as a system-wide P2P provider, VFIO optionally
> > > > allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> > > > new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
> > >
> > > That's probably more sensible but you can't have a DMABUF mmap
> > > actually install non-special memory. The native vfio mmap still can,
> > > but not mmap on the dmabuf fd. That's still workable, just keep in
> > > mind.
> >
> > Ack. I guess, we could have a separate mmap path in case of BARs that are
> > struct page backed which doesn't go through the dmabuf exporter.
>
> The dmabuf export is perfectly fine, you just have to think very
> carefully about the mmap path.
>
> I suppose if you build the proper revocation fence for zone device
> pages as part of the vfio implementation it would be OK for dmabuf
> mmap to expose them as well since it would have the right lifecycle
> model.
>
Ack, I'll move forward with adding a flag to request a ZONE_DEVICE-backed
DMABUF export (the 'Alternative Approach' mentioned in the cover letter).
And yes, I agree we need to ensure the mmap path is handled carefully
with the correct lifecycle in mind.
> That's the tricky thing with zone_device, you have to be careful to
> wait for all the page references to be put back at all the right
> times.
Yea, that's going to be tricky.. I'm thinking if we can have a zap model
there somehow? If the device is gone / going through a reset, we can
handle the refcounts accordingly?
>
> Come to think of it, since the sysfs API cannot do that in the way
> VFIO wants I actually think you can't use it..
Ack. Baking this into the VFIO DMABUF allows us to enforce the right
lifecycle.
My plan for RFC v2 is to add a flag like VFIO_DMA_BUF_FLAG_ZONE_DEVICE
to struct vfio_device_feature_dma_buf which allows the caller to opt-in
to ZONE_DEVICE backing specifically for that export.
Does this opt-in flag sound like a reasonable uAPI or do you see any
concerns with this direction?
Otherwise, as you noted, the lifecycle and the mmap path remain the main
problems to solve.
Thanks,
Praan
prev parent reply other threads:[~2026-06-12 14:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
2026-06-10 15:31 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
2026-06-10 15:35 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
2026-06-10 15:40 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
2026-06-10 15:44 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
2026-06-10 15:54 ` sashiko-bot
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
2026-06-10 18:32 ` Leon Romanovsky
2026-06-11 14:40 ` Pranjal Shrivastava
2026-06-11 14:43 ` Pranjal Shrivastava
2026-06-11 22:14 ` Jason Gunthorpe
2026-06-12 14:50 ` Pranjal Shrivastava [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiwcquSgAonkh_6L@google.com \
--to=praan@google.com \
--cc=alex@shazbot.org \
--cc=ankita@nvidia.com \
--cc=bhelgaas@google.com \
--cc=jgg@ziepe.ca \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=mattev@meta.com \
--cc=shivajikant@google.com \
--cc=skhawaja@google.com \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.