From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Kumar, Praveen" <pravkmr@amazon.de>,
"Adam, Mahmoud" <mngyadam@amazon.de>,
"Woodhouse, David" <dwmw@amazon.co.uk>,
"nagy@khwaternagy.com" <nagy@khwaternagy.com>
Subject: Re: [RFC PATCH 0/9] vfio: Introduce mmap maple tree
Date: Tue, 12 Aug 2025 21:17:10 -0300 [thread overview]
Message-ID: <20250813001710.GD599331@ziepe.ca> (raw)
In-Reply-To: <20250812132642.634b542d.alex.williamson@redhat.com>
On Tue, Aug 12, 2025 at 01:26:42PM -0600, Alex Williamson wrote:
> > On Mon, Aug 11, 2025 at 04:07:10PM -0600, Alex Williamson wrote:
> > > We do this today with device specific regions, see
> > > vfio_pci_core_register_dev_region(). We use this to provide several
> > > additional regions for IGD. If we had an interface for users to
> > > trigger new regions we'd need some protection for exceeding the index
> > > space (-ENOSPC), but adding a small number of regions is not a problem.
> >
> > That is pretty incomplete..
> >
> > If we go down the maple tree direction I expect to eliminate
> > vfio_pci_core_register_dev_region() and replace it with core code
> > handling the dispatch of mmap and rw through the struct vfio_mmap.
>
> Do note the inconsistency of having a vfio_mmap object that's used for
> read/write.
We can bikeshed the naming at some later point.
> > I know, but I really dislike this as a uAPI. It becomes confusing for
> > the user to get a list of, what should be, physical regions and now
> > suddenly has to deal with non-physial alias regions it may have
> > created.
>
> This is subjective though. Personally I find it confusing to have this
> fixation that BAR0 is _only_ accessed through region index 0.
I don't see why. The user story here is "give me a WC mapping for 4096
bytes at offset 0x1245000 in BAR 0"
My version of the uAPI is very simple and logical:
ioctl(fd, GET_MMAP, {.region_index = VFIO_PCI_BAR0_REGION_INDEX,
.map_flags = WC,
.mmap_pgoff = &pgoff})
reg = mmap(NULL, 4098, prot, MMAP_SHARED, fd, pgoff + 0x1245000)
Simple direct, exactly matches the user story.
Why force the user to mess with some virtual region index, I can't see
any justification for that, and it only makes the user's job more
complicated.
Even if you take the position that region indexes are dynamic, it
still makes sense, for a user story, to have an API flow like
region_index = get_region_by_name("registers");
ioctl(fd, GET_MMAP, {.region_index = region_index,
.map_flags = WC,
.mmap_pgoff = &pgoff})
reg = mmap(NULL, 4098, prot, MMAP_SHARED, fd, pgoff + 0x1245000)
And I can easially imagine something like the VFIO abstraction layer
in DPDK happily working like the above.
In terms of uAPI usage from a user there is zero value in adding a
virtual region index to this flow. It does nothing to help the user at
all, it just some weird hoop to jump through.
> > int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma, struct vfio_mmap *mmap)
> > {
> > index = mmap->index;
> >
> > The code is simpler and cleaner, it generalizes outside of PCI. No
> > more open coding vm_pgoff shifting. We get nice things like pgoff
> > packing, better 32 bit compatability, and a huge number of mmap
> > cookies for whatever we need.
>
> We're in agreement that we need to derive an object from a pgoff and
> that will universally replace embedding the region index in the high
> order bits of the pgoff.
Ok
> However everything else here is just the implementation of that
> returned object. The object could easily have a region field that
> maps to the uAPI region and a device region field that maps to the
> physical resource. Some objects have the same value for these
> fields, some objects don't. It's not fundamentally important.
Again, I agree we can implement what you are proposing, and it can be
implemented with the maple tree pgoff cleanups too (maybe even
requires it to really be a clean implementation as well).
I just don't think it is a good uapi design for what is ultimately a
very simple user action.
> > > > It would not be another region index. That is the whole point. It is
> > > > another pgoff for an existing index.
> > >
> > > I think this is turning a region index into something it was not meant
> > > to be.
> >
> > What do you mean? Region index is the uAPI we have to refer to a fixed
> > physical part of the device.
> >
> > Reallly, what makes more sense as userspace operations:
> >
> > 'give me a WC mapping for region index 0 which is BAR 0'
> >
> > 'Make a new region index for region index 0 which is bar 0 and then give me a mapping for region X"
> >
> > The first is very logical, the second is pretty obfuscated.
>
> Current situation: We already have an API that says give me a mapping
> for region X.
>
> Therefore a new operation that says "give me a new region index for
> mapping index 0 with attribute FOO" is much more congruent to our
> current API.
I think that is extremely subjective.
I view it as we have a api to get a mapping for region X, that is
nonextensible and lacks a feature we need.
The logical answer is to improve the API to get a mapping for region
X.
There is many precendents for this in Linux. We have an endless number
of systemcall extensions, eg open, openat(), openat2().
I view my suggestion of GET_REGION_MMAP as the openat() to
GET_REGION_INFO as open().
So I don't agree with your leap that GET_REGION_INFO is perfect and
the issue is we need an entirely new conceputal model for region_index
to keep GET_REGION_INFO unchanged. :(
> Oh! This was the proposal to use flags as an input value when we call
> REGION_INFO. Yeah, that's not possible. But the above two new APIs
> don't logically then follow as our only path forward though.
I'm not fussy here, any option to let us extend GET_REGION_INFO gets
my vote :)
> > > Introducing mmap cookies as a new mapping to a region where we can have
> > > N:1 cookies to region really seems unnecessarily complicated vs a 1:1
> > > cookie to region space.
> >
> > Your version is creating a 1:N:1 mapping, which I think is more
> > complicated. One physical region, N virtual regions, one pgoff.
>
> No, the region index to pgoff is 1:1, REGION_INFO returns 1 offset per
> region. Therefore I think it's 1:N:N in this denotation vs I think you
> want 1:1:N, but here you've referred to it as a _virtual_ region, which
> is exactly what it is, there is no 1:1 requirement for physical regions
> to virtual regions.
Ultimately the user is only dealing in the physical regions. That is
how it knows what MMIO to reach.
Adding an indirection where the user has to go from the physical
region, be it statically set as uAPI or discovered once dynamically,
to some other virtual region, and then to a pgoff, is confusing and
uncessary. It is not 1:1:N, I'm aiming for a simple 1:N with no new
virtual region concept at all.
> uAPI where a user could invoke a DEVICE_FEATURE (or pick your own
> IOCTL, but DEVICE_FEATURE is very extensible) to dynamically generate a
> new region with specified mmap attributes, DEVICE_INFO.num_regions is
> updated, the offset/mmap_cookie is obtained via REGION_INFO, along with
> a capability in the return buffer allowing introspection. New ioctls:
> potentially zero.
DEVICE_FEATURE_XX is basically a new ioctl, just multiplexed.
> vfio-core. I think this is where BenH and I are confused by the
> insistence to start with the maple tree rather than focusing on the API.
I guided working on the maple tree because doing so makes it trivial
and clean to implement the GET_MMAP style API which is what I have
been consistently advocating for.
Anyhow, I think we should stop writing these emails, this has gone on
long enough. If this last round hasn't convinced either one of us lets
call it quits. Tell them to do the extra region and help them make a
clean implementation if that is what you want to do.
Jason
next prev parent reply other threads:[~2025-08-13 0:17 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-04 10:39 [RFC PATCH 0/9] vfio: Introduce mmap maple tree Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 1/9] vfio: add mmap maple tree to vfio Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 2/9] vfio: add transient ops to support vfio mmap mt Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 3/9] vfio-pci-core: rename vm operations Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 4/9] vfio-pci-core: remove redundant offset calculations Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 5/9] vfio-pci-core: add vfio_pci_mmap & helpers Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 6/9] vfio-pci-core: support the new vfio ops Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 7/9] vfio-pci: use " Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 8/9] vfio: UAPI for setting mmap attributes Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 9/9] vfio_pci_core: support mmap attrs uapi & WC Mahmoud Adam
2025-08-04 18:49 ` [RFC PATCH 0/9] vfio: Introduce mmap maple tree Alex Williamson
2025-08-04 20:09 ` Mahmoud Nagy Adam
2025-08-05 14:31 ` Jason Gunthorpe
2025-08-05 15:48 ` Mahmoud Nagy Adam
2025-08-05 18:50 ` [RFC " Jason Gunthorpe
2025-08-05 19:00 ` Alex Williamson
[not found] ` <80dc87730f694b2d6e6aabbd29df49cf3c7c44fb.camel@amazon.com>
[not found] ` <20250806115224.GB377696@ziepe.ca>
2025-08-07 8:12 ` Herrenschmidt, Benjamin
2025-08-07 19:06 ` Alex Williamson
2025-08-11 15:55 ` Jason Gunthorpe
2025-08-11 22:07 ` Alex Williamson
2025-08-12 0:30 ` Jason Gunthorpe
2025-08-12 19:26 ` Alex Williamson
2025-08-13 0:17 ` Jason Gunthorpe [this message]
2025-08-14 8:39 ` Mahmoud Nagy Adam
2025-08-14 9:52 ` Mahmoud Nagy Adam
2025-08-14 17:52 ` Alex Williamson
2025-08-28 8:53 ` Mahmoud Nagy Adam
2025-08-28 19:17 ` Alex Williamson
2025-08-07 8:13 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250813001710.GD599331@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=alex.williamson@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=dwmw@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=mngyadam@amazon.de \
--cc=nagy@khwaternagy.com \
--cc=pravkmr@amazon.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).