From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: kvm@vger.kernel.org
Cc: "Adam, Mahmoud" <mngyadam@amazon.de>
Subject: Re: [RFC PATCH 0/9] vfio: Introduce mmap maple tree
Date: Thu, 07 Aug 2025 18:13:07 +1000 [thread overview]
Message-ID: <431011ca82cd7db46a704b43f122f845af0d26f1.camel@kernel.crashing.org> (raw)
In-Reply-To: <20250806115224.GB377696@ziepe.ca>
Re-sending with the list this time. Also I've fixed my email so back to
benh@kernel.crashing.org :)
1. On Wed, 2025-08-06 at 08:52 -0300, Jason Gunthorpe wrote:
>
>
> On Tue, Aug 05, 2025 at 10:19:29PM +0000, Herrenschmidt, Benjamin
> wrote:
> >
> > Right, that's my main objection here too. The way I see things is
> > that
> > we are somewhat conflating two different issues:
> >
> > * Wanting to change the index -> offset "cooking" mapping to
> > create a
> > more dense cookie space
> >
> > * Wanting to define attributes for mappings...
>
> I don't think that is right. The first is, yes, creating more dense
> cookie space which I think we need to do the second item, which is
> really 'add 2x more cookies, or maybe more'.
Hrm... the end *goal* is to define attribute for mappings. This whole
conversation is "we want to map things WC". "defining more cookies" is
a possible *how* here.
The reason I don't like presenting it that way is that you conflate
what we are trying to achieve with an implementation detail.
By defining more cookies, what you mean is creating multiple cookies
pointing to the same physical regions with different mapping
attributes, correct ? This is very similar to my sub-region idea in
practice I think unless I misunderstand.
> One of the things we want to do with more cookies is to create unique
> cookies for WC mmap requests.
Well, we want to be able to WC map. Introducing "more cookies" again is
just one way to get there. How do you create those cookies ? Upon
request or each region automatically gets multiple with different
attributes ? Do they represent entire regions or subsets ? etc...
Here too, I'd rather we start from a clear idea of the actual UAPI we
want, then we can look at how it gets plumbed under the hood.
> I'm not convinced we should try to introduce more cookies without
> improving the infrastructure. It is already pretty weak, I think
> trying to mangle the indexes or something "clever" would result in
> something even harder to maintain even if it might be less patches to
> write in the first place.
VFIO-pci already allocates dynamic indexes but yes, I don't disagree
that cleaning up the infrastructure first is a good idea. However, I
would prefer if we had a clearer idea of the end goal first,
specifically the details/shape of the UAPI and underlying structure
*before* we start hacking at this all.
> > * What I originally proposed ages ago when we discussed it at LPC
> > which is to have an ioctl to create "subregions" of a region with
> > different attributes. This means creating a new index (and a new
> > corresponding "cookie") that represents a portion of an existing
> > region
> > with a different attribute set to be later used by mmap.
>
> I never liked this, and it is still creating more cookies but now
> with
> weird hacky looking uapi.
"weird hacky looking" isn't a great way of understanding your
objections :-) Yes, it's a bit more complicated, it allows to break
down BARs basically into sub-regions with different attributes which is
fairly close to what users really want to do, but it does complexify
the UAPI a bit.
That said I'm not married to the idea, I just don't completely
understand the alternative you are proposing...
> > * A simpler approach which is to have an ioctl to change the
> > attributes over a range of a region, so that *any* mapping of that
> > range now uses that attribute. This can be done completely
> > dynamically
> > (see below).
>
> And I really, really don't like this. The meaning of a memmap cookie
> should not be changing dynamically. That is a bad precedent.
What do you mean "a cookie changing dynamically" ? The cookie doesn't
change. The attributes of a portion of a mapping change based on what
userspace requested. It is also a pretty close match to the use case
which is to define that a given part of the device MMIO space is meant
to be accessed WC. There is no fundamental change to the "cookies".
The biggest advantage of that approach is that it completely precludes
multiple conflicting mappings for a given region (at least within a
given process, though it might be possible to extend it globally if we
want) which has been a concern expressed last time we talked about this
by the folks from the ARM world.
> The uAPI I prefer is a 'get region info2' which would be simplified
> and allow userspace to pass in some flags. One of those flags can be
> 'request wc'.
So talking of "weird hacky looking" having something called "get_info"
taking "request" flags ... ahem ... :-)
If what you really mean is that under the hood, a given "index"
produces multiple "regions" (cookies) and "get_info2" allows to specify
which one you want to get info about ?
I mean ... this would work. But I don't see how it's superior to any of
the above. I don't care *that much*, but it does make it a bit harder
to avoid the multiple mappings issue and will waste much more cookie
space than any of the alternatives.
> This is logical and future extensible.
I disagree ... I find it actually cumbersome but we can just agree to
disagree here and as I said, I don't care that much as long as there is
no fundamental reason why it wouldn't work in the end.
Talking of cookie space, one thing we do need to preserve is the
natural alignment to the BAR size. Userspace *will* do "|" instead of
"+" on top of a cookie when mmap'ing (beyond mmap own alignment
requirements).
> > Now, within the context of VFIO PCI, since we use a fault handler,
> > it
> > doesn't even have to be done before mmap, we can just dynamically
> > fault
> > a given page with the right attributes (and zap mappings on
> > attributes
> > changes). I would go down that path personally and generalize the
> > fault
> > handler to all VFIO implementations.
>
> Even worse. fault by fault changing of attributes? No way, that's a
> completely crazy uAPI!
Why ? first userspace doesn't know or see it happens fault by fault
(and it could be full established at mmap time in fact, but fault time
makes the implementation a lot easier).
Here you are conflating implementation details with "uAPI" ... uAPI is
what is presented to userspace, which in this case is "set the
attributes for this portion of a region". Then at map time, that
portion of the region gets the requested attributes. There is nothing
in the uAPI that carries the fact that it happens at fault time.
You keep coming up with "ugly", "crazy" etc... without every actually
spelling out the technical pro/cons that would actually substantiate
those adjectives. Basically anything that isn't your
get_region2 is "crazy" or "ugly" ... NIH syndrome ?
> > For example, if I put a 4k WC page in a middle of index 1, then the
> > tree would have an entry for before the WC page, an entry for the
> > WC
> > page and an entry for after. They all point to region index 1, but
> > the
> > fault handler can use the maple tree to find the right attributes
> > at
> > fault time for the page.
>
> I don't know what this is about, I don't want to see APIs to slice up
> regions. The vfio driver should report full regions in the pgoff
> space, and VMAs should be linked to single maple tree ranges only. If
> userspace wants something weird it can call mmap multiple times and
> get different VMAs.
Why ? What are the pros/cons of each approach ?
> Each VMA should refer to a single vfio_mmap struct with the same
> pgprot for every page. It is easy and simple.
Maybe ... we have precedents of doing differently but I don't have big
beef in that game.
The big advantages of that latter approach is that it's very clear and
simple for userspace (this bit of the BAR is meant to be WC) and
completely
avoids the multiple mapping problem. The inconvenient is that we have
to
generalize the fault handler mechanism to all backends, and it makes
the
multi-mapping impossible (in the maybe possible case where might want
to
allow it in some cases).
Overall I find a lot of putting cart before horses here. Can we first
agree
on a clear definition of the uAPI first, then we can figure out the
implementation details ?
Cheers,
Ben.
prev parent reply other threads:[~2025-08-07 11:13 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-04 10:39 [RFC PATCH 0/9] vfio: Introduce mmap maple tree Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 1/9] vfio: add mmap maple tree to vfio Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 2/9] vfio: add transient ops to support vfio mmap mt Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 3/9] vfio-pci-core: rename vm operations Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 4/9] vfio-pci-core: remove redundant offset calculations Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 5/9] vfio-pci-core: add vfio_pci_mmap & helpers Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 6/9] vfio-pci-core: support the new vfio ops Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 7/9] vfio-pci: use " Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 8/9] vfio: UAPI for setting mmap attributes Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 9/9] vfio_pci_core: support mmap attrs uapi & WC Mahmoud Adam
2025-08-04 18:49 ` [RFC PATCH 0/9] vfio: Introduce mmap maple tree Alex Williamson
2025-08-04 20:09 ` Mahmoud Nagy Adam
2025-08-05 14:31 ` Jason Gunthorpe
2025-08-05 15:48 ` Mahmoud Nagy Adam
2025-08-05 18:50 ` [RFC " Jason Gunthorpe
2025-08-05 19:00 ` Alex Williamson
[not found] ` <80dc87730f694b2d6e6aabbd29df49cf3c7c44fb.camel@amazon.com>
[not found] ` <20250806115224.GB377696@ziepe.ca>
2025-08-07 8:12 ` Herrenschmidt, Benjamin
2025-08-07 19:06 ` Alex Williamson
2025-08-11 15:55 ` Jason Gunthorpe
2025-08-11 22:07 ` Alex Williamson
2025-08-12 0:30 ` Jason Gunthorpe
2025-08-12 19:26 ` Alex Williamson
2025-08-13 0:17 ` Jason Gunthorpe
2025-08-14 8:39 ` Mahmoud Nagy Adam
2025-08-14 9:52 ` Mahmoud Nagy Adam
2025-08-14 17:52 ` Alex Williamson
2025-08-28 8:53 ` Mahmoud Nagy Adam
2025-08-28 19:17 ` Alex Williamson
2025-08-07 8:13 ` Benjamin Herrenschmidt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=431011ca82cd7db46a704b43f122f845af0d26f1.camel@kernel.crashing.org \
--to=benh@kernel.crashing.org \
--cc=kvm@vger.kernel.org \
--cc=mngyadam@amazon.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).