kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Kumar, Praveen" <pravkmr@amazon.de>,
	"Adam, Mahmoud" <mngyadam@amazon.de>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"nagy@khwaternagy.com" <nagy@khwaternagy.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>
Subject: Re: [RFC PATCH 0/9] vfio: Introduce mmap maple tree
Date: Thu, 7 Aug 2025 13:06:05 -0600	[thread overview]
Message-ID: <20250807130605.644ac9f6.alex.williamson@redhat.com> (raw)
In-Reply-To: <cec694f109f705ab9e20c2641c1558aa19bcb25b.camel@amazon.com>

On Thu, 7 Aug 2025 08:12:04 +0000
"Herrenschmidt, Benjamin" <benh@amazon.com> wrote:

> Re-sending with the list this time. Also I've fixed my email so back to
> benh@kernel.crashing.org :)

Replacing the To: in this reply as well, hopefully it reaches you.

>    1. On Wed, 2025-08-06 at 08:52 -0300, Jason Gunthorpe wrote:
> > 
> > 
> > On Tue, Aug 05, 2025 at 10:19:29PM +0000, Herrenschmidt, Benjamin
> > wrote:  
> > > 
> > > Right, that's my main objection here too. The way I see things is
> > > that
> > > we are somewhat conflating two different issues:
> > > 
> > >  * Wanting to change the index -> offset "cooking" mapping to
> > > create a
> > > more dense cookie space
> > > 
> > >  * Wanting to define attributes for mappings...  
> > 
> > I don't think that is right. The first is, yes, creating more dense
> > cookie space which I think we need to do the second item, which is
> > really 'add 2x more cookies, or maybe more'.  
> 
> Hrm... the end *goal* is to define attribute for mappings. This whole
> conversation is "we want to map things WC". "defining more cookies" is
> a possible *how* here.
> 
> The reason I don't like presenting it that way is that you conflate
> what we are trying to achieve with an implementation detail.
> 
> By defining more cookies, what you mean is creating multiple cookies
> pointing to the same physical regions with different mapping
> attributes, correct ? This is very similar to my sub-region idea in
> practice I think unless I misunderstand.

I agree and I think this is likely heading down the path of rehashing
the previous thread

https://lore.kernel.org/kvm/20240801141914.GC3030761@ziepe.ca/

I honestly still don't have a firm idea how "mmap cookies" are
different from some sort of extended region ID space, whether it's
sub-regions, device specific regions, or just dynamic regions.  I think
there is an underlying desire to say, for example, that "region 0" is
BAR0, period, full stop, but there should be a way to get multiple mmap
cookies to region 0 with different mmap semantics.  Of course
REGION_INFO provides the default mmap cookie via the offset field, but
since we don't want to dynamically create region N+1 (under this
interpretation) as an alias of region 0 with different mmap semantics,
now we're stuck in a conundrum of how to describe different mmap cookies
for one region.

So to a large extent I think we're causing our own grief here and I do
agree that if we created an API that allows new regions to be created
as lightweight (mmap-only) aliases of other regions, then we could just
re-use REGION_INFO with the new region index, get the offset/mmap
cookie, return -ENOSPC when we run out of indexes, and implement a
maple tree to get a more compact region space as a follow-on.

> > One of the things we want to do with more cookies is to create unique
> > cookies for WC mmap requests.  
> 
> Well, we want to be able to WC map. Introducing "more cookies" again is
> just one way to get there. How do you create those cookies ? Upon
> request or each region automatically gets multiple with different
> attributes ? Do they represent entire regions or subsets ? etc... 
> 
> Here too, I'd rather we start from a clear idea of the actual UAPI we
> want, then we can look at how it gets plumbed under the hood.
> 
> > I'm not convinced we should try to introduce more cookies without
> > improving the infrastructure. It is already pretty weak, I think
> > trying to mangle the indexes or something "clever" would result in
> > something even harder to maintain even if it might be less patches to
> > write in the first place.  
> 
> VFIO-pci already allocates dynamic indexes but yes, I don't disagree
> that cleaning up the infrastructure first is a good idea. However, I
> would prefer if we had a clearer idea of the end goal first,
> specifically the details/shape of the UAPI and underlying structure
> *before* we start hacking at this all.

Yes, the UAPI here is shadowed by a maple tree conversion that isn't
really required by the UAPI.  We need to lead with a good UAPI.

> > >  * What I originally proposed ages ago when we discussed it at LPC
> > > which is to have an ioctl to create "subregions" of a region with
> > > different attributes. This means creating a new index (and a new
> > > corresponding "cookie") that represents a portion of an existing
> > > region
> > > with a different attribute set to be later used by mmap.  
> > 
> > I never liked this, and it is still creating more cookies but now
> > with
> > weird hacky looking uapi.  
> 
> "weird hacky looking" isn't a great way of understanding your
> objections :-) Yes, it's a bit more complicated, it allows to break
> down BARs basically into sub-regions with different attributes which is
> fairly close to what users really want to do, but it does complexify
> the UAPI a bit.
> 
> That said I'm not married to the idea, I just don't completely
> understand the alternative you are proposing... 

Obviously Jason can correct if the interpretation I gleaned from the
previous thread above is incorrect, but I don't really see that new
region indexes are weird or hacky.  They fit into the REGION_INFO
scheme, they could be managed via DEVICE_FEATURE.

> > >  * A simpler approach which is to have an ioctl to change the
> > > attributes over a range of a region, so that *any* mapping of that
> > > range now uses that attribute. This can be done completely
> > > dynamically
> > > (see below).  
> > 
> > And I really, really don't like this. The meaning of a memmap cookie
> > should not be changing dynamically. That is a bad precedent.  
> 
> What do you mean "a cookie changing dynamically" ? The cookie doesn't
> change. The attributes of a portion of a mapping change based on what
> userspace requested. It is also a pretty close match to the use case
> which is to define that a given part of the device MMIO space is meant
> to be accessed WC. There is no fundamental change to the "cookies".
> 
> The biggest advantage of that approach is that it completely precludes
> multiple conflicting mappings for a given region (at least within a
> given process, though it might be possible to extend it globally if we
> want) which has been a concern expressed last time we talked about this
> by the folks from the ARM world.

This precludes that a user could simultaneously have mappings to the
same device memory with different mmap attributes, but otherwise it
honestly sounds like the most consistent mechanism if we don't want to
expand the region index space.

> > The uAPI I prefer is a 'get region info2' which would be simplified
> > and allow userspace to pass in some flags. One of those flags can be
> > 'request wc'.  
> 
> So talking of "weird hacky looking" having something called "get_info"
> taking "request" flags ... ahem ... :-)

Yup...

> If what you really mean is that under the hood, a given "index"
> produces multiple "regions" (cookies) and "get_info2" allows to specify
> which one you want to get info about ?
> 
> I mean ... this would work. But I don't see how it's superior to any of
> the above. I don't care *that much*, but it does make it a bit harder
> to avoid the multiple mappings issue and will waste much more cookie
> space than any of the alternatives.
> 
> > This is logical and future extensible.  
> 
> I disagree ... I find it actually cumbersome but we can just agree to
> disagree here and as I said, I don't care that much as long as there is
> no fundamental reason why it wouldn't work in the end.

I'm not a fan of the REGION_INFO2 idea either.  A new region index that
provides an alias to another region with different mmap semantics is
much more intuitive in our existing UAPI, imo.

> Talking of cookie space, one thing we do need to preserve is the
> natural alignment to the BAR size. Userspace *will* do "|" instead of
> "+" on top of a cookie when mmap'ing (beyond mmap own alignment
> requirements).
> 
> > > Now, within the context of VFIO PCI, since we use a fault handler,
> > > it
> > > doesn't even have to be done before mmap, we can just dynamically
> > > fault
> > > a given page with the right attributes (and zap mappings on
> > > attributes
> > > changes). I would go down that path personally and generalize the
> > > fault
> > > handler to all VFIO implementations.  
> > 
> > Even worse. fault by fault changing of attributes? No way, that's a
> > completely crazy uAPI!  
> 
> Why ? first userspace doesn't know or see it happens fault by fault
> (and it could be full established at mmap time in fact, but fault time
> makes the implementation a lot easier).
> 
> Here you are conflating implementation details with "uAPI" ... uAPI is
> what is presented to userspace, which in this case is "set the
> attributes for this portion of a region". Then at map time, that
> portion of the region gets the requested attributes. There is nothing
> in the uAPI that carries the fact that it happens at fault time.
> 
> You keep coming up with "ugly", "crazy" etc... without every actually
> spelling out the technical pro/cons that would actually substantiate
> those adjectives. Basically anything that isn't your 
> get_region2 is "crazy" or "ugly" ... NIH syndrome ?
> 
> > > For example, if I put a 4k WC page in a middle of index 1, then the
> > > tree would have an entry for before the WC page, an entry for the
> > > WC
> > > page and an entry for after. They all point to region index 1, but
> > > the
> > > fault handler can use the maple tree to find the right attributes
> > > at
> > > fault time for the page.  
> > 
> > I don't know what this is about, I don't want to see APIs to slice up
> > regions. The vfio driver should report full regions in the pgoff
> > space, and VMAs should be linked to single maple tree ranges only. If
> > userspace wants something weird it can call mmap multiple times and
> > get different VMAs.  
> 
> Why ? What are the pros/cons of each approach ?
> 
> > Each VMA should refer to a single vfio_mmap struct with the same
> > pgprot for every page. It is easy and simple.  
> 
> Maybe ... we have precedents of doing differently but I don't have big
> beef in that game.
> 
> The big advantages of that latter approach is that it's very clear and
> simple for userspace (this bit of the BAR is meant to be WC) and
> completely
> avoids the multiple mapping problem. The inconvenient is that we have
> to
> generalize the fault handler mechanism to all backends, and it makes
> the
> multi-mapping impossible (in the maybe possible case where might want
> to
> allow it in some cases).
> 
> Overall I find a lot of putting cart before horses here. Can we first
> agree
> on a clear definition of the uAPI first, then we can figure out the
> implementation details ?

+1.  Thanks,

Alex


  reply	other threads:[~2025-08-07 19:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04 10:39 [RFC PATCH 0/9] vfio: Introduce mmap maple tree Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 1/9] vfio: add mmap maple tree to vfio Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 2/9] vfio: add transient ops to support vfio mmap mt Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 3/9] vfio-pci-core: rename vm operations Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 4/9] vfio-pci-core: remove redundant offset calculations Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 5/9] vfio-pci-core: add vfio_pci_mmap & helpers Mahmoud Adam
2025-08-04 10:39 ` [RFC PATCH 6/9] vfio-pci-core: support the new vfio ops Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 7/9] vfio-pci: use " Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 8/9] vfio: UAPI for setting mmap attributes Mahmoud Adam
2025-08-04 10:40 ` [RFC PATCH 9/9] vfio_pci_core: support mmap attrs uapi & WC Mahmoud Adam
2025-08-04 18:49 ` [RFC PATCH 0/9] vfio: Introduce mmap maple tree Alex Williamson
2025-08-04 20:09   ` Mahmoud Nagy Adam
2025-08-05 14:31     ` Jason Gunthorpe
2025-08-05 15:48       ` Mahmoud Nagy Adam
2025-08-05 18:50         ` [RFC " Jason Gunthorpe
2025-08-05 19:00         ` Alex Williamson
     [not found]           ` <80dc87730f694b2d6e6aabbd29df49cf3c7c44fb.camel@amazon.com>
     [not found]             ` <20250806115224.GB377696@ziepe.ca>
2025-08-07  8:12               ` Herrenschmidt, Benjamin
2025-08-07 19:06                 ` Alex Williamson [this message]
2025-08-11 15:55                   ` Jason Gunthorpe
2025-08-11 22:07                     ` Alex Williamson
2025-08-12  0:30                       ` Jason Gunthorpe
2025-08-12 19:26                         ` Alex Williamson
2025-08-13  0:17                           ` Jason Gunthorpe
2025-08-14  8:39                       ` Mahmoud Nagy Adam
2025-08-14  9:52                       ` Mahmoud Nagy Adam
2025-08-14 17:52                         ` Alex Williamson
2025-08-28  8:53                           ` Mahmoud Nagy Adam
2025-08-28 19:17                             ` Alex Williamson
2025-08-07  8:13               ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250807130605.644ac9f6.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=dwmw@amazon.co.uk \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=mngyadam@amazon.de \
    --cc=nagy@khwaternagy.com \
    --cc=pravkmr@amazon.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).