From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
David Hildenbrand <david@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
Elliot Berman <quic_eberman@quicinc.com>,
Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
maz@kernel.org, kvm@vger.kernel.org,
linux-arm-msm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
pbonzini@redhat.com
Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning
Date: Thu, 20 Jun 2024 08:37:03 -0700 [thread overview]
Message-ID: <ZnRMn1ObU8TFrms3@google.com> (raw)
In-Reply-To: <CA+EHjTz_=J+bDpqciaMnNja4uz1Njcpg5NVh_GW2tya-suA7kQ@mail.gmail.com>
On Wed, Jun 19, 2024, Fuad Tabba wrote:
> Hi Jason,
>
> On Wed, Jun 19, 2024 at 12:51 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote:
> >
> > > To be honest, personally (speaking only for myself, not necessarily
> > > for Elliot and not for anyone else in the pKVM team), I still would
> > > prefer to use guest_memfd(). I think that having one solution for
> > > confidential computing that rules them all would be best. But we do
> > > need to be able to share memory in place, have a plan for supporting
> > > huge pages in the near future, and migration in the not-too-distant
> > > future.
> >
> > I think using a FD to control this special lifetime stuff is
> > dramatically better than trying to force the MM to do it with struct
> > page hacks.
> >
> > If you can't agree with the guest_memfd people on how to get there
> > then maybe you need a guest_memfd2 for this slightly different special
> > stuff instead of intruding on the core mm so much. (though that would
> > be sad)
> >
> > We really need to be thinking more about containing these special
> > things and not just sprinkling them everywhere.
>
> I agree that we need to agree :) This discussion has been going on
> since before LPC last year, and the consensus from the guest_memfd()
> folks (if I understood it correctly) is that guest_memfd() is what it
> is: designed for a specific type of confidential computing, in the
> style of TDX and CCA perhaps, and that it cannot (or will not) perform
> the role of being a general solution for all confidential computing.
That isn't remotely accurate. I have stated multiple times that I want guest_memfd
to be a vehicle for all VM types, i.e. not just CoCo VMs, and most definitely not
just TDX/SNP/CCA VMs.
What I am staunchly against is piling features onto guest_memfd that will cause
it to eventually become virtually indistinguishable from any other file-based
backing store. I.e. while I want to make guest_memfd usable for all VM *types*,
making guest_memfd the preferred backing store for all *VMs* and use cases is
very much a non-goal.
From an earlier conversation[1]:
: In other words, ditch the complexity for features that are well served by existing
: general purpose solutions, so that guest_memfd can take on a bit of complexity to
: serve use cases that are unique to KVM guests, without becoming an unmaintainble
: mess due to cross-products.
> > > Also, since pin is already overloading the refcount, having the
> > > exclusive pin there helps in ensuring atomic accesses and avoiding
> > > races.
> >
> > Yeah, but every time someone does this and then links it to a uAPI it
> > becomes utterly baked in concrete for the MM forever.
>
> I agree. But if we can't modify guest_memfd() to fit our needs (pKVM,
> Gunyah), then we don't really have that many other options.
What _are_ your needs? There are multiple unanswered questions from our last
conversation[2]. And by "needs" I don't mean "what changes do you want to make
to guest_memfd?", I mean "what are the use cases, patterns, and scenarios that
you want to support?".
: What's "hypervisor-assisted page migration"? More specifically, what's the
: mechanism that drives it?
: Do you happen to have a list of exactly what you mean by "normal mm stuff"? I
: am not at all opposed to supporting .mmap(), because long term I also want to
: use guest_memfd for non-CoCo VMs. But I want to be very conservative with respect
: to what is allowed for guest_memfd. E.g. host userspace can map guest_memfd,
: and do operations that are directly related to its mapping, but that's about it.
That distinction matters, because as I have stated in that thread, I am not
opposed to page migration itself:
: I am not opposed to page migration itself, what I am opposed to is adding deep
: integration with core MM to do some of the fancy/complex things that lead to page
: migration.
I am generally aware of the core pKVM use cases, but I AFAIK I haven't seen a
complete picture of everything you want to do, and _why_.
E.g. if one of your requirements is that guest memory is managed by core-mm the
same as all other memory in the system, then yeah, guest_memfd isn't for you.
Integrating guest_memfd deeply into core-mm simply isn't realistic, at least not
without *massive* changes to core-mm, as the whole point of guest_memfd is that
it is guest-first memory, i.e. it is NOT memory that is managed by core-mm (primary
MMU) and optionally mapped into KVM (secondary MMU).
Again from that thread, one of most important aspects guest_memfd is that VMAs
are not required. Stating the obvious, lack of VMAs makes it really hard to drive
swap, reclaim, migration, etc. from code that fundamentally operates on VMAs.
: More broadly, no VMAs are required. The lack of stage-1 page tables are nice to
: have; the lack of VMAs means that guest_memfd isn't playing second fiddle, e.g.
: it's not subject to VMA protections, isn't restricted to host mapping size, etc.
[1] https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com
[2] https://lore.kernel.org/all/Zg3xF7dTtx6hbmZj@google.com
next prev parent reply other threads:[~2024-06-20 15:37 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-19 0:05 [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 1/5] mm/gup: Move GUP_PIN_COUNTING_BIAS to page_ref.h Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 2/5] mm/gup: Add an option for obtaining an exclusive pin Elliot Berman
2024-06-19 22:40 ` kernel test robot
2024-06-19 0:05 ` [PATCH RFC 3/5] mm/gup: Add support for re-pinning a normal pinned page as exclusive Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 4/5] mm/gup-test: Verify exclusive pinned Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 5/5] mm/gup_test: Verify GUP grabs same pages twice Elliot Berman
2024-06-19 0:11 ` [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning Elliot Berman
2024-06-19 2:44 ` John Hubbard
2024-06-19 7:37 ` David Hildenbrand
2024-06-19 9:11 ` Fuad Tabba
2024-06-19 11:51 ` Jason Gunthorpe
2024-06-19 12:01 ` Fuad Tabba
2024-06-19 12:42 ` Jason Gunthorpe
2024-06-20 15:37 ` Sean Christopherson [this message]
2024-06-21 8:23 ` Fuad Tabba
2024-06-21 8:43 ` David Hildenbrand
2024-06-21 8:54 ` Fuad Tabba
2024-06-21 9:10 ` David Hildenbrand
2024-06-21 10:16 ` Fuad Tabba
2024-06-21 16:54 ` Elliot Berman
2024-06-24 19:03 ` Sean Christopherson
2024-06-24 21:50 ` David Rientjes
2024-06-26 3:19 ` Vishal Annapurve
2024-06-26 5:20 ` Pankaj Gupta
2024-06-19 12:17 ` David Hildenbrand
2024-06-20 4:11 ` Christoph Hellwig
2024-06-20 8:32 ` Fuad Tabba
2024-06-20 13:55 ` Jason Gunthorpe
2024-06-20 14:01 ` David Hildenbrand
2024-06-20 14:29 ` Jason Gunthorpe
2024-06-20 14:45 ` David Hildenbrand
2024-06-20 16:04 ` Sean Christopherson
2024-06-20 18:56 ` David Hildenbrand
2024-06-20 16:36 ` Jason Gunthorpe
2024-06-20 18:53 ` David Hildenbrand
2024-06-20 20:30 ` Sean Christopherson
2024-06-20 20:47 ` David Hildenbrand
2024-06-20 22:32 ` Sean Christopherson
2024-06-20 23:00 ` Jason Gunthorpe
2024-06-20 23:11 ` Jason Gunthorpe
2024-06-20 23:54 ` Sean Christopherson
2024-06-21 7:43 ` David Hildenbrand
2024-06-21 12:39 ` Jason Gunthorpe
2024-06-20 23:08 ` Jason Gunthorpe
2024-06-20 22:47 ` Elliot Berman
2024-06-20 23:18 ` Jason Gunthorpe
2024-06-21 7:32 ` Quentin Perret
2024-06-21 8:02 ` David Hildenbrand
2024-06-21 9:25 ` Quentin Perret
2024-06-21 9:37 ` David Hildenbrand
2024-06-21 16:48 ` Elliot Berman
2024-06-21 12:26 ` Jason Gunthorpe
2024-06-19 12:16 ` David Hildenbrand
2024-06-20 8:47 ` Fuad Tabba
2024-06-20 9:00 ` David Hildenbrand
2024-06-20 14:01 ` Jason Gunthorpe
2024-06-20 13:08 ` Mostafa Saleh
2024-06-20 14:14 ` David Hildenbrand
2024-06-20 14:34 ` Jason Gunthorpe
2024-08-02 8:26 ` Tian, Kevin
2024-08-02 11:22 ` Jason Gunthorpe
2024-08-05 2:24 ` Tian, Kevin
2024-08-05 23:22 ` Jason Gunthorpe
2024-08-06 0:50 ` Tian, Kevin
2024-06-20 16:33 ` Mostafa Saleh
2024-07-12 23:29 ` Ackerley Tng
2024-07-16 16:03 ` Sean Christopherson
2024-07-16 16:08 ` Jason Gunthorpe
2024-07-16 17:34 ` Sean Christopherson
2024-07-16 20:11 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnRMn1ObU8TFrms3@google.com \
--to=seanjc@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maz@kernel.org \
--cc=pbonzini@redhat.com \
--cc=quic_eberman@quicinc.com \
--cc=shuah@kernel.org \
--cc=tabba@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.