Re: [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest_memfd/HugeTLB postcopy)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Peter Xu <peterx@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Oliver Upton <oliver.upton@linux.dev>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 David Matlack <dmatlack@google.com>,
	Anish Moorthy <amoorthy@google.com>
Subject: Re: [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest_memfd/HugeTLB postcopy)
Date: Wed, 7 Aug 2024 17:17:45 -0700	[thread overview]
Message-ID: <ZrQOqVsyEulBt7S9@google.com> (raw)
In-Reply-To: <CADrL8HXVNcbcuu9qF3wtkccpW6_QEnXQ1ViWEceeS9QGdQUTiw@mail.gmail.com>

On Wed, Aug 07, 2024, James Houghton wrote:
> On Thu, Aug 1, 2024 at 3:44 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Early warning for next week's PUCK since there's actually a topic this time.
> > James is going to lead a discussion on KVM userfault[*](name subject to change).
> 
> Thanks for attending, everyone!
> 
> We seemed to arrive at the following conclusions:
> 
> 1. For guest_memfd, stage 2 mapping installation will never go through
> GUP / virtual addresses to do the GFN --> PFN translation, including
> when it supports non-private memory.
> 2. Something like KVM Userfault is indeed necessary to handle
> post-copy for guest_memfd VMs, especially when guest_memfd supports
> non-private memory.
> 3. We should not hook into the overall GFN --> HVA translation, we
> should only be hooking the GFN --> PFN translation steps to figure out
> how to create stage 2 mappings. That is, KVM's own accesses to guest
> memory should just go through mm/userfaultfd.
> 4. We don't need the concept of "async userfaults" (making KVM block
> when attempting to access userfault memory) in KVM Userfault.
> 
> So I need to think more about what exactly the API should look like
> for controlling if a page should exit to userspace before KVM is
> allowed to map it into stage 2 and if this should apply to all of
> guest memory or only guest_memfd.
> 
> It sounds like it may most likely be something like a per-VM bitmap
> that describes which pages are allowed to be mapped into stage 2,
> applying to all memory, not just guest_memfd memory. Even though it is
> solving a problem for guest_memfd specifically, it is slightly cleaner
> to have it apply to all memory.
> 
> If this per-VM bitmap applies to all memory, then we don't need to
> wait for guest_memfd to support non-private memory before working on a
> full implementation. But if not, perhaps it makes sense to wait.

Per-memslot likely makes more sense.  Unlike attributes, the bitmap only needs
to exist during post-copy, and unless we do something clever, i.e. use something
other than a bitmap, the bitmap needs to be fully allocated, which would result
in unnecessary overhead if there are gaps in guest physical memory.

The other hiccup with a per-VM bitmap is that it would force us to define ABI
for things we don't care about.  E.g. what happens if the local APIC is in-kernel
and userspace marks the APIC page as USERFAULT?  Ditto for gfns without memslots.

E.g. add a KVM_MEM_USERFAULT flag along with a userfault_bitmap user pointer
that is valid when the flag is set.  Unlike dirty logging, KVM is only a reader
of the bitmap, so I'm pretty sure we don't need a copy in KVM.

When userspace creates the VM on the target, it allocates a bitmap for each
memslot and sets KVM_MEM_USERFAULT.  When migration completes, userspace clears
KVM_MEM_USERFAULT for each memslot, and then deletes the associated bitmap.

next prev parent reply	other threads:[~2024-08-08  0:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-01 22:43 [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest_memfd/HugeTLB postcopy) Sean Christopherson
2024-08-07 17:21 ` James Houghton
2024-08-08  0:17   ` Sean Christopherson [this message]
2024-08-08 12:15   ` Wang, Wei W
2024-08-08 19:04     ` James Houghton
2024-08-09 13:51       ` Wang, Wei W
2024-08-09 19:04         ` Sean Christopherson
2024-08-12 14:12           ` Wang, Wei W
2024-08-12 15:24             ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrQOqVsyEulBt7S9@google.com \
    --to=seanjc@google.com \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox