Re: [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest_memfd/HugeTLB postcopy)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Peter Xu <peterx@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Oliver Upton <oliver.upton@linux.dev>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 David Matlack <dmatlack@google.com>,
	Anish Moorthy <amoorthy@google.com>
Subject: Re: [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest_memfd/HugeTLB postcopy)
Date: Wed, 7 Aug 2024 17:17:45 -0700	[thread overview]
Message-ID: <ZrQOqVsyEulBt7S9@google.com> (raw)
In-Reply-To: <CADrL8HXVNcbcuu9qF3wtkccpW6_QEnXQ1ViWEceeS9QGdQUTiw@mail.gmail.com>

On Wed, Aug 07, 2024, James Houghton wrote:
> On Thu, Aug 1, 2024 at 3:44 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Early warning for next week's PUCK since there's actually a topic this time.
> > James is going to lead a discussion on KVM userfault[*](name subject to change).
> 
> Thanks for attending, everyone!
> 
> We seemed to arrive at the following conclusions:
> 
> 1. For guest_memfd, stage 2 mapping installation will never go through
> GUP / virtual addresses to do the GFN --> PFN translation, including
> when it supports non-private memory.
> 2. Something like KVM Userfault is indeed necessary to handle
> post-copy for guest_memfd VMs, especially when guest_memfd supports
> non-private memory.
> 3. We should not hook into the overall GFN --> HVA translation, we
> should only be hooking the GFN --> PFN translation steps to figure out
> how to create stage 2 mappings. That is, KVM's own accesses to guest
> memory should just go through mm/userfaultfd.
> 4. We don't need the concept of "async userfaults" (making KVM block
> when attempting to access userfault memory) in KVM Userfault.
> 
> So I need to think more about what exactly the API should look like
> for controlling if a page should exit to userspace before KVM is
> allowed to map it into stage 2 and if this should apply to all of
> guest memory or only guest_memfd.
> 
> It sounds like it may most likely be something like a per-VM bitmap
> that describes which pages are allowed to be mapped into stage 2,
> applying to all memory, not just guest_memfd memory. Even though it is
> solving a problem for guest_memfd specifically, it is slightly cleaner
> to have it apply to all memory.
> 
> If this per-VM bitmap applies to all memory, then we don't need to
> wait for guest_memfd to support non-private memory before working on a
> full implementation. But if not, perhaps it makes sense to wait.

Per-memslot likely makes more sense.  Unlike attributes, the bitmap only needs
to exist during post-copy, and unless we do something clever, i.e. use something
other than a bitmap, the bitmap needs to be fully allocated, which would result
in unnecessary overhead if there are gaps in guest physical memory.

The other hiccup with a per-VM bitmap is that it would force us to define ABI
for things we don't care about.  E.g. what happens if the local APIC is in-kernel
and userspace marks the APIC page as USERFAULT?  Ditto for gfns without memslots.

E.g. add a KVM_MEM_USERFAULT flag along with a userfault_bitmap user pointer
that is valid when the flag is set.  Unlike dirty logging, KVM is only a reader
of the bitmap, so I'm pretty sure we don't need a copy in KVM.

When userspace creates the VM on the target, it allocates a bitmap for each
memslot and sets KVM_MEM_USERFAULT.  When migration completes, userspace clears
KVM_MEM_USERFAULT for each memslot, and then deletes the associated bitmap.

next prev parent reply	other threads:[~2024-08-08  0:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-01 22:43 [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest_memfd/HugeTLB postcopy) Sean Christopherson
2024-08-07 17:21 ` James Houghton
2024-08-08  0:17   ` Sean Christopherson [this message]
2024-08-08 12:15   ` Wang, Wei W
2024-08-08 19:04     ` James Houghton
2024-08-09 13:51       ` Wang, Wei W
2024-08-09 19:04         ` Sean Christopherson
2024-08-12 14:12           ` Wang, Wei W
2024-08-12 15:24             ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrQOqVsyEulBt7S9@google.com \
    --to=seanjc@google.com \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.