Re: [PATCH v2 00/13] KVM: Introduce KVM Userfault

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	 Oliver Upton <oliver.upton@linux.dev>,
	Yan Zhao <yan.y.zhao@intel.com>,
	 Nikita Kalyazin <kalyazin@amazon.com>,
	Anish Moorthy <amoorthy@google.com>,
	 Peter Gonda <pgonda@google.com>, Peter Xu <peterx@redhat.com>,
	 David Matlack <dmatlack@google.com>,
	wei.w.wang@intel.com, kvm@vger.kernel.org,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	 Jiaqi Yan <jiaqiyan@google.com>
Subject: Re: [PATCH v2 00/13] KVM: Introduce KVM Userfault
Date: Thu, 29 May 2025 08:28:26 -0700	[thread overview]
Message-ID: <aDh9GtncjlVvvVJ1@google.com> (raw)
In-Reply-To: <CADrL8HXjLjVyFiFee9Q58TQ9zBfXiO+VG=25Rw4UD+fbDmxQFg@mail.gmail.com>

On Wed, May 28, 2025, James Houghton wrote:
> The only thing that I want to call out again is that this UAPI works
> great for when we are going from userfault --> !userfault. That is, it
> works well for postcopy (both for guest_memfd and for standard
> memslots where userfaultfd scalability is a concern).
> 
> But there is another use case worth bringing up: unmapping pages that
> the VMM is emulating as poisoned.
> 
> Normally this can be handled by mm (e.g. with UFFDIO_POISON), but for
> 4K poison within a HugeTLB-backed memslot (if the HugeTLB page remains
> mapped in userspace), KVM Userfault is the only option (if we don't
> want to punch holes in memslots). This leaves us with three problems:
> 
> 1. If using KVM Userfault to emulate poison, we are stuck with small
> pages in stage 2 for the entire memslot.
> 2. We must unmap everything when toggling on KVM Userfault just to
> unmap a single page.
> 3. If KVM Userfault is already enabled, we have no choice but to
> toggle KVM Userfault off and on again to unmap the newly poisoned
> pages (i.e., there is no ioctl to scan the bitmap and unmap
> newly-userfault pages).
> 
> All of these are non-issues if we emulate poison by removing memslots,
> and I think that's possible. But if that proves too slow, we'd need to
> be a little bit more clever with hugepage recovery and with unmapping
> newly-userfault pages, both of which I think can be solved by adding
> some kind of bitmap re-scan ioctl. We can do that later if the need
> arises.

Hmm.

On the one hand, punching a hole in a memslot is generally gross, e.g. requires
deleting the entire memslot and thus unmapping large swaths of guest memory (or
all of guest memory for most x86 VMs).

On the other hand, unless userspace sets KVM_MEM_USERFAULT from time zero, KVM
will need to unmap guest memory (or demote the mapping size a la eager page
splitting?) when KVM_MEM_USERFAULT is toggled from 0=>1.

One thought would be to change the behavior of KVM's processing of the userfault
bitmap, such that KVM doesn't infer *anything* about the mapping sizes, and instead
give userspace more explicit control over the mapping size.  However, on non-x86
architectures, implementing such a control would require a non-trivial amount of
code and complexity, and would incur overhead that doesn't exist today (i.e. we'd
need to implement equivalent infrastructure to x86's disallow_lpage tracking).

And IIUC, another problem with KVM Userfault is that it wouldn't Just Work for
KVM accesses to guest memory.  E.g. if the HugeTLB page is still mapped into
userspace, then depending on the flow that gets hit, I'm pretty sure that emulating
an access to the poisoned memory would result in KVM_EXIT_INTERNAL_ERROR, whereas
punching a hole in a memslot would result in a much more friendly KVM_EXIT_MMIO.

All in all, given that KVM needs to correctly handle hugepage vs. memslot
alignment/size issues no matter what, and that KVM has well-established behavior
for handling no-memslot accesses, I'm leaning towards saying userspace should
punch a hole in the memslot in order to emulate a poisoned page.  The only reason
I can think of for preferring a different approach is if userspace can't provide
the desired latency/performance characteristics when punching a hole in a memslot.
Hopefully reacting to a poisoned page is a fairly slow path?

next prev parent reply	other threads:[~2025-05-29 15:46 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 20:49 [PATCH v2 00/13] KVM: Introduce KVM Userfault James Houghton
2025-01-09 20:49 ` [PATCH v2 01/13] KVM: Add KVM_MEM_USERFAULT memslot flag and bitmap James Houghton
2025-05-07  0:01   ` Sean Christopherson
2025-05-28 15:21     ` James Houghton
2025-01-09 20:49 ` [PATCH v2 02/13] KVM: Add KVM_MEMORY_EXIT_FLAG_USERFAULT James Houghton
2025-01-09 20:49 ` [PATCH v2 03/13] KVM: Allow late setting of KVM_MEM_USERFAULT on guest_memfd memslot James Houghton
2025-05-07  0:03   ` Sean Christopherson
2025-01-09 20:49 ` [PATCH v2 04/13] KVM: Advertise KVM_CAP_USERFAULT in KVM_CHECK_EXTENSION James Houghton
2025-01-09 20:49 ` [PATCH v2 05/13] KVM: x86/mmu: Add support for KVM_MEM_USERFAULT James Houghton
2025-05-07  0:05   ` Sean Christopherson
2025-05-28 20:21     ` Oliver Upton
2025-05-28 21:22       ` Sean Christopherson
2025-05-29 14:56         ` Sean Christopherson
2025-05-29 15:37           ` James Houghton
2025-01-09 20:49 ` [PATCH v2 06/13] KVM: arm64: " James Houghton
2025-05-07  0:06   ` Sean Christopherson
2025-05-28 15:09     ` James Houghton
2025-05-28 15:25       ` James Houghton
2025-05-28 17:30         ` Sean Christopherson
2025-05-28 20:17           ` James Houghton
2025-05-28 23:25             ` Sean Christopherson
2025-06-09 23:04               ` James Houghton
2025-01-09 20:49 ` [PATCH v2 07/13] KVM: selftests: Fix vm_mem_region_set_flags docstring James Houghton
2025-01-09 20:49 ` [PATCH v2 08/13] KVM: selftests: Fix prefault_mem logic James Houghton
2025-01-09 20:49 ` [PATCH v2 09/13] KVM: selftests: Add va_start/end into uffd_desc James Houghton
2025-01-09 20:49 ` [PATCH v2 10/13] KVM: selftests: Add KVM Userfault mode to demand_paging_test James Houghton
2025-01-09 20:49 ` [PATCH v2 11/13] KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT James Houghton
2025-01-09 20:49 ` [PATCH v2 12/13] KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests James Houghton
2025-01-09 20:49 ` [PATCH v2 13/13] KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT details James Houghton
2025-05-06 23:48 ` [PATCH v2 00/13] KVM: Introduce KVM Userfault Sean Christopherson
2025-05-07  0:13 ` Sean Christopherson
2025-05-28 15:48   ` James Houghton
2025-05-29 15:28     ` Sean Christopherson [this message]
2025-05-29 16:17       ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDh9GtncjlVvvVJ1@google.com \
    --to=seanjc@google.com \
    --cc=amoorthy@google.com \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kalyazin@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pgonda@google.com \
    --cc=wei.w.wang@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).