Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Sean Christopherson <seanjc@google.com>
Cc: sashiko-reviews@lists.linux.dev, Marc Zyngier <maz@kernel.org>,
	kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	Oliver Upton <oupton@kernel.org>
Subject: Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
Date: Thu, 18 Jun 2026 11:19:56 +0100	[thread overview]
Message-ID: <ajPGTF8p_wFfogn_@raptor> (raw)
In-Reply-To: <ajMPvgA2eHQCj-B0@google.com>

Hi Sean,

On Wed, Jun 17, 2026 at 02:21:02PM -0700, Sean Christopherson wrote:
> On Wed, Jun 17, 2026, Alexandru Elisei wrote:
> > Hi Sean,
> > 
> > Thanks for the reply. Just to make sure, once the bugs have been iron out,
> > you're ok with this idea?
> > 
> > On Mon, Jun 15, 2026 at 10:47:14AM -0700, Sean Christopherson wrote:
> > > On Mon, Jun 15, 2026, sashiko-bot@kernel.org wrote:
> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > > --- a/virt/kvm/kvm_main.c
> > > > > +++ b/virt/kvm/kvm_main.c
> > > > [ ... ]
> > > > > @@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
> > > > >  			unsigned long hva_start, hva_end;
> > > > >  
> > > > >  			slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
> > > > > +
> > > > > +			if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot))
> > > > > +				continue;
> > > > > +
> > > > 
> > > > [Severity: Critical]
> > > > Does this change inadvertently introduce a regression in the pfncache retry
> > > > protocol?
> > > > 
> > > > Looking at the pfncache framework, it maps guest memory into kernel space and
> > > > explicitly drops the page reference after mapping it:
> > > > 
> > > > virt/kvm/pfncache.c:hva_to_pfn_retry() {
> > > >     ...
> > > >     kvm_release_page_clean(page);
> > > >     ...
> > > > }
> > > > 
> > > > It appears to rely entirely on KVM's MMU notifiers (kvm->mmu_invalidate_seq)
> > > > to invalidate the cache when the page is unmapped by the host.
> > > > 
> > > > If a VMM defines a guest_memfd-backed memslot with KVM_MEMSLOT_GMEM_ONLY
> > > > but still provides a valid anonymous user mapping as its userspace_addr,
> > > > could this regression lead to a use-after-free?
> > > 
> > > Sadly, yes.  To land this, we would need to first teach the gfn_to_pfn_cache code
> > > to be able to pull directly from guest_memfd.  I forget if anyone is working on
> > > that.
> > 
> > I've been trying to wrap my head around this, and I just can't seem to
> > figure it out.
> > 
> > kvm_mmu_notifier_invalidate_range_start(), before handle_hva_range(), calls
> > gfn_to_pfn_cache_invalidate_start() for the MMU notifier range, and that
> > marks all caches that overlap the range as invalid. kvm_gpc_check() returns
> > false for an invalid cache, so how can the memory still be accessed via the
> > pfncache?
> 
> That just forces gpcs to be refreshed, mmu_notifier_retry_cache() still relies
> on mmu_invalidate_seq being bumped to avoid consuming stale state.

Yes.

> 
> > > > By unmapping the anonymous memory, the host would trigger MMU notifiers, but
> > > > this new check skips the memslot. As a result, kvm->mmu_invalidate_seq
> > > > wouldn't increment, and KVM might retain a kernel mapping to a freed physical
> > > > page.
> > 
> > kvm->mmu_invalidate_seq is incremented in kvm_mmu_invalidate_end(), I don't see
> > how that is affected by skipping a memslot in handle_hva_range().
> 
> handle_hva_range() only invokes on_lock() if a memslot is found.  By skipping the
> memslot entirely, kvm_mmu_invalidate_{start,end}() won't be called and so
> mmu_invalidate_seq won't be bumped.

I see it now, for some reason I completely missed the part where
kvm_mmu_invalidate_{begin,end}() is called on ->lock() :( I was under the
impression that they are called directly from the
->invalidate_range_{start,end}() MMU notifier callbacks.

> 
> > > > Could this allow the guest to read or write arbitrary host physical memory?
> > 
> > The KVM_MEMSLOT_GMEM_ONLY flag is set if the backing guest_memfd has been
> > created with GUEST_MEMFD_FLAG_MMAP. The documentation for the flag says
> > that '[..] the fault will always be consumed from guest_memfd, regardless
> > of whether it is a shared or private fault'.  As far as I can tell, this
> > means that, absent a fallocate(FALLOC_FL_PUNCH_HOLE) call, the page is
> > still in the page cache for the guest_memfd file after userspace has
> > unmapped it, so the guest will not be accessing a freed page.
> 
> KVM_MEMSLOT_GMEM_ONLY is somewhat misleading, it only applies to the KVM's MMU.
> For other cases where KVM accesses guest memory, KVM still follows the host virtual
> address, e.g. so that copy_{to,from}_user() Just Works.  But userspace isn't
> strictly *required* to keep the userspace mapping coherent with guest_memfd, nor
> is userspace required to make the userspace mapping fully RWX.  And so if
> userspace modifies the VMA, KVM needs to react accordingly.
> 
> When in-place conversion comes along, KVM will also rely on userspace mappings
> being torn down before allow a SHARED page to become PRIVATE (for all intents
> and purposes, we're conceptually treating conversions as free()+re-alloc().  So
> while the page might still be in the page cache, it's effectively been "freed".
> So in that case, KVM really does need to ensure it handles mmu_notifier events
> correctly to avoid UAF.

Everything makes more sense now, thanks for your patience in explaining it.

Thanks,
Alex

next prev parent reply	other threads:[~2026-06-18 10:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 15:52 [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots Alexandru Elisei
2026-06-15 16:09 ` sashiko-bot
2026-06-15 17:47   ` Sean Christopherson
2026-06-15 18:09     ` Sean Christopherson
2026-06-18 11:09       ` Alexandru Elisei
2026-06-17 13:07     ` Alexandru Elisei
2026-06-17 21:21       ` Sean Christopherson
2026-06-18 10:19         ` Alexandru Elisei [this message]
2026-06-23 23:41         ` Ackerley Tng
2026-06-24 17:32           ` Sean Christopherson
2026-06-17 21:22       ` Sean Christopherson
2026-06-18 11:26     ` David Hildenbrand (Arm)
2026-06-15 19:07 ` David Hildenbrand
2026-06-17 13:23   ` Alexandru Elisei
2026-06-17 13:41     ` David Hildenbrand
2026-06-17 13:50       ` Alexandru Elisei
2026-06-21  0:02 ` XIAO WU

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajPGTF8p_wFfogn_@raptor \
    --to=alexandru.elisei@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox