Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Sean Christopherson <seanjc@google.com>
Cc: sashiko-reviews@lists.linux.dev, Marc Zyngier <maz@kernel.org>,
	kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	Oliver Upton <oupton@kernel.org>
Subject: Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
Date: Thu, 18 Jun 2026 11:19:56 +0100	[thread overview]
Message-ID: <ajPGTF8p_wFfogn_@raptor> (raw)
In-Reply-To: <ajMPvgA2eHQCj-B0@google.com>

Hi Sean,

On Wed, Jun 17, 2026 at 02:21:02PM -0700, Sean Christopherson wrote:
> On Wed, Jun 17, 2026, Alexandru Elisei wrote:
> > Hi Sean,
> > 
> > Thanks for the reply. Just to make sure, once the bugs have been iron out,
> > you're ok with this idea?
> > 
> > On Mon, Jun 15, 2026 at 10:47:14AM -0700, Sean Christopherson wrote:
> > > On Mon, Jun 15, 2026, sashiko-bot@kernel.org wrote:
> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > > --- a/virt/kvm/kvm_main.c
> > > > > +++ b/virt/kvm/kvm_main.c
> > > > [ ... ]
> > > > > @@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
> > > > >  			unsigned long hva_start, hva_end;
> > > > >  
> > > > >  			slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
> > > > > +
> > > > > +			if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot))
> > > > > +				continue;
> > > > > +
> > > > 
> > > > [Severity: Critical]
> > > > Does this change inadvertently introduce a regression in the pfncache retry
> > > > protocol?
> > > > 
> > > > Looking at the pfncache framework, it maps guest memory into kernel space and
> > > > explicitly drops the page reference after mapping it:
> > > > 
> > > > virt/kvm/pfncache.c:hva_to_pfn_retry() {
> > > >     ...
> > > >     kvm_release_page_clean(page);
> > > >     ...
> > > > }
> > > > 
> > > > It appears to rely entirely on KVM's MMU notifiers (kvm->mmu_invalidate_seq)
> > > > to invalidate the cache when the page is unmapped by the host.
> > > > 
> > > > If a VMM defines a guest_memfd-backed memslot with KVM_MEMSLOT_GMEM_ONLY
> > > > but still provides a valid anonymous user mapping as its userspace_addr,
> > > > could this regression lead to a use-after-free?
> > > 
> > > Sadly, yes.  To land this, we would need to first teach the gfn_to_pfn_cache code
> > > to be able to pull directly from guest_memfd.  I forget if anyone is working on
> > > that.
> > 
> > I've been trying to wrap my head around this, and I just can't seem to
> > figure it out.
> > 
> > kvm_mmu_notifier_invalidate_range_start(), before handle_hva_range(), calls
> > gfn_to_pfn_cache_invalidate_start() for the MMU notifier range, and that
> > marks all caches that overlap the range as invalid. kvm_gpc_check() returns
> > false for an invalid cache, so how can the memory still be accessed via the
> > pfncache?
> 
> That just forces gpcs to be refreshed, mmu_notifier_retry_cache() still relies
> on mmu_invalidate_seq being bumped to avoid consuming stale state.

Yes.

> 
> > > > By unmapping the anonymous memory, the host would trigger MMU notifiers, but
> > > > this new check skips the memslot. As a result, kvm->mmu_invalidate_seq
> > > > wouldn't increment, and KVM might retain a kernel mapping to a freed physical
> > > > page.
> > 
> > kvm->mmu_invalidate_seq is incremented in kvm_mmu_invalidate_end(), I don't see
> > how that is affected by skipping a memslot in handle_hva_range().
> 
> handle_hva_range() only invokes on_lock() if a memslot is found.  By skipping the
> memslot entirely, kvm_mmu_invalidate_{start,end}() won't be called and so
> mmu_invalidate_seq won't be bumped.

I see it now, for some reason I completely missed the part where
kvm_mmu_invalidate_{begin,end}() is called on ->lock() :( I was under the
impression that they are called directly from the
->invalidate_range_{start,end}() MMU notifier callbacks.

> 
> > > > Could this allow the guest to read or write arbitrary host physical memory?
> > 
> > The KVM_MEMSLOT_GMEM_ONLY flag is set if the backing guest_memfd has been
> > created with GUEST_MEMFD_FLAG_MMAP. The documentation for the flag says
> > that '[..] the fault will always be consumed from guest_memfd, regardless
> > of whether it is a shared or private fault'.  As far as I can tell, this
> > means that, absent a fallocate(FALLOC_FL_PUNCH_HOLE) call, the page is
> > still in the page cache for the guest_memfd file after userspace has
> > unmapped it, so the guest will not be accessing a freed page.
> 
> KVM_MEMSLOT_GMEM_ONLY is somewhat misleading, it only applies to the KVM's MMU.
> For other cases where KVM accesses guest memory, KVM still follows the host virtual
> address, e.g. so that copy_{to,from}_user() Just Works.  But userspace isn't
> strictly *required* to keep the userspace mapping coherent with guest_memfd, nor
> is userspace required to make the userspace mapping fully RWX.  And so if
> userspace modifies the VMA, KVM needs to react accordingly.
> 
> When in-place conversion comes along, KVM will also rely on userspace mappings
> being torn down before allow a SHARED page to become PRIVATE (for all intents
> and purposes, we're conceptually treating conversions as free()+re-alloc().  So
> while the page might still be in the page cache, it's effectively been "freed".
> So in that case, KVM really does need to ensure it handles mmu_notifier events
> correctly to avoid UAF.

Everything makes more sense now, thanks for your patience in explaining it.

Thanks,
Alex

next prev parent reply	other threads:[~2026-06-18 10:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 15:52 [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots Alexandru Elisei
2026-06-15 16:09 ` sashiko-bot
2026-06-15 17:47   ` Sean Christopherson
2026-06-15 18:09     ` Sean Christopherson
2026-06-18 11:09       ` Alexandru Elisei
2026-06-17 13:07     ` Alexandru Elisei
2026-06-17 21:21       ` Sean Christopherson
2026-06-18 10:19         ` Alexandru Elisei [this message]
2026-06-17 21:22       ` Sean Christopherson
2026-06-18 11:26     ` David Hildenbrand (Arm)
2026-06-15 19:07 ` David Hildenbrand
2026-06-17 13:23   ` Alexandru Elisei
2026-06-17 13:41     ` David Hildenbrand
2026-06-17 13:50       ` Alexandru Elisei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajPGTF8p_wFfogn_@raptor \
    --to=alexandru.elisei@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.