From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Sean Christopherson <seanjc@google.com>
Cc: sashiko-reviews@lists.linux.dev, Marc Zyngier <maz@kernel.org>,
kvm@vger.kernel.org, kvmarm@lists.linux.dev,
Oliver Upton <oupton@kernel.org>
Subject: Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
Date: Thu, 18 Jun 2026 12:09:40 +0100 [thread overview]
Message-ID: <ajPR9PrmNN3y8EC7@raptor> (raw)
In-Reply-To: <ajA_0TQCQLSz7qcU@google.com>
Hi Sean,
On Mon, Jun 15, 2026 at 11:09:21AM -0700, Sean Christopherson wrote:
> On Mon, Jun 15, 2026, Sean Christopherson wrote:
> > On Mon, Jun 15, 2026, sashiko-bot@kernel.org wrote:
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > [ ... ]
> > > > @@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
> > > > unsigned long hva_start, hva_end;
> > > >
> > > > slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
> > > > +
> > > > + if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot))
> > > > + continue;
> > > > +
> > >
> > > [Severity: Critical]
> > > Does this change inadvertently introduce a regression in the pfncache retry
> > > protocol?
> > >
> > > Looking at the pfncache framework, it maps guest memory into kernel space and
> > > explicitly drops the page reference after mapping it:
> > >
> > > virt/kvm/pfncache.c:hva_to_pfn_retry() {
> > > ...
> > > kvm_release_page_clean(page);
> > > ...
> > > }
> > >
> > > It appears to rely entirely on KVM's MMU notifiers (kvm->mmu_invalidate_seq)
> > > to invalidate the cache when the page is unmapped by the host.
> > >
> > > If a VMM defines a guest_memfd-backed memslot with KVM_MEMSLOT_GMEM_ONLY
> > > but still provides a valid anonymous user mapping as its userspace_addr,
> > > could this regression lead to a use-after-free?
> >
> > Sadly, yes. To land this, we would need to first teach the gfn_to_pfn_cache code
> > to be able to pull directly from guest_memfd. I forget if anyone is working on
> > that.
>
> Actually, we just need to ensure the invalidation tracking is updated, the MMU
> itself can be left as-is.
>
> Compile tested only, but this?
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 27498e990dff..690ab707816b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -260,6 +260,7 @@ union kvm_mmu_notifier_arg {
> enum kvm_gfn_range_filter {
> KVM_FILTER_SHARED = BIT(0),
> KVM_FILTER_PRIVATE = BIT(1),
> + KVM_FILTER_USERSPACE_MAPPINGS = BIT(2),
> };
>
> struct kvm_gfn_range {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e44c20c04961..84b693de7e35 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -608,7 +608,8 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
> * HVA-based notifications aren't relevant to private
> * mappings as they don't have a userspace mapping.
> */
> - gfn_range.attr_filter = KVM_FILTER_SHARED;
> + gfn_range.attr_filter = KVM_FILTER_SHARED |
> + KVM_FILTER_USERSPACE_MAPPINGS;
>
> /*
> * {gfn(page) | page intersects with [hva_start, hva_end)} =
> @@ -715,6 +716,21 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end)
> bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> {
> kvm_mmu_invalidate_range_add(kvm, range->start, range->end);
> +
> + /*
> + * When reacting to changes in userspace mappings, don't unmap memslots
> + * that are guest_memfd-only, in which case KVM's MMU mappings are
> + * pulled directly from guest_memfd, i.e. don't depend on the userspace
> + * mappings.
> + *
> + * TODO: Skip gmem-only memslots on mmu_notifier events entirely, once
> + * gfn_to_pfn_cache is also wired up to directly pull from guest_memfd.
> + */
> + if (range->attr_filter & KVM_FILTER_USERSPACE_MAPPINGS &&
> + kvm_slot_has_gmem(range->slot) &&
> + kvm_memslot_is_gmem_only(range->slot))
> + return false;
> +
> return kvm_unmap_gfn_range(kvm, range);
> }
Looks correct to me, this way we also make sure we don't hit the
WARN_ON_ONCE() from mmu_invalidate_retry_pfn().
How about the ->{clear_flush,clear,test}_young() MMU notifier callbacks?
Shouldn't they receive the same treatment?
Thanks,
Alex
next prev parent reply other threads:[~2026-06-18 11:09 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 15:52 [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots Alexandru Elisei
2026-06-15 16:09 ` sashiko-bot
2026-06-15 17:47 ` Sean Christopherson
2026-06-15 18:09 ` Sean Christopherson
2026-06-18 11:09 ` Alexandru Elisei [this message]
2026-06-17 13:07 ` Alexandru Elisei
2026-06-17 21:21 ` Sean Christopherson
2026-06-18 10:19 ` Alexandru Elisei
2026-06-23 23:41 ` Ackerley Tng
2026-06-24 17:32 ` Sean Christopherson
2026-06-17 21:22 ` Sean Christopherson
2026-06-18 11:26 ` David Hildenbrand (Arm)
2026-06-15 19:07 ` David Hildenbrand
2026-06-17 13:23 ` Alexandru Elisei
2026-06-17 13:41 ` David Hildenbrand
2026-06-17 13:50 ` Alexandru Elisei
2026-06-21 0:02 ` XIAO WU
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajPR9PrmNN3y8EC7@raptor \
--to=alexandru.elisei@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox