From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 289883F0761; Thu, 18 Jun 2026 11:09:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781780988; cv=none; b=s+WGo/L+bj/q9No9eaHsEg3sqXiX+LvvX8rb83BnndxG10iZnpJH01Ahd9gkUVKx+SzTE/hQM0FliSyfhIlKVvvE9atNikJ+VZh4LGVrjWAD3bTA9pplJq3Hwxoss5kVeoijpubhvBWIXpIF5OpqqNNRDWSj/GWMXNQGEXazRvA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781780988; c=relaxed/simple; bh=auYCEoZcbA6tsdM4hZRjaybb0BxVtZogmGDcWnQ2xH8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dC/SHHPhmTpo549673z8Q585G529iveIDWaaWKqrprRJdnGlZ/kvyRY747+UgOt32vid0cATHS7QDXD0SzOgGKnBsHNlx5l2qSiYZwyq+KKW0Lqa2NI0+ejfHt71jv6my7PRQu28lFE4D65OxfsjyNSEpCFm2OP1WKad51KGjo4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=hvmrsFIv; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="hvmrsFIv" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A5252A6B; Thu, 18 Jun 2026 04:09:40 -0700 (PDT) Received: from raptor (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6BFAA3F915; Thu, 18 Jun 2026 04:09:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1781780985; bh=auYCEoZcbA6tsdM4hZRjaybb0BxVtZogmGDcWnQ2xH8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hvmrsFIv7NwX7MC2b/nUOBdiT4YtYeuiO15zs1v436ZR06mIIyRKoZuzIXev4Ehv2 KKrl+p7B+bF6TupXCW9wHNe5R2XZYW8jd/Pp5Fni4eTpdMWbINm+lCkcSa76OciRbO wXQwV2Nvmn/SfmhudwTO6A0Ewh1mkaIL6RJauuQ0= Date: Thu, 18 Jun 2026 12:09:40 +0100 From: Alexandru Elisei To: Sean Christopherson Cc: sashiko-reviews@lists.linux.dev, Marc Zyngier , kvm@vger.kernel.org, kvmarm@lists.linux.dev, Oliver Upton Subject: Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots Message-ID: References: <20260615155244.183044-1-alexandru.elisei@arm.com> <20260615160901.9A1A61F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi Sean, On Mon, Jun 15, 2026 at 11:09:21AM -0700, Sean Christopherson wrote: > On Mon, Jun 15, 2026, Sean Christopherson wrote: > > On Mon, Jun 15, 2026, sashiko-bot@kernel.org wrote: > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > > --- a/virt/kvm/kvm_main.c > > > > +++ b/virt/kvm/kvm_main.c > > > [ ... ] > > > > @@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, > > > > unsigned long hva_start, hva_end; > > > > > > > > slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]); > > > > + > > > > + if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot)) > > > > + continue; > > > > + > > > > > > [Severity: Critical] > > > Does this change inadvertently introduce a regression in the pfncache retry > > > protocol? > > > > > > Looking at the pfncache framework, it maps guest memory into kernel space and > > > explicitly drops the page reference after mapping it: > > > > > > virt/kvm/pfncache.c:hva_to_pfn_retry() { > > > ... > > > kvm_release_page_clean(page); > > > ... > > > } > > > > > > It appears to rely entirely on KVM's MMU notifiers (kvm->mmu_invalidate_seq) > > > to invalidate the cache when the page is unmapped by the host. > > > > > > If a VMM defines a guest_memfd-backed memslot with KVM_MEMSLOT_GMEM_ONLY > > > but still provides a valid anonymous user mapping as its userspace_addr, > > > could this regression lead to a use-after-free? > > > > Sadly, yes. To land this, we would need to first teach the gfn_to_pfn_cache code > > to be able to pull directly from guest_memfd. I forget if anyone is working on > > that. > > Actually, we just need to ensure the invalidation tracking is updated, the MMU > itself can be left as-is. > > Compile tested only, but this? > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 27498e990dff..690ab707816b 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -260,6 +260,7 @@ union kvm_mmu_notifier_arg { > enum kvm_gfn_range_filter { > KVM_FILTER_SHARED = BIT(0), > KVM_FILTER_PRIVATE = BIT(1), > + KVM_FILTER_USERSPACE_MAPPINGS = BIT(2), > }; > > struct kvm_gfn_range { > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index e44c20c04961..84b693de7e35 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -608,7 +608,8 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, > * HVA-based notifications aren't relevant to private > * mappings as they don't have a userspace mapping. > */ > - gfn_range.attr_filter = KVM_FILTER_SHARED; > + gfn_range.attr_filter = KVM_FILTER_SHARED | > + KVM_FILTER_USERSPACE_MAPPINGS; > > /* > * {gfn(page) | page intersects with [hva_start, hva_end)} = > @@ -715,6 +716,21 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end) > bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > { > kvm_mmu_invalidate_range_add(kvm, range->start, range->end); > + > + /* > + * When reacting to changes in userspace mappings, don't unmap memslots > + * that are guest_memfd-only, in which case KVM's MMU mappings are > + * pulled directly from guest_memfd, i.e. don't depend on the userspace > + * mappings. > + * > + * TODO: Skip gmem-only memslots on mmu_notifier events entirely, once > + * gfn_to_pfn_cache is also wired up to directly pull from guest_memfd. > + */ > + if (range->attr_filter & KVM_FILTER_USERSPACE_MAPPINGS && > + kvm_slot_has_gmem(range->slot) && > + kvm_memslot_is_gmem_only(range->slot)) > + return false; > + > return kvm_unmap_gfn_range(kvm, range); > } Looks correct to me, this way we also make sure we don't hit the WARN_ON_ONCE() from mmu_invalidate_retry_pfn(). How about the ->{clear_flush,clear,test}_young() MMU notifier callbacks? Shouldn't they receive the same treatment? Thanks, Alex