[PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host
@ 2026-06-10 16:10 Hyunwoo Kim
  2026-06-10 16:26 ` sashiko-bot
  2026-06-10 22:16 ` Michael Roth
  0 siblings, 2 replies; 4+ messages in thread
From: Hyunwoo Kim @ 2026-06-10 16:10 UTC (permalink / raw)
  To: seanjc, pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa,
	michael.roth
  Cc: kvm, imv4bel

sev_gmem_invalidate() is called when guest_memfd frees a gmem page.
For each PFN that is still assigned to the guest in the RMP table, it
transitions the page back to hypervisor-owned via rmp_make_shared()
before the page is returned to the host.

A guest-assigned page can reach this path while still private,
because the free path does not transition it beforehand and
sev_gmem_invalidate() is the only place that does. A gmem page used
as a vCPU's VMSA after SEV-SNP AP creation is one such case. When
rmp_make_shared() fails, the RMP entry remains guest-owned and the
host cannot use the page because of RMP protection, so it must not be
returned to the host. The existing code only issues WARN_ONCE() and
continues to the next PFN, returning the page to the host allocator.

Leak the page instead of freeing it, as kvm_rmp_make_shared(),
snp_page_reclaim() and sev_free_vcpu() already do when a transition
back to shared fails. snp_leak_pages() does not take a reference of
its own, and on this path the page is freed right after the hook
returns, so take a reference with folio_get() first to keep the page
from being freed.

Fixes: 8eb01900b018 ("KVM: SEV: Implement gmem hook for invalidating private pages")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
 arch/x86/kvm/svm/sev.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6c6a6d663e29..8fee6ec529f9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -5178,8 +5178,12 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)

 		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
 		if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
-			      pfn, rc))
+			      pfn, rc)) {
+			/* Still assigned to the guest; pin and leak rather than freeing. */
+			folio_get(page_folio(pfn_to_page(pfn)));
+			snp_leak_pages(pfn, use_2m_update ? PTRS_PER_PMD : 1);
 			goto next_pfn;
+		}

 		/*
 		 * SEV-ES avoids host/guest cache coherency issues through
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host
  2026-06-10 16:10 [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host Hyunwoo Kim
@ 2026-06-10 16:26 ` sashiko-bot
  2026-06-10 18:25   ` Sean Christopherson
  2026-06-10 22:16 ` Michael Roth
  1 sibling, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-06-10 16:26 UTC (permalink / raw)
  To: Hyunwoo Kim; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Calling snp_leak_pages() on a pinned guest_memfd folio corrupts the system-wide LRU list by overwriting the folio's LRU pointers.
- [High] Unconditional PAGE_SIZE-stepping PFN iteration on large folios results in duplicate folio_get() increments on the same folio and failure to track leaked pages.
--

commit 5d574de6d0d362139fa648b3657e2f24c06fd5ee
Author: Hyunwoo Kim <imv4bel@gmail.com>

KVM: SEV: Don't return a still-assigned gmem page to the host

This commit addresses an issue where guest_memfd frees a page that is
still assigned to the guest. To prevent returning it to the host, the
page is pinned and leaked.

> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index f59c65abe3cfa..dbde6dce79b76 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
[ ... ]
> @@ -5036,8 +5036,12 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
>  
>  		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
>  		if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
> -			      pfn, rc))
> +			      pfn, rc)) {
> +			/* Still assigned to the guest; pin and leak rather than freeing. */
> +			folio_get(page_folio(pfn_to_page(pfn)));
> +			snp_leak_pages(pfn, use_2m_update ? PTRS_PER_PMD : 1);

[Severity: Critical]
Does calling snp_leak_pages() on a pinned guest_memfd folio corrupt the
system-wide LRU list?

Memory allocated by guest_memfd is added to the page cache, which
unconditionally places the memory on the global unevictable LRU list. When
guest_memfd memory is freed, the folio has been removed from the page cache
but is still actively linked in the global LRU list until its reference
count drops to zero.

By calling folio_get() here, the reference count is elevated, keeping the
folio on the LRU list. Then, snp_leak_pages() unconditionally links the
page into a tracking list via list_add_tail(&page->buddy_list, ...).

Because buddy_list and lru share the exact same union inside struct page
and struct folio, modifying buddy_list silently overwrites the lru.next and
lru.prev pointers of the folio, severing the global LRU linked list.

[Severity: High]
Does this code also double-account references and fail to track the leaked
pages when processing large folios?

When guest_memfd allocates a large 2MB PMD-size folio, but the RMP table
has it mapped as 4K entries, use_2m_update will be false and the loop
unconditionally steps by 1 PFN.

Because page_folio(pfn_to_page(pfn)) returns the head folio for any subpage,
folio_get() will be redundantly called up to 512 times on the exact same
head folio.

Furthermore, passing use_2m_update ? PTRS_PER_PMD : 1 (which is 1) to
snp_leak_pages() results in the internal check compound_nr(page) <= npages
evaluating to 512 <= 1 (false). This causes the entire compound page to be
completely skipped and omitted from the snp_leaked_pages_list.

>  			goto next_pfn;
> +		}
>  
>  		/*
>  		 * SEV-ES avoids host/guest cache coherency issues through

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/aimMWzAf5b3luM0b@v4bel?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host
  2026-06-10 16:26 ` sashiko-bot
@ 2026-06-10 18:25   ` Sean Christopherson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2026-06-10 18:25 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: Hyunwoo Kim, kvm

On Wed, Jun 10, 2026, sashiko-bot@kernel.org wrote:
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index f59c65abe3cfa..dbde6dce79b76 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> [ ... ]
> > @@ -5036,8 +5036,12 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> >  
> >  		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
> >  		if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
> > -			      pfn, rc))
> > +			      pfn, rc)) {
> > +			/* Still assigned to the guest; pin and leak rather than freeing. */
> > +			folio_get(page_folio(pfn_to_page(pfn)));
> > +			snp_leak_pages(pfn, use_2m_update ? PTRS_PER_PMD : 1);
> 
> [Severity: Critical]
> Does calling snp_leak_pages() on a pinned guest_memfd folio corrupt the
> system-wide LRU list?
> 
> Memory allocated by guest_memfd is added to the page cache, which
> unconditionally places the memory on the global unevictable LRU list. When
> guest_memfd memory is freed, the folio has been removed from the page cache
> but is still actively linked in the global LRU list until its reference
> count drops to zero.
> 
> By calling folio_get() here, the reference count is elevated, keeping the
> folio on the LRU list. Then, snp_leak_pages() unconditionally links the
> page into a tracking list via list_add_tail(&page->buddy_list, ...).
> 
> Because buddy_list and lru share the exact same union inside struct page
> and struct folio, modifying buddy_list silently overwrites the lru.next and
> lru.prev pointers of the folio, severing the global LRU linked list.

Yeah, trying to prevent a folio/page from being freed from within ->free_folio()
seems like a fool's errand.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host
  2026-06-10 16:10 [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host Hyunwoo Kim
  2026-06-10 16:26 ` sashiko-bot
@ 2026-06-10 22:16 ` Michael Roth
  1 sibling, 0 replies; 4+ messages in thread
From: Michael Roth @ 2026-06-10 22:16 UTC (permalink / raw)
  To: Hyunwoo Kim; +Cc: seanjc, pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm

On Thu, Jun 11, 2026 at 01:10:03AM +0900, Hyunwoo Kim wrote:
> [You don't often get email from imv4bel@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> sev_gmem_invalidate() is called when guest_memfd frees a gmem page.
> For each PFN that is still assigned to the guest in the RMP table, it
> transitions the page back to hypervisor-owned via rmp_make_shared()
> before the page is returned to the host.
> 
> A guest-assigned page can reach this path while still private,
> because the free path does not transition it beforehand and
> sev_gmem_invalidate() is the only place that does. A gmem page used
> as a vCPU's VMSA after SEV-SNP AP creation is one such case. When
> rmp_make_shared() fails, the RMP entry remains guest-owned and the
> host cannot use the page because of RMP protection, so it must not be
> returned to the host. The existing code only issues WARN_ONCE() and
> continues to the next PFN, returning the page to the host allocator.
> 
> Leak the page instead of freeing it, as kvm_rmp_make_shared(),
> snp_page_reclaim() and sev_free_vcpu() already do when a transition
> back to shared fails. snp_leak_pages() does not take a reference of
> its own, and on this path the page is freed right after the hook
> returns, so take a reference with folio_get() first to keep the page
> from being freed.
> 
> Fixes: 8eb01900b018 ("KVM: SEV: Implement gmem hook for invalidating private pages")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
>  arch/x86/kvm/svm/sev.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6c6a6d663e29..8fee6ec529f9 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -5178,8 +5178,12 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> 
>                 rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
>                 if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
> -                             pfn, rc))
> +                             pfn, rc)) {
> +                       /* Still assigned to the guest; pin and leak rather than freeing. */
> +                       folio_get(page_folio(pfn_to_page(pfn)));
> +                       snp_leak_pages(pfn, use_2m_update ? PTRS_PER_PMD : 1);
>                         goto next_pfn;
> +               }

This roughly aligns with what would happen if snp_page_reclaim() fails
in sev_gmem_post_populate(), while the guest is being initialized via
KVM_SEV_SNP_LAUNCH_UPDATE ioctl, which calls into kvm_gmem_populate().

However, in kvm_gmem_populate(), we still free the page. Maybe, to
address both cases, we should just add a parameter to snp_leak_pages()
to tell it to take an extra ref and use that in both of these paths.

Or we can just do the direct folio_get() in both cases, the above
formalizes the handling convention a little better though IMO.

Thanks,

Mike

> 
>                 /*
>                  * SEV-ES avoids host/guest cache coherency issues through
> --
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-10 22:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 16:10 [PATCH] KVM: SEV: Don't return a still-assigned gmem page to the host Hyunwoo Kim
2026-06-10 16:26 ` sashiko-bot
2026-06-10 18:25   ` Sean Christopherson
2026-06-10 22:16 ` Michael Roth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.