Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Shivank Garg" <shivankg@amd.com>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH RFC 2/3] KVM: guest_memfd: support folio migration for non-confidential VMs
Date: Thu, 11 Jun 2026 13:25:00 +0000	[thread overview]
Message-ID: <20260611132501.2A2661F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260611-shivank-gmem-migrate-v1-2-2d266bfc6f95@amd.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [Critical] Conditionally skipping `mapping_set_unmovable()` for non-confidential VMs inadvertently removes the `AS_UNEVICTABLE` flag, allowing silent eviction of guest memory and leading to host use-after-free and privilege escalation.
- [High] `kvm_gmem_migrate_folio` can use-after-free a `kvm_memory_slot` and permanently leak `mmu_invalidate_in_progress` if it races with memslot deletion after the file descriptor is closed.

Pre-existing issues:
- [High] `kvm_gmem_error_folio` acquires `filemap_invalidate_lock_shared` while holding `folio_lock`, creating a permanent AB-BA deadlock with truncate operations.
--

commit 91a9697e1cf39521338cfd839599e2eeea129a1e
Author: Shivank Garg <shivankg@amd.com>
Date:   Thu Jun 11 13:05:09 2026 +0000

    KVM: guest_memfd: support folio migration for non-confidential VMs
    
    guest_memfd folios are currently marked unmmovable, so the kernel
    cannot perform NUMA-balancing, memory compaction, etc.
    This is unavoidable for confidential VMs (SEV-SNP, TDX),
    since memory is encrypted and copying it need firmware assistance.
    However, for non-cofidential VMs (like firecracker), we can migrate
    the folios.

> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 806a42f0e031a..e4470106fc779 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -487,13 +487,45 @@ static struct file_operations kvm_gmem_fops = {
>  	.fallocate	= kvm_gmem_fallocate,
>  };
>  
> +#ifdef CONFIG_MIGRATION
>  static int kvm_gmem_migrate_folio(struct address_space *mapping,
>  				  struct folio *dst, struct folio *src,
>  				  enum migrate_mode mode)
>  {
> -	WARN_ON_ONCE(1);
> -	return -EINVAL;
> +	struct inode *inode = mapping->host;
> +	pgoff_t start, end;
> +	int ret;
> +
> +	if (!filemap_invalidate_trylock_shared(mapping))
> +		return -EAGAIN;
> +
> +	start = src->index;
> +	end = start + folio_nr_pages(src);
> +
> +	kvm_gmem_invalidate_begin(inode, start, end);

[Severity: High]
Does this lockless XArray iteration risk a use-after-free and invalidate
leakage if it races with kvm_gmem_unbind() during file close?

If the file descriptor is closed (f_count == 0), kvm_gmem_unbind() bypasses
filemap_invalidate_lock because it assumes reliance on kvm->slots_lock. Since
migration via compaction can occur before evict_inode() tears down the page
cache, kvm_gmem_migrate_folio() can run concurrently with kvm_gmem_unbind()
and memslot deletion without holding kvm->slots_lock or
srcu_read_lock(&kvm->srcu).

The yielded slot could be freed concurrently by KVM memslot deletion, causing
a use-after-free. Additionally, if the slot is removed before
kvm_gmem_invalidate_end(), xa_find() would return false and
kvm_mmu_invalidate_end() is skipped, permanently leaking
mmu_invalidate_in_progress and hanging future VM page faults.

> +
> +	/*
> +	 * For non-confidential guest_memfd the folio is host-readable,
> +	 * so filemap_migrate_folio() can copy the contents itself via
> +	 * folio_mc_copy().
> +	 *
> +	 * This is also the hook point for confidential VMs (SEV-SNP, TDX) once
> +	 * they are made movable: the host cannot copy encrypted/private memory,
> +	 * so a firmware-assisted copy would run here.
> +	 * Idea: https://lore.kernel.org/r/20260428155043.39251-8-shivankg@amd.com
> +	 * Mark the @dst->migrate_info field with FOLIO_CONTENT_COPIED, so
> +	 * __migrate_folio() skip folio_mc_copy() for confidential VMs.
> +	 */
> +	ret = filemap_migrate_folio(mapping, dst, src, mode);
> +
> +	kvm_gmem_invalidate_end(inode, start, end);
> +
> +	filemap_invalidate_unlock_shared(mapping);
> +	return ret;
>  }
> +#else
> +#define kvm_gmem_migrate_folio NULL
> +#endif
>  
>  static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *folio)
>  {

[Severity: High]
This is a pre-existing issue, but does kvm_gmem_error_folio() acquire
filemap_invalidate_lock_shared() while holding the folio lock, creating a
permanent AB-BA deadlock with truncate operations?

Truncate operations (like fallocate(PUNCH_HOLE) via kvm_gmem_punch_hole())
establish a lock ordering by acquiring filemap_invalidate_lock in write mode
first, and then locking individual folios via truncate_inode_pages_range().

However, the memory failure (HWPOISON) path (me_pagecache_clean() ->
truncate_error_folio()) locks the affected folio first and then invokes the
mapping's error_remove_folio callback. kvm_gmem_error_folio() unconditionally
acquires filemap_invalidate_lock_shared(mapping) while the folio lock is
already held. If a hardware memory failure races with a hole punch on the
same guest_memfd file, this reverse lock ordering could result in an AB-BA
deadlock that hangs the kernel.

[ ... ]

> @@ -592,9 +624,17 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>  	inode->i_size = size;
>  	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
>  	mapping_set_inaccessible(inode->i_mapping);
> -	mapping_set_unmovable(inode->i_mapping);
> -	/* Unmovable mappings are supposed to be marked unevictable as well. */
> -	WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
> +
> +	/*
> +	 * Confidential VMs (SEV-SNP, TDX) bind encryption to the physical
> +	 * address and require firmware assisted copy, so their folios cannot
> +	 * be migrated yet.
> +	 */
> +	if (kvm_arch_has_private_mem(kvm)) {
> +		mapping_set_unmovable(inode->i_mapping);
> +		/* Unmovable mappings are supposed to be marked unevictable as well. */
> +		WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
> +	}

[Severity: Critical]
By conditionally skipping mapping_set_unmovable() for non-confidential VMs,
do we inadvertently leave the mapping without the AS_UNEVICTABLE flag?

mapping_set_unmovable() is the only function that sets both AS_UNMOVABLE and
AS_UNEVICTABLE. guest_memfd folios lack a backing store and a writepage
callback (noop_dirty_folio). To the MM reclaimer (shrink_folio_list()), these
folios appear clean and unmapped since there are no userspace PTEs.

Under memory pressure, the reclaimer will silently evict them and free their
physical pages to the host buddy allocator. Since eviction does not invoke KVM
MMU notifiers, KVM SPTEs remain valid and point to the freed pages. When the
host reallocates these pages, the guest could read or write arbitrary host
memory. Should mapping_set_unevictable() be called explicitly for
non-confidential VMs?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-shivank-gmem-migrate-v1-0-2d266bfc6f95@amd.com?part=2

  reply	other threads:[~2026-06-11 13:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 13:05 [PATCH RFC 0/3] KVM: guest_memfd: folio migration for non-confidential VMs Shivank Garg
2026-06-11 13:05 ` [PATCH RFC 1/3] mm: split AS_UNMOVABLE back out of AS_INACCESSIBLE Shivank Garg
2026-06-11 13:05 ` [PATCH RFC 2/3] KVM: guest_memfd: support folio migration for non-confidential VMs Shivank Garg
2026-06-11 13:25   ` sashiko-bot [this message]
2026-06-15 18:35   ` David Hildenbrand (Arm)
2026-06-11 13:05 ` [PATCH RFC 3/3] KVM: selftests: exercise guest_memfd folio migration Shivank Garg
2026-06-11 13:18   ` sashiko-bot
2026-06-15 10:43 ` [PATCH RFC 0/3] KVM: guest_memfd: folio migration for non-confidential VMs Alexandru Elisei
2026-06-15 11:04   ` Alexandru Elisei
2026-06-15 17:39     ` Sean Christopherson
2026-06-15 18:24       ` David Hildenbrand (Arm)
2026-06-15 18:30   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611132501.2A2661F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=shivankg@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox