* [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
@ 2026-06-11 8:18 zhanghao
2026-06-11 10:32 ` sashiko-bot
2026-06-11 14:18 ` Sean Christopherson
0 siblings, 2 replies; 3+ messages in thread
From: zhanghao @ 2026-06-11 8:18 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm
From b164e59d4068226dfb33babe49292c7a685cacd9 Mon Sep 17 00:00:00 2001
From: Hao Zhang <zhanghao1@kylinos.cn>
Date: Thu, 11 Jun 2026 15:27:27 +0800
memory_failure() calls ->error_remove_folio() while holding the global
mf_mutex and the poisoned folio lock. guest_memfd's implementation takes
mapping.invalidate_lock for read before zapping KVM mappings.
That lock ordering can deadlock against guest_memfd punch-hole, which
holds mapping.invalidate_lock for write and can then wait on the same
folio lock in truncate_inode_pages_range().
Use a trylock in kvm_gmem_error_folio(). If mapping.invalidate_lock is
contended, fail recovery instead of blocking in the memory-failure path,
and instead of reporting MF_DELAYED without actually zapping KVM mappings.
Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Signed-off-by: Hao Zhang <zhanghao1@kylinos.cn>
---
virt/kvm/guest_memfd.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 69c9d6d546b2..9417be3049cf 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -499,7 +499,16 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
{
pgoff_t start, end;
- filemap_invalidate_lock_shared(mapping);
+ /*
+ * memory_failure() holds mf_mutex globally. We must not block
+ * on filemap_invalidate_lock here, as it can be held exclusive
+ * by kvm_gmem_fallocate() (MADV_REMOVE/FALLOC_FL_PUNCH_HOLE
+ * path), creating an ABBA deadlock with the poisoned folio lock.
+ * If the invalidation lock is contended, fail the recovery instead
+ * of reporting MF_DELAYED without zapping KVM mappings.
+ */
+ if (!filemap_invalidate_trylock_shared(mapping))
+ return -EBUSY;
start = folio->index;
end = start + folio_nr_pages(folio);
base-commit: 9716c086c8e8b141d35aa61f2e96a2e83de212a7
--
2.15.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
2026-06-11 8:18 [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio zhanghao
@ 2026-06-11 10:32 ` sashiko-bot
2026-06-11 14:18 ` Sean Christopherson
1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-11 10:32 UTC (permalink / raw)
To: zhanghao; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] kvm_gmem_error_folio incorrectly returns the internal enum MF_DELAYED instead of 0 on success, causing memory_failure() to treat successful recovery as a failure.
- [High] Leaving the poisoned folio in the page cache causes memory_failure() to fail due to an unexpected extra reference during me_pagecache_clean().
--
commit f0c468f1d22ef5955d11ed4dde621b8eef0c2808
Author: Hao Zhang <zhanghao1@kylinos.cn>
KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
This commit modifies kvm_gmem_error_folio() to use a trylock when acquiring
the mapping's invalidate_lock. This prevents an ABBA deadlock with guest_memfd
punch-hole operations, returning -EBUSY instead if the lock is contended.
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b28..9417be3049cf0 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -499,7 +499,16 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
> {
> pgoff_t start, end;
>
> - filemap_invalidate_lock_shared(mapping);
> + /*
> + * memory_failure() holds mf_mutex globally. We must not block
> + * on filemap_invalidate_lock here, as it can be held exclusive
> + * by kvm_gmem_fallocate() (MADV_REMOVE/FALLOC_FL_PUNCH_HOLE
> + * path), creating an ABBA deadlock with the poisoned folio lock.
> + * If the invalidation lock is contended, fail the recovery instead
> + * of reporting MF_DELAYED without zapping KVM mappings.
> + */
> + if (!filemap_invalidate_trylock_shared(mapping))
> + return -EBUSY;
>
> start = folio->index;
> end = start + folio_nr_pages(folio);
[Severity: High]
This is a pre-existing issue, but does leaving the poisoned folio in the page
cache cause memory_failure() to fail?
In kvm_gmem_error_folio(), guest_memfd intentionally does not truncate the
poisoned folio from the page cache so that KVM can gracefully handle guest
access to the poisoned page. However, in
mm/memory-failure.c:me_pagecache_clean(), the code checks for extra references
after truncate_error_folio():
extra_pins = shmem_mapping(mapping);
...
if (has_extra_refcount(ps, p, extra_pins))
ret = MF_FAILED;
Because guest_memfd is not shmem, extra_pins is false. has_extra_refcount()
then sees the lingering page cache reference on the folio as an unexpected
user and returns true. Does this guarantee that me_pagecache_clean() will
always return MF_FAILED for guest_memfd, leading to the VM being hard-killed
by the MCE handler instead of KVM gracefully recovering?
[Severity: High]
This isn't a bug introduced by this patch, but should this callback return 0
on success instead of MF_DELAYED?
The VFS address_space_operations->error_remove_folio callback contract
requires returning 0 on success or a standard negative error code on failure.
However, kvm_gmem_error_folio() returns the internal memory-failure enum
MF_DELAYED (2).
The caller truncate_error_folio() in mm/memory-failure.c checks this:
if (err != 0)
pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
Since it mistakenly treats the positive enum value as an error, will this
log "Failed to punch page: 2" and return MF_FAILED, entirely defeating the
graceful recovery feature?
--
Sashiko AI review · https://sashiko.dev/#/patchset/tencent_4B051B84CDCC0D1964EC00337AA32E40DC07@qq.com?part=1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
2026-06-11 8:18 [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio zhanghao
2026-06-11 10:32 ` sashiko-bot
@ 2026-06-11 14:18 ` Sean Christopherson
1 sibling, 0 replies; 3+ messages in thread
From: Sean Christopherson @ 2026-06-11 14:18 UTC (permalink / raw)
To: zhanghao; +Cc: Paolo Bonzini, kvm, Ackerley Tng, Lisa Wang, David Hildenbrand
+Lisa, David, and Ackerley
On Thu, Jun 11, 2026, zhanghao wrote:
> >From b164e59d4068226dfb33babe49292c7a685cacd9 Mon Sep 17 00:00:00 2001
> From: Hao Zhang <zhanghao1@kylinos.cn>
> Date: Thu, 11 Jun 2026 15:27:27 +0800
>
> memory_failure() calls ->error_remove_folio() while holding the global
> mf_mutex and the poisoned folio lock. guest_memfd's implementation takes
> mapping.invalidate_lock for read before zapping KVM mappings.
>
> That lock ordering can deadlock against guest_memfd punch-hole, which
> holds mapping.invalidate_lock for write and can then wait on the same
> folio lock in truncate_inode_pages_range().
I assume this was found by lockdep? If so, please provide a lockdep splat so that
it's easier to understand exactly what all is problematic.
> Use a trylock in kvm_gmem_error_folio(). If mapping.invalidate_lock is
> contended, fail recovery instead of blocking in the memory-failure path,
> and instead of reporting MF_DELAYED without actually zapping KVM mappings.
>
> Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
> Signed-off-by: Hao Zhang <zhanghao1@kylinos.cn>
> ---
> virt/kvm/guest_memfd.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b2..9417be3049cf 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -499,7 +499,16 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
> {
> pgoff_t start, end;
>
> - filemap_invalidate_lock_shared(mapping);
> + /*
> + * memory_failure() holds mf_mutex globally.
Why does mf_mutex matter?
> We must not block
> + * on filemap_invalidate_lock here, as it can be held exclusive
> + * by kvm_gmem_fallocate() (MADV_REMOVE/FALLOC_FL_PUNCH_HOLE
> + * path), creating an ABBA deadlock with the poisoned folio lock.
It's not just kvm_gmem_fallocate() that's problematic. Due to the fairness of
r/w semaphores, waiting writers block future readers, and so any path that takes
a folio lock inside of mapping->invalidate_lock will effectively create the same
scenario.
I really don't like doing a trylock here, it feels like we're hacking around a
poor locking scheme in guest_memfd. Can't we also fix this by using a dedicated
lock for bindings (and for link_at() in the future)?
E.g. untested, but something like this?
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index 86690683b2fe..6ff005978ae1 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -32,6 +32,8 @@ struct gmem_inode {
struct inode vfs_inode;
struct list_head gmem_file_list;
+ struct rw_semaphore bindings_lock;
+
u64 flags;
};
@@ -500,7 +502,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
{
pgoff_t start, end;
- filemap_invalidate_lock_shared(mapping);
+ down_read(&GMEM_I(mapping->host)->bindings_lock);
start = folio->index;
end = start + folio_nr_pages(folio);
@@ -518,7 +520,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
kvm_gmem_invalidate_end(mapping->host, start, end);
- filemap_invalidate_unlock_shared(mapping);
+ up_read(&GMEM_I(mapping->host)->bindings_lock);
return MF_DELAYED;
}
@@ -597,6 +599,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
GMEM_I(inode)->flags = flags;
+ init_rwsem(&GMEM_I(inode)->bindings_lock);
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
if (IS_ERR(file)) {
@@ -691,7 +694,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
if (kvm_gmem_supports_mmap(inode))
slot->flags |= KVM_MEMSLOT_GMEM_ONLY;
+ down_write(&GMEM_I(inode)->bindings_lock);
xa_store_range(&f->bindings, start, end - 1, slot, GFP_KERNEL);
+ up_write(&GMEM_I(inode)->bindings_lock);
+
filemap_invalidate_unlock(inode->i_mapping);
/*
@@ -746,7 +752,9 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
}
filemap_invalidate_lock(file->f_mapping);
+ down_write(&GMEM_I(file_inode(file))->bindings_lock);
__kvm_gmem_unbind(slot, file->private_data);
+ up_write(&GMEM_I(file_inode(file))->bindings_lock);
filemap_invalidate_unlock(file->f_mapping);
}
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-11 14:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 8:18 [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio zhanghao
2026-06-11 10:32 ` sashiko-bot
2026-06-11 14:18 ` Sean Christopherson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.