From: Sean Christopherson <seanjc@google.com>
To: zhanghao <76824143@qq.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Ackerley Tng <ackerleytng@google.com>,
Lisa Wang <wyihan@google.com>,
David Hildenbrand <david@kernel.org>
Subject: Re: [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
Date: Thu, 11 Jun 2026 07:18:11 -0700 [thread overview]
Message-ID: <airDo5qNxn-gCsKY@google.com> (raw)
In-Reply-To: <tencent_4B051B84CDCC0D1964EC00337AA32E40DC07@qq.com>
+Lisa, David, and Ackerley
On Thu, Jun 11, 2026, zhanghao wrote:
> >From b164e59d4068226dfb33babe49292c7a685cacd9 Mon Sep 17 00:00:00 2001
> From: Hao Zhang <zhanghao1@kylinos.cn>
> Date: Thu, 11 Jun 2026 15:27:27 +0800
>
> memory_failure() calls ->error_remove_folio() while holding the global
> mf_mutex and the poisoned folio lock. guest_memfd's implementation takes
> mapping.invalidate_lock for read before zapping KVM mappings.
>
> That lock ordering can deadlock against guest_memfd punch-hole, which
> holds mapping.invalidate_lock for write and can then wait on the same
> folio lock in truncate_inode_pages_range().
I assume this was found by lockdep? If so, please provide a lockdep splat so that
it's easier to understand exactly what all is problematic.
> Use a trylock in kvm_gmem_error_folio(). If mapping.invalidate_lock is
> contended, fail recovery instead of blocking in the memory-failure path,
> and instead of reporting MF_DELAYED without actually zapping KVM mappings.
>
> Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
> Signed-off-by: Hao Zhang <zhanghao1@kylinos.cn>
> ---
> virt/kvm/guest_memfd.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b2..9417be3049cf 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -499,7 +499,16 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
> {
> pgoff_t start, end;
>
> - filemap_invalidate_lock_shared(mapping);
> + /*
> + * memory_failure() holds mf_mutex globally.
Why does mf_mutex matter?
> We must not block
> + * on filemap_invalidate_lock here, as it can be held exclusive
> + * by kvm_gmem_fallocate() (MADV_REMOVE/FALLOC_FL_PUNCH_HOLE
> + * path), creating an ABBA deadlock with the poisoned folio lock.
It's not just kvm_gmem_fallocate() that's problematic. Due to the fairness of
r/w semaphores, waiting writers block future readers, and so any path that takes
a folio lock inside of mapping->invalidate_lock will effectively create the same
scenario.
I really don't like doing a trylock here, it feels like we're hacking around a
poor locking scheme in guest_memfd. Can't we also fix this by using a dedicated
lock for bindings (and for link_at() in the future)?
E.g. untested, but something like this?
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index 86690683b2fe..6ff005978ae1 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -32,6 +32,8 @@ struct gmem_inode {
struct inode vfs_inode;
struct list_head gmem_file_list;
+ struct rw_semaphore bindings_lock;
+
u64 flags;
};
@@ -500,7 +502,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
{
pgoff_t start, end;
- filemap_invalidate_lock_shared(mapping);
+ down_read(&GMEM_I(mapping->host)->bindings_lock);
start = folio->index;
end = start + folio_nr_pages(folio);
@@ -518,7 +520,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
kvm_gmem_invalidate_end(mapping->host, start, end);
- filemap_invalidate_unlock_shared(mapping);
+ up_read(&GMEM_I(mapping->host)->bindings_lock);
return MF_DELAYED;
}
@@ -597,6 +599,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
GMEM_I(inode)->flags = flags;
+ init_rwsem(&GMEM_I(inode)->bindings_lock);
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
if (IS_ERR(file)) {
@@ -691,7 +694,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
if (kvm_gmem_supports_mmap(inode))
slot->flags |= KVM_MEMSLOT_GMEM_ONLY;
+ down_write(&GMEM_I(inode)->bindings_lock);
xa_store_range(&f->bindings, start, end - 1, slot, GFP_KERNEL);
+ up_write(&GMEM_I(inode)->bindings_lock);
+
filemap_invalidate_unlock(inode->i_mapping);
/*
@@ -746,7 +752,9 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
}
filemap_invalidate_lock(file->f_mapping);
+ down_write(&GMEM_I(file_inode(file))->bindings_lock);
__kvm_gmem_unbind(slot, file->private_data);
+ up_write(&GMEM_I(file_inode(file))->bindings_lock);
filemap_invalidate_unlock(file->f_mapping);
}
prev parent reply other threads:[~2026-06-11 14:18 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 8:18 [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio zhanghao
2026-06-11 10:32 ` sashiko-bot
2026-06-11 14:18 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=airDo5qNxn-gCsKY@google.com \
--to=seanjc@google.com \
--cc=76824143@qq.com \
--cc=ackerleytng@google.com \
--cc=david@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=wyihan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.