From: Sean Christopherson <seanjc@google.com>
To: zhanghao <76824143@qq.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Ackerley Tng <ackerleytng@google.com>,
Lisa Wang <wyihan@google.com>,
David Hildenbrand <david@kernel.org>
Subject: Re: [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
Date: Thu, 11 Jun 2026 07:18:11 -0700 [thread overview]
Message-ID: <airDo5qNxn-gCsKY@google.com> (raw)
In-Reply-To: <tencent_4B051B84CDCC0D1964EC00337AA32E40DC07@qq.com>
+Lisa, David, and Ackerley
On Thu, Jun 11, 2026, zhanghao wrote:
> >From b164e59d4068226dfb33babe49292c7a685cacd9 Mon Sep 17 00:00:00 2001
> From: Hao Zhang <zhanghao1@kylinos.cn>
> Date: Thu, 11 Jun 2026 15:27:27 +0800
>
> memory_failure() calls ->error_remove_folio() while holding the global
> mf_mutex and the poisoned folio lock. guest_memfd's implementation takes
> mapping.invalidate_lock for read before zapping KVM mappings.
>
> That lock ordering can deadlock against guest_memfd punch-hole, which
> holds mapping.invalidate_lock for write and can then wait on the same
> folio lock in truncate_inode_pages_range().
I assume this was found by lockdep? If so, please provide a lockdep splat so that
it's easier to understand exactly what all is problematic.
> Use a trylock in kvm_gmem_error_folio(). If mapping.invalidate_lock is
> contended, fail recovery instead of blocking in the memory-failure path,
> and instead of reporting MF_DELAYED without actually zapping KVM mappings.
>
> Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
> Signed-off-by: Hao Zhang <zhanghao1@kylinos.cn>
> ---
> virt/kvm/guest_memfd.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b2..9417be3049cf 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -499,7 +499,16 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
> {
> pgoff_t start, end;
>
> - filemap_invalidate_lock_shared(mapping);
> + /*
> + * memory_failure() holds mf_mutex globally.
Why does mf_mutex matter?
> We must not block
> + * on filemap_invalidate_lock here, as it can be held exclusive
> + * by kvm_gmem_fallocate() (MADV_REMOVE/FALLOC_FL_PUNCH_HOLE
> + * path), creating an ABBA deadlock with the poisoned folio lock.
It's not just kvm_gmem_fallocate() that's problematic. Due to the fairness of
r/w semaphores, waiting writers block future readers, and so any path that takes
a folio lock inside of mapping->invalidate_lock will effectively create the same
scenario.
I really don't like doing a trylock here, it feels like we're hacking around a
poor locking scheme in guest_memfd. Can't we also fix this by using a dedicated
lock for bindings (and for link_at() in the future)?
E.g. untested, but something like this?
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index 86690683b2fe..6ff005978ae1 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -32,6 +32,8 @@ struct gmem_inode {
struct inode vfs_inode;
struct list_head gmem_file_list;
+ struct rw_semaphore bindings_lock;
+
u64 flags;
};
@@ -500,7 +502,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
{
pgoff_t start, end;
- filemap_invalidate_lock_shared(mapping);
+ down_read(&GMEM_I(mapping->host)->bindings_lock);
start = folio->index;
end = start + folio_nr_pages(folio);
@@ -518,7 +520,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
kvm_gmem_invalidate_end(mapping->host, start, end);
- filemap_invalidate_unlock_shared(mapping);
+ up_read(&GMEM_I(mapping->host)->bindings_lock);
return MF_DELAYED;
}
@@ -597,6 +599,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
GMEM_I(inode)->flags = flags;
+ init_rwsem(&GMEM_I(inode)->bindings_lock);
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
if (IS_ERR(file)) {
@@ -691,7 +694,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
if (kvm_gmem_supports_mmap(inode))
slot->flags |= KVM_MEMSLOT_GMEM_ONLY;
+ down_write(&GMEM_I(inode)->bindings_lock);
xa_store_range(&f->bindings, start, end - 1, slot, GFP_KERNEL);
+ up_write(&GMEM_I(inode)->bindings_lock);
+
filemap_invalidate_unlock(inode->i_mapping);
/*
@@ -746,7 +752,9 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
}
filemap_invalidate_lock(file->f_mapping);
+ down_write(&GMEM_I(file_inode(file))->bindings_lock);
__kvm_gmem_unbind(slot, file->private_data);
+ up_write(&GMEM_I(file_inode(file))->bindings_lock);
filemap_invalidate_unlock(file->f_mapping);
}
prev parent reply other threads:[~2026-06-11 14:18 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 8:18 [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio zhanghao
2026-06-11 10:32 ` sashiko-bot
2026-06-11 14:18 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=airDo5qNxn-gCsKY@google.com \
--to=seanjc@google.com \
--cc=76824143@qq.com \
--cc=ackerleytng@google.com \
--cc=david@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=wyihan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox