Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: zhanghao <76824143@qq.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,  Ackerley Tng <ackerleytng@google.com>,
	Lisa Wang <wyihan@google.com>,
	 David Hildenbrand <david@kernel.org>
Subject: Re: [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio
Date: Thu, 11 Jun 2026 07:18:11 -0700	[thread overview]
Message-ID: <airDo5qNxn-gCsKY@google.com> (raw)
In-Reply-To: <tencent_4B051B84CDCC0D1964EC00337AA32E40DC07@qq.com>

+Lisa, David, and Ackerley

On Thu, Jun 11, 2026, zhanghao wrote:
> >From b164e59d4068226dfb33babe49292c7a685cacd9 Mon Sep 17 00:00:00 2001
> From: Hao Zhang <zhanghao1@kylinos.cn>
> Date: Thu, 11 Jun 2026 15:27:27 +0800
> 
> memory_failure() calls ->error_remove_folio() while holding the global
> mf_mutex and the poisoned folio lock. guest_memfd's implementation takes
> mapping.invalidate_lock for read before zapping KVM mappings.
> 
> That lock ordering can deadlock against guest_memfd punch-hole, which
> holds mapping.invalidate_lock for write and can then wait on the same
> folio lock in truncate_inode_pages_range().

I assume this was found by lockdep?  If so, please provide a lockdep splat so that
it's easier to understand exactly what all is problematic.

> Use a trylock in kvm_gmem_error_folio(). If mapping.invalidate_lock is
> contended, fail recovery instead of blocking in the memory-failure path,
> and instead of reporting MF_DELAYED without actually zapping KVM mappings.
> 
> Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
> Signed-off-by: Hao Zhang <zhanghao1@kylinos.cn>
> ---
>  virt/kvm/guest_memfd.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b2..9417be3049cf 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -499,7 +499,16 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>  {
>  	pgoff_t start, end;
>  
> -	filemap_invalidate_lock_shared(mapping);
> +	/*
> +	 * memory_failure() holds mf_mutex globally.  

Why does mf_mutex matter? 

> We must not block
> +	 * on filemap_invalidate_lock here, as it can be held exclusive
> +	 * by kvm_gmem_fallocate() (MADV_REMOVE/FALLOC_FL_PUNCH_HOLE
> +	 * path), creating an ABBA deadlock with the poisoned folio lock.

It's not just kvm_gmem_fallocate() that's problematic.  Due to the fairness of
r/w semaphores, waiting writers block future readers, and so any path that takes
a folio lock inside of mapping->invalidate_lock will effectively create the same
scenario.

I really don't like doing a trylock here, it feels like we're hacking around a
poor locking scheme in guest_memfd.  Can't we also fix this by using a dedicated
lock for bindings (and for link_at() in the future)?

E.g. untested, but something like this?

diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index 86690683b2fe..6ff005978ae1 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -32,6 +32,8 @@ struct gmem_inode {
        struct inode vfs_inode;
        struct list_head gmem_file_list;
 
+       struct rw_semaphore bindings_lock;
+
        u64 flags;
 };
 
@@ -500,7 +502,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 {
        pgoff_t start, end;
 
-       filemap_invalidate_lock_shared(mapping);
+       down_read(&GMEM_I(mapping->host)->bindings_lock);
 
        start = folio->index;
        end = start + folio_nr_pages(folio);
@@ -518,7 +520,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 
        kvm_gmem_invalidate_end(mapping->host, start, end);
 
-       filemap_invalidate_unlock_shared(mapping);
+       up_read(&GMEM_I(mapping->host)->bindings_lock);
 
        return MF_DELAYED;
 }
@@ -597,6 +599,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
        WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
 
        GMEM_I(inode)->flags = flags;
+       init_rwsem(&GMEM_I(inode)->bindings_lock);
 
        file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
        if (IS_ERR(file)) {
@@ -691,7 +694,10 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
        if (kvm_gmem_supports_mmap(inode))
                slot->flags |= KVM_MEMSLOT_GMEM_ONLY;
 
+       down_write(&GMEM_I(inode)->bindings_lock);
        xa_store_range(&f->bindings, start, end - 1, slot, GFP_KERNEL);
+       up_write(&GMEM_I(inode)->bindings_lock);
+
        filemap_invalidate_unlock(inode->i_mapping);
 
        /*
@@ -746,7 +752,9 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
        }
 
        filemap_invalidate_lock(file->f_mapping);
+       down_write(&GMEM_I(file_inode(file))->bindings_lock);
        __kvm_gmem_unbind(slot, file->private_data);
+       up_write(&GMEM_I(file_inode(file))->bindings_lock);
        filemap_invalidate_unlock(file->f_mapping);
 }

      parent reply	other threads:[~2026-06-11 14:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  8:18 [PATCH] KVM: guest_memfd: Fix ABBA deadlock in error_remove_folio zhanghao
2026-06-11 10:32 ` sashiko-bot
2026-06-11 14:18 ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=airDo5qNxn-gCsKY@google.com \
    --to=seanjc@google.com \
    --cc=76824143@qq.com \
    --cc=ackerleytng@google.com \
    --cc=david@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=wyihan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox