linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Introduce RET_PF_RETRY_INVALID_SLOT
@ 2025-05-19  2:36 Yan Zhao
  2025-05-19  2:37 ` [PATCH 1/2] KVM: x86/mmu: Add RET_PF_RETRY_INVALID_SLOT for fault retry on invalid slot Yan Zhao
  2025-05-19  2:38 ` [PATCH 2/2] KVM: selftests: Test prefault memory with concurrent memslot removal Yan Zhao
  0 siblings, 2 replies; 18+ messages in thread
From: Yan Zhao @ 2025-05-19  2:36 UTC (permalink / raw)
  To: pbonzini, seanjc
  Cc: reinette.chatre, rick.p.edgecombe, linux-kernel, kvm, Yan Zhao

Introduce a new return value RET_PF_RETRY_INVALID_SLOT to inform callers of
kvm_mmu_do_page_fault() that a fault retry is caused by an invalid memslot.
This helps prevent deadlock when a memslot is removed during pre-faulting
or local retry of faulting private pages in TDX.

Take pre-faulting as an example. Removing a memslot during the ioctl
KVM_PRE_FAULT_MEMORY on x86 can lead to a kernel deadlock.

"slot deleting" thread                    "pre-fault" thread
-----------------------------             ----------------------
                                          srcu_read_lock();
(A)
invalid_slot->flags |= KVM_MEMSLOT_INVALID
rcu_assign_pointer();

                                          kvm_tdp_map_page();
                                          (B)
                                            do {
                                               r = kvm_mmu_do_page_fault();

(C) synchronize_srcu_expedited();

                                            } while (r == RET_PF_RETRY);

                                         (D) srcu_read_unlock();
    
 
The deadlock occurs because (C) is waiting for (D) to complete, but (B)
repeatedly encounters an invalid slot and retries before (C) can finish,
thus preventing (D) from being executed.
(If the srcu_read_lock() in the pre-fault thread is invoked after (A)
and before (C), deadlock can also occur).

The local retry code in TDX's EPT violation handler faces a similar issue,
where deadlock can occur if an invalid slot is found when faulting a
private GFN.

To resolve the deadlock, modify kvm_mmu_do_page_fault() to return
RET_PF_RETRY_INVALID_SLOT instead of RET_PF_RETRY when encountering an
invalid memslot. This change enables the pre-fault thread or TDX's EPT
violation handler to exit the loop and release the srcu lock.

There're some alternative solutions to address the deadlock.
1) Return -EAGAIN for an invalid slot.
   Cons: This approach involves an uAPI change, requiring userspace to
         ignore error -EAGAIN and re-run the vCPU. While QEMU already
         ignores -EAGAIN, selftests may need updates.

2) Call kvm_mmu_prepare_memory_fault_exit() and return -EFAULT for an
   invalid slot.
   Cons: - kvm_mmu_prepare_memory_fault_exit() is not recognized by
           userspace for normal VMs.
         - Returning -EFAULT when a memslot is still invalid (i.e., not
           completely removed) may confuse userspace.

3) Release the srcu lock before cond_resched() and re-acquire it afterward
   in kvm_tdp_map_page() and tdx_handle_ept_violation():
   Cons: - It allows each kvm_vcpu_pre_fault_memory() to pre-fault memory
           with different memslot layouts.
         - It requires tdx_gmem_post_populate() to acquire the srcu lock
           before invoking kvm_tdp_map_page(), which is redundant since
           tdx_gmem_post_populate() already holds the kvm->slots_lock.


Patch 1 opts to introduce RET_PF_RETRY_INVALID_SLOT, which prevents endless
retries in kvm_tdp_map_page() and tdx_handle_ept_violation() while avoiding
exiting to userspace.

Patch 2 provides a selftest to reproduce the deadlock when patch 1 is
absent and verifies the pre-fault can correctly completes when the faulting
range is in a removing memslot.

Note: The selftest pre_fault_memory_test relies on commit 20a6cff3b283
("KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()") to run
the second ioctl KVM_PRE_FAULT_MEMORY successfully after a memory slot is
removed during the first ioctl KVM_PRE_FAULT_MEMORY.
Base commit:
kvm-x86/next or
kvm/next + commit 20a6cff3b283 ("KVM: x86/mmu: Check and free obsolete
roots in kvm_mmu_reload()") 

Yan Zhao (2):
  KVM: x86/mmu: Add RET_PF_RETRY_INVALID_SLOT for fault retry on invalid
    slot
  KVM: selftests: Test prefault memory with concurrent memslot removal

 arch/x86/kvm/mmu/mmu.c                        |  3 +-
 arch/x86/kvm/mmu/mmu_internal.h               |  3 +
 .../selftests/kvm/pre_fault_memory_test.c     | 82 +++++++++++++++----
 3 files changed, 72 insertions(+), 16 deletions(-)


base-commit: e59ae112a7c2e0f58d9f5905de299fe206f84af0
-- 
2.43.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-05-22  0:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19  2:36 [PATCH 0/2] Introduce RET_PF_RETRY_INVALID_SLOT Yan Zhao
2025-05-19  2:37 ` [PATCH 1/2] KVM: x86/mmu: Add RET_PF_RETRY_INVALID_SLOT for fault retry on invalid slot Yan Zhao
2025-05-19 13:33   ` Sean Christopherson
2025-05-19 15:05     ` Reinette Chatre
2025-05-19 15:22       ` Edgecombe, Rick P
2025-05-19 15:53         ` Sean Christopherson
2025-05-19 16:17           ` Edgecombe, Rick P
2025-05-19 16:12     ` Edgecombe, Rick P
2025-05-19 17:06       ` Sean Christopherson
2025-05-19 17:49         ` Reinette Chatre
2025-05-19 20:14         ` Edgecombe, Rick P
2025-05-20  5:33         ` Yan Zhao
2025-05-20 16:13           ` Sean Christopherson
2025-05-21  1:45             ` Yan Zhao
2025-05-21 15:45               ` Sean Christopherson
2025-05-22  0:40                 ` Yan Zhao
2025-05-20  5:27     ` Yan Zhao
2025-05-19  2:38 ` [PATCH 2/2] KVM: selftests: Test prefault memory with concurrent memslot removal Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).