* Re: [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
2024-02-22 1:26 [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing Sean Christopherson
@ 2024-02-22 10:09 ` Huang, Kai
2024-02-22 16:59 ` Friedrich Weber
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Huang, Kai @ 2024-02-22 10:09 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com
Cc: yilun.xu@linux.intel.com, kvm@vger.kernel.org,
f.weber@proxmox.com, linux-kernel@vger.kernel.org, Zhao, Yan Y,
yuan.yao@linux.intel.com
On Wed, 2024-02-21 at 17:26 -0800, Sean Christopherson wrote:
> Retry page faults without acquiring mmu_lock, and without even faulting
> the page into the primary MMU, if the resolved gfn is covered by an active
> invalidation. Contending for mmu_lock is especially problematic on
> preemptible kernels as the mmu_notifier invalidation task will yield
> mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
> ultimately increase the latency of resolving the page fault. And in the
> worst case scenario, yielding will be accompanied by a remote TLB flush,
> e.g. if the invalidation covers a large range of memory and vCPUs are
> accessing addresses that were already zapped.
>
> Faulting the page into the primary MMU is similarly problematic, as doing
> so may acquire locks that need to be taken for the invalidation to
> complete (the primary MMU has finer grained locks than KVM's MMU), and/or
> may cause unnecessary churn (getting/putting pages, marking them accessed,
> etc).
>
> Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
> iterators to perform more work before yielding, but that wouldn't solve
> the lock contention and would negatively affect scenarios where a vCPU is
> trying to fault in an address that is NOT covered by the in-progress
> invalidation.
>
> Add a dedicated lockess version of the range-based retry check to avoid
> false positives on the sanity check on start+end WARN, and so that it's
> super obvious that checking for a racing invalidation without holding
> mmu_lock is unsafe (though obviously useful).
>
> Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
> invalidation in a loop won't put KVM into an infinite loop, e.g. due to
> caching the in-progress flag and never seeing it go to '0'.
>
> Force a load of mmu_invalidate_seq as well, even though it isn't strictly
> necessary to avoid an infinite loop, as doing so improves the probability
> that KVM will detect an invalidation that already completed before
> acquiring mmu_lock and bailing anyways.
>
> Do the pre-check even for non-preemptible kernels, as waiting to detect
> the invalidation until mmu_lock is held guarantees the vCPU will observe
> the worst case latency in terms of handling the fault, and can generate
> even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock,
> detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
> eventually re-acquire mmu_lock. This behavior is also why there are no
> new starvation issues due to losing the fairness guarantees provided by
> rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
> on mmu_lock doesn't guarantee forward progress in the face of _another_
> mmu_notifier invalidation event.
>
> Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
> may generate a load into a register instead of doing a direct comparison
> (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
> is a few bytes of code and maaaaybe a cycle or three.
>
> Reported-by: Yan Zhao <yan.y.zhao@intel.com>
> Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> Reported-by: Friedrich Weber <f.weber@proxmox.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Yuan Yao <yuan.yao@linux.intel.com>
> Cc: Xu Yilun <yilun.xu@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>
Acked-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
2024-02-22 1:26 [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing Sean Christopherson
2024-02-22 10:09 ` Huang, Kai
@ 2024-02-22 16:59 ` Friedrich Weber
2024-02-23 1:35 ` Sean Christopherson
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Friedrich Weber @ 2024-02-22 16:59 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Yan Zhao, Kai Huang, Yuan Yao, Xu Yilun
On 22/02/2024 02:26, Sean Christopherson wrote:
> Retry page faults without acquiring mmu_lock, and without even faulting
> the page into the primary MMU, if the resolved gfn is covered by an active
> invalidation. Contending for mmu_lock is especially problematic on
> preemptible kernels as the mmu_notifier invalidation task will yield
> mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
> ultimately increase the latency of resolving the page fault. And in the
> worst case scenario, yielding will be accompanied by a remote TLB flush,
> e.g. if the invalidation covers a large range of memory and vCPUs are
> accessing addresses that were already zapped.
>
> Faulting the page into the primary MMU is similarly problematic, as doing
> so may acquire locks that need to be taken for the invalidation to
> complete (the primary MMU has finer grained locks than KVM's MMU), and/or
> may cause unnecessary churn (getting/putting pages, marking them accessed,
> etc).
>
> Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
> iterators to perform more work before yielding, but that wouldn't solve
> the lock contention and would negatively affect scenarios where a vCPU is
> trying to fault in an address that is NOT covered by the in-progress
> invalidation.
>
> Add a dedicated lockess version of the range-based retry check to avoid
> false positives on the sanity check on start+end WARN, and so that it's
> super obvious that checking for a racing invalidation without holding
> mmu_lock is unsafe (though obviously useful).
>
> Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
> invalidation in a loop won't put KVM into an infinite loop, e.g. due to
> caching the in-progress flag and never seeing it go to '0'.
>
> Force a load of mmu_invalidate_seq as well, even though it isn't strictly
> necessary to avoid an infinite loop, as doing so improves the probability
> that KVM will detect an invalidation that already completed before
> acquiring mmu_lock and bailing anyways.
>
> Do the pre-check even for non-preemptible kernels, as waiting to detect
> the invalidation until mmu_lock is held guarantees the vCPU will observe
> the worst case latency in terms of handling the fault, and can generate
> even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock,
> detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
> eventually re-acquire mmu_lock. This behavior is also why there are no
> new starvation issues due to losing the fairness guarantees provided by
> rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
> on mmu_lock doesn't guarantee forward progress in the face of _another_
> mmu_notifier invalidation event.
>
> Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
> may generate a load into a register instead of doing a direct comparison
> (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
> is a few bytes of code and maaaaybe a cycle or three.
>
> Reported-by: Yan Zhao <yan.y.zhao@intel.com>
> Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> Reported-by: Friedrich Weber <f.weber@proxmox.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Yuan Yao <yuan.yao@linux.intel.com>
> Cc: Xu Yilun <yilun.xu@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
Couldn't find the base-commit 21dbc438 (might not have looked at the
right place though), so I applied this patch on top of c48617fb ("Merge
tag 'kvmarm-fixes-6.8-3' ...") from kvm/kvm.git. Can confirm the patch
fixes the temporary guest hangs in combination with KSM and NUMA
balancing [1]. Thanks!
[1]
https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
2024-02-22 1:26 [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing Sean Christopherson
2024-02-22 10:09 ` Huang, Kai
2024-02-22 16:59 ` Friedrich Weber
@ 2024-02-23 1:35 ` Sean Christopherson
2024-02-23 5:08 ` Yan Zhao
2024-02-24 16:44 ` Xu Yilun
4 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-02-23 1:35 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Yan Zhao, Friedrich Weber, Kai Huang, Yuan Yao,
Xu Yilun
On Wed, 21 Feb 2024 17:26:40 -0800, Sean Christopherson wrote:
> Retry page faults without acquiring mmu_lock, and without even faulting
> the page into the primary MMU, if the resolved gfn is covered by an active
> invalidation. Contending for mmu_lock is especially problematic on
> preemptible kernels as the mmu_notifier invalidation task will yield
> mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
> ultimately increase the latency of resolving the page fault. And in the
> worst case scenario, yielding will be accompanied by a remote TLB flush,
> e.g. if the invalidation covers a large range of memory and vCPUs are
> accessing addresses that were already zapped.
>
> [...]
Applied (quickly) to kvm-x86 fixes, as I want to get this into -next for at
least a day or two before sending it to Paolo for 6.8. But I'm more than happy
to squash in reviews/acks, especially since many people gave very helpful
feedback on earlier versions.
[1/1] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
https://github.com/kvm-x86/linux/commit/67e4022ffad6
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
2024-02-22 1:26 [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing Sean Christopherson
` (2 preceding siblings ...)
2024-02-23 1:35 ` Sean Christopherson
@ 2024-02-23 5:08 ` Yan Zhao
2024-02-23 18:15 ` Sean Christopherson
2024-02-24 16:44 ` Xu Yilun
4 siblings, 1 reply; 7+ messages in thread
From: Yan Zhao @ 2024-02-23 5:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Friedrich Weber, Kai Huang,
Yuan Yao, Xu Yilun
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
On Wed, Feb 21, 2024 at 05:26:40PM -0800, Sean Christopherson wrote:
> Retry page faults without acquiring mmu_lock, and without even faulting
> the page into the primary MMU, if the resolved gfn is covered by an active
> invalidation. Contending for mmu_lock is especially problematic on
> preemptible kernels as the mmu_notifier invalidation task will yield
> mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
> ultimately increase the latency of resolving the page fault. And in the
> worst case scenario, yielding will be accompanied by a remote TLB flush,
> e.g. if the invalidation covers a large range of memory and vCPUs are
> accessing addresses that were already zapped.
>
> Faulting the page into the primary MMU is similarly problematic, as doing
> so may acquire locks that need to be taken for the invalidation to
> complete (the primary MMU has finer grained locks than KVM's MMU), and/or
> may cause unnecessary churn (getting/putting pages, marking them accessed,
> etc).
>
> Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
> iterators to perform more work before yielding, but that wouldn't solve
> the lock contention and would negatively affect scenarios where a vCPU is
> trying to fault in an address that is NOT covered by the in-progress
> invalidation.
>
> Add a dedicated lockess version of the range-based retry check to avoid
> false positives on the sanity check on start+end WARN, and so that it's
> super obvious that checking for a racing invalidation without holding
> mmu_lock is unsafe (though obviously useful).
>
> Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
> invalidation in a loop won't put KVM into an infinite loop, e.g. due to
> caching the in-progress flag and never seeing it go to '0'.
>
> Force a load of mmu_invalidate_seq as well, even though it isn't strictly
> necessary to avoid an infinite loop, as doing so improves the probability
> that KVM will detect an invalidation that already completed before
> acquiring mmu_lock and bailing anyways.
>
> Do the pre-check even for non-preemptible kernels, as waiting to detect
> the invalidation until mmu_lock is held guarantees the vCPU will observe
> the worst case latency in terms of handling the fault, and can generate
> even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock,
> detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
> eventually re-acquire mmu_lock. This behavior is also why there are no
> new starvation issues due to losing the fairness guarantees provided by
> rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
> on mmu_lock doesn't guarantee forward progress in the face of _another_
> mmu_notifier invalidation event.
>
> Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
> may generate a load into a register instead of doing a direct comparison
> (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
> is a few bytes of code and maaaaybe a cycle or three.
>
> Reported-by: Yan Zhao <yan.y.zhao@intel.com>
> Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> Reported-by: Friedrich Weber <f.weber@proxmox.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Yuan Yao <yuan.yao@linux.intel.com>
> Cc: Xu Yilun <yilun.xu@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>
> v5:
> - Fix the inverted slot check. [Xu]
> - Drop all the other patches (will post separately).
>
> arch/x86/kvm/mmu/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++++
> include/linux/kvm_host.h | 26 +++++++++++++++++++++++++
> 2 files changed, 68 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3c193b096b45..274acc53f0e9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4405,6 +4405,31 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> smp_rmb();
>
> + /*
> + * Check for a relevant mmu_notifier invalidation event before getting
> + * the pfn from the primary MMU, and before acquiring mmu_lock.
> + *
> + * For mmu_lock, if there is an in-progress invalidation and the kernel
> + * allows preemption, the invalidation task may drop mmu_lock and yield
> + * in response to mmu_lock being contended, which is *very* counter-
> + * productive as this vCPU can't actually make forward progress until
> + * the invalidation completes.
> + *
> + * Retrying now can also avoid unnessary lock contention in the primary
> + * MMU, as the primary MMU doesn't necessarily hold a single lock for
> + * the duration of the invalidation, i.e. faulting in a conflicting pfn
> + * can cause the invalidation to take longer by holding locks that are
> + * needed to complete the invalidation.
> + *
> + * Do the pre-check even for non-preemtible kernels, i.e. even if KVM
> + * will never yield mmu_lock in response to contention, as this vCPU is
> + * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
> + * to detect retry guarantees the worst case latency for the vCPU.
> + */
> + if (fault->slot &&
> + mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
> + return RET_PF_RETRY;
> +
> ret = __kvm_faultin_pfn(vcpu, fault);
> if (ret != RET_PF_CONTINUE)
> return ret;
> @@ -4415,6 +4440,18 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> if (unlikely(!fault->slot))
> return kvm_handle_noslot_fault(vcpu, fault, access);
>
> + /*
> + * Check again for a relevant mmu_notifier invalidation event purely to
> + * avoid contending mmu_lock. Most invalidations will be detected by
> + * the previous check, but checking is extremely cheap relative to the
> + * overall cost of failing to detect the invalidation until after
> + * mmu_lock is acquired.
> + */
> + if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) {
> + kvm_release_pfn_clean(fault->pfn);
> + return RET_PF_RETRY;
> + }
> +
> return RET_PF_CONTINUE;
> }
>
> @@ -4442,6 +4479,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
> if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
> return true;
>
> + /*
> + * Check for a relevant mmu_notifier invalidation event one last time
> + * now that mmu_lock is held, as the "unsafe" checks performed without
> + * holding mmu_lock can get false negatives.
> + */
> return fault->slot &&
> mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
> }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 18e28610749e..97afe4519772 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2062,6 +2062,32 @@ static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
> return 1;
> return 0;
> }
> +
> +/*
> + * This lockless version of the range-based retry check *must* be paired with a
> + * call to the locked version after acquiring mmu_lock, i.e. this is safe to
> + * use only as a pre-check to avoid contending mmu_lock. This version *will*
> + * get false negatives and false positives.
> + */
> +static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
> + unsigned long mmu_seq,
> + gfn_t gfn)
> +{
> + /*
> + * Use READ_ONCE() to ensure the in-progress flag and sequence counter
> + * are always read from memory, e.g. so that checking for retry in a
> + * loop won't result in an infinite retry loop. Don't force loads for
> + * start+end, as the key to avoiding infinite retry loops is observing
> + * the 1=>0 transition of in-progress, i.e. getting false negatives
> + * due to stale start+end values is acceptable.
> + */
> + if (unlikely(READ_ONCE(kvm->mmu_invalidate_in_progress)) &&
> + gfn >= kvm->mmu_invalidate_range_start &&
> + gfn < kvm->mmu_invalidate_range_end)
> + return true;
> +
> + return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
> +}
> #endif
>
> #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>
> base-commit: 21dbc438dde69ff630b3264c54b94923ee9fcdcf
> --
> 2.44.0.rc0.258.g7320e95886-goog
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
2024-02-23 5:08 ` Yan Zhao
@ 2024-02-23 18:15 ` Sean Christopherson
0 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-02-23 18:15 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, kvm, linux-kernel, Friedrich Weber, Kai Huang,
Yuan Yao, Xu Yilun
On Fri, Feb 23, 2024, Yan Zhao wrote:
> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Squashed, new hash below. Thanks!
[1/1] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
https://github.com/kvm-x86/linux/commit/d02c357e5bfa
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
2024-02-22 1:26 [PATCH v5] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing Sean Christopherson
` (3 preceding siblings ...)
2024-02-23 5:08 ` Yan Zhao
@ 2024-02-24 16:44 ` Xu Yilun
4 siblings, 0 replies; 7+ messages in thread
From: Xu Yilun @ 2024-02-24 16:44 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Friedrich Weber,
Kai Huang, Yuan Yao
On Wed, Feb 21, 2024 at 05:26:40PM -0800, Sean Christopherson wrote:
> Retry page faults without acquiring mmu_lock, and without even faulting
> the page into the primary MMU, if the resolved gfn is covered by an active
> invalidation. Contending for mmu_lock is especially problematic on
> preemptible kernels as the mmu_notifier invalidation task will yield
> mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
> ultimately increase the latency of resolving the page fault. And in the
> worst case scenario, yielding will be accompanied by a remote TLB flush,
> e.g. if the invalidation covers a large range of memory and vCPUs are
> accessing addresses that were already zapped.
>
> Faulting the page into the primary MMU is similarly problematic, as doing
> so may acquire locks that need to be taken for the invalidation to
> complete (the primary MMU has finer grained locks than KVM's MMU), and/or
> may cause unnecessary churn (getting/putting pages, marking them accessed,
> etc).
>
> Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
> iterators to perform more work before yielding, but that wouldn't solve
> the lock contention and would negatively affect scenarios where a vCPU is
> trying to fault in an address that is NOT covered by the in-progress
> invalidation.
>
> Add a dedicated lockess version of the range-based retry check to avoid
> false positives on the sanity check on start+end WARN, and so that it's
> super obvious that checking for a racing invalidation without holding
> mmu_lock is unsafe (though obviously useful).
>
> Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
> invalidation in a loop won't put KVM into an infinite loop, e.g. due to
> caching the in-progress flag and never seeing it go to '0'.
>
> Force a load of mmu_invalidate_seq as well, even though it isn't strictly
> necessary to avoid an infinite loop, as doing so improves the probability
> that KVM will detect an invalidation that already completed before
> acquiring mmu_lock and bailing anyways.
>
> Do the pre-check even for non-preemptible kernels, as waiting to detect
> the invalidation until mmu_lock is held guarantees the vCPU will observe
> the worst case latency in terms of handling the fault, and can generate
> even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock,
> detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
> eventually re-acquire mmu_lock. This behavior is also why there are no
> new starvation issues due to losing the fairness guarantees provided by
> rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
> on mmu_lock doesn't guarantee forward progress in the face of _another_
> mmu_notifier invalidation event.
>
> Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
> may generate a load into a register instead of doing a direct comparison
> (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
> is a few bytes of code and maaaaybe a cycle or three.
>
> Reported-by: Yan Zhao <yan.y.zhao@intel.com>
> Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
> Reported-by: Friedrich Weber <f.weber@proxmox.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Yuan Yao <yuan.yao@linux.intel.com>
> Cc: Xu Yilun <yilun.xu@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
^ permalink raw reply [flat|nested] 7+ messages in thread