[PATCH RESEND] x86/paravirt: add backoff mechanism to virt_spin

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND] x86/paravirt: add backoff mechanism to virt_spin_lock
@ 2025-07-03  2:23 Wangyang Guo
  2025-07-16  9:25 ` Guo, Wangyang
  0 siblings, 1 reply; 2+ messages in thread
From: Wangyang Guo @ 2025-07-03  2:23 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, kvm
  Cc: wangyang.guo, Tianyou Li, Tim Chen

When multiple threads waiting for lock at the same time, once lock owner
releases the lock, waiters will see lock available and all try to lock,
which may cause an expensive CAS storm.

Binary exponential backoff is introduced. As try-lock attempt increases,
there is more likely that a larger number threads compete for the same
lock, so increase wait time in exponential.

The optimization can improves SpecCPU2017 502.gcc_r benchmark by ~4% for
288 cores VM on Intel Xeon 6 E-cores platform.

For micro benchmark, the patch can have significant performance gain
in high contention case. Slight regression is found in some of mid-
conetented cases because the last waiter might take longer to check
unlocked. No changes to low contented scenario as expected.

Micro Bench: https://github.com/guowangy/kernel-lock-bench
Test Platform: Xeon 8380L
First Row: critical section length
First Col: CPU core number
Values: backoff vs linux-6.15, throughput based, higher is better

non-critical-length: 1
       0     1     2     4     8    16    32    64   128
1   1.01  1.00  1.00  1.00  1.01  1.01  1.01  1.01  1.00
2   1.02  1.01  1.02  0.97  1.02  1.05  1.01  1.00  1.01
4   1.15  1.20  1.14  1.11  1.34  1.26  0.99  0.93  0.98
8   1.59  1.71  1.18  1.80  1.95  1.45  1.05  0.99  1.17
16  1.04  1.37  1.08  1.31  1.85  1.50  1.24  0.99  1.24
32  1.24  1.36  1.23  1.40  1.50  1.86  1.45  1.18  1.48
64  1.12  1.24  1.11  1.31  1.34  1.37  2.01  1.60  1.43

non-critical-length: 32
       0     1     2     4     8    16    32    64   128
1   1.00  1.00  1.00  1.00  1.00  0.99  1.00  1.00  1.01
2   1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.99  1.00
4   1.12  1.25  1.09  1.07  1.12  1.16  1.13  1.16  1.09
8   1.02  1.16  1.03  1.02  1.04  1.07  1.04  0.99  0.98
16  0.97  0.95  0.84  0.96  0.99  0.98  0.98  1.01  1.03
32  1.05  1.03  0.87  1.05  1.25  1.16  1.25  1.30  1.27
64  1.83  1.10  1.07  1.02  1.19  1.18  1.21  1.14  1.13

non-critical-length: 128
       0     1     2     4     8    16    32    64   128
1   1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
2   0.99  1.02  1.00  1.00  1.00  1.00  1.00  1.00  1.00
4   0.98  0.99  1.00  1.00  0.99  1.04  0.99  0.99  1.02
8   1.08  1.08  1.08  1.07  1.15  1.12  1.03  0.94  1.00
16  1.00  1.00  1.00  1.01  1.01  1.01  1.36  1.06  1.02
32  1.07  1.08  1.07  1.07  1.09  1.10  1.22  1.36  1.25
64  1.03  1.04  1.04  1.06  1.13  1.18  0.82  1.02  1.14

Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
---
 arch/x86/include/asm/qspinlock.h | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 68da67df304d..ac6e1bbd9ba4 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -87,7 +87,7 @@ DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
 #define virt_spin_lock virt_spin_lock
 static inline bool virt_spin_lock(struct qspinlock *lock)
 {
-	int val;
+	int val, locked;
 
 	if (!static_branch_likely(&virt_spin_lock_key))
 		return false;
@@ -98,11 +98,33 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
 	 * horrible lock 'holder' preemption issues.
 	 */
 
+#define MAX_BACKOFF 64
+	int backoff = 1;
+
  __retry:
 	val = atomic_read(&lock->val);
+	locked = val;
+
+	if (locked || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
+		int spin_count = backoff;
+
+		while (spin_count--)
+			cpu_relax();
+
+		/*
+		 * Here not locked means lock tried, but fails.
+		 *
+		 * When multiple threads waiting for lock at the same time,
+		 * once lock owner releases the lock, waiters will see lock available
+		 * and all try to lock, which may cause an expensive CAS storm.
+		 *
+		 * Binary exponential backoff is introduced. As try-lock attempt
+		 * increases, there is more likely that a larger number threads
+		 * compete for the same lock, so increase wait time in exponential.
+		 */
+		if (!locked)
+			backoff = (backoff < MAX_BACKOFF) ? backoff << 1 : backoff;
 
-	if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
-		cpu_relax();
 		goto __retry;
 	}
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH RESEND] x86/paravirt: add backoff mechanism to virt_spin_lock
  2025-07-03  2:23 [PATCH RESEND] x86/paravirt: add backoff mechanism to virt_spin_lock Wangyang Guo
@ 2025-07-16  9:25 ` Guo, Wangyang
  0 siblings, 0 replies; 2+ messages in thread
From: Guo, Wangyang @ 2025-07-16  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, kvm, Paolo Bonzini,
	Vitaly Kuznetsov, Sean Christopherson
  Cc: Tianyou Li, Tim Chen

Any comments or suggestions to this patch? Is there any further updates 
or changes needed?

BR
Wangyang

On 7/3/2025 10:23 AM, Wangyang Guo wrote:
> When multiple threads waiting for lock at the same time, once lock owner
> releases the lock, waiters will see lock available and all try to lock,
> which may cause an expensive CAS storm.
> 
> Binary exponential backoff is introduced. As try-lock attempt increases,
> there is more likely that a larger number threads compete for the same
> lock, so increase wait time in exponential.
> 
> The optimization can improves SpecCPU2017 502.gcc_r benchmark by ~4% for
> 288 cores VM on Intel Xeon 6 E-cores platform.
> 
> For micro benchmark, the patch can have significant performance gain
> in high contention case. Slight regression is found in some of mid-
> conetented cases because the last waiter might take longer to check
> unlocked. No changes to low contented scenario as expected.
> 
> Micro Bench: https://github.com/guowangy/kernel-lock-bench
> Test Platform: Xeon 8380L
> First Row: critical section length
> First Col: CPU core number
> Values: backoff vs linux-6.15, throughput based, higher is better
> 
> non-critical-length: 1
>         0     1     2     4     8    16    32    64   128
> 1   1.01  1.00  1.00  1.00  1.01  1.01  1.01  1.01  1.00
> 2   1.02  1.01  1.02  0.97  1.02  1.05  1.01  1.00  1.01
> 4   1.15  1.20  1.14  1.11  1.34  1.26  0.99  0.93  0.98
> 8   1.59  1.71  1.18  1.80  1.95  1.45  1.05  0.99  1.17
> 16  1.04  1.37  1.08  1.31  1.85  1.50  1.24  0.99  1.24
> 32  1.24  1.36  1.23  1.40  1.50  1.86  1.45  1.18  1.48
> 64  1.12  1.24  1.11  1.31  1.34  1.37  2.01  1.60  1.43
> 
> non-critical-length: 32
>         0     1     2     4     8    16    32    64   128
> 1   1.00  1.00  1.00  1.00  1.00  0.99  1.00  1.00  1.01
> 2   1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.99  1.00
> 4   1.12  1.25  1.09  1.07  1.12  1.16  1.13  1.16  1.09
> 8   1.02  1.16  1.03  1.02  1.04  1.07  1.04  0.99  0.98
> 16  0.97  0.95  0.84  0.96  0.99  0.98  0.98  1.01  1.03
> 32  1.05  1.03  0.87  1.05  1.25  1.16  1.25  1.30  1.27
> 64  1.83  1.10  1.07  1.02  1.19  1.18  1.21  1.14  1.13
> 
> non-critical-length: 128
>         0     1     2     4     8    16    32    64   128
> 1   1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
> 2   0.99  1.02  1.00  1.00  1.00  1.00  1.00  1.00  1.00
> 4   0.98  0.99  1.00  1.00  0.99  1.04  0.99  0.99  1.02
> 8   1.08  1.08  1.08  1.07  1.15  1.12  1.03  0.94  1.00
> 16  1.00  1.00  1.00  1.01  1.01  1.01  1.36  1.06  1.02
> 32  1.07  1.08  1.07  1.07  1.09  1.10  1.22  1.36  1.25
> 64  1.03  1.04  1.04  1.06  1.13  1.18  0.82  1.02  1.14
> 
> Reviewed-by: Tianyou Li <tianyou.li@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
> ---
>   arch/x86/include/asm/qspinlock.h | 28 +++++++++++++++++++++++++---
>   1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 68da67df304d..ac6e1bbd9ba4 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -87,7 +87,7 @@ DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
>   #define virt_spin_lock virt_spin_lock
>   static inline bool virt_spin_lock(struct qspinlock *lock)
>   {
> -	int val;
> +	int val, locked;
>   
>   	if (!static_branch_likely(&virt_spin_lock_key))
>   		return false;
> @@ -98,11 +98,33 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>   	 * horrible lock 'holder' preemption issues.
>   	 */
>   
> +#define MAX_BACKOFF 64
> +	int backoff = 1;
> +
>    __retry:
>   	val = atomic_read(&lock->val);
> +	locked = val;
> +
> +	if (locked || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
> +		int spin_count = backoff;
> +
> +		while (spin_count--)
> +			cpu_relax();
> +
> +		/*
> +		 * Here not locked means lock tried, but fails.
> +		 *
> +		 * When multiple threads waiting for lock at the same time,
> +		 * once lock owner releases the lock, waiters will see lock available
> +		 * and all try to lock, which may cause an expensive CAS storm.
> +		 *
> +		 * Binary exponential backoff is introduced. As try-lock attempt
> +		 * increases, there is more likely that a larger number threads
> +		 * compete for the same lock, so increase wait time in exponential.
> +		 */
> +		if (!locked)
> +			backoff = (backoff < MAX_BACKOFF) ? backoff << 1 : backoff;
>   
> -	if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
> -		cpu_relax();
>   		goto __retry;
>   	}
>   


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-07-16  9:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03  2:23 [PATCH RESEND] x86/paravirt: add backoff mechanism to virt_spin_lock Wangyang Guo
2025-07-16  9:25 ` Guo, Wangyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).