All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] locking/qspinlock: Make the 1st spinner only spin on locked_pending bits
@ 2023-05-08  8:15 Qiuxu Zhuo
  2023-05-08 15:31 ` Waiman Long
  2023-05-15  6:42 ` kernel test robot
  0 siblings, 2 replies; 7+ messages in thread
From: Qiuxu Zhuo @ 2023-05-08  8:15 UTC (permalink / raw)
  To: Peter Zijlstra, Waiman Long, Ingo Molnar, Will Deacon
  Cc: Qiuxu Zhuo, Boqun Feng, linux-kernel

The 1st spinner (header of the MCS queue) spins on the whole qspinlock
variable to check whether the lock is released. For a contended qspinlock,
this spinning is a hotspot as each CPU queued in the MCS queue performs
the spinning when it becomes the 1st spinner (header of the MCS queue).

The granularity among SMT h/w threads in the same core could be "byte"
which the Load-Store Unit (LSU) inside the core handles. Making the 1st
spinner only spin on locked_pending bits (not the whole qspinlock) can
avoid the false dependency between the tail field and the locked_pending
field. So this micro-optimization helps the h/w thread (the 1st spinner)
stay in a low power state and prevents it from being woken up by other
h/w threads in the same core when they perform xchg_tail() to update
the tail field. Please see a similar discussion in the link [1].

[1] https://lore.kernel.org/r/20230105021952.3090070-1-guoren@kernel.org

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 kernel/locking/qspinlock.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index efebbf19f887..e7b990b28610 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -513,7 +513,20 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	if ((val = pv_wait_head_or_lock(lock, node)))
 		goto locked;
 
+#if _Q_PENDING_BITS == 8
+	/*
+	 * Spinning on the 2-byte locked_pending instead of the 4-byte qspinlock
+	 * variable can avoid the false dependency between the tail field and
+	 * the locked_pending field. This helps the h/w thread (the 1st spinner)
+	 * stay in a low power state and prevents it from being woken up by other
+	 * h/w threads in the same core when they perform xchg_tail() to update
+	 * the tail field only.
+	 */
+	smp_cond_load_acquire(&lock->locked_pending, !VAL);
+	val = atomic_read_acquire(&lock->val);
+#else
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+#endif
 
 locked:
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-15  6:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-08  8:15 [PATCH 1/1] locking/qspinlock: Make the 1st spinner only spin on locked_pending bits Qiuxu Zhuo
2023-05-08 15:31 ` Waiman Long
2023-05-09  2:45   ` Zhuo, Qiuxu
2023-05-09  2:57     ` Waiman Long
2023-05-09  3:30       ` Zhuo, Qiuxu
2023-05-09 14:33         ` Waiman Long
2023-05-15  6:42 ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.