[PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code
@ 2026-03-06 22:51 david.laight.linux
  2026-03-06 22:51 ` [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path david.laight.linux
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: david.laight.linux @ 2026-03-06 22:51 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt
  Cc: David Laight

From: David Laight <david.laight.linux@gmail.com>

This is a slightly edited copy of v2 from 2 years ago.
I've re-read the comments (on v1 and v2).
Patch #3 now unconditionally calls decode_cpu() when stabilizing @prev
(I'm not at all sure the cpu number can ever be unchanged.)
Patch #5 now converts almost all the cpu numbers to 'unsigned int'.

Fot patch #2 I've found a note that:
kernel test robot noticed a 10.7% improvement of stress-ng.netlink-task.ops_per_sec

Notes from v2:
Patch #1 is the node->locked part of v1's patch #2.

Patch #2 removes the pretty much guaranteed cache line reload getting
the cpu number (from node->prev) for the vcpu_is_preempted() check.
It is (basically) the old #5 with the addition of a READ_ONCE()
and leaving the '+ 1' offset (for patch 3).

Patch #3 ends up removing both node->cpu and node->prev.
This saves issues initialising node->cpu.
Basically node->cpu was only ever read as node->prev->cpu in the unqueue code.
Most of the time it is the value read from lock->tail that was used to
obtain 'prev' in the first place.
The only time it is different is in the unlock race path where 'prev'
is re-read from node->prev - updated right at the bottom of osq_lock().
So the updated node->prev_cpu can used (and prev obtained from it) without
worrying about only one of node->prev and node->prev-cpu being updated.

Linus did suggest just saving the cpu numbers instead of pointers.
It actually works for 'prev' but not 'next'.

Patch #4 removes the unnecessary node->next = NULL
assignment from the top of osq_lock().

Patch #5 just stops gcc using two separate instructions to decrement
the offset cpu number and then convert it to 64 bits.
Linus got annoyed with it, and I'd spotted it as well.
I don't seem to be able to get gcc to convert __per_cpu_offset[cpu - 1]
to (__per_cpu_offset - 1)[cpu] (cpu is offset by one) but, in any case,
it would still need zero extending in the common case.

David Laight (5):
  Defer clearing node->locked until the slow osq_lock() path.
  Optimise vcpu_is_preempted() check.
  Use node->prev_cpu instead of saving node->prev.
  Optimise decode_cpu() and per_cpu_ptr().
  Avoid writing to node->next in the osq_lock() fast path.

 kernel/locking/osq_lock.c | 56 +++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 29 deletions(-)

-- 
2.39.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path
  2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
@ 2026-03-06 22:51 ` david.laight.linux
  2026-03-06 23:01   ` David Laight
  2026-03-06 22:51 ` [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check david.laight.linux
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: david.laight.linux @ 2026-03-06 22:51 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt
  Cc: David Laight, David Laight

From: David Laight <david.laight@aculab.com>

node->locked is used to indicate that the owner of the lock has handed it
off to the waiting CPU.
As such it's value is only relevant in the slow path so it need not be
initialised in the fast path.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---
 kernel/locking/osq_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index b4233dc2c2b0..96c6094157b5 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -97,7 +97,6 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	int curr = encode_cpu(smp_processor_id());
 	int old;
 
-	node->locked = 0;
 	node->next = NULL;
 	node->cpu = curr;
 
@@ -113,6 +112,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 
 	prev = decode_cpu(old);
 	node->prev = prev;
+	node->locked = 0;
 
 	/*
 	 * osq_lock()			unqueue
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check
  2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
  2026-03-06 22:51 ` [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path david.laight.linux
@ 2026-03-06 22:51 ` david.laight.linux
  2026-03-06 23:01   ` David Laight
  2026-03-06 23:03   ` David Laight
  2026-03-06 22:51 ` [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev david.laight.linux
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: david.laight.linux @ 2026-03-06 22:51 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt
  Cc: David Laight

From: David Laight <david.laight.linux@gmail.com>

The vcpu_is_preempted() test stops osq_lock() spinning if a virtual
  CPU is no longer running.
Although patched out for bare-metal, the code still needs the CPU number.
Reading this from 'prev->cpu' is a pretty much guaranteed have a cache miss
when osq_unlock() is waking up the next cpu.

Instead save 'prev->cpu' in 'node->prev_cpu' and use that value instead.
Update in the osq_lock() 'unqueue' path when 'node->prev' is changed.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---
 kernel/locking/osq_lock.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 96c6094157b5..0e1c7d11b6c0 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -16,6 +16,7 @@ struct optimistic_spin_node {
 	struct optimistic_spin_node *next, *prev;
 	int locked; /* 1 if lock acquired */
 	int cpu; /* encoded CPU # + 1 value */
+	int prev_cpu; /* encoded CPU # + 1 value */
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
@@ -29,9 +30,9 @@ static inline int encode_cpu(int cpu_nr)
 	return cpu_nr + 1;
 }
 
-static inline int node_cpu(struct optimistic_spin_node *node)
+static inline int prev_cpu_nr(struct optimistic_spin_node *node)
 {
-	return node->cpu - 1;
+	return READ_ONCE(node->prev_cpu) - 1;
 }
 
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
@@ -110,6 +111,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	if (old == OSQ_UNLOCKED_VAL)
 		return true;
 
+	WRITE_ONCE(node->prev_cpu, old);
 	prev = decode_cpu(old);
 	node->prev = prev;
 	node->locked = 0;
@@ -144,7 +146,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * polling, be careful.
 	 */
 	if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
-				  vcpu_is_preempted(node_cpu(node->prev))))
+				  vcpu_is_preempted(prev_cpu_nr(node))))
 		return true;
 
 	/* unqueue */
@@ -201,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * it will wait in Step-A.
 	 */
 
+	WRITE_ONCE(next->prev_cpu, prev->cpu);
 	WRITE_ONCE(next->prev, prev);
 	WRITE_ONCE(prev->next, next);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev
  2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
  2026-03-06 22:51 ` [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path david.laight.linux
  2026-03-06 22:51 ` [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check david.laight.linux
@ 2026-03-06 22:51 ` david.laight.linux
  2026-03-06 23:01   ` David Laight
  2026-03-06 23:03   ` David Laight
  2026-03-06 22:51 ` [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr() david.laight.linux
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: david.laight.linux @ 2026-03-06 22:51 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt
  Cc: David Laight

From: David Laight <david.laight.linux@gmail.com>

node->prev is only used to update 'prev' in the unlikely case
of concurrent unqueues.
The new 'prev' pointer can be obtained from prev_cpu.

node->cpu (or more particularly) prev->cpu is only used for the
osq_wait_next() call in the unqueue path.
Normally this is exactly the value that the initial xchg() read
from lock->tail (used to obtain 'prev'), but can get updated
by concurrent unqueues.

Both the 'prev' and 'cpu' members of optimistic_spin_node are
now unused and can be deleted.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---
 kernel/locking/osq_lock.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 0e1c7d11b6c0..5dd7e08d4fda 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -13,9 +13,8 @@
  */
 
 struct optimistic_spin_node {
-	struct optimistic_spin_node *next, *prev;
+	struct optimistic_spin_node *next;
 	int locked; /* 1 if lock acquired */
-	int cpu; /* encoded CPU # + 1 value */
 	int prev_cpu; /* encoded CPU # + 1 value */
 };
 
@@ -96,10 +95,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
 	struct optimistic_spin_node *prev, *next;
 	int curr = encode_cpu(smp_processor_id());
-	int old;
+	int prev_cpu;
 
 	node->next = NULL;
-	node->cpu = curr;
 
 	/*
 	 * We need both ACQUIRE (pairs with corresponding RELEASE in
@@ -107,23 +105,22 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * the node fields we just initialised) semantics when updating
 	 * the lock tail.
 	 */
-	old = atomic_xchg(&lock->tail, curr);
-	if (old == OSQ_UNLOCKED_VAL)
+	prev_cpu = atomic_xchg(&lock->tail, curr);
+	if (prev_cpu == OSQ_UNLOCKED_VAL)
 		return true;
 
-	WRITE_ONCE(node->prev_cpu, old);
-	prev = decode_cpu(old);
-	node->prev = prev;
+	WRITE_ONCE(node->prev_cpu, prev_cpu);
+	prev = decode_cpu(prev_cpu);
 	node->locked = 0;
 
 	/*
 	 * osq_lock()			unqueue
 	 *
-	 * node->prev = prev		osq_wait_next()
+	 * node->prev_cpu = prev_cpu	osq_wait_next()
 	 * WMB				MB
-	 * prev->next = node		next->prev = prev // unqueue-C
+	 * prev->next = node		next->prev_cpu = prev_cpu // unqueue-C
 	 *
-	 * Here 'node->prev' and 'next->prev' are the same variable and we need
+	 * Here 'node->prev_cpu' and 'next->prev_cpu' are the same variable and we need
 	 * to ensure these stores happen in-order to avoid corrupting the list.
 	 */
 	smp_wmb();
@@ -179,9 +176,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
-		 * case its step-C will write us a new @node->prev pointer.
+		 * case its step-C will write us a new @node->prev_cpu value.
 		 */
-		prev = READ_ONCE(node->prev);
+		prev_cpu = READ_ONCE(node->prev_cpu);
+		prev = decode_cpu(prev_cpu);
 	}
 
 	/*
@@ -191,7 +189,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * back to @prev.
 	 */
 
-	next = osq_wait_next(lock, node, prev->cpu);
+	next = osq_wait_next(lock, node, prev_cpu);
 	if (!next)
 		return false;
 
@@ -203,8 +201,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * it will wait in Step-A.
 	 */
 
-	WRITE_ONCE(next->prev_cpu, prev->cpu);
-	WRITE_ONCE(next->prev, prev);
+	WRITE_ONCE(next->prev_cpu, prev_cpu);
 	WRITE_ONCE(prev->next, next);
 
 	return false;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr()
  2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
                   ` (2 preceding siblings ...)
  2026-03-06 22:51 ` [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev david.laight.linux
@ 2026-03-06 22:51 ` david.laight.linux
  2026-03-06 23:01   ` David Laight
  2026-03-06 23:03   ` David Laight
  2026-03-06 22:51 ` [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path david.laight.linux
  2026-03-06 22:59 ` [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code David Laight
  5 siblings, 2 replies; 20+ messages in thread
From: david.laight.linux @ 2026-03-06 22:51 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt
  Cc: David Laight

From: David Laight <david.laight.linux@gmail.com>

Changing the 'cpu number' variables to 'unsigned int' generates
slightly better code (and the values can never be negative).

More specifically gcc knows that decrementing the 'encoded' value
zeros the high 32bits (on sane 64bit architectures) so that it doesn't
need to zero/sign extend the value to index __per_cpu_offset[].

Not massive but saves two instructions.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---

Proposed by Linus.
Part of a discussion from v1 about whether removing the offset would help.

 kernel/locking/osq_lock.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 5dd7e08d4fda..0619691e2756 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -15,7 +15,7 @@
 struct optimistic_spin_node {
 	struct optimistic_spin_node *next;
 	int locked; /* 1 if lock acquired */
-	int prev_cpu; /* encoded CPU # + 1 value */
+	unsigned int prev_cpu; /* encoded CPU # + 1 value */
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
@@ -24,19 +24,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
  * We use the value 0 to represent "no CPU", thus the encoded value
  * will be the CPU number incremented by 1.
  */
-static inline int encode_cpu(int cpu_nr)
+static inline unsigned int encode_cpu(unsigned int cpu_nr)
 {
 	return cpu_nr + 1;
 }
 
-static inline int prev_cpu_nr(struct optimistic_spin_node *node)
+static inline unsigned int prev_cpu_nr(struct optimistic_spin_node *node)
 {
 	return READ_ONCE(node->prev_cpu) - 1;
 }
 
-static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
+static inline struct optimistic_spin_node *decode_cpu(unsigned int encoded_cpu_val)
 {
-	int cpu_nr = encoded_cpu_val - 1;
+	unsigned int cpu_nr = encoded_cpu_val - 1;
 
 	return per_cpu_ptr(&osq_node, cpu_nr);
 }
@@ -53,9 +53,9 @@ static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 static inline struct optimistic_spin_node *
 osq_wait_next(struct optimistic_spin_queue *lock,
 	      struct optimistic_spin_node *node,
-	      int old_cpu)
+	      unsigned int old_cpu)
 {
-	int curr = encode_cpu(smp_processor_id());
+	unsigned int curr = encode_cpu(smp_processor_id());
 
 	for (;;) {
 		if (atomic_read(&lock->tail) == curr &&
@@ -94,8 +94,8 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 {
 	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
 	struct optimistic_spin_node *prev, *next;
-	int curr = encode_cpu(smp_processor_id());
-	int prev_cpu;
+	unsigned int curr = encode_cpu(smp_processor_id());
+	unsigned int prev_cpu;
 
 	node->next = NULL;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
                   ` (3 preceding siblings ...)
  2026-03-06 22:51 ` [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr() david.laight.linux
@ 2026-03-06 22:51 ` david.laight.linux
  2026-03-06 23:04   ` David Laight
                     ` (2 more replies)
  2026-03-06 22:59 ` [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code David Laight
  5 siblings, 3 replies; 20+ messages in thread
From: david.laight.linux @ 2026-03-06 22:51 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt
  Cc: David Laight

From: David Laight <david.laight.linux@gmail.com>

When osq_lock() returns false or osq_unlock() returns static
analysis shows that node->next should always be NULL.
This means that it isn't necessary to explicitly set it to NULL
prior to atomic_xchg(&lock->tail, curr) on entry to osq_lock().

Defer determining the address of the CPU's 'node' until after the
atomic_exchange() so that it isn't done in the uncontented path.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---
 kernel/locking/osq_lock.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 0619691e2756..3f0cfdf1cd0f 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -92,13 +92,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 
 bool osq_lock(struct optimistic_spin_queue *lock)
 {
-	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
-	struct optimistic_spin_node *prev, *next;
+	struct optimistic_spin_node *node, *prev, *next;
 	unsigned int curr = encode_cpu(smp_processor_id());
 	unsigned int prev_cpu;
 
-	node->next = NULL;
-
 	/*
 	 * We need both ACQUIRE (pairs with corresponding RELEASE in
 	 * unlock() uncontended, or fastpath) and RELEASE (to publish
@@ -109,6 +106,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	if (prev_cpu == OSQ_UNLOCKED_VAL)
 		return true;
 
+	node = this_cpu_ptr(&osq_node);
 	WRITE_ONCE(node->prev_cpu, prev_cpu);
 	prev = decode_cpu(prev_cpu);
 	node->locked = 0;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code
  2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
                   ` (4 preceding siblings ...)
  2026-03-06 22:51 ` [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path david.laight.linux
@ 2026-03-06 22:59 ` David Laight
  5 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 22:59 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:45 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address....

> From: David Laight <david.laight.linux@gmail.com>
> 
> This is a slightly edited copy of v2 from 2 years ago.
> I've re-read the comments (on v1 and v2).
> Patch #3 now unconditionally calls decode_cpu() when stabilizing @prev
> (I'm not at all sure the cpu number can ever be unchanged.)
> Patch #5 now converts almost all the cpu numbers to 'unsigned int'.
> 
> Fot patch #2 I've found a note that:
> kernel test robot noticed a 10.7% improvement of stress-ng.netlink-task.ops_per_sec
> 
> Notes from v2:
> Patch #1 is the node->locked part of v1's patch #2.
> 
> Patch #2 removes the pretty much guaranteed cache line reload getting
> the cpu number (from node->prev) for the vcpu_is_preempted() check.
> It is (basically) the old #5 with the addition of a READ_ONCE()
> and leaving the '+ 1' offset (for patch 3).
> 
> Patch #3 ends up removing both node->cpu and node->prev.
> This saves issues initialising node->cpu.
> Basically node->cpu was only ever read as node->prev->cpu in the unqueue code.
> Most of the time it is the value read from lock->tail that was used to
> obtain 'prev' in the first place.
> The only time it is different is in the unlock race path where 'prev'
> is re-read from node->prev - updated right at the bottom of osq_lock().
> So the updated node->prev_cpu can used (and prev obtained from it) without
> worrying about only one of node->prev and node->prev-cpu being updated.
> 
> Linus did suggest just saving the cpu numbers instead of pointers.
> It actually works for 'prev' but not 'next'.
> 
> Patch #4 removes the unnecessary node->next = NULL
> assignment from the top of osq_lock().
> 
> Patch #5 just stops gcc using two separate instructions to decrement
> the offset cpu number and then convert it to 64 bits.
> Linus got annoyed with it, and I'd spotted it as well.
> I don't seem to be able to get gcc to convert __per_cpu_offset[cpu - 1]
> to (__per_cpu_offset - 1)[cpu] (cpu is offset by one) but, in any case,
> it would still need zero extending in the common case.
> 
> David Laight (5):
>   Defer clearing node->locked until the slow osq_lock() path.
>   Optimise vcpu_is_preempted() check.
>   Use node->prev_cpu instead of saving node->prev.
>   Optimise decode_cpu() and per_cpu_ptr().
>   Avoid writing to node->next in the osq_lock() fast path.
> 
>  kernel/locking/osq_lock.c | 56 +++++++++++++++++++--------------------
>  1 file changed, 27 insertions(+), 29 deletions(-)
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path
  2026-03-06 22:51 ` [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path david.laight.linux
@ 2026-03-06 23:01   ` David Laight
  0 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:01 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:46 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address...

> From: David Laight <david.laight.linux@gmail.com>
> 
> node->locked is used to indicate that the owner of the lock has handed it
> off to the waiting CPU.
> As such it's value is only relevant in the slow path so it need not be
> initialised in the fast path.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>  kernel/locking/osq_lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index b4233dc2c2b0..96c6094157b5 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -97,7 +97,6 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	int curr = encode_cpu(smp_processor_id());
>  	int old;
>  
> -	node->locked = 0;
>  	node->next = NULL;
>  	node->cpu = curr;
>  
> @@ -113,6 +112,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  
>  	prev = decode_cpu(old);
>  	node->prev = prev;
> +	node->locked = 0;
>  
>  	/*
>  	 * osq_lock()			unqueue


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check
  2026-03-06 22:51 ` [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check david.laight.linux
@ 2026-03-06 23:01   ` David Laight
  2026-03-06 23:03   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:01 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:47 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address...

> From: David Laight <david.laight.linux@gmail.com>
> 
> The vcpu_is_preempted() test stops osq_lock() spinning if a virtual
>   CPU is no longer running.
> Although patched out for bare-metal, the code still needs the CPU number.
> Reading this from 'prev->cpu' is a pretty much guaranteed have a cache miss
> when osq_unlock() is waking up the next cpu.
> 
> Instead save 'prev->cpu' in 'node->prev_cpu' and use that value instead.
> Update in the osq_lock() 'unqueue' path when 'node->prev' is changed.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>  kernel/locking/osq_lock.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 96c6094157b5..0e1c7d11b6c0 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -16,6 +16,7 @@ struct optimistic_spin_node {
>  	struct optimistic_spin_node *next, *prev;
>  	int locked; /* 1 if lock acquired */
>  	int cpu; /* encoded CPU # + 1 value */
> +	int prev_cpu; /* encoded CPU # + 1 value */
>  };
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
> @@ -29,9 +30,9 @@ static inline int encode_cpu(int cpu_nr)
>  	return cpu_nr + 1;
>  }
>  
> -static inline int node_cpu(struct optimistic_spin_node *node)
> +static inline int prev_cpu_nr(struct optimistic_spin_node *node)
>  {
> -	return node->cpu - 1;
> +	return READ_ONCE(node->prev_cpu) - 1;
>  }
>  
>  static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
> @@ -110,6 +111,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	if (old == OSQ_UNLOCKED_VAL)
>  		return true;
>  
> +	WRITE_ONCE(node->prev_cpu, old);
>  	prev = decode_cpu(old);
>  	node->prev = prev;
>  	node->locked = 0;
> @@ -144,7 +146,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * polling, be careful.
>  	 */
>  	if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
> -				  vcpu_is_preempted(node_cpu(node->prev))))
> +				  vcpu_is_preempted(prev_cpu_nr(node))))
>  		return true;
>  
>  	/* unqueue */
> @@ -201,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * it will wait in Step-A.
>  	 */
>  
> +	WRITE_ONCE(next->prev_cpu, prev->cpu);
>  	WRITE_ONCE(next->prev, prev);
>  	WRITE_ONCE(prev->next, next);
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev
  2026-03-06 22:51 ` [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev david.laight.linux
@ 2026-03-06 23:01   ` David Laight
  2026-03-06 23:03   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:01 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:48 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address...

> From: David Laight <david.laight.linux@gmail.com>
> 
> node->prev is only used to update 'prev' in the unlikely case
> of concurrent unqueues.
> The new 'prev' pointer can be obtained from prev_cpu.
> 
> node->cpu (or more particularly) prev->cpu is only used for the
> osq_wait_next() call in the unqueue path.
> Normally this is exactly the value that the initial xchg() read
> from lock->tail (used to obtain 'prev'), but can get updated
> by concurrent unqueues.
> 
> Both the 'prev' and 'cpu' members of optimistic_spin_node are
> now unused and can be deleted.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>  kernel/locking/osq_lock.c | 31 ++++++++++++++-----------------
>  1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 0e1c7d11b6c0..5dd7e08d4fda 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -13,9 +13,8 @@
>   */
>  
>  struct optimistic_spin_node {
> -	struct optimistic_spin_node *next, *prev;
> +	struct optimistic_spin_node *next;
>  	int locked; /* 1 if lock acquired */
> -	int cpu; /* encoded CPU # + 1 value */
>  	int prev_cpu; /* encoded CPU # + 1 value */
>  };
>  
> @@ -96,10 +95,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>  	struct optimistic_spin_node *prev, *next;
>  	int curr = encode_cpu(smp_processor_id());
> -	int old;
> +	int prev_cpu;
>  
>  	node->next = NULL;
> -	node->cpu = curr;
>  
>  	/*
>  	 * We need both ACQUIRE (pairs with corresponding RELEASE in
> @@ -107,23 +105,22 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * the node fields we just initialised) semantics when updating
>  	 * the lock tail.
>  	 */
> -	old = atomic_xchg(&lock->tail, curr);
> -	if (old == OSQ_UNLOCKED_VAL)
> +	prev_cpu = atomic_xchg(&lock->tail, curr);
> +	if (prev_cpu == OSQ_UNLOCKED_VAL)
>  		return true;
>  
> -	WRITE_ONCE(node->prev_cpu, old);
> -	prev = decode_cpu(old);
> -	node->prev = prev;
> +	WRITE_ONCE(node->prev_cpu, prev_cpu);
> +	prev = decode_cpu(prev_cpu);
>  	node->locked = 0;
>  
>  	/*
>  	 * osq_lock()			unqueue
>  	 *
> -	 * node->prev = prev		osq_wait_next()
> +	 * node->prev_cpu = prev_cpu	osq_wait_next()
>  	 * WMB				MB
> -	 * prev->next = node		next->prev = prev // unqueue-C
> +	 * prev->next = node		next->prev_cpu = prev_cpu // unqueue-C
>  	 *
> -	 * Here 'node->prev' and 'next->prev' are the same variable and we need
> +	 * Here 'node->prev_cpu' and 'next->prev_cpu' are the same variable and we need
>  	 * to ensure these stores happen in-order to avoid corrupting the list.
>  	 */
>  	smp_wmb();
> @@ -179,9 +176,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  
>  		/*
>  		 * Or we race against a concurrent unqueue()'s step-B, in which
> -		 * case its step-C will write us a new @node->prev pointer.
> +		 * case its step-C will write us a new @node->prev_cpu value.
>  		 */
> -		prev = READ_ONCE(node->prev);
> +		prev_cpu = READ_ONCE(node->prev_cpu);
> +		prev = decode_cpu(prev_cpu);
>  	}
>  
>  	/*
> @@ -191,7 +189,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * back to @prev.
>  	 */
>  
> -	next = osq_wait_next(lock, node, prev->cpu);
> +	next = osq_wait_next(lock, node, prev_cpu);
>  	if (!next)
>  		return false;
>  
> @@ -203,8 +201,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * it will wait in Step-A.
>  	 */
>  
> -	WRITE_ONCE(next->prev_cpu, prev->cpu);
> -	WRITE_ONCE(next->prev, prev);
> +	WRITE_ONCE(next->prev_cpu, prev_cpu);
>  	WRITE_ONCE(prev->next, next);
>  
>  	return false;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr()
  2026-03-06 22:51 ` [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr() david.laight.linux
@ 2026-03-06 23:01   ` David Laight
  2026-03-06 23:03   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:01 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:49 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address...

> From: David Laight <david.laight.linux@gmail.com>
> 
> Changing the 'cpu number' variables to 'unsigned int' generates
> slightly better code (and the values can never be negative).
> 
> More specifically gcc knows that decrementing the 'encoded' value
> zeros the high 32bits (on sane 64bit architectures) so that it doesn't
> need to zero/sign extend the value to index __per_cpu_offset[].
> 
> Not massive but saves two instructions.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
> 
> Proposed by Linus.
> Part of a discussion from v1 about whether removing the offset would help.
> 
>  kernel/locking/osq_lock.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 5dd7e08d4fda..0619691e2756 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -15,7 +15,7 @@
>  struct optimistic_spin_node {
>  	struct optimistic_spin_node *next;
>  	int locked; /* 1 if lock acquired */
> -	int prev_cpu; /* encoded CPU # + 1 value */
> +	unsigned int prev_cpu; /* encoded CPU # + 1 value */
>  };
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
> @@ -24,19 +24,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
>   * We use the value 0 to represent "no CPU", thus the encoded value
>   * will be the CPU number incremented by 1.
>   */
> -static inline int encode_cpu(int cpu_nr)
> +static inline unsigned int encode_cpu(unsigned int cpu_nr)
>  {
>  	return cpu_nr + 1;
>  }
>  
> -static inline int prev_cpu_nr(struct optimistic_spin_node *node)
> +static inline unsigned int prev_cpu_nr(struct optimistic_spin_node *node)
>  {
>  	return READ_ONCE(node->prev_cpu) - 1;
>  }
>  
> -static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
> +static inline struct optimistic_spin_node *decode_cpu(unsigned int encoded_cpu_val)
>  {
> -	int cpu_nr = encoded_cpu_val - 1;
> +	unsigned int cpu_nr = encoded_cpu_val - 1;
>  
>  	return per_cpu_ptr(&osq_node, cpu_nr);
>  }
> @@ -53,9 +53,9 @@ static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
>  static inline struct optimistic_spin_node *
>  osq_wait_next(struct optimistic_spin_queue *lock,
>  	      struct optimistic_spin_node *node,
> -	      int old_cpu)
> +	      unsigned int old_cpu)
>  {
> -	int curr = encode_cpu(smp_processor_id());
> +	unsigned int curr = encode_cpu(smp_processor_id());
>  
>  	for (;;) {
>  		if (atomic_read(&lock->tail) == curr &&
> @@ -94,8 +94,8 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  {
>  	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>  	struct optimistic_spin_node *prev, *next;
> -	int curr = encode_cpu(smp_processor_id());
> -	int prev_cpu;
> +	unsigned int curr = encode_cpu(smp_processor_id());
> +	unsigned int prev_cpu;
>  
>  	node->next = NULL;
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check
  2026-03-06 22:51 ` [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check david.laight.linux
  2026-03-06 23:01   ` David Laight
@ 2026-03-06 23:03   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:03 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:47 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address....
(and actually corrected this time - it's getting late)

> From: David Laight <david.laight.linux@gmail.com>
> 
> The vcpu_is_preempted() test stops osq_lock() spinning if a virtual
>   CPU is no longer running.
> Although patched out for bare-metal, the code still needs the CPU number.
> Reading this from 'prev->cpu' is a pretty much guaranteed have a cache miss
> when osq_unlock() is waking up the next cpu.
> 
> Instead save 'prev->cpu' in 'node->prev_cpu' and use that value instead.
> Update in the osq_lock() 'unqueue' path when 'node->prev' is changed.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>  kernel/locking/osq_lock.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 96c6094157b5..0e1c7d11b6c0 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -16,6 +16,7 @@ struct optimistic_spin_node {
>  	struct optimistic_spin_node *next, *prev;
>  	int locked; /* 1 if lock acquired */
>  	int cpu; /* encoded CPU # + 1 value */
> +	int prev_cpu; /* encoded CPU # + 1 value */
>  };
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
> @@ -29,9 +30,9 @@ static inline int encode_cpu(int cpu_nr)
>  	return cpu_nr + 1;
>  }
>  
> -static inline int node_cpu(struct optimistic_spin_node *node)
> +static inline int prev_cpu_nr(struct optimistic_spin_node *node)
>  {
> -	return node->cpu - 1;
> +	return READ_ONCE(node->prev_cpu) - 1;
>  }
>  
>  static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
> @@ -110,6 +111,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	if (old == OSQ_UNLOCKED_VAL)
>  		return true;
>  
> +	WRITE_ONCE(node->prev_cpu, old);
>  	prev = decode_cpu(old);
>  	node->prev = prev;
>  	node->locked = 0;
> @@ -144,7 +146,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * polling, be careful.
>  	 */
>  	if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
> -				  vcpu_is_preempted(node_cpu(node->prev))))
> +				  vcpu_is_preempted(prev_cpu_nr(node))))
>  		return true;
>  
>  	/* unqueue */
> @@ -201,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * it will wait in Step-A.
>  	 */
>  
> +	WRITE_ONCE(next->prev_cpu, prev->cpu);
>  	WRITE_ONCE(next->prev, prev);
>  	WRITE_ONCE(prev->next, next);
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev
  2026-03-06 22:51 ` [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev david.laight.linux
  2026-03-06 23:01   ` David Laight
@ 2026-03-06 23:03   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:03 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:48 +0000
david.laight.linux@gmail.com wrote:

Apologies to Yafang for mistyping his address....

> From: David Laight <david.laight.linux@gmail.com>
> 
> node->prev is only used to update 'prev' in the unlikely case
> of concurrent unqueues.
> The new 'prev' pointer can be obtained from prev_cpu.
> 
> node->cpu (or more particularly) prev->cpu is only used for the
> osq_wait_next() call in the unqueue path.
> Normally this is exactly the value that the initial xchg() read
> from lock->tail (used to obtain 'prev'), but can get updated
> by concurrent unqueues.
> 
> Both the 'prev' and 'cpu' members of optimistic_spin_node are
> now unused and can be deleted.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>  kernel/locking/osq_lock.c | 31 ++++++++++++++-----------------
>  1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 0e1c7d11b6c0..5dd7e08d4fda 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -13,9 +13,8 @@
>   */
>  
>  struct optimistic_spin_node {
> -	struct optimistic_spin_node *next, *prev;
> +	struct optimistic_spin_node *next;
>  	int locked; /* 1 if lock acquired */
> -	int cpu; /* encoded CPU # + 1 value */
>  	int prev_cpu; /* encoded CPU # + 1 value */
>  };
>  
> @@ -96,10 +95,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>  	struct optimistic_spin_node *prev, *next;
>  	int curr = encode_cpu(smp_processor_id());
> -	int old;
> +	int prev_cpu;
>  
>  	node->next = NULL;
> -	node->cpu = curr;
>  
>  	/*
>  	 * We need both ACQUIRE (pairs with corresponding RELEASE in
> @@ -107,23 +105,22 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * the node fields we just initialised) semantics when updating
>  	 * the lock tail.
>  	 */
> -	old = atomic_xchg(&lock->tail, curr);
> -	if (old == OSQ_UNLOCKED_VAL)
> +	prev_cpu = atomic_xchg(&lock->tail, curr);
> +	if (prev_cpu == OSQ_UNLOCKED_VAL)
>  		return true;
>  
> -	WRITE_ONCE(node->prev_cpu, old);
> -	prev = decode_cpu(old);
> -	node->prev = prev;
> +	WRITE_ONCE(node->prev_cpu, prev_cpu);
> +	prev = decode_cpu(prev_cpu);
>  	node->locked = 0;
>  
>  	/*
>  	 * osq_lock()			unqueue
>  	 *
> -	 * node->prev = prev		osq_wait_next()
> +	 * node->prev_cpu = prev_cpu	osq_wait_next()
>  	 * WMB				MB
> -	 * prev->next = node		next->prev = prev // unqueue-C
> +	 * prev->next = node		next->prev_cpu = prev_cpu // unqueue-C
>  	 *
> -	 * Here 'node->prev' and 'next->prev' are the same variable and we need
> +	 * Here 'node->prev_cpu' and 'next->prev_cpu' are the same variable and we need
>  	 * to ensure these stores happen in-order to avoid corrupting the list.
>  	 */
>  	smp_wmb();
> @@ -179,9 +176,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  
>  		/*
>  		 * Or we race against a concurrent unqueue()'s step-B, in which
> -		 * case its step-C will write us a new @node->prev pointer.
> +		 * case its step-C will write us a new @node->prev_cpu value.
>  		 */
> -		prev = READ_ONCE(node->prev);
> +		prev_cpu = READ_ONCE(node->prev_cpu);
> +		prev = decode_cpu(prev_cpu);
>  	}
>  
>  	/*
> @@ -191,7 +189,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * back to @prev.
>  	 */
>  
> -	next = osq_wait_next(lock, node, prev->cpu);
> +	next = osq_wait_next(lock, node, prev_cpu);
>  	if (!next)
>  		return false;
>  
> @@ -203,8 +201,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 * it will wait in Step-A.
>  	 */
>  
> -	WRITE_ONCE(next->prev_cpu, prev->cpu);
> -	WRITE_ONCE(next->prev, prev);
> +	WRITE_ONCE(next->prev_cpu, prev_cpu);
>  	WRITE_ONCE(prev->next, next);
>  
>  	return false;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr()
  2026-03-06 22:51 ` [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr() david.laight.linux
  2026-03-06 23:01   ` David Laight
@ 2026-03-06 23:03   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:03 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:49 +0000
david.laight.linux@gmail.com wrote:

> From: David Laight <david.laight.linux@gmail.com>
> 
> Changing the 'cpu number' variables to 'unsigned int' generates
> slightly better code (and the values can never be negative).
> 
> More specifically gcc knows that decrementing the 'encoded' value
> zeros the high 32bits (on sane 64bit architectures) so that it doesn't
> need to zero/sign extend the value to index __per_cpu_offset[].
> 
> Not massive but saves two instructions.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
> 
> Proposed by Linus.
> Part of a discussion from v1 about whether removing the offset would help.
> 
>  kernel/locking/osq_lock.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 5dd7e08d4fda..0619691e2756 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -15,7 +15,7 @@
>  struct optimistic_spin_node {
>  	struct optimistic_spin_node *next;
>  	int locked; /* 1 if lock acquired */
> -	int prev_cpu; /* encoded CPU # + 1 value */
> +	unsigned int prev_cpu; /* encoded CPU # + 1 value */
>  };
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
> @@ -24,19 +24,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
>   * We use the value 0 to represent "no CPU", thus the encoded value
>   * will be the CPU number incremented by 1.
>   */
> -static inline int encode_cpu(int cpu_nr)
> +static inline unsigned int encode_cpu(unsigned int cpu_nr)
>  {
>  	return cpu_nr + 1;
>  }
>  
> -static inline int prev_cpu_nr(struct optimistic_spin_node *node)
> +static inline unsigned int prev_cpu_nr(struct optimistic_spin_node *node)
>  {
>  	return READ_ONCE(node->prev_cpu) - 1;
>  }
>  
> -static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
> +static inline struct optimistic_spin_node *decode_cpu(unsigned int encoded_cpu_val)
>  {
> -	int cpu_nr = encoded_cpu_val - 1;
> +	unsigned int cpu_nr = encoded_cpu_val - 1;
>  
>  	return per_cpu_ptr(&osq_node, cpu_nr);
>  }
> @@ -53,9 +53,9 @@ static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
>  static inline struct optimistic_spin_node *
>  osq_wait_next(struct optimistic_spin_queue *lock,
>  	      struct optimistic_spin_node *node,
> -	      int old_cpu)
> +	      unsigned int old_cpu)
>  {
> -	int curr = encode_cpu(smp_processor_id());
> +	unsigned int curr = encode_cpu(smp_processor_id());
>  
>  	for (;;) {
>  		if (atomic_read(&lock->tail) == curr &&
> @@ -94,8 +94,8 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  {
>  	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>  	struct optimistic_spin_node *prev, *next;
> -	int curr = encode_cpu(smp_processor_id());
> -	int prev_cpu;
> +	unsigned int curr = encode_cpu(smp_processor_id());
> +	unsigned int prev_cpu;
>  
>  	node->next = NULL;
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-06 22:51 ` [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path david.laight.linux
@ 2026-03-06 23:04   ` David Laight
  2026-03-07  0:06   ` Linus Torvalds
  2026-03-11 19:27   ` Waiman Long
  2 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-06 23:04 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Yafang Shao, Steven Rostedt

On Fri,  6 Mar 2026 22:51:50 +0000
david.laight.linux@gmail.com wrote:

> From: David Laight <david.laight.linux@gmail.com>
> 
> When osq_lock() returns false or osq_unlock() returns static
> analysis shows that node->next should always be NULL.
> This means that it isn't necessary to explicitly set it to NULL
> prior to atomic_xchg(&lock->tail, curr) on entry to osq_lock().
> 
> Defer determining the address of the CPU's 'node' until after the
> atomic_exchange() so that it isn't done in the uncontented path.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>  kernel/locking/osq_lock.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 0619691e2756..3f0cfdf1cd0f 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -92,13 +92,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>  
>  bool osq_lock(struct optimistic_spin_queue *lock)
>  {
> -	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> -	struct optimistic_spin_node *prev, *next;
> +	struct optimistic_spin_node *node, *prev, *next;
>  	unsigned int curr = encode_cpu(smp_processor_id());
>  	unsigned int prev_cpu;
>  
> -	node->next = NULL;
> -
>  	/*
>  	 * We need both ACQUIRE (pairs with corresponding RELEASE in
>  	 * unlock() uncontended, or fastpath) and RELEASE (to publish
> @@ -109,6 +106,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	if (prev_cpu == OSQ_UNLOCKED_VAL)
>  		return true;
>  
> +	node = this_cpu_ptr(&osq_node);
>  	WRITE_ONCE(node->prev_cpu, prev_cpu);
>  	prev = decode_cpu(prev_cpu);
>  	node->locked = 0;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-06 22:51 ` [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path david.laight.linux
  2026-03-06 23:04   ` David Laight
@ 2026-03-07  0:06   ` Linus Torvalds
  2026-03-07 11:32     ` David Laight
  2026-03-11 19:27   ` Waiman Long
  2 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2026-03-07  0:06 UTC (permalink / raw)
  To: david.laight.linux
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Steven Rostedt, Yafang Shao

On Fri, 6 Mar 2026 at 14:52, <david.laight.linux@gmail.com> wrote:
>
> From: David Laight <david.laight.linux@gmail.com>
>
> When osq_lock() returns false or osq_unlock() returns static
> analysis shows that node->next should always be NULL.

This explanation makes me nervous.

*What* static analysis? It's very unclear. And the "should be NULL"
doesn't make me get the warm and fuzzies.

For example, osq_unlock() does do

        node = this_cpu_ptr(&osq_node);
        next = xchg(&node->next, NULL);

so it's clearly NULL after that. But it's not obvious this will be
reached, because osq_unlock() does that

        /*
         * Fast path for the uncontended case.
         */
        if (atomic_try_cmpxchg_release(&lock->tail, &curr, OSQ_UNLOCKED_VAL))
                return;

before it actually gets to this point.

And yes, I'm very willing to believe that if we hit that fast-path,
node->next (which is "curr->next" in that path) is indeed NULL, but I
think this commit message really needs to spell it all out.

No "should be NULL", in other words. I want a rock-solid "node->next
is always NULL because XYZ" explanation, not a wishy-washy "static
analysis says" without spelling it out.

            Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-07  0:06   ` Linus Torvalds
@ 2026-03-07 11:32     ` David Laight
  0 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-07 11:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Steven Rostedt, Yafang Shao

On Fri, 6 Mar 2026 16:06:25 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 6 Mar 2026 at 14:52, <david.laight.linux@gmail.com> wrote:
> >
> > From: David Laight <david.laight.linux@gmail.com>
> >
> > When osq_lock() returns false or osq_unlock() returns static
> > analysis shows that node->next should always be NULL.  
> 
> This explanation makes me nervous.
> 
> *What* static analysis? It's very unclear. And the "should be NULL"
> doesn't make me get the warm and fuzzies.

The analysis was 'my brain' about two years ago :-)

> For example, osq_unlock() does do
> 
>         node = this_cpu_ptr(&osq_node);
>         next = xchg(&node->next, NULL);
> 
> so it's clearly NULL after that. But it's not obvious this will be
> reached, because osq_unlock() does that
> 
>         /*
>          * Fast path for the uncontended case.
>          */
>         if (atomic_try_cmpxchg_release(&lock->tail, &curr, OSQ_UNLOCKED_VAL))
>                 return;
> 
> before it actually gets to this point.

That is (should be) checking that the list only contains a single node.
So the 'next' pointer can't be set.

I'll drink 10 double-espressos and read the code again.

I'll also check what happens if the code is hit by an ethernet interrupt
just after the initial xchg().
Other cpu can also acquire the lock and then get preempted (so need to
unlink themselves) as well as the holder trying to release the lock.

It might be that the initial xchg needs to be a cmpxchg - which would
massively simplify the 'lock' path.
It does have the 'no forwards progress' issue.
I can't remember whether xchg has to be implemented as cmpxchg on any
architectures, if it does then the current complex locking code
is unnecessary.

	David


> 
> And yes, I'm very willing to believe that if we hit that fast-path,
> node->next (which is "curr->next" in that path) is indeed NULL, but I
> think this commit message really needs to spell it all out.
> 
> No "should be NULL", in other words. I want a rock-solid "node->next
> is always NULL because XYZ" explanation, not a wishy-washy "static
> analysis says" without spelling it out.
> 
>             Linus


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-06 22:51 ` [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path david.laight.linux
  2026-03-06 23:04   ` David Laight
  2026-03-07  0:06   ` Linus Torvalds
@ 2026-03-11 19:27   ` Waiman Long
  2026-03-11 19:40     ` Waiman Long
  2026-03-11 21:50     ` David Laight
  2 siblings, 2 replies; 20+ messages in thread
From: Waiman Long @ 2026-03-11 19:27 UTC (permalink / raw)
  To: david.laight.linux, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-kernel, Linus Torvalds, Steven Rostedt,
	Yafang Shao

On 3/6/26 5:51 PM, david.laight.linux@gmail.com wrote:
> From: David Laight <david.laight.linux@gmail.com>
>
> When osq_lock() returns false or osq_unlock() returns static
> analysis shows that node->next should always be NULL.
> This means that it isn't necessary to explicitly set it to NULL
> prior to atomic_xchg(&lock->tail, curr) on entry to osq_lock().
>
> Defer determining the address of the CPU's 'node' until after the
> atomic_exchange() so that it isn't done in the uncontented path.
>
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
>   kernel/locking/osq_lock.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 0619691e2756..3f0cfdf1cd0f 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -92,13 +92,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>   
>   bool osq_lock(struct optimistic_spin_queue *lock)
>   {
> -	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> -	struct optimistic_spin_node *prev, *next;
> +	struct optimistic_spin_node *node, *prev, *next;
>   	unsigned int curr = encode_cpu(smp_processor_id());
>   	unsigned int prev_cpu;
>   
> -	node->next = NULL;

Although it does look like node->next should always be NULL when 
entering osq_lock(), any future change may invalidate this assumption. I 
know you want to not touch the osq_node cacheline on fast path, but we 
will need a big comment here to explicitly spell out this assumption to 
make sure that we won't break it in the future.

BTW, how much performance gain have you measured with this change? Can 
we just leave it there to be safe.

> -
>   	/*
>   	 * We need both ACQUIRE (pairs with corresponding RELEASE in
>   	 * unlock() uncontended, or fastpath) and RELEASE (to publish
> @@ -109,6 +106,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>   	if (prev_cpu == OSQ_UNLOCKED_VAL)
>   		return true;
>   
> +	node = this_cpu_ptr(&osq_node);
>   	WRITE_ONCE(node->prev_cpu, prev_cpu);
>   	prev = decode_cpu(prev_cpu);
>   	node->locked = 0;

I am fine with moving the initialization here. The other patches also 
look good to me.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-11 19:27   ` Waiman Long
@ 2026-03-11 19:40     ` Waiman Long
  2026-03-11 21:50     ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: Waiman Long @ 2026-03-11 19:40 UTC (permalink / raw)
  To: david.laight.linux, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-kernel, Linus Torvalds, Steven Rostedt,
	Yafang Shao

On 3/11/26 3:27 PM, Waiman Long wrote:
> On 3/6/26 5:51 PM, david.laight.linux@gmail.com wrote:
>> From: David Laight <david.laight.linux@gmail.com>
>>
>> When osq_lock() returns false or osq_unlock() returns static
>> analysis shows that node->next should always be NULL.
>> This means that it isn't necessary to explicitly set it to NULL
>> prior to atomic_xchg(&lock->tail, curr) on entry to osq_lock().
>>
>> Defer determining the address of the CPU's 'node' until after the
>> atomic_exchange() so that it isn't done in the uncontented path.
>>
>> Signed-off-by: David Laight <david.laight.linux@gmail.com>
>> ---
>>   kernel/locking/osq_lock.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
>> index 0619691e2756..3f0cfdf1cd0f 100644
>> --- a/kernel/locking/osq_lock.c
>> +++ b/kernel/locking/osq_lock.c
>> @@ -92,13 +92,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>>     bool osq_lock(struct optimistic_spin_queue *lock)
>>   {
>> -    struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>> -    struct optimistic_spin_node *prev, *next;
>> +    struct optimistic_spin_node *node, *prev, *next;
>>       unsigned int curr = encode_cpu(smp_processor_id());
>>       unsigned int prev_cpu;
>>   -    node->next = NULL;
>
> Although it does look like node->next should always be NULL when 
> entering osq_lock(), any future change may invalidate this assumption. 
> I know you want to not touch the osq_node cacheline on fast path, but 
> we will need a big comment here to explicitly spell out this 
> assumption to make sure that we won't break it in the future.
>
> BTW, how much performance gain have you measured with this change? Can 
> we just leave it there to be safe.
>
Or you could have something like

         if (IS_ENABLED(CONFIG_PROVE_LOCKING))
                 WARN_ON_ONCE(node->next != NULL);

Cheers,
Longman



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
  2026-03-11 19:27   ` Waiman Long
  2026-03-11 19:40     ` Waiman Long
@ 2026-03-11 21:50     ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: David Laight @ 2026-03-11 21:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, Linus Torvalds, Steven Rostedt, Yafang Shao

On Wed, 11 Mar 2026 15:27:08 -0400
Waiman Long <longman@redhat.com> wrote:

> On 3/6/26 5:51 PM, david.laight.linux@gmail.com wrote:
> > From: David Laight <david.laight.linux@gmail.com>
> >
> > When osq_lock() returns false or osq_unlock() returns static
> > analysis shows that node->next should always be NULL.
> > This means that it isn't necessary to explicitly set it to NULL
> > prior to atomic_xchg(&lock->tail, curr) on entry to osq_lock().
> >
> > Defer determining the address of the CPU's 'node' until after the
> > atomic_exchange() so that it isn't done in the uncontented path.
> >
> > Signed-off-by: David Laight <david.laight.linux@gmail.com>
> > ---
> >   kernel/locking/osq_lock.c | 6 ++----
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index 0619691e2756..3f0cfdf1cd0f 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -92,13 +92,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> >   
> >   bool osq_lock(struct optimistic_spin_queue *lock)
> >   {
> > -	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> > -	struct optimistic_spin_node *prev, *next;
> > +	struct optimistic_spin_node *node, *prev, *next;
> >   	unsigned int curr = encode_cpu(smp_processor_id());
> >   	unsigned int prev_cpu;
> >   
> > -	node->next = NULL;  
> 
> Although it does look like node->next should always be NULL when 
> entering osq_lock(), any future change may invalidate this assumption. I 
> know you want to not touch the osq_node cacheline on fast path, but we 
> will need a big comment here to explicitly spell out this assumption to 
> make sure that we won't break it in the future.

I've been looking at the code some more.
The first new patch adds a load of comments.
Unfortunately they are rather wrong (or in the wrong place) by the time
I finished. So I need to go around again at least once.
The main extra changes:
- Set node->prev to zero instead of node->locked = 1.
  This makes the loop after the smp_cond_load_relaxed() better.
  And the write to next->prev can move into osq_wait_next().
- Just call osq_wait_next() from osq_unlock() - it is the same code.
  Removes the unconditional cmpxchg on lock->tail (which needs to be
  _release in both places).
- Change node->next to be the cpu number - it is only used as a pointer
  once (I think Linus suggested that once).
- Stop 'node' being cache-line aligned - it only contains two cpu numbers.
  Just 8 byte align so it is all in one cache line.
  The data for the 'wrong' cpu is hardly ever accessed.

> 
> BTW, how much performance gain have you measured with this change? Can 
> we just leave it there to be safe.

You think I've been brave enough to run my changes :-)
Actually I've written a little test harness that lets me compile and run
the actual kernel source in usespace and 'prod' it with requests.
That let me add delays at particular points to check the unusual paths
(but not the effects of acquire/release).

I will run the next changes, hopefully there won't be anything nasty in
the rc kernel to catch me out.
But I've no real idea of how to exercise the code.

	David

> 
> > -
> >   	/*
> >   	 * We need both ACQUIRE (pairs with corresponding RELEASE in
> >   	 * unlock() uncontended, or fastpath) and RELEASE (to publish
> > @@ -109,6 +106,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> >   	if (prev_cpu == OSQ_UNLOCKED_VAL)
> >   		return true;
> >   
> > +	node = this_cpu_ptr(&osq_node);
> >   	WRITE_ONCE(node->prev_cpu, prev_cpu);
> >   	prev = decode_cpu(prev_cpu);
> >   	node->locked = 0;  
> 
> I am fine with moving the initialization here. The other patches also 
> look good to me.
> 
> Cheers,
> Longman
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-03-11 21:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 22:51 [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code david.laight.linux
2026-03-06 22:51 ` [PATCH v3 next 1/5] Only clear node->locked in the slow osq_lock() path david.laight.linux
2026-03-06 23:01   ` David Laight
2026-03-06 22:51 ` [PATCH v3 next 2/5] Optimise vcpu_is_preempted() check david.laight.linux
2026-03-06 23:01   ` David Laight
2026-03-06 23:03   ` David Laight
2026-03-06 22:51 ` [PATCH v3 next 3/5] Use node->prev_cpu instead of saving node->prev david.laight.linux
2026-03-06 23:01   ` David Laight
2026-03-06 23:03   ` David Laight
2026-03-06 22:51 ` [PATCH v3 next 4/5] Optimise decode_cpu() and per_cpu_ptr() david.laight.linux
2026-03-06 23:01   ` David Laight
2026-03-06 23:03   ` David Laight
2026-03-06 22:51 ` [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path david.laight.linux
2026-03-06 23:04   ` David Laight
2026-03-07  0:06   ` Linus Torvalds
2026-03-07 11:32     ` David Laight
2026-03-11 19:27   ` Waiman Long
2026-03-11 19:40     ` Waiman Long
2026-03-11 21:50     ` David Laight
2026-03-06 22:59 ` [PATCH v3 next 0/5] locking/osq_lock: Optimisations to osq_lock code David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox