public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath()
@ 2023-05-09 15:02 Uros Bizjak
  2023-05-10 11:35 ` David Laight
  2023-10-10  8:27 ` [tip: locking/core] locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath() tip-bot2 for Uros Bizjak
  0 siblings, 2 replies; 3+ messages in thread
From: Uros Bizjak @ 2023-05-09 15:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Uros Bizjak, Thomas Gleixner

Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
in rcuref_put_slowpath(). 86 CMPXCHG instruction returns success in
ZF flag, so this change saves a compare after cmpxchg.  Additionaly,
the compiler reorders some code blocks to follow likely/unlikely
annotations in the atomic_try_cmpxchg macro, improving the code from

  9a:	f0 0f b1 0b          	lock cmpxchg %ecx,(%rbx)
  9e:	83 f8 ff             	cmp    $0xffffffff,%eax
  a1:	74 04                	je     a7 <rcuref_put_slowpath+0x27>
  a3:	31 c0                	xor    %eax,%eax

to

  9a:	f0 0f b1 0b          	lock cmpxchg %ecx,(%rbx)
  9e:	75 4c                	jne    ec <rcuref_put_slowpath+0x6c>
  a0:	b0 01                	mov    $0x1,%al

No functional change intended.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 lib/rcuref.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/rcuref.c b/lib/rcuref.c
index 5ec00a4a64d1..97f300eca927 100644
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -248,7 +248,7 @@ bool rcuref_put_slowpath(rcuref_t *ref)
 		 * require a retry. If this fails the caller is not
 		 * allowed to deconstruct the object.
 		 */
-		if (atomic_cmpxchg_release(&ref->refcnt, RCUREF_NOREF, RCUREF_DEAD) != RCUREF_NOREF)
+		if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
 			return false;
 
 		/*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath()
  2023-05-09 15:02 [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath() Uros Bizjak
@ 2023-05-10 11:35 ` David Laight
  2023-10-10  8:27 ` [tip: locking/core] locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath() tip-bot2 for Uros Bizjak
  1 sibling, 0 replies; 3+ messages in thread
From: David Laight @ 2023-05-10 11:35 UTC (permalink / raw)
  To: 'Uros Bizjak', linux-kernel@vger.kernel.org; +Cc: Thomas Gleixner

From: Uros Bizjak
> Sent: 09 May 2023 16:03
> 
> Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
> in rcuref_put_slowpath(). 86 CMPXCHG instruction returns success in
> ZF flag, so this change saves a compare after cmpxchg.  Additionaly,
> the compiler reorders some code blocks to follow likely/unlikely
> annotations in the atomic_try_cmpxchg macro, improving the code from
> 
>   9a:	f0 0f b1 0b          	lock cmpxchg %ecx,(%rbx)
>   9e:	83 f8 ff             	cmp    $0xffffffff,%eax
>   a1:	74 04                	je     a7 <rcuref_put_slowpath+0x27>
>   a3:	31 c0                	xor    %eax,%eax
> 
> to
> 
>   9a:	f0 0f b1 0b          	lock cmpxchg %ecx,(%rbx)
>   9e:	75 4c                	jne    ec <rcuref_put_slowpath+0x6c>
>   a0:	b0 01                	mov    $0x1,%al
> 
> No functional change intended.

While I'm not against the change I bet you can't detect
any actual difference. IIRC:
- The 'cmp+je' get merged into a single u-op.
- The 'lock cmpxchg' will take long enough that the instruction
  decoder won't be a bottleneck.
- Whether the je/jne is predicted taken is pretty much random.
  So you'll speculatively execute somewhere (could be anywhere)
  while the locked cycle completes.
So the only change is three less bytes of object code.
That will change the cache line alignment of later code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: locking/core] locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
  2023-05-09 15:02 [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath() Uros Bizjak
  2023-05-10 11:35 ` David Laight
@ 2023-10-10  8:27 ` tip-bot2 for Uros Bizjak
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Uros Bizjak @ 2023-10-10  8:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Uros Bizjak, Ingo Molnar, Peter Zijlstra, Paul E. McKenney, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     4fbf8b136ded943f8661cf48270482ad1f5ce7bd
Gitweb:        https://git.kernel.org/tip/4fbf8b136ded943f8661cf48270482ad1f5ce7bd
Author:        Uros Bizjak <ubizjak@gmail.com>
AuthorDate:    Tue, 09 May 2023 17:02:55 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 10 Oct 2023 10:14:27 +02:00

locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()

Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in rcuref_put_slowpath(). On x86 the CMPXCHG instruction returns success in the
ZF flag, so this change saves a compare after CMPXCHG.  Additionaly,
the compiler reorders some code blocks to follow likely/unlikely
annotations in the atomic_try_cmpxchg() macro, improving the code from:

  9a:	f0 0f b1 0b          	lock cmpxchg %ecx,(%rbx)
  9e:	83 f8 ff             	cmp    $0xffffffff,%eax
  a1:	74 04                	je     a7 <rcuref_put_slowpath+0x27>
  a3:	31 c0                	xor    %eax,%eax

to:

  9a:	f0 0f b1 0b          	lock cmpxchg %ecx,(%rbx)
  9e:	75 4c                	jne    ec <rcuref_put_slowpath+0x6c>
  a0:	b0 01                	mov    $0x1,%al

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20230509150255.3691-1-ubizjak@gmail.com
---
 lib/rcuref.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/rcuref.c b/lib/rcuref.c
index 5ec00a4..97f300e 100644
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -248,7 +248,7 @@ bool rcuref_put_slowpath(rcuref_t *ref)
 		 * require a retry. If this fails the caller is not
 		 * allowed to deconstruct the object.
 		 */
-		if (atomic_cmpxchg_release(&ref->refcnt, RCUREF_NOREF, RCUREF_DEAD) != RCUREF_NOREF)
+		if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
 			return false;
 
 		/*

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-10  8:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-09 15:02 [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath() Uros Bizjak
2023-05-10 11:35 ` David Laight
2023-10-10  8:27 ` [tip: locking/core] locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath() tip-bot2 for Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox