[PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions
@ 2024-04-09 10:03 Uros Bizjak
  2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

The following patch series improves x86 arch_atomic*()
family of functions and merges x86_32 and x86_64
arch_atomic64_fetch_{and,or,xor}() functions.

The patch series enables impressive assembly code
reductions for x86_32 target and lowers future maintenace
burden and technical debt of the source code by unifying
several functions between x86_32 and x86_32 targets.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>

Uros Bizjak (6):
  locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32
  locking/atomic/x86: Rewrite x86_32
    arch_atomic64_{,fetch}_{and,or,xor}() functions
  locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
  locking/atomic/x86: Merge x86_32 and x86_64
    arch_atomic64_fetch_{and,or,xor}() functions
  locking/atomic/x86: Define arch_atomic_sub() family using
    arch_atomic_add() functions
  locking/atomic/x86: Reorder a couple of arch_atomic64 functions

 arch/x86/include/asm/atomic.h      |  50 ++++++++++----
 arch/x86/include/asm/atomic64_32.h | 103 ++++++++++-------------------
 arch/x86/include/asm/atomic64_64.h |  44 +-----------
 3 files changed, 75 insertions(+), 122 deletions(-)

-- 
2.44.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32
  2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
  2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

Introduce arch_atomic64_try_cmpxchg for 32-bit targets to use
optimized target specific implementation instead of a generic one.
This implementation eliminates dual-word compare after
cmpxchg8b instruction and improves generated asm code from:

    2273:	f0 0f c7 0f          	lock cmpxchg8b (%edi)
    2277:	8b 74 24 2c          	mov    0x2c(%esp),%esi
    227b:	89 d3                	mov    %edx,%ebx
    227d:	89 c2                	mov    %eax,%edx
    227f:	89 5c 24 10          	mov    %ebx,0x10(%esp)
    2283:	8b 7c 24 30          	mov    0x30(%esp),%edi
    2287:	89 44 24 1c          	mov    %eax,0x1c(%esp)
    228b:	31 f2                	xor    %esi,%edx
    228d:	89 d0                	mov    %edx,%eax
    228f:	89 da                	mov    %ebx,%edx
    2291:	31 fa                	xor    %edi,%edx
    2293:	09 d0                	or     %edx,%eax
    2295:	0f 85 a5 00 00 00    	jne    2340 <...>

to:

    2270:	f0 0f c7 0f          	lock cmpxchg8b (%edi)
    2274:	0f 85 a6 00 00 00    	jne    2320 <...>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/atomic64_32.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index d510405e4e1d..11e817dab44a 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -61,12 +61,18 @@ ATOMIC64_DECL(add_unless);
 #undef __ATOMIC64_DECL
 #undef ATOMIC64_EXPORT
 
-static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
-	return arch_cmpxchg64(&v->counter, o, n);
+	return arch_cmpxchg64(&v->counter, old, new);
 }
 #define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
+static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+	return arch_try_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+
 static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
 {
 	s64 o;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
  2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
  2024-04-09 11:13   ` Mark Rutland
  2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions to
use arch_atomic64_try_cmpxchg.  This implementation avoids one extra
trip through the cmpxchg loop.

The value preload before the cmpxchg loop does not need to be atomic,
but should use READ_ONCE to prevent compiler from merging, refetching
or reordering the read.

The generated code improves from:

  1917d5:	31 c9                	xor    %ecx,%ecx
  1917d7:	31 db                	xor    %ebx,%ebx
  1917d9:	89 4c 24 3c          	mov    %ecx,0x3c(%esp)
  1917dd:	8b 74 24 24          	mov    0x24(%esp),%esi
  1917e1:	89 c8                	mov    %ecx,%eax
  1917e3:	89 5c 24 34          	mov    %ebx,0x34(%esp)
  1917e7:	8b 7c 24 28          	mov    0x28(%esp),%edi
  1917eb:	21 ce                	and    %ecx,%esi
  1917ed:	89 74 24 4c          	mov    %esi,0x4c(%esp)
  1917f1:	21 df                	and    %ebx,%edi
  1917f3:	89 de                	mov    %ebx,%esi
  1917f5:	89 7c 24 50          	mov    %edi,0x50(%esp)
  1917f9:	8b 54 24 4c          	mov    0x4c(%esp),%edx
  1917fd:	8b 7c 24 2c          	mov    0x2c(%esp),%edi
  191801:	8b 4c 24 50          	mov    0x50(%esp),%ecx
  191805:	89 d3                	mov    %edx,%ebx
  191807:	89 f2                	mov    %esi,%edx
  191809:	f0 0f c7 0f          	lock cmpxchg8b (%edi)
  19180d:	89 c1                	mov    %eax,%ecx
  19180f:	8b 74 24 34          	mov    0x34(%esp),%esi
  191813:	89 d3                	mov    %edx,%ebx
  191815:	89 44 24 4c          	mov    %eax,0x4c(%esp)
  191819:	8b 44 24 3c          	mov    0x3c(%esp),%eax
  19181d:	89 df                	mov    %ebx,%edi
  19181f:	89 54 24 44          	mov    %edx,0x44(%esp)
  191823:	89 ca                	mov    %ecx,%edx
  191825:	31 de                	xor    %ebx,%esi
  191827:	31 c8                	xor    %ecx,%eax
  191829:	09 f0                	or     %esi,%eax
  19182b:	75 ac                	jne    1917d9 <...>

to:

  1912ba:	8b 06                	mov    (%esi),%eax
  1912bc:	8b 56 04             	mov    0x4(%esi),%edx
  1912bf:	89 44 24 3c          	mov    %eax,0x3c(%esp)
  1912c3:	89 c1                	mov    %eax,%ecx
  1912c5:	23 4c 24 34          	and    0x34(%esp),%ecx
  1912c9:	89 d3                	mov    %edx,%ebx
  1912cb:	23 5c 24 38          	and    0x38(%esp),%ebx
  1912cf:	89 54 24 40          	mov    %edx,0x40(%esp)
  1912d3:	89 4c 24 2c          	mov    %ecx,0x2c(%esp)
  1912d7:	89 5c 24 30          	mov    %ebx,0x30(%esp)
  1912db:	8b 5c 24 2c          	mov    0x2c(%esp),%ebx
  1912df:	8b 4c 24 30          	mov    0x30(%esp),%ecx
  1912e3:	f0 0f c7 0e          	lock cmpxchg8b (%esi)
  1912e7:	0f 85 f3 02 00 00    	jne    1915e0 <...>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/atomic64_32.h | 44 ++++++++++++------------------
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 11e817dab44a..84affd7a5d1c 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -201,69 +201,61 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 
 static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
-		c = old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
 }
 
 static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
-		c = old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
 
-	return old;
+	return val;
 }
 #define arch_atomic64_fetch_and arch_atomic64_fetch_and
 
 static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
-		c = old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
 }
 
 static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
-		c = old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
 
-	return old;
+	return val;
 }
 #define arch_atomic64_fetch_or arch_atomic64_fetch_or
 
 static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
-		c = old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 }
 
 static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
-		c = old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 
-	return old;
+	return val;
 }
 #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
 
 static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
-	s64 old, c = 0;
+	s64 val = __READ_ONCE(v->counter);
 
-	while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c)
-		c = old;
-
-	return old;
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
+	return val;
 }
 #define arch_atomic64_fetch_add arch_atomic64_fetch_add
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
  2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
  2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
  2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
  2024-04-09 11:07   ` Mark Rutland
  2024-04-09 10:03 ` [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions Uros Bizjak
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

The value preload before the cmpxchg loop does not need to be atomic,
but should use READ_ONCE to prevent compiler from merging, refetching
or reordering the read.

This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
between x86_32 and x86_64 targets.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/atomic.h      |  8 ++++----
 arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 55a55ec04350..b166da21ee98 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
 	 * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
 	 * it's non-inlined function that increases binary size and stack usage.
 	 */
-	return __READ_ONCE((v)->counter);
+	return __READ_ONCE(v->counter);
 }
 
 static __always_inline void arch_atomic_set(atomic_t *v, int i)
@@ -132,7 +132,7 @@ static __always_inline void arch_atomic_and(int i, atomic_t *v)
 
 static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
 {
-	int val = arch_atomic_read(v);
+	int val = __READ_ONCE(v->counter);
 
 	do { } while (!arch_atomic_try_cmpxchg(v, &val, val & i));
 
@@ -150,7 +150,7 @@ static __always_inline void arch_atomic_or(int i, atomic_t *v)
 
 static __always_inline int arch_atomic_fetch_or(int i, atomic_t *v)
 {
-	int val = arch_atomic_read(v);
+	int val = __READ_ONCE(v->counter);
 
 	do { } while (!arch_atomic_try_cmpxchg(v, &val, val | i));
 
@@ -168,7 +168,7 @@ static __always_inline void arch_atomic_xor(int i, atomic_t *v)
 
 static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
 {
-	int val = arch_atomic_read(v);
+	int val = __READ_ONCE(v->counter);
 
 	do { } while (!arch_atomic_try_cmpxchg(v, &val, val ^ i));
 
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 3165c0feedf7..e7b12a48fecb 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -12,7 +12,7 @@
 
 static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
 {
-	return __READ_ONCE((v)->counter);
+	return __READ_ONCE(v->counter);
 }
 
 static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
@@ -126,10 +126,10 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
 
 static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
 {
-	s64 val = arch_atomic64_read(v);
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
 
-	do {
-	} while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
 	return val;
 }
 #define arch_atomic64_fetch_and arch_atomic64_fetch_and
@@ -144,10 +144,10 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
 
 static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
 {
-	s64 val = arch_atomic64_read(v);
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
 
-	do {
-	} while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
 	return val;
 }
 #define arch_atomic64_fetch_or arch_atomic64_fetch_or
@@ -162,10 +162,10 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 
 static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
-	s64 val = arch_atomic64_read(v);
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 
-	do {
-	} while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 	return val;
 }
 #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions
  2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
                   ` (2 preceding siblings ...)
  2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
  2024-04-09 10:03 ` [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions Uros Bizjak
  2024-04-09 10:03 ` [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions Uros Bizjak
  5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

Move the same definitions of arch_atomic64_fetch_{and,or,xor}() from
x86/include/asm/atomic64_32.h and x86/include/asm/atomic64_64.h
to the common place in arch/x86/include/asm/atomic.h

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/atomic.h      | 30 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/atomic64_32.h | 30 ------------------------------
 arch/x86/include/asm/atomic64_64.h | 30 ------------------------------
 3 files changed, 30 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b166da21ee98..b2e44de36934 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -182,4 +182,34 @@ static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
 # include <asm/atomic64_64.h>
 #endif
 
+static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
+{
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
+
+	return val;
+}
+#define arch_atomic64_fetch_and arch_atomic64_fetch_and
+
+static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
+{
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
+
+	return val;
+}
+#define arch_atomic64_fetch_or arch_atomic64_fetch_or
+
+static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
+{
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
+
+	return val;
+}
+#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
+
 #endif /* _ASM_X86_ATOMIC_H */
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 84affd7a5d1c..4f79198da98e 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -206,16 +206,6 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
 	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
 }
 
-static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
-
-	return val;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
-
 static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
 {
 	s64 val = __READ_ONCE(v->counter);
@@ -223,16 +213,6 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
 	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
 }
 
-static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
-
-	return val;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-
 static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 {
 	s64 val = __READ_ONCE(v->counter);
@@ -240,16 +220,6 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 }
 
-static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
-
-	return val;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-
 static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
 	s64 val = __READ_ONCE(v->counter);
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index e7b12a48fecb..b2c9974ba971 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -124,16 +124,6 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
 			: "memory");
 }
 
-static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
-
-	return val;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
-
 static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
 {
 	asm volatile(LOCK_PREFIX "orq %1,%0"
@@ -142,16 +132,6 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
 			: "memory");
 }
 
-static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
-
-	return val;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-
 static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 {
 	asm volatile(LOCK_PREFIX "xorq %1,%0"
@@ -160,14 +140,4 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 			: "memory");
 }
 
-static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
-
-	return val;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-
 #endif /* _ASM_X86_ATOMIC64_64_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions
  2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
                   ` (3 preceding siblings ...)
  2024-04-09 10:03 ` [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
  2024-04-09 10:03 ` [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions Uros Bizjak
  5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

There is no need to implement arch_atomic_sub() family of inline
functions, corresponding macros can be directly implemented using
arch_atomic_add() inlines with negated argument.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/atomic.h      | 12 ++----------
 arch/x86/include/asm/atomic64_32.h | 23 ++++++++++++-----------
 arch/x86/include/asm/atomic64_64.h | 12 ++----------
 3 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b2e44de36934..77526257dacf 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -86,11 +86,7 @@ static __always_inline int arch_atomic_add_return(int i, atomic_t *v)
 }
 #define arch_atomic_add_return arch_atomic_add_return
 
-static __always_inline int arch_atomic_sub_return(int i, atomic_t *v)
-{
-	return arch_atomic_add_return(-i, v);
-}
-#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_sub_return(i, v) arch_atomic_add_return(-(i), v)
 
 static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
 {
@@ -98,11 +94,7 @@ static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
 }
 #define arch_atomic_fetch_add arch_atomic_fetch_add
 
-static __always_inline int arch_atomic_fetch_sub(int i, atomic_t *v)
-{
-	return xadd(&v->counter, -i);
-}
-#define arch_atomic_fetch_sub arch_atomic_fetch_sub
+#define arch_atomic_fetch_sub(i, v) arch_atomic_fetch_add(-(i), v)
 
 static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 {
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 4f79198da98e..862448db1207 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -199,6 +199,18 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 #undef alternative_atomic64
 #undef __alternative_atomic64
 
+static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
+{
+	s64 val = __READ_ONCE(v->counter);
+
+	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
+
+	return val;
+}
+#define arch_atomic64_fetch_add arch_atomic64_fetch_add
+
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), v)
+
 static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
 {
 	s64 val = __READ_ONCE(v->counter);
@@ -220,15 +232,4 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 }
 
-static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
-{
-	s64 val = __READ_ONCE(v->counter);
-
-	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
-	return val;
-}
-#define arch_atomic64_fetch_add arch_atomic64_fetch_add
-
-#define arch_atomic64_fetch_sub(i, v)	arch_atomic64_fetch_add(-(i), (v))
-
 #endif /* _ASM_X86_ATOMIC64_32_H */
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index b2c9974ba971..a96a4b9acfcb 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -80,11 +80,7 @@ static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 }
 #define arch_atomic64_add_return arch_atomic64_add_return
 
-static __always_inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v)
-{
-	return arch_atomic64_add_return(-i, v);
-}
-#define arch_atomic64_sub_return arch_atomic64_sub_return
+#define arch_atomic64_sub_return(i, v) arch_atomic64_add_return(-(i), v)
 
 static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
@@ -92,11 +88,7 @@ static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 }
 #define arch_atomic64_fetch_add arch_atomic64_fetch_add
 
-static __always_inline s64 arch_atomic64_fetch_sub(s64 i, atomic64_t *v)
-{
-	return xadd(&v->counter, -i);
-}
-#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), v)
 
 static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions
  2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
                   ` (4 preceding siblings ...)
  2024-04-09 10:03 ` [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
  5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

Reorder a couple of arch_atomic64 functions in
arch/x86/include/asm/atomic64_32.h to better match
their sequence of declarations between x86_32 and x86_64.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/atomic64_32.h | 46 +++++++++++++++---------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 862448db1207..3864d82a9339 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -12,7 +12,7 @@ typedef struct {
 	s64 __aligned(8) counter;
 } atomic64_t;
 
-#define ATOMIC64_INIT(val)	{ (val) }
+#define ATOMIC64_INIT(i)	{ (i) }
 
 #define __ATOMIC64_DECL(sym) void atomic64_##sym(atomic64_t *, ...)
 #ifndef ATOMIC64_EXPORT
@@ -61,17 +61,21 @@ ATOMIC64_DECL(add_unless);
 #undef __ATOMIC64_DECL
 #undef ATOMIC64_EXPORT
 
-static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
 {
-	return arch_cmpxchg64(&v->counter, old, new);
+	s64 r;
+	alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
+	return r;
 }
-#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
-static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
 {
-	return arch_try_cmpxchg64(&v->counter, old, new);
+	unsigned high = (unsigned)(i >> 32);
+	unsigned low = (unsigned)i;
+	alternative_atomic64(set, /* no output */,
+			     "S" (v), "b" (low), "c" (high)
+			     : "eax", "edx", "memory");
 }
-#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
 
 static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
 {
@@ -85,22 +89,6 @@ static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
 }
 #define arch_atomic64_xchg arch_atomic64_xchg
 
-static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
-{
-	unsigned high = (unsigned)(i >> 32);
-	unsigned low = (unsigned)i;
-	alternative_atomic64(set, /* no output */,
-			     "S" (v), "b" (low), "c" (high)
-			     : "eax", "edx", "memory");
-}
-
-static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
-{
-	s64 r;
-	alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
-	return r;
-}
-
 static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 {
 	alternative_atomic64(add_return,
@@ -199,6 +187,18 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
 #undef alternative_atomic64
 #undef __alternative_atomic64
 
+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+{
+	return arch_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
+
+static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+	return arch_try_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+
 static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
 	s64 val = __READ_ONCE(v->counter);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
  2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
@ 2024-04-09 11:07   ` Mark Rutland
  2024-04-09 11:59     ` Uros Bizjak
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2024-04-09 11:07 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 09, 2024 at 12:03:54PM +0200, Uros Bizjak wrote:
> The value preload before the cmpxchg loop does not need to be atomic,
> but should use READ_ONCE to prevent compiler from merging, refetching
> or reordering the read.
> 

Yes, and that's what arch_atomic_read() and arch_atomic64_read() do...

> This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> between x86_32 and x86_64 targets.
> 
> No functional changes intended.
> 
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/include/asm/atomic.h      |  8 ++++----
>  arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index 55a55ec04350..b166da21ee98 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
>  	 * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
>  	 * it's non-inlined function that increases binary size and stack usage.
>  	 */
> -	return __READ_ONCE((v)->counter);
> +	return __READ_ONCE(v->counter);

Removing the unncessary brackets is fine, but the commit message didn't mention this.

[...]

>  static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
>  {
> -	int val = arch_atomic_read(v);
> +	int val = __READ_ONCE(v->counter);

This is the wrong thing to do; arch_atomic_read() already has the required
semantic, and it more clearly aligns with the use of arch_atomic_try_cmpxchg()
below. It contains the documentation regarding why we use __READ_ONCE()
specifically (which we should probably note in arch_atomic64_read()).

Please leave this as-is, and likewise for the other cases below. Similarly, the
prior patch should use arch_atomic{,_64}_read() rather than using
__READ_ONCE().

[...]

>  static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
>  {
> -	s64 val = arch_atomic64_read(v);
> +	s64 val = __READ_ONCE(v->counter);
> +
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>  
> -	do {
> -	} while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>  	return val;

I think this reformatting is what you meant in the commit message when you said:

| This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
| between x86_32 and x86_64 targets.

Assuming so, can you please jsut do that, and say:

  This patch reformats the x86_64 arch_atomic{,64}_{,fetch}_{and,or,xor}()
  functions to match the x86_32 versions.

Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
@ 2024-04-09 11:13   ` Mark Rutland
  2024-04-09 12:03     ` Uros Bizjak
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2024-04-09 11:13 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 09, 2024 at 12:03:53PM +0200, Uros Bizjak wrote:
> Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions to
> use arch_atomic64_try_cmpxchg.  This implementation avoids one extra
> trip through the cmpxchg loop.
> 
> The value preload before the cmpxchg loop does not need to be atomic,
> but should use READ_ONCE to prevent compiler from merging, refetching
> or reordering the read.
> 
> The generated code improves from:
> 
>   1917d5:	31 c9                	xor    %ecx,%ecx
>   1917d7:	31 db                	xor    %ebx,%ebx
>   1917d9:	89 4c 24 3c          	mov    %ecx,0x3c(%esp)
>   1917dd:	8b 74 24 24          	mov    0x24(%esp),%esi
>   1917e1:	89 c8                	mov    %ecx,%eax
>   1917e3:	89 5c 24 34          	mov    %ebx,0x34(%esp)
>   1917e7:	8b 7c 24 28          	mov    0x28(%esp),%edi
>   1917eb:	21 ce                	and    %ecx,%esi
>   1917ed:	89 74 24 4c          	mov    %esi,0x4c(%esp)
>   1917f1:	21 df                	and    %ebx,%edi
>   1917f3:	89 de                	mov    %ebx,%esi
>   1917f5:	89 7c 24 50          	mov    %edi,0x50(%esp)
>   1917f9:	8b 54 24 4c          	mov    0x4c(%esp),%edx
>   1917fd:	8b 7c 24 2c          	mov    0x2c(%esp),%edi
>   191801:	8b 4c 24 50          	mov    0x50(%esp),%ecx
>   191805:	89 d3                	mov    %edx,%ebx
>   191807:	89 f2                	mov    %esi,%edx
>   191809:	f0 0f c7 0f          	lock cmpxchg8b (%edi)
>   19180d:	89 c1                	mov    %eax,%ecx
>   19180f:	8b 74 24 34          	mov    0x34(%esp),%esi
>   191813:	89 d3                	mov    %edx,%ebx
>   191815:	89 44 24 4c          	mov    %eax,0x4c(%esp)
>   191819:	8b 44 24 3c          	mov    0x3c(%esp),%eax
>   19181d:	89 df                	mov    %ebx,%edi
>   19181f:	89 54 24 44          	mov    %edx,0x44(%esp)
>   191823:	89 ca                	mov    %ecx,%edx
>   191825:	31 de                	xor    %ebx,%esi
>   191827:	31 c8                	xor    %ecx,%eax
>   191829:	09 f0                	or     %esi,%eax
>   19182b:	75 ac                	jne    1917d9 <...>
> 
> to:
> 
>   1912ba:	8b 06                	mov    (%esi),%eax
>   1912bc:	8b 56 04             	mov    0x4(%esi),%edx
>   1912bf:	89 44 24 3c          	mov    %eax,0x3c(%esp)
>   1912c3:	89 c1                	mov    %eax,%ecx
>   1912c5:	23 4c 24 34          	and    0x34(%esp),%ecx
>   1912c9:	89 d3                	mov    %edx,%ebx
>   1912cb:	23 5c 24 38          	and    0x38(%esp),%ebx
>   1912cf:	89 54 24 40          	mov    %edx,0x40(%esp)
>   1912d3:	89 4c 24 2c          	mov    %ecx,0x2c(%esp)
>   1912d7:	89 5c 24 30          	mov    %ebx,0x30(%esp)
>   1912db:	8b 5c 24 2c          	mov    0x2c(%esp),%ebx
>   1912df:	8b 4c 24 30          	mov    0x30(%esp),%ecx
>   1912e3:	f0 0f c7 0e          	lock cmpxchg8b (%esi)
>   1912e7:	0f 85 f3 02 00 00    	jne    1915e0 <...>
> 
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/include/asm/atomic64_32.h | 44 ++++++++++++------------------
>  1 file changed, 18 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
> index 11e817dab44a..84affd7a5d1c 100644
> --- a/arch/x86/include/asm/atomic64_32.h
> +++ b/arch/x86/include/asm/atomic64_32.h
> @@ -201,69 +201,61 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
>  
>  static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);

I reckon it's worth placing this in a helper with a big comment, e.g.

static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
{
	/*
	 * TODO: explain that this might be torn, but it occurs *once*, and can
	 * safely be consumed by atomic64_try_cmpxchg().
	 *
	 * TODO: point to the existing commentary regarding why we use
	 * __READ_ONCE() for KASAN reasons.
	 */
	return __READ_ONCE(v->counter);
}

... and then use that in each of the instances below.

That way the subtlety is clearly documented, and it'd more clearly align with
the x86_64 verions.

Mark.

>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
> -		c = old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>  }
>  
>  static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);
>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
> -		c = old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>  
> -	return old;
> +	return val;
>  }
>  #define arch_atomic64_fetch_and arch_atomic64_fetch_and
>  
>  static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);
>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
> -		c = old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
>  }
>  
>  static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);
>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
> -		c = old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
>  
> -	return old;
> +	return val;
>  }
>  #define arch_atomic64_fetch_or arch_atomic64_fetch_or
>  
>  static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);
>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
> -		c = old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
>  }
>  
>  static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);
>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
> -		c = old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
>  
> -	return old;
> +	return val;
>  }
>  #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
>  
>  static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
>  {
> -	s64 old, c = 0;
> +	s64 val = __READ_ONCE(v->counter);
>  
> -	while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c)
> -		c = old;
> -
> -	return old;
> +	do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
> +	return val;
>  }
>  #define arch_atomic64_fetch_add arch_atomic64_fetch_add
>  
> -- 
> 2.44.0
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
  2024-04-09 11:07   ` Mark Rutland
@ 2024-04-09 11:59     ` Uros Bizjak
  0 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 11:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 9, 2024 at 1:07 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Apr 09, 2024 at 12:03:54PM +0200, Uros Bizjak wrote:
> > The value preload before the cmpxchg loop does not need to be atomic,
> > but should use READ_ONCE to prevent compiler from merging, refetching
> > or reordering the read.
> >
>
> Yes, and that's what arch_atomic_read() and arch_atomic64_read() do...
>
> > This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> > between x86_32 and x86_64 targets.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > ---
> >  arch/x86/include/asm/atomic.h      |  8 ++++----
> >  arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
> >  2 files changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> > index 55a55ec04350..b166da21ee98 100644
> > --- a/arch/x86/include/asm/atomic.h
> > +++ b/arch/x86/include/asm/atomic.h
> > @@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
> >        * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
> >        * it's non-inlined function that increases binary size and stack usage.
> >        */
> > -     return __READ_ONCE((v)->counter);
> > +     return __READ_ONCE(v->counter);
>
> Removing the unncessary brackets is fine, but the commit message didn't mention this.

It was just a change in passing. I didn't think it even needed a comment.

> [...]
>
> >  static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
> >  {
> > -     int val = arch_atomic_read(v);
> > +     int val = __READ_ONCE(v->counter);
>
> This is the wrong thing to do; arch_atomic_read() already has the required
> semantic, and it more clearly aligns with the use of arch_atomic_try_cmpxchg()
> below. It contains the documentation regarding why we use __READ_ONCE()
> specifically (which we should probably note in arch_atomic64_read()).
>
> Please leave this as-is, and likewise for the other cases below. Similarly, the
> prior patch should use arch_atomic{,_64}_read() rather than using
> __READ_ONCE().

Please note that arch_atomic64_read implements true 64-bit atomic read
on x86_32. I tried to bypass this using __READ_ONCE(), but your
suggestion to use arch_atomic64_read_tearable() is indeed a much
better approach.

> [...]
>
> >  static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> >  {
> > -     s64 val = arch_atomic64_read(v);
> > +     s64 val = __READ_ONCE(v->counter);
> > +
> > +     do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> >
> > -     do {
> > -     } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> >       return val;
>
> I think this reformatting is what you meant in the commit message when you said:
>
> | This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> | between x86_32 and x86_64 targets.

Actually, it was the change to use __READ_ONCE(). The reformatting
follows the functions from atomic.h, e.g. arch_atomic_fetch_and() and
was another case of change in passing, not worth mentioning in the
commit message. I will rewrite this in v2 of the patch, so these
functions will uniformly use arch_atomic64_read_tearable().

> Assuming so, can you please jsut do that, and say:
>
>   This patch reformats the x86_64 arch_atomic{,64}_{,fetch}_{and,or,xor}()
>   functions to match the x86_32 versions.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 11:13   ` Mark Rutland
@ 2024-04-09 12:03     ` Uros Bizjak
  2024-04-09 12:50       ` Uros Bizjak
  0 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 12:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@arm.com> wrote:

> >  static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> >  {
> > -     s64 old, c = 0;
> > +     s64 val = __READ_ONCE(v->counter);
>
> I reckon it's worth placing this in a helper with a big comment, e.g.
>
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
>         /*
>          * TODO: explain that this might be torn, but it occurs *once*, and can
>          * safely be consumed by atomic64_try_cmpxchg().
>          *
>          * TODO: point to the existing commentary regarding why we use
>          * __READ_ONCE() for KASAN reasons.
>          */
>         return __READ_ONCE(v->counter);
> }
>
> ... and then use that in each of the instances below.
>
> That way the subtlety is clearly documented, and it'd more clearly align with
> the x86_64 verions.

This is an excellent idea. The separate definitions needs to be placed
in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
typedef), but it will allow the same unification of functions between
x64_32 and x64_64 as the approach with __READ_ONCE().

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 12:03     ` Uros Bizjak
@ 2024-04-09 12:50       ` Uros Bizjak
  2024-04-09 16:34         ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 12:50 UTC (permalink / raw)
  To: Mark Rutland
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> > >  static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > >  {
> > > -     s64 old, c = 0;
> > > +     s64 val = __READ_ONCE(v->counter);
> >
> > I reckon it's worth placing this in a helper with a big comment, e.g.
> >
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> >         /*
> >          * TODO: explain that this might be torn, but it occurs *once*, and can
> >          * safely be consumed by atomic64_try_cmpxchg().
> >          *
> >          * TODO: point to the existing commentary regarding why we use
> >          * __READ_ONCE() for KASAN reasons.
> >          */
> >         return __READ_ONCE(v->counter);
> > }
> >
> > ... and then use that in each of the instances below.
> >
> > That way the subtlety is clearly documented, and it'd more clearly align with
> > the x86_64 verions.
>
> This is an excellent idea. The separate definitions needs to be placed
> in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> typedef), but it will allow the same unification of functions between
> x64_32 and x64_64 as the approach with __READ_ONCE().

Something like this:

--cut here--
/*
 * This function is intended to preload the value from atomic64_t
 * location in a non-atomic way. The read might be torn, but can
 * safely be consumed by the compare-and-swap loop.
 */
static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
{
    /*
     * See the comment in arch_atomic_read() on why we use
     * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
     */
    return __READ_ONCE(v->counter);
}
--cut here--

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 12:50       ` Uros Bizjak
@ 2024-04-09 16:34         ` Mark Rutland
  2024-04-09 16:39           ` Uros Bizjak
  2024-04-09 16:53           ` Uros Bizjak
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Rutland @ 2024-04-09 16:34 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 09, 2024 at 02:50:19PM +0200, Uros Bizjak wrote:
> On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > > >  static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > > >  {
> > > > -     s64 old, c = 0;
> > > > +     s64 val = __READ_ONCE(v->counter);
> > >
> > > I reckon it's worth placing this in a helper with a big comment, e.g.
> > >
> > > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > > {
> > >         /*
> > >          * TODO: explain that this might be torn, but it occurs *once*, and can
> > >          * safely be consumed by atomic64_try_cmpxchg().
> > >          *
> > >          * TODO: point to the existing commentary regarding why we use
> > >          * __READ_ONCE() for KASAN reasons.
> > >          */
> > >         return __READ_ONCE(v->counter);
> > > }
> > >
> > > ... and then use that in each of the instances below.
> > >
> > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > the x86_64 verions.
> >
> > This is an excellent idea. The separate definitions needs to be placed
> > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > typedef), but it will allow the same unification of functions between
> > x64_32 and x64_64 as the approach with __READ_ONCE().
> 
> Something like this:
> 
> --cut here--
> /*
>  * This function is intended to preload the value from atomic64_t
>  * location in a non-atomic way. The read might be torn, but can
>  * safely be consumed by the compare-and-swap loop.
>  */
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
>     /*
>      * See the comment in arch_atomic_read() on why we use
>      * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
>      */
>     return __READ_ONCE(v->counter);
> }
> --cut here--
> 
> Thanks,
> Uros.

Yeah, something of that shape.

Having thought for a bit longer, it's probably better to use '_torn' rather
than '_tearable' (i.e. name this arch_atomic64_read_torn()).

It'd be nice if we could specify the usage restrictions a bit more clearly,
since this can only be used for compare-and-swap loops that implement
unconditional atomics. (e.g. arch_atomic64_and(), but not
arch_atomic_add_unless()).

So I'd suggest:

/*
 * Read an atomic64_t non-atomically.
 *
 * This is intended to be used in cases where a subsequent atomic operation
 * will handle the torn value, and can be used to prime the first iteration of
 * unconditional try_cmpxchg() loops, e.g.
 *
 * 	s64 val = arch_atomic64_read_torn(v);
 * 	do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
 *
 * This is NOT safe to use where the value is not always checked by a
 * subsequent atomic operation, such as in conditional try_cmpxchg() loops that
 * can break before the atomic, e.g.
 *
 * 	s64 val = arch_atomic64_read_torn(v);
 * 	do {
 * 		if (condition(val))
 * 			break;
 * 	} while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
 */
static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
{
    /* See comment in arch_atomic_read() */
    return __READ_ONCE(v->counter);
}

Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 16:34         ` Mark Rutland
@ 2024-04-09 16:39           ` Uros Bizjak
  2024-04-09 16:53           ` Uros Bizjak
  1 sibling, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 16:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 9, 2024 at 6:34 PM Mark Rutland <mark.rutland@arm.com> wrote:

> > > > ... and then use that in each of the instances below.
> > > >
> > > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > > the x86_64 verions.
> > >
> > > This is an excellent idea. The separate definitions needs to be placed
> > > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > > typedef), but it will allow the same unification of functions between
> > > x64_32 and x64_64 as the approach with __READ_ONCE().
> >
> > Something like this:
> >
> > --cut here--
> > /*
> >  * This function is intended to preload the value from atomic64_t
> >  * location in a non-atomic way. The read might be torn, but can
> >  * safely be consumed by the compare-and-swap loop.
> >  */
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> >     /*
> >      * See the comment in arch_atomic_read() on why we use
> >      * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> >      */
> >     return __READ_ONCE(v->counter);
> > }
> > --cut here--
> >
> > Thanks,
> > Uros.
>
> Yeah, something of that shape.
>
> Having thought for a bit longer, it's probably better to use '_torn' rather
> than '_tearable' (i.e. name this arch_atomic64_read_torn()).
>
> It'd be nice if we could specify the usage restrictions a bit more clearly,
> since this can only be used for compare-and-swap loops that implement
> unconditional atomics. (e.g. arch_atomic64_and(), but not
> arch_atomic_add_unless()).
>
> So I'd suggest:

Eh, just sent a v2 a second before I received your mail. I'll respin
the patchset tomorrow to include your suggested text. Please note that
v2 patch set avoids all cosmetic  changes.

Thanks,
Uros.

>
> /*
>  * Read an atomic64_t non-atomically.
>  *
>  * This is intended to be used in cases where a subsequent atomic operation
>  * will handle the torn value, and can be used to prime the first iteration of
>  * unconditional try_cmpxchg() loops, e.g.
>  *
>  *      s64 val = arch_atomic64_read_torn(v);
>  *      do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
>  *
>  * This is NOT safe to use where the value is not always checked by a
>  * subsequent atomic operation, such as in conditional try_cmpxchg() loops that
>  * can break before the atomic, e.g.
>  *
>  *      s64 val = arch_atomic64_read_torn(v);
>  *      do {
>  *              if (condition(val))
>  *                      break;
>  *      } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
>  */
> static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
> {
>     /* See comment in arch_atomic_read() */
>     return __READ_ONCE(v->counter);
> }
>
> Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
  2024-04-09 16:34         ` Mark Rutland
  2024-04-09 16:39           ` Uros Bizjak
@ 2024-04-09 16:53           ` Uros Bizjak
  1 sibling, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 16:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Peter Zijlstra

On Tue, Apr 9, 2024 at 6:34 PM Mark Rutland <mark.rutland@arm.com> wrote:

> > Something like this:
> >
> > --cut here--
> > /*
> >  * This function is intended to preload the value from atomic64_t
> >  * location in a non-atomic way. The read might be torn, but can
> >  * safely be consumed by the compare-and-swap loop.
> >  */
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> >     /*
> >      * See the comment in arch_atomic_read() on why we use
> >      * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> >      */
> >     return __READ_ONCE(v->counter);
> > }
> > --cut here--
>
> Yeah, something of that shape.
>
> Having thought for a bit longer, it's probably better to use '_torn' rather
> than '_tearable' (i.e. name this arch_atomic64_read_torn()).

How about we simply name the function

arch_atomic64_read_nonatomic()

in the sense that it reads atomic64_t variables in a non-atomic way?

Uros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-04-09 16:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
2024-04-09 11:13   ` Mark Rutland
2024-04-09 12:03     ` Uros Bizjak
2024-04-09 12:50       ` Uros Bizjak
2024-04-09 16:34         ` Mark Rutland
2024-04-09 16:39           ` Uros Bizjak
2024-04-09 16:53           ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
2024-04-09 11:07   ` Mark Rutland
2024-04-09 11:59     ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox