* [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions
@ 2024-04-09 10:03 Uros Bizjak
2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
The following patch series improves x86 arch_atomic*()
family of functions and merges x86_32 and x86_64
arch_atomic64_fetch_{and,or,xor}() functions.
The patch series enables impressive assembly code
reductions for x86_32 target and lowers future maintenace
burden and technical debt of the source code by unifying
several functions between x86_32 and x86_32 targets.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Uros Bizjak (6):
locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32
locking/atomic/x86: Rewrite x86_32
arch_atomic64_{,fetch}_{and,or,xor}() functions
locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
locking/atomic/x86: Merge x86_32 and x86_64
arch_atomic64_fetch_{and,or,xor}() functions
locking/atomic/x86: Define arch_atomic_sub() family using
arch_atomic_add() functions
locking/atomic/x86: Reorder a couple of arch_atomic64 functions
arch/x86/include/asm/atomic.h | 50 ++++++++++----
arch/x86/include/asm/atomic64_32.h | 103 ++++++++++-------------------
arch/x86/include/asm/atomic64_64.h | 44 +-----------
3 files changed, 75 insertions(+), 122 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
Introduce arch_atomic64_try_cmpxchg for 32-bit targets to use
optimized target specific implementation instead of a generic one.
This implementation eliminates dual-word compare after
cmpxchg8b instruction and improves generated asm code from:
2273: f0 0f c7 0f lock cmpxchg8b (%edi)
2277: 8b 74 24 2c mov 0x2c(%esp),%esi
227b: 89 d3 mov %edx,%ebx
227d: 89 c2 mov %eax,%edx
227f: 89 5c 24 10 mov %ebx,0x10(%esp)
2283: 8b 7c 24 30 mov 0x30(%esp),%edi
2287: 89 44 24 1c mov %eax,0x1c(%esp)
228b: 31 f2 xor %esi,%edx
228d: 89 d0 mov %edx,%eax
228f: 89 da mov %ebx,%edx
2291: 31 fa xor %edi,%edx
2293: 09 d0 or %edx,%eax
2295: 0f 85 a5 00 00 00 jne 2340 <...>
to:
2270: f0 0f c7 0f lock cmpxchg8b (%edi)
2274: 0f 85 a6 00 00 00 jne 2320 <...>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/atomic64_32.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index d510405e4e1d..11e817dab44a 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -61,12 +61,18 @@ ATOMIC64_DECL(add_unless);
#undef __ATOMIC64_DECL
#undef ATOMIC64_EXPORT
-static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
{
- return arch_cmpxchg64(&v->counter, o, n);
+ return arch_cmpxchg64(&v->counter, old, new);
}
#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
+static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+ return arch_try_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+
static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
{
s64 o;
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
2024-04-09 11:13 ` Mark Rutland
2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions to
use arch_atomic64_try_cmpxchg. This implementation avoids one extra
trip through the cmpxchg loop.
The value preload before the cmpxchg loop does not need to be atomic,
but should use READ_ONCE to prevent compiler from merging, refetching
or reordering the read.
The generated code improves from:
1917d5: 31 c9 xor %ecx,%ecx
1917d7: 31 db xor %ebx,%ebx
1917d9: 89 4c 24 3c mov %ecx,0x3c(%esp)
1917dd: 8b 74 24 24 mov 0x24(%esp),%esi
1917e1: 89 c8 mov %ecx,%eax
1917e3: 89 5c 24 34 mov %ebx,0x34(%esp)
1917e7: 8b 7c 24 28 mov 0x28(%esp),%edi
1917eb: 21 ce and %ecx,%esi
1917ed: 89 74 24 4c mov %esi,0x4c(%esp)
1917f1: 21 df and %ebx,%edi
1917f3: 89 de mov %ebx,%esi
1917f5: 89 7c 24 50 mov %edi,0x50(%esp)
1917f9: 8b 54 24 4c mov 0x4c(%esp),%edx
1917fd: 8b 7c 24 2c mov 0x2c(%esp),%edi
191801: 8b 4c 24 50 mov 0x50(%esp),%ecx
191805: 89 d3 mov %edx,%ebx
191807: 89 f2 mov %esi,%edx
191809: f0 0f c7 0f lock cmpxchg8b (%edi)
19180d: 89 c1 mov %eax,%ecx
19180f: 8b 74 24 34 mov 0x34(%esp),%esi
191813: 89 d3 mov %edx,%ebx
191815: 89 44 24 4c mov %eax,0x4c(%esp)
191819: 8b 44 24 3c mov 0x3c(%esp),%eax
19181d: 89 df mov %ebx,%edi
19181f: 89 54 24 44 mov %edx,0x44(%esp)
191823: 89 ca mov %ecx,%edx
191825: 31 de xor %ebx,%esi
191827: 31 c8 xor %ecx,%eax
191829: 09 f0 or %esi,%eax
19182b: 75 ac jne 1917d9 <...>
to:
1912ba: 8b 06 mov (%esi),%eax
1912bc: 8b 56 04 mov 0x4(%esi),%edx
1912bf: 89 44 24 3c mov %eax,0x3c(%esp)
1912c3: 89 c1 mov %eax,%ecx
1912c5: 23 4c 24 34 and 0x34(%esp),%ecx
1912c9: 89 d3 mov %edx,%ebx
1912cb: 23 5c 24 38 and 0x38(%esp),%ebx
1912cf: 89 54 24 40 mov %edx,0x40(%esp)
1912d3: 89 4c 24 2c mov %ecx,0x2c(%esp)
1912d7: 89 5c 24 30 mov %ebx,0x30(%esp)
1912db: 8b 5c 24 2c mov 0x2c(%esp),%ebx
1912df: 8b 4c 24 30 mov 0x30(%esp),%ecx
1912e3: f0 0f c7 0e lock cmpxchg8b (%esi)
1912e7: 0f 85 f3 02 00 00 jne 1915e0 <...>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/atomic64_32.h | 44 ++++++++++++------------------
1 file changed, 18 insertions(+), 26 deletions(-)
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 11e817dab44a..84affd7a5d1c 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -201,69 +201,61 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
}
static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
- return old;
+ return val;
}
#define arch_atomic64_fetch_and arch_atomic64_fetch_and
static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
}
static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
- return old;
+ return val;
}
#define arch_atomic64_fetch_or arch_atomic64_fetch_or
static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
}
static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
- return old;
+ return val;
}
#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);
- while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c)
- c = old;
-
- return old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
+ return val;
}
#define arch_atomic64_fetch_add arch_atomic64_fetch_add
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
2024-04-09 11:07 ` Mark Rutland
2024-04-09 10:03 ` [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions Uros Bizjak
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
The value preload before the cmpxchg loop does not need to be atomic,
but should use READ_ONCE to prevent compiler from merging, refetching
or reordering the read.
This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
between x86_32 and x86_64 targets.
No functional changes intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/atomic.h | 8 ++++----
arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
2 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 55a55ec04350..b166da21ee98 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
* Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
* it's non-inlined function that increases binary size and stack usage.
*/
- return __READ_ONCE((v)->counter);
+ return __READ_ONCE(v->counter);
}
static __always_inline void arch_atomic_set(atomic_t *v, int i)
@@ -132,7 +132,7 @@ static __always_inline void arch_atomic_and(int i, atomic_t *v)
static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
{
- int val = arch_atomic_read(v);
+ int val = __READ_ONCE(v->counter);
do { } while (!arch_atomic_try_cmpxchg(v, &val, val & i));
@@ -150,7 +150,7 @@ static __always_inline void arch_atomic_or(int i, atomic_t *v)
static __always_inline int arch_atomic_fetch_or(int i, atomic_t *v)
{
- int val = arch_atomic_read(v);
+ int val = __READ_ONCE(v->counter);
do { } while (!arch_atomic_try_cmpxchg(v, &val, val | i));
@@ -168,7 +168,7 @@ static __always_inline void arch_atomic_xor(int i, atomic_t *v)
static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
{
- int val = arch_atomic_read(v);
+ int val = __READ_ONCE(v->counter);
do { } while (!arch_atomic_try_cmpxchg(v, &val, val ^ i));
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 3165c0feedf7..e7b12a48fecb 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -12,7 +12,7 @@
static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
{
- return __READ_ONCE((v)->counter);
+ return __READ_ONCE(v->counter);
}
static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
@@ -126,10 +126,10 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
{
- s64 val = arch_atomic64_read(v);
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
- do {
- } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
return val;
}
#define arch_atomic64_fetch_and arch_atomic64_fetch_and
@@ -144,10 +144,10 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
{
- s64 val = arch_atomic64_read(v);
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
- do {
- } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
return val;
}
#define arch_atomic64_fetch_or arch_atomic64_fetch_or
@@ -162,10 +162,10 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
{
- s64 val = arch_atomic64_read(v);
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
- do {
- } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
return val;
}
#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
` (2 preceding siblings ...)
2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions Uros Bizjak
5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
Move the same definitions of arch_atomic64_fetch_{and,or,xor}() from
x86/include/asm/atomic64_32.h and x86/include/asm/atomic64_64.h
to the common place in arch/x86/include/asm/atomic.h
No functional changes intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/atomic.h | 30 ++++++++++++++++++++++++++++++
arch/x86/include/asm/atomic64_32.h | 30 ------------------------------
arch/x86/include/asm/atomic64_64.h | 30 ------------------------------
3 files changed, 30 insertions(+), 60 deletions(-)
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b166da21ee98..b2e44de36934 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -182,4 +182,34 @@ static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
# include <asm/atomic64_64.h>
#endif
+static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
+
+ return val;
+}
+#define arch_atomic64_fetch_and arch_atomic64_fetch_and
+
+static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
+
+ return val;
+}
+#define arch_atomic64_fetch_or arch_atomic64_fetch_or
+
+static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
+
+ return val;
+}
+#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
+
#endif /* _ASM_X86_ATOMIC_H */
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 84affd7a5d1c..4f79198da98e 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -206,16 +206,6 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
}
-static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
-
- return val;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
-
static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
@@ -223,16 +213,6 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
}
-static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
-
- return val;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-
static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
@@ -240,16 +220,6 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
}
-static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
-
- return val;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-
static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index e7b12a48fecb..b2c9974ba971 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -124,16 +124,6 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
: "memory");
}
-static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
-
- return val;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
-
static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
{
asm volatile(LOCK_PREFIX "orq %1,%0"
@@ -142,16 +132,6 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
: "memory");
}
-static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
-
- return val;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-
static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
{
asm volatile(LOCK_PREFIX "xorq %1,%0"
@@ -160,14 +140,4 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
: "memory");
}
-static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
-
- return val;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-
#endif /* _ASM_X86_ATOMIC64_64_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
` (3 preceding siblings ...)
2024-04-09 10:03 ` [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions Uros Bizjak
5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
There is no need to implement arch_atomic_sub() family of inline
functions, corresponding macros can be directly implemented using
arch_atomic_add() inlines with negated argument.
No functional changes intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/atomic.h | 12 ++----------
arch/x86/include/asm/atomic64_32.h | 23 ++++++++++++-----------
arch/x86/include/asm/atomic64_64.h | 12 ++----------
3 files changed, 16 insertions(+), 31 deletions(-)
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b2e44de36934..77526257dacf 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -86,11 +86,7 @@ static __always_inline int arch_atomic_add_return(int i, atomic_t *v)
}
#define arch_atomic_add_return arch_atomic_add_return
-static __always_inline int arch_atomic_sub_return(int i, atomic_t *v)
-{
- return arch_atomic_add_return(-i, v);
-}
-#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_sub_return(i, v) arch_atomic_add_return(-(i), v)
static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
{
@@ -98,11 +94,7 @@ static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
}
#define arch_atomic_fetch_add arch_atomic_fetch_add
-static __always_inline int arch_atomic_fetch_sub(int i, atomic_t *v)
-{
- return xadd(&v->counter, -i);
-}
-#define arch_atomic_fetch_sub arch_atomic_fetch_sub
+#define arch_atomic_fetch_sub(i, v) arch_atomic_fetch_add(-(i), v)
static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
{
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 4f79198da98e..862448db1207 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -199,6 +199,18 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
#undef alternative_atomic64
#undef __alternative_atomic64
+static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
+
+ return val;
+}
+#define arch_atomic64_fetch_add arch_atomic64_fetch_add
+
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), v)
+
static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
@@ -220,15 +232,4 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
}
-static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
- return val;
-}
-#define arch_atomic64_fetch_add arch_atomic64_fetch_add
-
-#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), (v))
-
#endif /* _ASM_X86_ATOMIC64_32_H */
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index b2c9974ba971..a96a4b9acfcb 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -80,11 +80,7 @@ static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
}
#define arch_atomic64_add_return arch_atomic64_add_return
-static __always_inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v)
-{
- return arch_atomic64_add_return(-i, v);
-}
-#define arch_atomic64_sub_return arch_atomic64_sub_return
+#define arch_atomic64_sub_return(i, v) arch_atomic64_add_return(-(i), v)
static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
@@ -92,11 +88,7 @@ static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
}
#define arch_atomic64_fetch_add arch_atomic64_fetch_add
-static __always_inline s64 arch_atomic64_fetch_sub(s64 i, atomic64_t *v)
-{
- return xadd(&v->counter, -i);
-}
-#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), v)
static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
` (4 preceding siblings ...)
2024-04-09 10:03 ` [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions Uros Bizjak
@ 2024-04-09 10:03 ` Uros Bizjak
5 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 10:03 UTC (permalink / raw)
To: x86, linux-kernel
Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
Reorder a couple of arch_atomic64 functions in
arch/x86/include/asm/atomic64_32.h to better match
their sequence of declarations between x86_32 and x86_64.
No functional changes intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/atomic64_32.h | 46 +++++++++++++++---------------
1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 862448db1207..3864d82a9339 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -12,7 +12,7 @@ typedef struct {
s64 __aligned(8) counter;
} atomic64_t;
-#define ATOMIC64_INIT(val) { (val) }
+#define ATOMIC64_INIT(i) { (i) }
#define __ATOMIC64_DECL(sym) void atomic64_##sym(atomic64_t *, ...)
#ifndef ATOMIC64_EXPORT
@@ -61,17 +61,21 @@ ATOMIC64_DECL(add_unless);
#undef __ATOMIC64_DECL
#undef ATOMIC64_EXPORT
-static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
{
- return arch_cmpxchg64(&v->counter, old, new);
+ s64 r;
+ alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
+ return r;
}
-#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
-static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
{
- return arch_try_cmpxchg64(&v->counter, old, new);
+ unsigned high = (unsigned)(i >> 32);
+ unsigned low = (unsigned)i;
+ alternative_atomic64(set, /* no output */,
+ "S" (v), "b" (low), "c" (high)
+ : "eax", "edx", "memory");
}
-#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
{
@@ -85,22 +89,6 @@ static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
}
#define arch_atomic64_xchg arch_atomic64_xchg
-static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
-{
- unsigned high = (unsigned)(i >> 32);
- unsigned low = (unsigned)i;
- alternative_atomic64(set, /* no output */,
- "S" (v), "b" (low), "c" (high)
- : "eax", "edx", "memory");
-}
-
-static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
-{
- s64 r;
- alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
- return r;
-}
-
static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
{
alternative_atomic64(add_return,
@@ -199,6 +187,18 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
#undef alternative_atomic64
#undef __alternative_atomic64
+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+{
+ return arch_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
+
+static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+ return arch_try_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+
static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
@ 2024-04-09 11:07 ` Mark Rutland
2024-04-09 11:59 ` Uros Bizjak
0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2024-04-09 11:07 UTC (permalink / raw)
To: Uros Bizjak
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 09, 2024 at 12:03:54PM +0200, Uros Bizjak wrote:
> The value preload before the cmpxchg loop does not need to be atomic,
> but should use READ_ONCE to prevent compiler from merging, refetching
> or reordering the read.
>
Yes, and that's what arch_atomic_read() and arch_atomic64_read() do...
> This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> between x86_32 and x86_64 targets.
>
> No functional changes intended.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
> arch/x86/include/asm/atomic.h | 8 ++++----
> arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
> 2 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index 55a55ec04350..b166da21ee98 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
> * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
> * it's non-inlined function that increases binary size and stack usage.
> */
> - return __READ_ONCE((v)->counter);
> + return __READ_ONCE(v->counter);
Removing the unncessary brackets is fine, but the commit message didn't mention this.
[...]
> static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
> {
> - int val = arch_atomic_read(v);
> + int val = __READ_ONCE(v->counter);
This is the wrong thing to do; arch_atomic_read() already has the required
semantic, and it more clearly aligns with the use of arch_atomic_try_cmpxchg()
below. It contains the documentation regarding why we use __READ_ONCE()
specifically (which we should probably note in arch_atomic64_read()).
Please leave this as-is, and likewise for the other cases below. Similarly, the
prior patch should use arch_atomic{,_64}_read() rather than using
__READ_ONCE().
[...]
> static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> {
> - s64 val = arch_atomic64_read(v);
> + s64 val = __READ_ONCE(v->counter);
> +
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>
> - do {
> - } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> return val;
I think this reformatting is what you meant in the commit message when you said:
| This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
| between x86_32 and x86_64 targets.
Assuming so, can you please jsut do that, and say:
This patch reformats the x86_64 arch_atomic{,64}_{,fetch}_{and,or,xor}()
functions to match the x86_32 versions.
Mark.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
@ 2024-04-09 11:13 ` Mark Rutland
2024-04-09 12:03 ` Uros Bizjak
0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2024-04-09 11:13 UTC (permalink / raw)
To: Uros Bizjak
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 09, 2024 at 12:03:53PM +0200, Uros Bizjak wrote:
> Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions to
> use arch_atomic64_try_cmpxchg. This implementation avoids one extra
> trip through the cmpxchg loop.
>
> The value preload before the cmpxchg loop does not need to be atomic,
> but should use READ_ONCE to prevent compiler from merging, refetching
> or reordering the read.
>
> The generated code improves from:
>
> 1917d5: 31 c9 xor %ecx,%ecx
> 1917d7: 31 db xor %ebx,%ebx
> 1917d9: 89 4c 24 3c mov %ecx,0x3c(%esp)
> 1917dd: 8b 74 24 24 mov 0x24(%esp),%esi
> 1917e1: 89 c8 mov %ecx,%eax
> 1917e3: 89 5c 24 34 mov %ebx,0x34(%esp)
> 1917e7: 8b 7c 24 28 mov 0x28(%esp),%edi
> 1917eb: 21 ce and %ecx,%esi
> 1917ed: 89 74 24 4c mov %esi,0x4c(%esp)
> 1917f1: 21 df and %ebx,%edi
> 1917f3: 89 de mov %ebx,%esi
> 1917f5: 89 7c 24 50 mov %edi,0x50(%esp)
> 1917f9: 8b 54 24 4c mov 0x4c(%esp),%edx
> 1917fd: 8b 7c 24 2c mov 0x2c(%esp),%edi
> 191801: 8b 4c 24 50 mov 0x50(%esp),%ecx
> 191805: 89 d3 mov %edx,%ebx
> 191807: 89 f2 mov %esi,%edx
> 191809: f0 0f c7 0f lock cmpxchg8b (%edi)
> 19180d: 89 c1 mov %eax,%ecx
> 19180f: 8b 74 24 34 mov 0x34(%esp),%esi
> 191813: 89 d3 mov %edx,%ebx
> 191815: 89 44 24 4c mov %eax,0x4c(%esp)
> 191819: 8b 44 24 3c mov 0x3c(%esp),%eax
> 19181d: 89 df mov %ebx,%edi
> 19181f: 89 54 24 44 mov %edx,0x44(%esp)
> 191823: 89 ca mov %ecx,%edx
> 191825: 31 de xor %ebx,%esi
> 191827: 31 c8 xor %ecx,%eax
> 191829: 09 f0 or %esi,%eax
> 19182b: 75 ac jne 1917d9 <...>
>
> to:
>
> 1912ba: 8b 06 mov (%esi),%eax
> 1912bc: 8b 56 04 mov 0x4(%esi),%edx
> 1912bf: 89 44 24 3c mov %eax,0x3c(%esp)
> 1912c3: 89 c1 mov %eax,%ecx
> 1912c5: 23 4c 24 34 and 0x34(%esp),%ecx
> 1912c9: 89 d3 mov %edx,%ebx
> 1912cb: 23 5c 24 38 and 0x38(%esp),%ebx
> 1912cf: 89 54 24 40 mov %edx,0x40(%esp)
> 1912d3: 89 4c 24 2c mov %ecx,0x2c(%esp)
> 1912d7: 89 5c 24 30 mov %ebx,0x30(%esp)
> 1912db: 8b 5c 24 2c mov 0x2c(%esp),%ebx
> 1912df: 8b 4c 24 30 mov 0x30(%esp),%ecx
> 1912e3: f0 0f c7 0e lock cmpxchg8b (%esi)
> 1912e7: 0f 85 f3 02 00 00 jne 1915e0 <...>
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
> arch/x86/include/asm/atomic64_32.h | 44 ++++++++++++------------------
> 1 file changed, 18 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
> index 11e817dab44a..84affd7a5d1c 100644
> --- a/arch/x86/include/asm/atomic64_32.h
> +++ b/arch/x86/include/asm/atomic64_32.h
> @@ -201,69 +201,61 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
>
> static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
I reckon it's worth placing this in a helper with a big comment, e.g.
static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
{
/*
* TODO: explain that this might be torn, but it occurs *once*, and can
* safely be consumed by atomic64_try_cmpxchg().
*
* TODO: point to the existing commentary regarding why we use
* __READ_ONCE() for KASAN reasons.
*/
return __READ_ONCE(v->counter);
}
... and then use that in each of the instances below.
That way the subtlety is clearly documented, and it'd more clearly align with
the x86_64 verions.
Mark.
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> }
>
> static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>
> - return old;
> + return val;
> }
> #define arch_atomic64_fetch_and arch_atomic64_fetch_and
>
> static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
> }
>
> static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
>
> - return old;
> + return val;
> }
> #define arch_atomic64_fetch_or arch_atomic64_fetch_or
>
> static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
> }
>
> static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
>
> - return old;
> + return val;
> }
> #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
>
> static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c)
> - c = old;
> -
> - return old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
> + return val;
> }
> #define arch_atomic64_fetch_add arch_atomic64_fetch_add
>
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
2024-04-09 11:07 ` Mark Rutland
@ 2024-04-09 11:59 ` Uros Bizjak
0 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 11:59 UTC (permalink / raw)
To: Mark Rutland
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 9, 2024 at 1:07 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Apr 09, 2024 at 12:03:54PM +0200, Uros Bizjak wrote:
> > The value preload before the cmpxchg loop does not need to be atomic,
> > but should use READ_ONCE to prevent compiler from merging, refetching
> > or reordering the read.
> >
>
> Yes, and that's what arch_atomic_read() and arch_atomic64_read() do...
>
> > This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> > between x86_32 and x86_64 targets.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > ---
> > arch/x86/include/asm/atomic.h | 8 ++++----
> > arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
> > 2 files changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> > index 55a55ec04350..b166da21ee98 100644
> > --- a/arch/x86/include/asm/atomic.h
> > +++ b/arch/x86/include/asm/atomic.h
> > @@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
> > * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
> > * it's non-inlined function that increases binary size and stack usage.
> > */
> > - return __READ_ONCE((v)->counter);
> > + return __READ_ONCE(v->counter);
>
> Removing the unncessary brackets is fine, but the commit message didn't mention this.
It was just a change in passing. I didn't think it even needed a comment.
> [...]
>
> > static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
> > {
> > - int val = arch_atomic_read(v);
> > + int val = __READ_ONCE(v->counter);
>
> This is the wrong thing to do; arch_atomic_read() already has the required
> semantic, and it more clearly aligns with the use of arch_atomic_try_cmpxchg()
> below. It contains the documentation regarding why we use __READ_ONCE()
> specifically (which we should probably note in arch_atomic64_read()).
>
> Please leave this as-is, and likewise for the other cases below. Similarly, the
> prior patch should use arch_atomic{,_64}_read() rather than using
> __READ_ONCE().
Please note that arch_atomic64_read implements true 64-bit atomic read
on x86_32. I tried to bypass this using __READ_ONCE(), but your
suggestion to use arch_atomic64_read_tearable() is indeed a much
better approach.
> [...]
>
> > static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> > {
> > - s64 val = arch_atomic64_read(v);
> > + s64 val = __READ_ONCE(v->counter);
> > +
> > + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> >
> > - do {
> > - } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> > return val;
>
> I think this reformatting is what you meant in the commit message when you said:
>
> | This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> | between x86_32 and x86_64 targets.
Actually, it was the change to use __READ_ONCE(). The reformatting
follows the functions from atomic.h, e.g. arch_atomic_fetch_and() and
was another case of change in passing, not worth mentioning in the
commit message. I will rewrite this in v2 of the patch, so these
functions will uniformly use arch_atomic64_read_tearable().
> Assuming so, can you please jsut do that, and say:
>
> This patch reformats the x86_64 arch_atomic{,64}_{,fetch}_{and,or,xor}()
> functions to match the x86_32 versions.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 11:13 ` Mark Rutland
@ 2024-04-09 12:03 ` Uros Bizjak
2024-04-09 12:50 ` Uros Bizjak
0 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 12:03 UTC (permalink / raw)
To: Mark Rutland
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > {
> > - s64 old, c = 0;
> > + s64 val = __READ_ONCE(v->counter);
>
> I reckon it's worth placing this in a helper with a big comment, e.g.
>
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
> /*
> * TODO: explain that this might be torn, but it occurs *once*, and can
> * safely be consumed by atomic64_try_cmpxchg().
> *
> * TODO: point to the existing commentary regarding why we use
> * __READ_ONCE() for KASAN reasons.
> */
> return __READ_ONCE(v->counter);
> }
>
> ... and then use that in each of the instances below.
>
> That way the subtlety is clearly documented, and it'd more clearly align with
> the x86_64 verions.
This is an excellent idea. The separate definitions needs to be placed
in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
typedef), but it will allow the same unification of functions between
x64_32 and x64_64 as the approach with __READ_ONCE().
Thanks,
Uros.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 12:03 ` Uros Bizjak
@ 2024-04-09 12:50 ` Uros Bizjak
2024-04-09 16:34 ` Mark Rutland
0 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 12:50 UTC (permalink / raw)
To: Mark Rutland
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> > > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > > {
> > > - s64 old, c = 0;
> > > + s64 val = __READ_ONCE(v->counter);
> >
> > I reckon it's worth placing this in a helper with a big comment, e.g.
> >
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> > /*
> > * TODO: explain that this might be torn, but it occurs *once*, and can
> > * safely be consumed by atomic64_try_cmpxchg().
> > *
> > * TODO: point to the existing commentary regarding why we use
> > * __READ_ONCE() for KASAN reasons.
> > */
> > return __READ_ONCE(v->counter);
> > }
> >
> > ... and then use that in each of the instances below.
> >
> > That way the subtlety is clearly documented, and it'd more clearly align with
> > the x86_64 verions.
>
> This is an excellent idea. The separate definitions needs to be placed
> in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> typedef), but it will allow the same unification of functions between
> x64_32 and x64_64 as the approach with __READ_ONCE().
Something like this:
--cut here--
/*
* This function is intended to preload the value from atomic64_t
* location in a non-atomic way. The read might be torn, but can
* safely be consumed by the compare-and-swap loop.
*/
static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
{
/*
* See the comment in arch_atomic_read() on why we use
* __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
*/
return __READ_ONCE(v->counter);
}
--cut here--
Thanks,
Uros.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 12:50 ` Uros Bizjak
@ 2024-04-09 16:34 ` Mark Rutland
2024-04-09 16:39 ` Uros Bizjak
2024-04-09 16:53 ` Uros Bizjak
0 siblings, 2 replies; 15+ messages in thread
From: Mark Rutland @ 2024-04-09 16:34 UTC (permalink / raw)
To: Uros Bizjak
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 09, 2024 at 02:50:19PM +0200, Uros Bizjak wrote:
> On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > > > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > > > {
> > > > - s64 old, c = 0;
> > > > + s64 val = __READ_ONCE(v->counter);
> > >
> > > I reckon it's worth placing this in a helper with a big comment, e.g.
> > >
> > > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > > {
> > > /*
> > > * TODO: explain that this might be torn, but it occurs *once*, and can
> > > * safely be consumed by atomic64_try_cmpxchg().
> > > *
> > > * TODO: point to the existing commentary regarding why we use
> > > * __READ_ONCE() for KASAN reasons.
> > > */
> > > return __READ_ONCE(v->counter);
> > > }
> > >
> > > ... and then use that in each of the instances below.
> > >
> > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > the x86_64 verions.
> >
> > This is an excellent idea. The separate definitions needs to be placed
> > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > typedef), but it will allow the same unification of functions between
> > x64_32 and x64_64 as the approach with __READ_ONCE().
>
> Something like this:
>
> --cut here--
> /*
> * This function is intended to preload the value from atomic64_t
> * location in a non-atomic way. The read might be torn, but can
> * safely be consumed by the compare-and-swap loop.
> */
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
> /*
> * See the comment in arch_atomic_read() on why we use
> * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> */
> return __READ_ONCE(v->counter);
> }
> --cut here--
>
> Thanks,
> Uros.
Yeah, something of that shape.
Having thought for a bit longer, it's probably better to use '_torn' rather
than '_tearable' (i.e. name this arch_atomic64_read_torn()).
It'd be nice if we could specify the usage restrictions a bit more clearly,
since this can only be used for compare-and-swap loops that implement
unconditional atomics. (e.g. arch_atomic64_and(), but not
arch_atomic_add_unless()).
So I'd suggest:
/*
* Read an atomic64_t non-atomically.
*
* This is intended to be used in cases where a subsequent atomic operation
* will handle the torn value, and can be used to prime the first iteration of
* unconditional try_cmpxchg() loops, e.g.
*
* s64 val = arch_atomic64_read_torn(v);
* do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
*
* This is NOT safe to use where the value is not always checked by a
* subsequent atomic operation, such as in conditional try_cmpxchg() loops that
* can break before the atomic, e.g.
*
* s64 val = arch_atomic64_read_torn(v);
* do {
* if (condition(val))
* break;
* } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
*/
static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
{
/* See comment in arch_atomic_read() */
return __READ_ONCE(v->counter);
}
Mark.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 16:34 ` Mark Rutland
@ 2024-04-09 16:39 ` Uros Bizjak
2024-04-09 16:53 ` Uros Bizjak
1 sibling, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 16:39 UTC (permalink / raw)
To: Mark Rutland
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 9, 2024 at 6:34 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > ... and then use that in each of the instances below.
> > > >
> > > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > > the x86_64 verions.
> > >
> > > This is an excellent idea. The separate definitions needs to be placed
> > > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > > typedef), but it will allow the same unification of functions between
> > > x64_32 and x64_64 as the approach with __READ_ONCE().
> >
> > Something like this:
> >
> > --cut here--
> > /*
> > * This function is intended to preload the value from atomic64_t
> > * location in a non-atomic way. The read might be torn, but can
> > * safely be consumed by the compare-and-swap loop.
> > */
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> > /*
> > * See the comment in arch_atomic_read() on why we use
> > * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> > */
> > return __READ_ONCE(v->counter);
> > }
> > --cut here--
> >
> > Thanks,
> > Uros.
>
> Yeah, something of that shape.
>
> Having thought for a bit longer, it's probably better to use '_torn' rather
> than '_tearable' (i.e. name this arch_atomic64_read_torn()).
>
> It'd be nice if we could specify the usage restrictions a bit more clearly,
> since this can only be used for compare-and-swap loops that implement
> unconditional atomics. (e.g. arch_atomic64_and(), but not
> arch_atomic_add_unless()).
>
> So I'd suggest:
Eh, just sent a v2 a second before I received your mail. I'll respin
the patchset tomorrow to include your suggested text. Please note that
v2 patch set avoids all cosmetic changes.
Thanks,
Uros.
>
> /*
> * Read an atomic64_t non-atomically.
> *
> * This is intended to be used in cases where a subsequent atomic operation
> * will handle the torn value, and can be used to prime the first iteration of
> * unconditional try_cmpxchg() loops, e.g.
> *
> * s64 val = arch_atomic64_read_torn(v);
> * do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
> *
> * This is NOT safe to use where the value is not always checked by a
> * subsequent atomic operation, such as in conditional try_cmpxchg() loops that
> * can break before the atomic, e.g.
> *
> * s64 val = arch_atomic64_read_torn(v);
> * do {
> * if (condition(val))
> * break;
> * } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
> */
> static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
> {
> /* See comment in arch_atomic_read() */
> return __READ_ONCE(v->counter);
> }
>
> Mark.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
2024-04-09 16:34 ` Mark Rutland
2024-04-09 16:39 ` Uros Bizjak
@ 2024-04-09 16:53 ` Uros Bizjak
1 sibling, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2024-04-09 16:53 UTC (permalink / raw)
To: Mark Rutland
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Peter Zijlstra
On Tue, Apr 9, 2024 at 6:34 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > Something like this:
> >
> > --cut here--
> > /*
> > * This function is intended to preload the value from atomic64_t
> > * location in a non-atomic way. The read might be torn, but can
> > * safely be consumed by the compare-and-swap loop.
> > */
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> > /*
> > * See the comment in arch_atomic_read() on why we use
> > * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> > */
> > return __READ_ONCE(v->counter);
> > }
> > --cut here--
>
> Yeah, something of that shape.
>
> Having thought for a bit longer, it's probably better to use '_torn' rather
> than '_tearable' (i.e. name this arch_atomic64_read_torn()).
How about we simply name the function
arch_atomic64_read_nonatomic()
in the sense that it reads atomic64_t variables in a non-atomic way?
Uros.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-04-09 16:53 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-09 10:03 [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32 Uros Bizjak
2024-04-09 10:03 ` [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions Uros Bizjak
2024-04-09 11:13 ` Mark Rutland
2024-04-09 12:03 ` Uros Bizjak
2024-04-09 12:50 ` Uros Bizjak
2024-04-09 16:34 ` Mark Rutland
2024-04-09 16:39 ` Uros Bizjak
2024-04-09 16:53 ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops Uros Bizjak
2024-04-09 11:07 ` Mark Rutland
2024-04-09 11:59 ` Uros Bizjak
2024-04-09 10:03 ` [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions Uros Bizjak
2024-04-09 10:03 ` [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox