* [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64
@ 2018-02-15 15:29 Will Deacon
2018-02-15 15:29 ` [RFC PATCH 1/5] arm64: fpsimd: include <linux/init.h> in fpsimd.h Will Deacon
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Will Deacon @ 2018-02-15 15:29 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
Whilst investigating a livelock in fs/dcache.c [1], I noticed that the
arm64 test_and_set operation always writes back to memory even if the
value is already set. This led me to start hacking on improved versions
of our bitops, including an acquire version of test_and_set_bit_lock.
Since there was nothing arm64-specific about the resulting code, I figured
I'd replace what's sitting in asm-generic/bitops/atomic.h and simply include
that instead. I had to rejig a couple of #includes so that I can call into
atomic_* ops from bitops.h, but that was actually pretty straightforward.
Feedback welcome,
Will
[1] https://lkml.org/lkml/2018/2/13/414
--->8
Will Deacon (5):
arm64: fpsimd: include <linux/init.h> in fpsimd.h
asm-generic: Avoid including linux/kernel.h in asm-generic/bug.h
asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
arm64: Replace our atomic bitops implementation with asm-generic
arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h>
arch/arm64/include/asm/bitops.h | 20 +---
arch/arm64/include/asm/fpsimd.h | 1 +
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/bitops.S | 76 -------------
include/asm-generic/bitops/atomic.h | 219 ++++++++++++------------------------
include/asm-generic/bug.h | 2 +-
lib/errseq.c | 1 +
7 files changed, 77 insertions(+), 244 deletions(-)
delete mode 100644 arch/arm64/lib/bitops.S
--
2.1.4
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 1/5] arm64: fpsimd: include <linux/init.h> in fpsimd.h
2018-02-15 15:29 [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64 Will Deacon
@ 2018-02-15 15:29 ` Will Deacon
2018-02-15 15:29 ` [RFC PATCH 2/5] asm-generic: Avoid including linux/kernel.h in asm-generic/bug.h Will Deacon
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2018-02-15 15:29 UTC (permalink / raw)
To: linux-arm-kernel
fpsimd.h uses the __init annotation, so pull in linux/init.h
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/fpsimd.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 8857a0f0d0f7..fc3527b985ca 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -22,6 +22,7 @@
#ifndef __ASSEMBLY__
#include <linux/cache.h>
+#include <linux/init.h>
#include <linux/stddef.h>
/*
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 2/5] asm-generic: Avoid including linux/kernel.h in asm-generic/bug.h
2018-02-15 15:29 [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64 Will Deacon
2018-02-15 15:29 ` [RFC PATCH 1/5] arm64: fpsimd: include <linux/init.h> in fpsimd.h Will Deacon
@ 2018-02-15 15:29 ` Will Deacon
2018-02-15 15:29 ` [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_* Will Deacon
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2018-02-15 15:29 UTC (permalink / raw)
To: linux-arm-kernel
asm-generic/bug.h unnecessarily includes linux/kernel.h whereas it can
get away with linux/types.h instead. lib/errseq.c relies on this transitive
include, so update it to include linux/kernel.h instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/bug.h | 2 +-
lib/errseq.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 963b755d19b0..ffda1247f639 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -15,7 +15,7 @@
#endif
#ifndef __ASSEMBLY__
-#include <linux/kernel.h>
+#include <linux/types.h>
#ifdef CONFIG_BUG
diff --git a/lib/errseq.c b/lib/errseq.c
index df782418b333..ef3b10516eab 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -3,6 +3,7 @@
#include <linux/bug.h>
#include <linux/atomic.h>
#include <linux/errseq.h>
+#include <linux/kernel.h>
/*
* An errseq_t is a way of recording errors in one place, and allowing any
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-15 15:29 [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64 Will Deacon
2018-02-15 15:29 ` [RFC PATCH 1/5] arm64: fpsimd: include <linux/init.h> in fpsimd.h Will Deacon
2018-02-15 15:29 ` [RFC PATCH 2/5] asm-generic: Avoid including linux/kernel.h in asm-generic/bug.h Will Deacon
@ 2018-02-15 15:29 ` Will Deacon
2018-02-15 17:08 ` Peter Zijlstra
2018-02-15 15:29 ` [RFC PATCH 4/5] arm64: Replace our atomic bitops implementation with asm-generic Will Deacon
2018-02-15 15:29 ` [RFC PATCH 5/5] arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h> Will Deacon
4 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-02-15 15:29 UTC (permalink / raw)
To: linux-arm-kernel
The atomic bitops can actually be implemented pretty efficiently using
the atomic_fetch_* ops, rather than explicit use of spinlocks.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/bitops/atomic.h | 219 ++++++++++++------------------------
1 file changed, 71 insertions(+), 148 deletions(-)
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index 04deffaf5f7d..69b7915e291d 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -2,189 +2,112 @@
#ifndef _ASM_GENERIC_BITOPS_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_ATOMIC_H_
-#include <asm/types.h>
-#include <linux/irqflags.h>
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>
-#ifdef CONFIG_SMP
-#include <asm/spinlock.h>
-#include <asm/cache.h> /* we use L1_CACHE_BYTES */
-
-/* Use an array of spinlocks for our atomic_ts.
- * Hash function to index into a different SPINLOCK.
- * Since "a" is usually an address, use one spinlock per cacheline.
+/*
+ * Implementation of atomic bitops using atomic-fetch ops.
+ * See Documentation/atomic_bitops.txt for details.
*/
-# define ATOMIC_HASH_SIZE 4
-# define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
-
-extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
-
-/* Can't use raw_spin_lock_irq because of #include problems, so
- * this is the substitute */
-#define _atomic_spin_lock_irqsave(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- local_irq_save(f); \
- arch_spin_lock(s); \
-} while(0)
-
-#define _atomic_spin_unlock_irqrestore(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- arch_spin_unlock(s); \
- local_irq_restore(f); \
-} while(0)
+static inline void set_bit(unsigned int nr, volatile unsigned long *p)
+{
+ p += BIT_WORD(nr);
+ atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
+}
-#else
-# define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0)
-# define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0)
-#endif
+static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
+{
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
+}
-/*
- * NMI events can occur at any time, including when interrupts have been
- * disabled by *_irqsave(). So you can get NMI events occurring while a
- * *_bit function is holding a spin lock. If the NMI handler also wants
- * to do bit manipulation (and they do) then you can get a deadlock
- * between the original caller of *_bit() and the NMI handler.
- *
- * by Keith Owens
- */
+static inline void change_bit(unsigned int nr, volatile unsigned long *p)
+{
+ p += BIT_WORD(nr);
+ atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
+}
-/**
- * set_bit - Atomically set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * This function is atomic and may not be reordered. See __set_bit()
- * if you do not require the atomic guarantees.
- *
- * Note: there are no guarantees that this function will not be reordered
- * on non x86 architectures, so if you are writing portable code,
- * make sure not to rely on its reordering guarantees.
- *
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void set_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- *p |= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
+
+ old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * clear_bit - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered. However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
- * in order to ensure changes are visible on other processors.
- */
-static inline void clear_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- *p &= ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (!(READ_ONCE(*p) & mask))
+ return 0;
+
+ old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * change_bit - Toggle a bit in memory
- * @nr: Bit to change
- * @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered. It may be
- * reordered on other architectures than x86.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void change_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_change_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- *p ^= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_xor(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It may be reordered on other architectures than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_set_bit_lock(unsigned int nr,
+ volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old | mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
- return (old & mask) != 0;
+ old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It can be reorderdered on other architectures other than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline void clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
+}
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old & ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+static inline void __clear_bit_unlock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ unsigned long old;
- return (old & mask) != 0;
+ p += BIT_WORD(nr);
+ old = READ_ONCE(*p);
+ old &= ~BIT_MASK(nr);
+ smp_store_release(p, old);
}
-/**
- * test_and_change_bit - Change a bit and return its old value
- * @nr: Bit to change
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
- */
-static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
+#ifndef clear_bit_unlock_is_negative_byte
+static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
+ volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old ^ mask;
- _atomic_spin_unlock_irqrestore(p, flags);
- return (old & mask) != 0;
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_andnot_release(mask, (atomic_long_t *)p);
+ return !!(old & BIT(7));
}
+#define clear_bit_unlock_is_negative_byte clear_bit_unlock_is_negative_byte
+#endif
#endif /* _ASM_GENERIC_BITOPS_ATOMIC_H */
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 4/5] arm64: Replace our atomic bitops implementation with asm-generic
2018-02-15 15:29 [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64 Will Deacon
` (2 preceding siblings ...)
2018-02-15 15:29 ` [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_* Will Deacon
@ 2018-02-15 15:29 ` Will Deacon
2018-02-15 15:29 ` [RFC PATCH 5/5] arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h> Will Deacon
4 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2018-02-15 15:29 UTC (permalink / raw)
To: linux-arm-kernel
The asm-generic/bitops/atomic.h implementation is built around the
atomic-fetch ops, which we implement efficiently for both LSE and LL/SC
systems. Use that instead of our hand-rolled, out-of-line bitops.S.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/bitops.h | 13 +------
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/bitops.S | 76 -----------------------------------------
3 files changed, 2 insertions(+), 89 deletions(-)
delete mode 100644 arch/arm64/lib/bitops.S
diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 9c19594ce7cb..817233fbf249 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -17,22 +17,11 @@
#define __ASM_BITOPS_H
#include <linux/compiler.h>
-#include <asm/barrier.h>
#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
#endif
-/*
- * Little endian assembly atomic bitops.
- */
-extern void set_bit(int nr, volatile unsigned long *p);
-extern void clear_bit(int nr, volatile unsigned long *p);
-extern void change_bit(int nr, volatile unsigned long *p);
-extern int test_and_set_bit(int nr, volatile unsigned long *p);
-extern int test_and_clear_bit(int nr, volatile unsigned long *p);
-extern int test_and_change_bit(int nr, volatile unsigned long *p);
-
#include <asm-generic/bitops/builtin-__ffs.h>
#include <asm-generic/bitops/builtin-ffs.h>
#include <asm-generic/bitops/builtin-__fls.h>
@@ -44,8 +33,8 @@ extern int test_and_change_bit(int nr, volatile unsigned long *p);
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/atomic.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 4e696f96451f..73095a04c0ad 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-lib-y := bitops.o clear_user.o delay.o copy_from_user.o \
+lib-y := clear_user.o delay.o copy_from_user.o \
copy_to_user.o copy_in_user.o copy_page.o \
clear_page.o memchr.o memcpy.o memmove.o memset.o \
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
deleted file mode 100644
index 43ac736baa5b..000000000000
--- a/arch/arm64/lib/bitops.S
+++ /dev/null
@@ -1,76 +0,0 @@
-/*
- * Based on arch/arm/lib/bitops.h
- *
- * Copyright (C) 2013 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/lse.h>
-
-/*
- * x0: bits 5:0 bit offset
- * bits 31:6 word offset
- * x1: address
- */
- .macro bitop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x3, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x3, [x1]"
-alt_lse " \llsc x2, x2, x3", "nop"
-alt_lse " stxr w0, x2, [x1]", "nop"
-alt_lse " cbnz w0, 1b", "nop"
-
- ret
-ENDPROC(\name )
- .endm
-
- .macro testop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x4, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x4, x2, [x1]"
- lsr x0, x2, x3
-alt_lse " \llsc x2, x2, x4", "nop"
-alt_lse " stlxr w5, x2, [x1]", "nop"
-alt_lse " cbnz w5, 1b", "nop"
-alt_lse " dmb ish", "nop"
-
- and x0, x0, #1
- ret
-ENDPROC(\name )
- .endm
-
-/*
- * Atomic bit operations.
- */
- bitop change_bit, eor, steor
- bitop clear_bit, bic, stclr
- bitop set_bit, orr, stset
-
- testop test_and_change_bit, eor, ldeoral
- testop test_and_clear_bit, bic, ldclral
- testop test_and_set_bit, orr, ldsetal
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 5/5] arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h>
2018-02-15 15:29 [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64 Will Deacon
` (3 preceding siblings ...)
2018-02-15 15:29 ` [RFC PATCH 4/5] arm64: Replace our atomic bitops implementation with asm-generic Will Deacon
@ 2018-02-15 15:29 ` Will Deacon
4 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2018-02-15 15:29 UTC (permalink / raw)
To: linux-arm-kernel
asm-generic/bitops/ext2-atomic-setbit.h provides the ext2 atomic bitop
definitions, so we don't need to define our own.
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/include/asm/bitops.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 817233fbf249..564ffe562b7a 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -37,11 +37,6 @@
#include <asm-generic/bitops/atomic.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>
-
-/*
- * Ext2 is defined to use little-endian byte ordering.
- */
-#define ext2_set_bit_atomic(lock, nr, p) test_and_set_bit_le(nr, p)
-#define ext2_clear_bit_atomic(lock, nr, p) test_and_clear_bit_le(nr, p)
+#include <asm-generic/bitops/ext2-atomic-setbit.h>
#endif /* __ASM_BITOPS_H */
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-15 15:29 ` [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_* Will Deacon
@ 2018-02-15 17:08 ` Peter Zijlstra
2018-02-15 18:20 ` Will Deacon
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-15 17:08 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Feb 15, 2018 at 03:29:33PM +0000, Will Deacon wrote:
> +static inline void set_bit(unsigned int nr, volatile unsigned long *p)
> +{
> + p += BIT_WORD(nr);
> + atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> +}
>
> +static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
> +{
> + p += BIT_WORD(nr);
> + atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> +}
>
> +static inline void change_bit(unsigned int nr, volatile unsigned long *p)
> +{
> + p += BIT_WORD(nr);
> + atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> +}
>
> +static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
> {
> + long old;
> unsigned long mask = BIT_MASK(nr);
>
> + p += BIT_WORD(nr);
> + if (READ_ONCE(*p) & mask)
> + return 1;
> +
> + old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
> + return !!(old & mask);
> }
>
> +static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
> {
> + long old;
> unsigned long mask = BIT_MASK(nr);
>
> + p += BIT_WORD(nr);
> + if (!(READ_ONCE(*p) & mask))
> + return 0;
> +
> + old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
> + return !!(old & mask);
> }
>
> +static inline int test_and_change_bit(unsigned int nr, volatile unsigned long *p)
> {
> + long old;
> unsigned long mask = BIT_MASK(nr);
>
> + p += BIT_WORD(nr);
> + old = atomic_long_fetch_xor(mask, (atomic_long_t *)p);
> + return !!(old & mask);
> }
>
> +static inline int test_and_set_bit_lock(unsigned int nr,
> + volatile unsigned long *p)
> {
> + long old;
> unsigned long mask = BIT_MASK(nr);
>
> + p += BIT_WORD(nr);
> + if (READ_ONCE(*p) & mask)
> + return 1;
>
> + old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
> + return !!(old & mask);
> }
>
> +static inline void clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
> {
> + p += BIT_WORD(nr);
> + atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
> +}
>
> +static inline void __clear_bit_unlock(unsigned int nr,
> + volatile unsigned long *p)
> +{
> + unsigned long old;
>
> + p += BIT_WORD(nr);
> + old = READ_ONCE(*p);
> + old &= ~BIT_MASK(nr);
> + smp_store_release(p, old);
This should be atomic_set_release() I think, for the special case where
atomics are implemented with spinlocks, see the 'fun' case in
Documentation/atomic_t.txt.
> }
The only other comment is that I think it would be better if you use
atomic_t instead of atomic_long_t. It would just mean changing
BIT_WORD() and BIT_MASK().
The reason is that we generate a pretty sane set of atomic_t primitives
as long as the architecture supplies cmpxchg, but atomic64 defaults to
utter crap, even on 64bit platforms.
Otherwise this looks pretty neat.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-15 17:08 ` Peter Zijlstra
@ 2018-02-15 18:20 ` Will Deacon
2018-02-16 10:21 ` Peter Zijlstra
2018-02-16 10:35 ` Peter Zijlstra
0 siblings, 2 replies; 15+ messages in thread
From: Will Deacon @ 2018-02-15 18:20 UTC (permalink / raw)
To: linux-arm-kernel
Hi Peter,
Thanks for having a look.
On Thu, Feb 15, 2018 at 06:08:47PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 03:29:33PM +0000, Will Deacon wrote:
> > +static inline void __clear_bit_unlock(unsigned int nr,
> > + volatile unsigned long *p)
> > +{
> > + unsigned long old;
> >
> > + p += BIT_WORD(nr);
> > + old = READ_ONCE(*p);
> > + old &= ~BIT_MASK(nr);
> > + smp_store_release(p, old);
>
> This should be atomic_set_release() I think, for the special case where
> atomics are implemented with spinlocks, see the 'fun' case in
> Documentation/atomic_t.txt.
My understanding of __clear_bit_unlock is that there is guaranteed to be
no concurrent accesses to the same word, so why would it matter whether
locks are used to implement atomics?
> The only other comment is that I think it would be better if you use
> atomic_t instead of atomic_long_t. It would just mean changing
> BIT_WORD() and BIT_MASK().
It would make it pretty messy for big-endian architectures, I think...
> The reason is that we generate a pretty sane set of atomic_t primitives
> as long as the architecture supplies cmpxchg, but atomic64 defaults to
> utter crap, even on 64bit platforms.
I think all the architectures using this today are 32-bit:
blackfin
c6x
cris
metag
openrisc
sh
xtensa
and I don't know how much we should care about optimising the generic atomic
bitops for 64-bit architectures that rely on spinlocks for 64-bit atomics!
Will
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-15 18:20 ` Will Deacon
@ 2018-02-16 10:21 ` Peter Zijlstra
2018-02-19 14:01 ` Will Deacon
2018-02-16 10:35 ` Peter Zijlstra
1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-16 10:21 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Feb 15, 2018 at 06:20:49PM +0000, Will Deacon wrote:
> On Thu, Feb 15, 2018 at 06:08:47PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 03:29:33PM +0000, Will Deacon wrote:
> > > +static inline void __clear_bit_unlock(unsigned int nr,
> > > + volatile unsigned long *p)
> > > +{
> > > + unsigned long old;
> > >
> > > + p += BIT_WORD(nr);
> > > + old = READ_ONCE(*p);
> > > + old &= ~BIT_MASK(nr);
> > > + smp_store_release(p, old);
> >
> > This should be atomic_set_release() I think, for the special case where
> > atomics are implemented with spinlocks, see the 'fun' case in
> > Documentation/atomic_t.txt.
>
> My understanding of __clear_bit_unlock is that there is guaranteed to be
> no concurrent accesses to the same word, so why would it matter whether
> locks are used to implement atomics?
commit f75d48644c56a31731d17fa693c8175328957e1d
Author: Peter Zijlstra <peterz@infradead.org>
Date: Wed Mar 9 12:40:54 2016 +0100
bitops: Do not default to __clear_bit() for __clear_bit_unlock()
__clear_bit_unlock() is a special little snowflake. While it carries the
non-atomic '__' prefix, it is specifically documented to pair with
test_and_set_bit() and therefore should be 'somewhat' atomic.
Therefore the generic implementation of __clear_bit_unlock() cannot use
the fully non-atomic __clear_bit() as a default.
If an arch is able to do better; is must provide an implementation of
__clear_bit_unlock() itself.
Specifically, this came up as a result of hackbench livelock'ing in
slab_lock() on ARC with SMP + SLUB + !LLSC.
The issue was incorrect pairing of atomic ops.
slab_lock() -> bit_spin_lock() -> test_and_set_bit()
slab_unlock() -> __bit_spin_unlock() -> __clear_bit()
The non serializing __clear_bit() was getting "lost"
80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set
80543b90: or r3,r2,1 <--- (B) other core unlocks right here
80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock)
Fixes ARC STAR 9000817404 (and probably more).
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable at vger.kernel.org
Link: http://lkml.kernel.org/r/20160309114054.GJ6356 at twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index c30266e94806..8ef0ccbf8167 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -29,16 +29,16 @@ do { \
* @nr: the bit to set
* @addr: the address to start counting from
*
- * This operation is like clear_bit_unlock, however it is not atomic.
- * It does provide release barrier semantics so it can be used to unlock
- * a bit lock, however it would only be used if no other CPU can modify
- * any bits in the memory until the lock is released (a good example is
- * if the bit lock itself protects access to the other bits in the word).
+ * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all
+ * the bits in the word are protected by this lock some archs can use weaker
+ * ops to safely unlock.
+ *
+ * See for example x86's implementation.
*/
#define __clear_bit_unlock(nr, addr) \
do { \
- smp_mb(); \
- __clear_bit(nr, addr); \
+ smp_mb__before_atomic(); \
+ clear_bit(nr, addr); \
} while (0)
#endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-15 18:20 ` Will Deacon
2018-02-16 10:21 ` Peter Zijlstra
@ 2018-02-16 10:35 ` Peter Zijlstra
2018-02-16 14:18 ` Peter Zijlstra
2018-02-19 14:01 ` Will Deacon
1 sibling, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-16 10:35 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Feb 15, 2018 at 06:20:49PM +0000, Will Deacon wrote:
> > The only other comment is that I think it would be better if you use
> > atomic_t instead of atomic_long_t. It would just mean changing
> > BIT_WORD() and BIT_MASK().
>
> It would make it pretty messy for big-endian architectures, I think...
Urgh, the big.little indians strike again.. Bah I always forget about
that.
#define BIT_U32_MASK(nr) (1UL << ((nr) % 32))
#define BIT_U32_WORD(nr) (((nr) / 32) ^ (4 * __BIG_ENDIAN__))
Or something like that might work, but I always get these things wrong.
> > The reason is that we generate a pretty sane set of atomic_t primitives
> > as long as the architecture supplies cmpxchg, but atomic64 defaults to
> > utter crap, even on 64bit platforms.
>
> I think all the architectures using this today are 32-bit:
>
> blackfin
> c6x
> cris
> metag
> openrisc
> sh
> xtensa
>
> and I don't know how much we should care about optimising the generic atomic
> bitops for 64-bit architectures that rely on spinlocks for 64-bit atomics!
You're probably right, but it just bugs me that we default to such
horrible crap. Arguably we should do a better default for atomic64_t on
64bit archs. But that's for another time.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-16 10:35 ` Peter Zijlstra
@ 2018-02-16 14:18 ` Peter Zijlstra
2018-02-19 14:01 ` Will Deacon
1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-16 14:18 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Feb 16, 2018 at 11:35:20AM +0100, Peter Zijlstra wrote:
> #define BIT_U32_MASK(nr) (1UL << ((nr) % 32))
> #define BIT_U32_WORD(nr) (((nr) / 32) ^ (4 * __BIG_ENDIAN__))
s/4 *//
it's already a 4 byte offset.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-16 10:21 ` Peter Zijlstra
@ 2018-02-19 14:01 ` Will Deacon
2018-02-19 14:04 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-02-19 14:01 UTC (permalink / raw)
To: linux-arm-kernel
Hi Peter,
On Fri, Feb 16, 2018 at 11:21:00AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 06:20:49PM +0000, Will Deacon wrote:
> > On Thu, Feb 15, 2018 at 06:08:47PM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 15, 2018 at 03:29:33PM +0000, Will Deacon wrote:
> > > > +static inline void __clear_bit_unlock(unsigned int nr,
> > > > + volatile unsigned long *p)
> > > > +{
> > > > + unsigned long old;
> > > >
> > > > + p += BIT_WORD(nr);
> > > > + old = READ_ONCE(*p);
> > > > + old &= ~BIT_MASK(nr);
> > > > + smp_store_release(p, old);
> > >
> > > This should be atomic_set_release() I think, for the special case where
> > > atomics are implemented with spinlocks, see the 'fun' case in
> > > Documentation/atomic_t.txt.
> >
> > My understanding of __clear_bit_unlock is that there is guaranteed to be
> > no concurrent accesses to the same word, so why would it matter whether
> > locks are used to implement atomics?
>
>
> commit f75d48644c56a31731d17fa693c8175328957e1d
> Author: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Mar 9 12:40:54 2016 +0100
>
> bitops: Do not default to __clear_bit() for __clear_bit_unlock()
>
> __clear_bit_unlock() is a special little snowflake. While it carries the
> non-atomic '__' prefix, it is specifically documented to pair with
> test_and_set_bit() and therefore should be 'somewhat' atomic.
>
> Therefore the generic implementation of __clear_bit_unlock() cannot use
> the fully non-atomic __clear_bit() as a default.
>
> If an arch is able to do better; is must provide an implementation of
> __clear_bit_unlock() itself.
>
> Specifically, this came up as a result of hackbench livelock'ing in
> slab_lock() on ARC with SMP + SLUB + !LLSC.
>
> The issue was incorrect pairing of atomic ops.
>
> slab_lock() -> bit_spin_lock() -> test_and_set_bit()
> slab_unlock() -> __bit_spin_unlock() -> __clear_bit()
>
> The non serializing __clear_bit() was getting "lost"
>
> 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set
> 80543b90: or r3,r2,1 <--- (B) other core unlocks right here
> 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock)
Ah, so it's problematic for the case where atomics are built using locks.
Got it. I'll err on the side of caution here and have the asm-generic header
(which should be bitops/lock.h not bitops/atomic.h) conditionally define
__clear_bit_unlock as clear_bit_lock unless the architecture has provided
its own implementation.
Thanks,
Will
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-16 10:35 ` Peter Zijlstra
2018-02-16 14:18 ` Peter Zijlstra
@ 2018-02-19 14:01 ` Will Deacon
2018-02-19 14:05 ` Peter Zijlstra
1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-02-19 14:01 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Feb 16, 2018 at 11:35:20AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 06:20:49PM +0000, Will Deacon wrote:
>
> > > The only other comment is that I think it would be better if you use
> > > atomic_t instead of atomic_long_t. It would just mean changing
> > > BIT_WORD() and BIT_MASK().
> >
> > It would make it pretty messy for big-endian architectures, I think...
>
> Urgh, the big.little indians strike again.. Bah I always forget about
> that.
>
> #define BIT_U32_MASK(nr) (1UL << ((nr) % 32))
> #define BIT_U32_WORD(nr) (((nr) / 32) ^ (4 * __BIG_ENDIAN__))
>
> Or something like that might work, but I always get these things wrong.
>
> > > The reason is that we generate a pretty sane set of atomic_t primitives
> > > as long as the architecture supplies cmpxchg, but atomic64 defaults to
> > > utter crap, even on 64bit platforms.
> >
> > I think all the architectures using this today are 32-bit:
> >
> > blackfin
> > c6x
> > cris
> > metag
> > openrisc
> > sh
> > xtensa
> >
> > and I don't know how much we should care about optimising the generic atomic
> > bitops for 64-bit architectures that rely on spinlocks for 64-bit atomics!
>
> You're probably right, but it just bugs me that we default to such
> horrible crap. Arguably we should do a better default for atomic64_t on
> 64bit archs. But that's for another time.
If it's defined, then we could consider using cmpxchg64 to build atomic64
instead of the locks. But even then, I'm not sure we're really helping
anybody out in practice.
Will
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-19 14:01 ` Will Deacon
@ 2018-02-19 14:04 ` Peter Zijlstra
0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-19 14:04 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Feb 19, 2018 at 02:01:43PM +0000, Will Deacon wrote:
> > The non serializing __clear_bit() was getting "lost"
> >
> > 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set
> > 80543b90: or r3,r2,1 <--- (B) other core unlocks right here
> > 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock)
>
> Ah, so it's problematic for the case where atomics are built using locks.
> Got it. I'll err on the side of caution here and have the asm-generic header
> (which should be bitops/lock.h not bitops/atomic.h) conditionally define
> __clear_bit_unlock as clear_bit_lock unless the architecture has provided
> its own implementation.
So I think we get it all right if we use atomic_set_release(). If the
atomics are implemented using locks, atomic_set*() should be implemented
like atomic_xchg() and avoid the above problem.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
2018-02-19 14:01 ` Will Deacon
@ 2018-02-19 14:05 ` Peter Zijlstra
0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-19 14:05 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Feb 19, 2018 at 02:01:51PM +0000, Will Deacon wrote:
> If it's defined, then we could consider using cmpxchg64 to build atomic64
> instead of the locks. But even then, I'm not sure we're really helping
> anybody out in practice.
yeah, most 64bit archs have more atomics or ll/sc and would not use it
anyway I suppose.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-02-19 14:05 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-15 15:29 [RFC PATCH 0/5] Rewrite asm-generic/bitops/atomic.h and use on arm64 Will Deacon
2018-02-15 15:29 ` [RFC PATCH 1/5] arm64: fpsimd: include <linux/init.h> in fpsimd.h Will Deacon
2018-02-15 15:29 ` [RFC PATCH 2/5] asm-generic: Avoid including linux/kernel.h in asm-generic/bug.h Will Deacon
2018-02-15 15:29 ` [RFC PATCH 3/5] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_* Will Deacon
2018-02-15 17:08 ` Peter Zijlstra
2018-02-15 18:20 ` Will Deacon
2018-02-16 10:21 ` Peter Zijlstra
2018-02-19 14:01 ` Will Deacon
2018-02-19 14:04 ` Peter Zijlstra
2018-02-16 10:35 ` Peter Zijlstra
2018-02-16 14:18 ` Peter Zijlstra
2018-02-19 14:01 ` Will Deacon
2018-02-19 14:05 ` Peter Zijlstra
2018-02-15 15:29 ` [RFC PATCH 4/5] arm64: Replace our atomic bitops implementation with asm-generic Will Deacon
2018-02-15 15:29 ` [RFC PATCH 5/5] arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h> Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).