* [PATCH] fix a race condition in cancelable mcs spinlocks
@ 2014-06-01 17:53 Mikulas Patocka
2014-06-01 19:20 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 61+ messages in thread
From: Mikulas Patocka @ 2014-06-01 17:53 UTC (permalink / raw)
To: Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin,
linux-parisc, linux-kernel
Cc: chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr,
hpa, andi, aswin, scott.norton, Jason Low
The cancelable MCS spinlocks introduced in
fb0527bd5ea99bfeb2dd91e3c1433ecf745d6b99 break the kernel on PA-RISC.
How to reproduce:
* Use a machine with two dual-core PA-8900 processors.
* Run the LVM testsuite and compile the kernel in an endless loop at the
same time.
* Wait for an hour or two and the kernel locks up.
You see some processes locked up in osd_lock and osq_unlock:
INFO: rcu_sched self-detected stall on CPU { 2} (t=18000 jiffies g=247335 c=247334 q=101)
CPU: 2 PID: 21006 Comm: lvm Tainted: G O 3.15.0-rc7 #9
Backtrace:
[<000000004013e8a4>] show_stack+0x14/0x20
[<00000000403016f0>] dump_stack+0x88/0x100
[<00000000401b8738>] rcu_check_callbacks+0x4a8/0x900
[<00000000401714c4>] update_process_times+0x64/0xc0
[<000000004013fa24>] timer_interrupt+0x19c/0x200
[<00000000401ad8d8>] handle_irq_event_percpu+0xa8/0x238
[<00000000401b2454>] handle_percpu_irq+0x9c/0xd0
[<00000000401acc40>] generic_handle_irq+0x40/0x50
[<00000000401408cc>] do_cpu_irq_mask+0x1ac/0x298
[<000000004012c074>] intr_return+0x0/0xc
[<00000000401a609c>] osq_lock+0xc4/0x178
[<0000000040138d24>] __mutex_lock_slowpath+0x1cc/0x290
[<0000000040138e78>] mutex_lock+0x90/0x98
[<00000000402a5614>] kernfs_activate+0x6c/0x1a0
[<00000000402a59e0>] kernfs_add_one+0x140/0x190
[<00000000402a75ec>] __kernfs_create_file+0xa4/0xf8
INFO: rcu_sched self-detected stall on CPU { 3} (t=18473 jiffies g=247335 c=247334 q=101)
CPU: 3 PID: 21051 Comm: udevd Tainted: G O 3.15.0-rc7 #9
Backtrace:
[<000000004013e8a4>] show_stack+0x14/0x20
[<00000000403016f0>] dump_stack+0x88/0x100
[<00000000401b8738>] rcu_check_callbacks+0x4a8/0x900
[<00000000401714c4>] update_process_times+0x64/0xc0
[<000000004013fa24>] timer_interrupt+0x19c/0x200
[<00000000401ad8d8>] handle_irq_event_percpu+0xa8/0x238
[<00000000401b2454>] handle_percpu_irq+0x9c/0xd0
[<00000000401acc40>] generic_handle_irq+0x40/0x50
[<00000000401408cc>] do_cpu_irq_mask+0x1ac/0x298
[<000000004012c074>] intr_return+0x0/0xc
[<00000000401a6220>] osq_unlock+0xd0/0xf8
[<0000000040138dcc>] __mutex_lock_slowpath+0x274/0x290
[<0000000040138e78>] mutex_lock+0x90/0x98
[<00000000402a3a90>] kernfs_dop_revalidate+0x48/0x108
[<0000000040233310>] lookup_fast+0x320/0x348
[<0000000040234600>] link_path_walk+0x190/0x9d8
The code in kernel/locking/mcs_spinlock.c is broken.
PA-RISC doesn't have xchg or cmpxchg atomic instructions like other
processors. It only has ldcw and ldcd instructions that load a word (or
doubleword) from memory and atomically store zero at the same location.
These instructions can only be used to implement spinlocks, direct
implementation of other atomic operations is impossible.
Consequently, Linux xchg and cmpxchg functions are implemented in such a
way that they hash the address, use the hash to index a spinlock, take the
spinlock, perform the xchg or cmpxchg operation non-atomically and drop
the spinlock.
If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at
the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock,
so, in this case, cmpxchg or xchg isn't really atomic at all.
This patch fixes the bug by introducing a new type atomic_pointer_t
(backed by atomic_long_t) and replacing the offending pointer with it.
atomic_long_set takes the hashed spinlock, so it avoids the race
condition.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
include/asm-generic/atomic-long.h | 24 ++++++++++++++++++++++++
kernel/locking/mcs_spinlock.c | 16 ++++++++--------
kernel/locking/mcs_spinlock.h | 4 +++-
3 files changed, 35 insertions(+), 9 deletions(-)
Index: linux-3.15-rc7/kernel/locking/mcs_spinlock.c
===================================================================
--- linux-3.15-rc7.orig/kernel/locking/mcs_spinlock.c 2014-05-31 19:05:24.000000000 +0200
+++ linux-3.15-rc7/kernel/locking/mcs_spinlock.c 2014-06-01 14:31:58.000000000 +0200
@@ -47,8 +47,8 @@ osq_wait_next(struct optimistic_spin_que
* wait for either @lock to point to us, through its Step-B, or
* wait for a new @node->next from its Step-C.
*/
- if (node->next) {
- next = xchg(&node->next, NULL);
+ if (atomic_pointer_read(&node->next)) {
+ next = atomic_pointer_xchg(&node->next, NULL);
if (next)
break;
}
@@ -65,13 +65,13 @@ bool osq_lock(struct optimistic_spin_que
struct optimistic_spin_queue *prev, *next;
node->locked = 0;
- node->next = NULL;
+ atomic_pointer_set(&node->next, NULL);
node->prev = prev = xchg(lock, node);
if (likely(prev == NULL))
return true;
- ACCESS_ONCE(prev->next) = node;
+ atomic_pointer_set(&prev->next, node);
/*
* Normally @prev is untouchable after the above store; because at that
@@ -103,8 +103,8 @@ unqueue:
*/
for (;;) {
- if (prev->next == node &&
- cmpxchg(&prev->next, node, NULL) == node)
+ if (atomic_pointer_read(&prev->next) == node &&
+ atomic_pointer_cmpxchg(&prev->next, node, NULL) == node)
break;
/*
@@ -144,7 +144,7 @@ unqueue:
*/
ACCESS_ONCE(next->prev) = prev;
- ACCESS_ONCE(prev->next) = next;
+ atomic_pointer_set(&prev->next, next);
return false;
}
@@ -163,7 +163,7 @@ void osq_unlock(struct optimistic_spin_q
/*
* Second most likely case.
*/
- next = xchg(&node->next, NULL);
+ next = atomic_pointer_xchg(&node->next, NULL);
if (next) {
ACCESS_ONCE(next->locked) = 1;
return;
Index: linux-3.15-rc7/kernel/locking/mcs_spinlock.h
===================================================================
--- linux-3.15-rc7.orig/kernel/locking/mcs_spinlock.h 2014-05-31 19:01:01.000000000 +0200
+++ linux-3.15-rc7/kernel/locking/mcs_spinlock.h 2014-06-01 14:17:49.000000000 +0200
@@ -13,6 +13,7 @@
#define __LINUX_MCS_SPINLOCK_H
#include <asm/mcs_spinlock.h>
+#include <linux/atomic.h>
struct mcs_spinlock {
struct mcs_spinlock *next;
@@ -119,7 +120,8 @@ void mcs_spin_unlock(struct mcs_spinlock
*/
struct optimistic_spin_queue {
- struct optimistic_spin_queue *next, *prev;
+ atomic_pointer_t next;
+ struct optimistic_spin_queue *prev;
int locked; /* 1 if lock acquired */
};
Index: linux-3.15-rc7/include/asm-generic/atomic-long.h
===================================================================
--- linux-3.15-rc7.orig/include/asm-generic/atomic-long.h 2014-06-01 14:04:17.000000000 +0200
+++ linux-3.15-rc7/include/asm-generic/atomic-long.h 2014-06-01 14:30:19.000000000 +0200
@@ -255,4 +255,28 @@ static inline long atomic_long_add_unles
#endif /* BITS_PER_LONG == 64 */
+typedef atomic_long_t atomic_pointer_t;
+
+#define ATOMIC_POINTER_INIT(i) ATOMIC_LONG_INIT((long)(i))
+
+static inline void *atomic_pointer_read(atomic_pointer_t *v)
+{
+ return (void *)atomic_long_read(v);
+}
+
+static inline void atomic_pointer_set(atomic_pointer_t *v, void *i)
+{
+ atomic_long_set(v, (long)i);
+}
+
+static inline void *atomic_pointer_xchg(atomic_pointer_t *v, void *i)
+{
+ return (void *)atomic_long_xchg(v, (long)i);
+}
+
+static inline void *atomic_pointer_cmpxchg(atomic_pointer_t *v, void *old, void *new)
+{
+ return (void *)atomic_long_cmpxchg(v, (long)old, (long)new);
+}
+
#endif /* _ASM_GENERIC_ATOMIC_LONG_H */
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 17:53 [PATCH] fix a race condition in cancelable mcs spinlocks Mikulas Patocka @ 2014-06-01 19:20 ` Peter Zijlstra 2014-06-01 20:46 ` John David Anglin 2014-06-02 10:34 ` Mikulas Patocka 2014-06-02 14:14 ` Waiman Long 2014-06-02 16:00 ` [PATCH v2] introduce atomic_pointer to " Mikulas Patocka 2 siblings, 2 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-01 19:20 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, Jun 01, 2014 at 01:53:11PM -0400, Mikulas Patocka wrote: > PA-RISC doesn't have xchg or cmpxchg atomic instructions like other > processors. It only has ldcw and ldcd instructions that load a word (or > doubleword) from memory and atomically store zero at the same location. > These instructions can only be used to implement spinlocks, direct > implementation of other atomic operations is impossible. > > Consequently, Linux xchg and cmpxchg functions are implemented in such a > way that they hash the address, use the hash to index a spinlock, take the > spinlock, perform the xchg or cmpxchg operation non-atomically and drop > the spinlock. > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > so, in this case, cmpxchg or xchg isn't really atomic at all. And this is really the first place in the kernel that breaks like this? I've been using xchg() and cmpxchg() without such consideration for quite a while. Doesn't sparc32 have similarly broken atomic ops? Ideally, if we really want to preserve such broken-ness, we'd add some debugging infrastructure to detect such nonsense. > This patch fixes the bug by introducing a new type atomic_pointer_t > (backed by atomic_long_t) and replacing the offending pointer with it. > atomic_long_set takes the hashed spinlock, so it avoids the race > condition. So I hate that twice, once since xchg() and cmpxchg() do not share the atomic_ prefix, its inappropriate and misleading here, and secondly, non of this is specific to pointers, xchg() and cmpxchg() take any (naturally aligned) 'native' size type. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 19:20 ` Peter Zijlstra @ 2014-06-01 20:46 ` John David Anglin 2014-06-01 21:30 ` Peter Zijlstra 2014-06-02 13:58 ` Mikulas Patocka 2014-06-02 10:34 ` Mikulas Patocka 1 sibling, 2 replies; 61+ messages in thread From: John David Anglin @ 2014-06-01 20:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: >> If you write to some variable with ACCESS_ONCE and use cmpxchg or >> xchg at >> the same time, you break it. ACCESS_ONCE doesn't take the hashed >> spinlock, >> so, in this case, cmpxchg or xchg isn't really atomic at all. > > And this is really the first place in the kernel that breaks like > this? > I've been using xchg() and cmpxchg() without such consideration for > quite a while. I believe Mikulas is correct. Even in a controlled situation where a cmpxchg operation is used to implement pthread_spin_lock() in userspace, we found recently that the lock must be released with a cmpxchg operation and not a simple write on SMP systems. There is a race in the cache operations or instruction ordering that's not present with the ldcw instruction. Dave -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 20:46 ` John David Anglin @ 2014-06-01 21:30 ` Peter Zijlstra 2014-06-01 21:46 ` Paul E. McKenney ` (2 more replies) 2014-06-02 13:58 ` Mikulas Patocka 1 sibling, 3 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-01 21:30 UTC (permalink / raw) To: John David Anglin Cc: Mikulas Patocka, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > >>at > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > >>spinlock, > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > >And this is really the first place in the kernel that breaks like this? > >I've been using xchg() and cmpxchg() without such consideration for > >quite a while. > > I believe Mikulas is correct. Even in a controlled situation where a > cmpxchg operation > is used to implement pthread_spin_lock() in userspace, we found recently > that the lock > must be released with a cmpxchg operation and not a simple write on SMP > systems. > There is a race in the cache operations or instruction ordering that's not > present with > the ldcw instruction. Oh, I'm not arguing that. He's quite right that its broken, but this form of atomic ops is also quite insane and unusual. Most sane machines don't have this problem. My main concern is how are we going to avoid breaking parisc (and I think sparc32, which is similarly retarded) in the future; we should invest in machinery to find and detect these things. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 21:30 ` Peter Zijlstra @ 2014-06-01 21:46 ` Paul E. McKenney 2014-06-02 9:19 ` Mikulas Patocka 2014-06-02 19:56 ` James Bottomley 2 siblings, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-01 21:46 UTC (permalink / raw) To: Peter Zijlstra Cc: John David Anglin, Mikulas Patocka, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, Jun 01, 2014 at 11:30:03PM +0200, Peter Zijlstra wrote: > On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > > >>at > > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > > >>spinlock, > > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > >And this is really the first place in the kernel that breaks like this? > > >I've been using xchg() and cmpxchg() without such consideration for > > >quite a while. > > > > I believe Mikulas is correct. Even in a controlled situation where a > > cmpxchg operation > > is used to implement pthread_spin_lock() in userspace, we found recently > > that the lock > > must be released with a cmpxchg operation and not a simple write on SMP > > systems. > > There is a race in the cache operations or instruction ordering that's not > > present with > > the ldcw instruction. > > Oh, I'm not arguing that. He's quite right that its broken, but this > form of atomic ops is also quite insane and unusual. Most sane machines > don't have this problem. > > My main concern is how are we going to avoid breaking parisc (and I > think sparc32, which is similarly retarded) in the future; we should > invest in machinery to find and detect these things. I cannot see an easy way to fix this by making ACCESS_ONCE() arch-dependent. But could the compiler help out by recognizing ACCESS_ONCE() and generating the needed code for it on sparc and pa-risc? Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 21:30 ` Peter Zijlstra 2014-06-01 21:46 ` Paul E. McKenney @ 2014-06-02 9:19 ` Mikulas Patocka 2014-06-02 13:24 ` Paul E. McKenney 2014-06-02 19:56 ` James Bottomley 2 siblings, 1 reply; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 9:19 UTC (permalink / raw) To: Peter Zijlstra Cc: John David Anglin, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, 1 Jun 2014, Peter Zijlstra wrote: > On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > > >>at > > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > > >>spinlock, > > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > >And this is really the first place in the kernel that breaks like this? > > >I've been using xchg() and cmpxchg() without such consideration for > > >quite a while. > > > > I believe Mikulas is correct. Even in a controlled situation where a > > cmpxchg operation > > is used to implement pthread_spin_lock() in userspace, we found recently > > that the lock > > must be released with a cmpxchg operation and not a simple write on SMP > > systems. > > There is a race in the cache operations or instruction ordering that's not > > present with > > the ldcw instruction. > > Oh, I'm not arguing that. He's quite right that its broken, but this > form of atomic ops is also quite insane and unusual. Most sane machines > don't have this problem. > > My main concern is how are we going to avoid breaking parisc (and I > think sparc32, which is similarly retarded) in the future; we should > invest in machinery to find and detect these things. Grep the kernel for "\<xchg\>" and "\<cmpxchg\>" and replace them with atomic types and atomic access functions. Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 9:19 ` Mikulas Patocka @ 2014-06-02 13:24 ` Paul E. McKenney 2014-06-02 15:57 ` Mikulas Patocka 0 siblings, 1 reply; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 13:24 UTC (permalink / raw) To: Mikulas Patocka Cc: Peter Zijlstra, John David Anglin, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 05:19:39AM -0400, Mikulas Patocka wrote: > > > On Sun, 1 Jun 2014, Peter Zijlstra wrote: > > > On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > > > >>at > > > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > > > >>spinlock, > > > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > > > >And this is really the first place in the kernel that breaks like this? > > > >I've been using xchg() and cmpxchg() without such consideration for > > > >quite a while. > > > > > > I believe Mikulas is correct. Even in a controlled situation where a > > > cmpxchg operation > > > is used to implement pthread_spin_lock() in userspace, we found recently > > > that the lock > > > must be released with a cmpxchg operation and not a simple write on SMP > > > systems. > > > There is a race in the cache operations or instruction ordering that's not > > > present with > > > the ldcw instruction. > > > > Oh, I'm not arguing that. He's quite right that its broken, but this > > form of atomic ops is also quite insane and unusual. Most sane machines > > don't have this problem. > > > > My main concern is how are we going to avoid breaking parisc (and I > > think sparc32, which is similarly retarded) in the future; we should > > invest in machinery to find and detect these things. > > Grep the kernel for "\<xchg\>" and "\<cmpxchg\>" and replace them with > atomic types and atomic access functions. Not so good for pointers, though. Defeats type-checking, for one thing. An example of this is use of xchg() for atomically enqueuing RCU callbacks in kernel/rcu/tree_plugin.h. I still like the idea of PA-RISC's compiler implementing ACCESS_ONCE() as needed to make things work on that architecture. Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 13:24 ` Paul E. McKenney @ 2014-06-02 15:57 ` Mikulas Patocka 2014-06-02 16:39 ` Paul E. McKenney 0 siblings, 1 reply; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 15:57 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, John David Anglin, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, 2 Jun 2014, Paul E. McKenney wrote: > On Mon, Jun 02, 2014 at 05:19:39AM -0400, Mikulas Patocka wrote: > > > > > > On Sun, 1 Jun 2014, Peter Zijlstra wrote: > > > > > On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > > > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > > > > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > > > > >>at > > > > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > > > > >>spinlock, > > > > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > > > > > >And this is really the first place in the kernel that breaks like this? > > > > >I've been using xchg() and cmpxchg() without such consideration for > > > > >quite a while. > > > > > > > > I believe Mikulas is correct. Even in a controlled situation where a > > > > cmpxchg operation > > > > is used to implement pthread_spin_lock() in userspace, we found recently > > > > that the lock > > > > must be released with a cmpxchg operation and not a simple write on SMP > > > > systems. > > > > There is a race in the cache operations or instruction ordering that's not > > > > present with > > > > the ldcw instruction. > > > > > > Oh, I'm not arguing that. He's quite right that its broken, but this > > > form of atomic ops is also quite insane and unusual. Most sane machines > > > don't have this problem. > > > > > > My main concern is how are we going to avoid breaking parisc (and I > > > think sparc32, which is similarly retarded) in the future; we should > > > invest in machinery to find and detect these things. > > > > Grep the kernel for "\<xchg\>" and "\<cmpxchg\>" and replace them with > > atomic types and atomic access functions. > > Not so good for pointers, though. Defeats type-checking, for one thing. > An example of this is use of xchg() for atomically enqueuing RCU callbacks > in kernel/rcu/tree_plugin.h. > > I still like the idea of PA-RISC's compiler implementing ACCESS_ONCE() > as needed to make things work on that architecture. > > Thanx, Paul We can perform some preprocessor tricks to check the pointer type. See my next patch that adds type checking - you declare the variable with atomic_pointer(struct optimistic_spin_queue *) next; and the pointer type is checked on all atomic operations involving this variable. The problem with ACCESS_ONCE is that people omit it. There's plenty of places in the kernel where ACCESS_ONCE should be used and isn't (i_size_read, i_size_write, rt_mutex_is_locked...). Nothing really forces people to write the code correctly and use it. atomic_pointer (and other atomic types) have the advantage that they force people to use the atomic functions to access them. If you read or write to the variable directly, it won't compile. I think the best solution is to wrap the critical pointers with atomic_pointer(pointer_type *) and let the compiler report errors on all places where it is used unsafely. Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 15:57 ` Mikulas Patocka @ 2014-06-02 16:39 ` Paul E. McKenney 0 siblings, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 16:39 UTC (permalink / raw) To: Mikulas Patocka Cc: Peter Zijlstra, John David Anglin, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 11:57:14AM -0400, Mikulas Patocka wrote: > > > On Mon, 2 Jun 2014, Paul E. McKenney wrote: > > > On Mon, Jun 02, 2014 at 05:19:39AM -0400, Mikulas Patocka wrote: > > > > > > > > > On Sun, 1 Jun 2014, Peter Zijlstra wrote: > > > > > > > On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > > > > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > > > > > > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > > > > > >>at > > > > > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > > > > > >>spinlock, > > > > > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > > > > > > > >And this is really the first place in the kernel that breaks like this? > > > > > >I've been using xchg() and cmpxchg() without such consideration for > > > > > >quite a while. > > > > > > > > > > I believe Mikulas is correct. Even in a controlled situation where a > > > > > cmpxchg operation > > > > > is used to implement pthread_spin_lock() in userspace, we found recently > > > > > that the lock > > > > > must be released with a cmpxchg operation and not a simple write on SMP > > > > > systems. > > > > > There is a race in the cache operations or instruction ordering that's not > > > > > present with > > > > > the ldcw instruction. > > > > > > > > Oh, I'm not arguing that. He's quite right that its broken, but this > > > > form of atomic ops is also quite insane and unusual. Most sane machines > > > > don't have this problem. > > > > > > > > My main concern is how are we going to avoid breaking parisc (and I > > > > think sparc32, which is similarly retarded) in the future; we should > > > > invest in machinery to find and detect these things. > > > > > > Grep the kernel for "\<xchg\>" and "\<cmpxchg\>" and replace them with > > > atomic types and atomic access functions. > > > > Not so good for pointers, though. Defeats type-checking, for one thing. > > An example of this is use of xchg() for atomically enqueuing RCU callbacks > > in kernel/rcu/tree_plugin.h. > > > > I still like the idea of PA-RISC's compiler implementing ACCESS_ONCE() > > as needed to make things work on that architecture. > > > > Thanx, Paul > > We can perform some preprocessor tricks to check the pointer type. See my > next patch that adds type checking - you declare the variable with > > atomic_pointer(struct optimistic_spin_queue *) next; > > and the pointer type is checked on all atomic operations involving this > variable. The special handling of ACCESS_ONCE() on architectures needing it is way better than this sort of modification, from what I can see. > The problem with ACCESS_ONCE is that people omit it. There's plenty of > places in the kernel where ACCESS_ONCE should be used and isn't > (i_size_read, i_size_write, rt_mutex_is_locked...). Nothing really forces > people to write the code correctly and use it. Well, that would be another thing to add to the compiler modification, have it check for a variable passed to xchg() or cmpxchg() and assigned without the benefit of ACCESS_ONCE(). Of course, there will be false positives, such as non-atomic assignments during initialization and cleanup that cannot race with xchg() or cmpxchg(). Also cases where all the xchg() and cmpxchg() are done under a lock, so that normal assignments under that lock are OK. Alternatively, perhaps a coccinelle script or change to sparse, smatch, or whatever could help here. > atomic_pointer (and other atomic types) have the advantage that they force > people to use the atomic functions to access them. If you read or write to > the variable directly, it won't compile. Including the safe uses of normal assignment called out above? > I think the best solution is to wrap the critical pointers with > atomic_pointer(pointer_type *) and let the compiler report errors on all > places where it is used unsafely. I understand that you like this approach, but I am not at all convinced. Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 21:30 ` Peter Zijlstra 2014-06-01 21:46 ` Paul E. McKenney 2014-06-02 9:19 ` Mikulas Patocka @ 2014-06-02 19:56 ` James Bottomley 2014-06-03 7:56 ` Peter Zijlstra 2 siblings, 1 reply; 61+ messages in thread From: James Bottomley @ 2014-06-02 19:56 UTC (permalink / raw) To: Peter Zijlstra Cc: John David Anglin, Mikulas Patocka, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, 2014-06-01 at 23:30 +0200, Peter Zijlstra wrote: > On Sun, Jun 01, 2014 at 04:46:26PM -0400, John David Anglin wrote: > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > >>If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg > > >>at > > >>the same time, you break it. ACCESS_ONCE doesn't take the hashed > > >>spinlock, > > >>so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > >And this is really the first place in the kernel that breaks like this? > > >I've been using xchg() and cmpxchg() without such consideration for > > >quite a while. > > > > I believe Mikulas is correct. Even in a controlled situation where a > > cmpxchg operation > > is used to implement pthread_spin_lock() in userspace, we found recently > > that the lock > > must be released with a cmpxchg operation and not a simple write on SMP > > systems. > > There is a race in the cache operations or instruction ordering that's not > > present with > > the ldcw instruction. > > Oh, I'm not arguing that. He's quite right that its broken, but this > form of atomic ops is also quite insane and unusual. Most sane machines > don't have this problem. > > My main concern is how are we going to avoid breaking parisc (and I > think sparc32, which is similarly retarded) in the future; we should > invest in machinery to find and detect these things. Architecturally, there is a way we could emulate the atomic exchange instructions. We could have a special section of memory that always triggers a page trap. In the Q state dtlb trap handlers we could recognise the "atomic" section of memory and wrap the attempted modification in a semaphore. This would add a bit of overhead, but not a huge amount if we do it in the trap handlers like the TMPALIAS flushes. This involves a lot of work for us because we have to decode the instructions in software, recognise the operations and manually apply the hashed semaphores around them. If we did it like this, all we'd need by way of mainline support is that variables treated as atomically exchangeable should be in a separate section (because it's a page fault handler effectively, we need them all separated from "normal" code). This would probably require some type of variable marker and if we ever saw a xchg or cmpxchg on a variable without the marker, we could break the build. The way we'd implement is the memory region would be read and write protected, so all loads and stores trap to the dtlb absent handlers. For a ldX instruction, if it were not followed by a stX to the same location, we'd simply give it the value. For stX followed by ldX for xchg, we'd take the lock, do the exchange and drop the lock and for stX not preceded by ldX, we'd take the lock, do the store and drop the lock. To avoid compromising the protected region, we'd actually back it by a different area of kernel memory where we make the real modifications, rather than trying to muck with temporarily inserting a TLB entry. On return we'd have to nullify the instructions to avoid re-trapping. Effectively this has us emulating all load and store operations with a shadow memory region ... if you know the number of possible address modes for PARISC, you'll realise that's a non-trivial amount of code. Plus we'd either have to ensure the shadow region had a permanent TLB entry or do a full fault and exit the TLB handler (we can't take a nested TLB fault within the TLB fault handler). Is it worth it ... definitely not if we can just prevent mainline from using xchg on our architecture. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 19:56 ` James Bottomley @ 2014-06-03 7:56 ` Peter Zijlstra 2014-06-04 12:53 ` Mikulas Patocka 0 siblings, 1 reply; 61+ messages in thread From: Peter Zijlstra @ 2014-06-03 7:56 UTC (permalink / raw) To: James Bottomley Cc: John David Anglin, Mikulas Patocka, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low [-- Attachment #1: Type: text/plain, Size: 1229 bytes --] On Mon, Jun 02, 2014 at 12:56:40PM -0700, James Bottomley wrote: > Architecturally, there is a way we could emulate the atomic exchange > instructions. We could have a special section of memory that always > triggers a page trap. In the Q state dtlb trap handlers we could > recognise the "atomic" section of memory and wrap the attempted > modification in a semaphore. This would add a bit of overhead, but not > a huge amount if we do it in the trap handlers like the TMPALIAS > flushes. This involves a lot of work for us because we have to decode > the instructions in software, recognise the operations and manually > apply the hashed semaphores around them. If we did it like this, all > we'd need by way of mainline support is that variables treated as > atomically exchangeable should be in a separate section (because it's a > page fault handler effectively, we need them all separated from "normal" > code). This would probably require some type of variable marker and if > we ever saw a xchg or cmpxchg on a variable without the marker, we could > break the build. Cute, but I don't think that's entirely feasible given how these things can be embedded in other structures (some dynamically allocated etc..). [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-03 7:56 ` Peter Zijlstra @ 2014-06-04 12:53 ` Mikulas Patocka 0 siblings, 0 replies; 61+ messages in thread From: Mikulas Patocka @ 2014-06-04 12:53 UTC (permalink / raw) To: Peter Zijlstra Cc: James Bottomley, John David Anglin, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Tue, 3 Jun 2014, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 12:56:40PM -0700, James Bottomley wrote: > > Architecturally, there is a way we could emulate the atomic exchange > > instructions. We could have a special section of memory that always > > triggers a page trap. In the Q state dtlb trap handlers we could > > recognise the "atomic" section of memory and wrap the attempted > > modification in a semaphore. This would add a bit of overhead, but not > > a huge amount if we do it in the trap handlers like the TMPALIAS > > flushes. This involves a lot of work for us because we have to decode > > the instructions in software, recognise the operations and manually > > apply the hashed semaphores around them. If we did it like this, all > > we'd need by way of mainline support is that variables treated as > > atomically exchangeable should be in a separate section (because it's a > > page fault handler effectively, we need them all separated from "normal" > > code). This would probably require some type of variable marker and if > > we ever saw a xchg or cmpxchg on a variable without the marker, we could > > break the build. > > Cute, but I don't think that's entirely feasible given how these things > can be embedded in other structures (some dynamically allocated etc..). We could deliberately misalign all the atomic variables - then, we would take the alignment trap (that is already written) and take the atomic spinlock in it. I've got another idea - we could stop the other CPUs while xchg or cmpxchg is being executed. But there is a problem if the other CPU has interrupts disabled. Could we mask interrupts on PA-RISC in such a way that they are all disabled except one IPI that stops the CPU temporarily? Maybe do not mask interrupts with PSW I-bit and mask them with EIEM instead (leaving the one interrupt for cmpxchg IPI enabled)? Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 20:46 ` John David Anglin 2014-06-01 21:30 ` Peter Zijlstra @ 2014-06-02 13:58 ` Mikulas Patocka 2014-06-02 14:02 ` Mikulas Patocka 1 sibling, 1 reply; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 13:58 UTC (permalink / raw) To: John David Anglin Cc: Peter Zijlstra, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, 1 Jun 2014, John David Anglin wrote: > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > > > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > > > so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > And this is really the first place in the kernel that breaks like this? > > I've been using xchg() and cmpxchg() without such consideration for > > quite a while. > > I believe Mikulas is correct. Even in a controlled situation where a > cmpxchg operation is used to implement pthread_spin_lock() in userspace, > we found recently that the lock must be released with a cmpxchg > operation and not a simple write on SMP systems. There is a race in the > cache operations or instruction ordering that's not present with the > ldcw instruction. > > Dave > -- > John David Anglin dave.anglin@bell.net That is strange. Spinlock with cmpxchg on lock and a single write on unlock should work, assuming that cmpxchg doesn't write to the target address when it detects mismatch (the cmpxchg in the kernel syscall page doesn't do it, it nullifies the write instruction on mismatch). Do you have some code that reproduces this misbehavior? We really need to find out why does it behave this way: - is PA-RISC really out of order? (we used to believe that it is in-order and we have empty barrier instructions in the kernel). Does adding the "SYNC" instruction before the write in pthread_spin_unlock fix it? - does the processor performs nullified writes unconditionally? Does moving the write in the cmpxchg implementation from the nullified slot to is own branch fix it? - does adding a dummy "ldcw" instruction to an unrelated address fix it? Is it that "ldcw" has some magic barrier properties? I think we need to perform these tests and maybe some more to find out what really happened there... BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and pthread_spin_unlock uses a single write (just like the kernel spinlock implementation). In Debian-ports libc 2.18, both pthread_spin_lock and pthread_spin_unlock call the kernel syscall page. What was the reason for switching to a less efficient implementation? Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 13:58 ` Mikulas Patocka @ 2014-06-02 14:02 ` Mikulas Patocka 2014-06-02 15:39 ` John David Anglin 0 siblings, 1 reply; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 14:02 UTC (permalink / raw) To: John David Anglin Cc: Peter Zijlstra, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, 2 Jun 2014, Mikulas Patocka wrote: > > > On Sun, 1 Jun 2014, John David Anglin wrote: > > > On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: > > > > > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > > > > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > > > > so, in this case, cmpxchg or xchg isn't really atomic at all. > > > > > > And this is really the first place in the kernel that breaks like this? > > > I've been using xchg() and cmpxchg() without such consideration for > > > quite a while. > > > > I believe Mikulas is correct. Even in a controlled situation where a > > cmpxchg operation is used to implement pthread_spin_lock() in userspace, > > we found recently that the lock must be released with a cmpxchg > > operation and not a simple write on SMP systems. There is a race in the > > cache operations or instruction ordering that's not present with the > > ldcw instruction. > > > > Dave > > -- > > John David Anglin dave.anglin@bell.net > > That is strange. > > Spinlock with cmpxchg on lock and a single write on unlock should work, > assuming that cmpxchg doesn't write to the target address when it detects > mismatch (the cmpxchg in the kernel syscall page doesn't do it, it > nullifies the write instruction on mismatch). > > Do you have some code that reproduces this misbehavior? > > We really need to find out why does it behave this way: > - is PA-RISC really out of order? (we used to believe that it is in-order > and we have empty barrier instructions in the kernel). Does adding the > "SYNC" instruction before the write in pthread_spin_unlock fix it? > - does the processor performs nullified writes unconditionally? Does > moving the write in the cmpxchg implementation from the nullified slot > to is own branch fix it? > - does adding a dummy "ldcw" instruction to an unrelated address fix it? > Is it that "ldcw" has some magic barrier properties? - and there is "stw,o" instruction that does ordered store according to the specification, so we should test it too... > I think we need to perform these tests and maybe some more to find out > what really happened there... > > BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and > pthread_spin_unlock uses a single write (just like the kernel spinlock > implementation). In Debian-ports libc 2.18, both pthread_spin_lock and > pthread_spin_unlock call the kernel syscall page. What was the reason for > switching to a less efficient implementation? > > Mikulas > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 14:02 ` Mikulas Patocka @ 2014-06-02 15:39 ` John David Anglin 0 siblings, 0 replies; 61+ messages in thread From: John David Anglin @ 2014-06-02 15:39 UTC (permalink / raw) To: Mikulas Patocka Cc: Peter Zijlstra, Linus Torvalds, jejb, deller, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On 6/2/2014 10:02 AM, Mikulas Patocka wrote: > > On Mon, 2 Jun 2014, Mikulas Patocka wrote: > >> >> On Sun, 1 Jun 2014, John David Anglin wrote: >> >>> On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote: >>> >>>>> If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at >>>>> the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, >>>>> so, in this case, cmpxchg or xchg isn't really atomic at all. >>>> And this is really the first place in the kernel that breaks like this? >>>> I've been using xchg() and cmpxchg() without such consideration for >>>> quite a while. >>> I believe Mikulas is correct. Even in a controlled situation where a >>> cmpxchg operation is used to implement pthread_spin_lock() in userspace, >>> we found recently that the lock must be released with a cmpxchg >>> operation and not a simple write on SMP systems. There is a race in the >>> cache operations or instruction ordering that's not present with the >>> ldcw instruction. >>> >>> Dave >>> -- >>> John David Anglin dave.anglin@bell.net >> That is strange. >> >> Spinlock with cmpxchg on lock and a single write on unlock should work, >> assuming that cmpxchg doesn't write to the target address when it detects >> mismatch (the cmpxchg in the kernel syscall page doesn't do it, it >> nullifies the write instruction on mismatch). >> >> Do you have some code that reproduces this misbehavior? There is a pthread_spin_lock test in the kyotocabinet package that reproduces this misbehavior. Essentially, it creates four threads which loop doing pthread_spin_lock(), sched_yield() and then pthread_spin_unlock(). On SMP systems, the test hangs with the pthread_spin_lock locked and no thread holding lock (i.e., unlock failed). The pthread support uses the cmpxchg code in arch/parisc/kernel/syscall.S. This uses "hashed" locks, etc, in a manner similar to the kernel code. >> >> We really need to find out why does it behave this way: >> - is PA-RISC really out of order? (we used to believe that it is in-order >> and we have empty barrier instructions in the kernel). Does adding the >> "SYNC" instruction before the write in pthread_spin_unlock fix it? I tried "SYNC" instruction before write and after the cmpxchg operation both with. In the cmpxchg operation, I also tried it with cache flush. I was trying to simulated ldcw behavior. >> - does the processor performs nullified writes unconditionally? Does >> moving the write in the cmpxchg implementation from the nullified slot >> to is own branch fix it? I don't see how the processor can perform nullified writes unconditionally although that might explain the observed symptom. Didn't try moving the cmpxchg write. >> - does adding a dummy "ldcw" instruction to an unrelated address fix it? >> Is it that "ldcw" has some magic barrier properties? I had wondered about that. One can't use %r0 as the instruction target as the architecture manual says that it may then be implemented as a normal load. "ldcw" definitely has some magic cache and barrier properties. A normal store definitely works with it to reset the semaphore. > - and there is "stw,o" instruction that does ordered store according to > the specification, so we should test it too... This doesn't help. Currently, the Debian eglibc has a pthread_spin_unlock.diff patch that resolves the kyotocabinet bug. See: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=725508 > >> I think we need to perform these tests and maybe some more to find out >> what really happened there... >> >> BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and >> pthread_spin_unlock uses a single write (just like the kernel spinlock >> implementation). In Debian-ports libc 2.18, both pthread_spin_lock and >> pthread_spin_unlock call the kernel syscall page. What was the reason for >> switching to a less efficient implementation? >> >> Mikulas >> > Dave -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 19:20 ` Peter Zijlstra 2014-06-01 20:46 ` John David Anglin @ 2014-06-02 10:34 ` Mikulas Patocka 1 sibling, 0 replies; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 10:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, paulmck, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Sun, 1 Jun 2014, Peter Zijlstra wrote: > On Sun, Jun 01, 2014 at 01:53:11PM -0400, Mikulas Patocka wrote: > > PA-RISC doesn't have xchg or cmpxchg atomic instructions like other > > processors. It only has ldcw and ldcd instructions that load a word (or > > doubleword) from memory and atomically store zero at the same location. > > These instructions can only be used to implement spinlocks, direct > > implementation of other atomic operations is impossible. > > > > Consequently, Linux xchg and cmpxchg functions are implemented in such a > > way that they hash the address, use the hash to index a spinlock, take the > > spinlock, perform the xchg or cmpxchg operation non-atomically and drop > > the spinlock. > > > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > > so, in this case, cmpxchg or xchg isn't really atomic at all. > > And this is really the first place in the kernel that breaks like this? > I've been using xchg() and cmpxchg() without such consideration for > quite a while. It happens on a common mutex operation and it took an hour of stress-testing to trigger it. The other cases may be buggy too, but no one has written a stress test specifically tailored for them. > Doesn't sparc32 have similarly broken atomic ops? Yes. tile32, arc and metag seem to be broken by this too. hexagon also has non-standard atomic_set, so it may be broken too. > Ideally, if we really want to preserve such broken-ness, we'd add some > debugging infrastructure to detect such nonsense. > > > This patch fixes the bug by introducing a new type atomic_pointer_t > > (backed by atomic_long_t) and replacing the offending pointer with it. > > atomic_long_set takes the hashed spinlock, so it avoids the race > > condition. > > So I hate that twice, once since xchg() and cmpxchg() do not share the > atomic_ prefix, its inappropriate and misleading here, and secondly, > non of this is specific to pointers, xchg() and cmpxchg() take any > (naturally aligned) 'native' size type. I think we don't need xchg() and cmpxchg() at all, because we have atomic types. Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-01 17:53 [PATCH] fix a race condition in cancelable mcs spinlocks Mikulas Patocka 2014-06-01 19:20 ` Peter Zijlstra @ 2014-06-02 14:14 ` Waiman Long 2014-06-02 15:27 ` Jason Low 2014-06-02 16:00 ` [PATCH v2] introduce atomic_pointer to " Mikulas Patocka 2 siblings, 1 reply; 61+ messages in thread From: Waiman Long @ 2014-06-02 14:14 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, paulmck, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On 06/01/2014 01:53 PM, Mikulas Patocka wrote: > The code in kernel/locking/mcs_spinlock.c is broken. The osq_lock and osq_unlock functions aren't the only ones that need to be changed, the mcs_spin_lock and mcs_spin_unlock have exactly the same problem. There aren't certainly problems in other places as well. > PA-RISC doesn't have xchg or cmpxchg atomic instructions like other > processors. It only has ldcw and ldcd instructions that load a word (or > doubleword) from memory and atomically store zero at the same location. > These instructions can only be used to implement spinlocks, direct > implementation of other atomic operations is impossible. > > Consequently, Linux xchg and cmpxchg functions are implemented in such a > way that they hash the address, use the hash to index a spinlock, take the > spinlock, perform the xchg or cmpxchg operation non-atomically and drop > the spinlock. > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > so, in this case, cmpxchg or xchg isn't really atomic at all. > > This patch fixes the bug by introducing a new type atomic_pointer_t > (backed by atomic_long_t) and replacing the offending pointer with it. > atomic_long_set takes the hashed spinlock, so it avoids the race > condition. I believe the mixing of cmpxchg/xchg and ACCESS_ONCE() is fairly common in the kernel, it will be an additional burden on the kernel developers to make sure that this kind of breakage won't happen. We also need clear documentation somewhere to document this kind of architecture specific behavior, maybe in the memory-barrier.txt. > Index: linux-3.15-rc7/kernel/locking/mcs_spinlock.h > =================================================================== > --- linux-3.15-rc7.orig/kernel/locking/mcs_spinlock.h 2014-05-31 19:01:01.000000000 +0200 > +++ linux-3.15-rc7/kernel/locking/mcs_spinlock.h 2014-06-01 14:17:49.000000000 +0200 > @@ -13,6 +13,7 @@ > #define __LINUX_MCS_SPINLOCK_H > > #include<asm/mcs_spinlock.h> > +#include<linux/atomic.h> > > struct mcs_spinlock { > struct mcs_spinlock *next; > @@ -119,7 +120,8 @@ void mcs_spin_unlock(struct mcs_spinlock > */ > > struct optimistic_spin_queue { > - struct optimistic_spin_queue *next, *prev; > + atomic_pointer_t next; > + struct optimistic_spin_queue *prev; > int locked; /* 1 if lock acquired */ > }; Is there a way to do it without changing the pointer type? It will make the code harder to read and understand. -Longman ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] fix a race condition in cancelable mcs spinlocks 2014-06-02 14:14 ` Waiman Long @ 2014-06-02 15:27 ` Jason Low 0 siblings, 0 replies; 61+ messages in thread From: Jason Low @ 2014-06-02 15:27 UTC (permalink / raw) To: Waiman Long Cc: Mikulas Patocka, Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, paulmck, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton On Mon, 2014-06-02 at 10:14 -0400, Waiman Long wrote: > On 06/01/2014 01:53 PM, Mikulas Patocka wrote: > > struct optimistic_spin_queue { > > - struct optimistic_spin_queue *next, *prev; > > + atomic_pointer_t next; > > + struct optimistic_spin_queue *prev; > > int locked; /* 1 if lock acquired */ > > }; > > Is there a way to do it without changing the pointer type? It will make > the code harder to read and understand. I agree that it would be nice if there is a way to fix this without changing the pointer type of "next". The change of the type to atomic_pointer_t might make it less obvious what "next" is for. This is then compounded with "prev" being kept as a pointer to optimistic_spin_queue, which can further make it appear as if "next" may potentially point to something different. ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-01 17:53 [PATCH] fix a race condition in cancelable mcs spinlocks Mikulas Patocka 2014-06-01 19:20 ` Peter Zijlstra 2014-06-02 14:14 ` Waiman Long @ 2014-06-02 16:00 ` Mikulas Patocka 2014-06-02 16:25 ` Peter Zijlstra 2014-06-02 16:50 ` Jason Low 2 siblings, 2 replies; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 16:00 UTC (permalink / raw) To: Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck Cc: chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low The cancelable MCS spinlocks introduced in fb0527bd5ea99bfeb2dd91e3c1433ecf745d6b99 break the kernel on PA-RISC. How to reproduce: * Use a machine with two dual-core PA-8900 processors. * Run the LVM testsuite and compile the kernel in an endless loop at the same time. * Wait for an hour or two and the kernel locks up. You see some process locked up in osd_lock and osq_unlock: INFO: rcu_sched self-detected stall on CPU { 2} (t=18000 jiffies g=247335 c=247334 q=101) CPU: 2 PID: 21006 Comm: lvm Tainted: G O 3.15.0-rc7 #9 Backtrace: [<000000004013e8a4>] show_stack+0x14/0x20 [<00000000403016f0>] dump_stack+0x88/0x100 [<00000000401b8738>] rcu_check_callbacks+0x4a8/0x900 [<00000000401714c4>] update_process_times+0x64/0xc0 [<000000004013fa24>] timer_interrupt+0x19c/0x200 [<00000000401ad8d8>] handle_irq_event_percpu+0xa8/0x238 [<00000000401b2454>] handle_percpu_irq+0x9c/0xd0 [<00000000401acc40>] generic_handle_irq+0x40/0x50 [<00000000401408cc>] do_cpu_irq_mask+0x1ac/0x298 [<000000004012c074>] intr_return+0x0/0xc [<00000000401a609c>] osq_lock+0xc4/0x178 [<0000000040138d24>] __mutex_lock_slowpath+0x1cc/0x290 [<0000000040138e78>] mutex_lock+0x90/0x98 [<00000000402a5614>] kernfs_activate+0x6c/0x1a0 [<00000000402a59e0>] kernfs_add_one+0x140/0x190 [<00000000402a75ec>] __kernfs_create_file+0xa4/0xf8 INFO: rcu_sched self-detected stall on CPU { 3} (t=18473 jiffies g=247335 c=247334 q=101) CPU: 3 PID: 21051 Comm: udevd Tainted: G O 3.15.0-rc7 #9 Backtrace: [<000000004013e8a4>] show_stack+0x14/0x20 [<00000000403016f0>] dump_stack+0x88/0x100 [<00000000401b8738>] rcu_check_callbacks+0x4a8/0x900 [<00000000401714c4>] update_process_times+0x64/0xc0 [<000000004013fa24>] timer_interrupt+0x19c/0x200 [<00000000401ad8d8>] handle_irq_event_percpu+0xa8/0x238 [<00000000401b2454>] handle_percpu_irq+0x9c/0xd0 [<00000000401acc40>] generic_handle_irq+0x40/0x50 [<00000000401408cc>] do_cpu_irq_mask+0x1ac/0x298 [<000000004012c074>] intr_return+0x0/0xc [<00000000401a6220>] osq_unlock+0xd0/0xf8 [<0000000040138dcc>] __mutex_lock_slowpath+0x274/0x290 [<0000000040138e78>] mutex_lock+0x90/0x98 [<00000000402a3a90>] kernfs_dop_revalidate+0x48/0x108 [<0000000040233310>] lookup_fast+0x320/0x348 [<0000000040234600>] link_path_walk+0x190/0x9d8 The code in kernel/locking/mcs_spinlock.c is broken. PA-RISC doesn't have xchg or cmpxchg atomic instructions like other processors. It only has ldcw and ldcd instructions that load a word (or doubleword) from memory and atomically store zero at the same location. These instructions can only be used to implement spinlocks, direct implementation of other atomic operations is impossible. Consequently, Linux xchg and cmpxchg functions are implemented in such a way that they hash the address, use the hash to index a spinlock, take the spinlock, perform the xchg or cmpxchg operation non-atomically and drop the spinlock. If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, so, in this case, cmpxchg or xchg isn't really atomic at all. This patch fixes the bug by introducing a new type atomic_pointer and replacing the offending pointer with it. atomic_pointer_set (calling atomic_long_set) takes the hashed spinlock, so it avoids the race condition. We perform some gcc-specific compiler tricks to warn on pointer type mismatch. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- include/asm-generic/atomic-long.h | 27 +++++++++++++++++++++++++++ kernel/locking/mcs_spinlock.c | 16 ++++++++-------- kernel/locking/mcs_spinlock.h | 4 +++- 3 files changed, 38 insertions(+), 9 deletions(-) Index: linux-3.15-rc8/kernel/locking/mcs_spinlock.c =================================================================== --- linux-3.15-rc8.orig/kernel/locking/mcs_spinlock.c 2014-06-02 17:11:16.000000000 +0200 +++ linux-3.15-rc8/kernel/locking/mcs_spinlock.c 2014-06-02 17:11:50.000000000 +0200 @@ -47,8 +47,8 @@ osq_wait_next(struct optimistic_spin_que * wait for either @lock to point to us, through its Step-B, or * wait for a new @node->next from its Step-C. */ - if (node->next) { - next = xchg(&node->next, NULL); + if (atomic_pointer_read(&node->next)) { + next = atomic_pointer_xchg(&node->next, NULL); if (next) break; } @@ -65,13 +65,13 @@ bool osq_lock(struct optimistic_spin_que struct optimistic_spin_queue *prev, *next; node->locked = 0; - node->next = NULL; + atomic_pointer_set(&node->next, NULL); node->prev = prev = xchg(lock, node); if (likely(prev == NULL)) return true; - ACCESS_ONCE(prev->next) = node; + atomic_pointer_set(&prev->next, node); /* * Normally @prev is untouchable after the above store; because at that @@ -103,8 +103,8 @@ unqueue: */ for (;;) { - if (prev->next == node && - cmpxchg(&prev->next, node, NULL) == node) + if (atomic_pointer_read(&prev->next) == node && + atomic_pointer_cmpxchg(&prev->next, node, NULL) == node) break; /* @@ -144,7 +144,7 @@ unqueue: */ ACCESS_ONCE(next->prev) = prev; - ACCESS_ONCE(prev->next) = next; + atomic_pointer_set(&prev->next, next); return false; } @@ -163,7 +163,7 @@ void osq_unlock(struct optimistic_spin_q /* * Second most likely case. */ - next = xchg(&node->next, NULL); + next = atomic_pointer_xchg(&node->next, NULL); if (next) { ACCESS_ONCE(next->locked) = 1; return; Index: linux-3.15-rc8/kernel/locking/mcs_spinlock.h =================================================================== --- linux-3.15-rc8.orig/kernel/locking/mcs_spinlock.h 2014-06-02 17:11:16.000000000 +0200 +++ linux-3.15-rc8/kernel/locking/mcs_spinlock.h 2014-06-02 17:11:50.000000000 +0200 @@ -13,6 +13,7 @@ #define __LINUX_MCS_SPINLOCK_H #include <asm/mcs_spinlock.h> +#include <linux/atomic.h> struct mcs_spinlock { struct mcs_spinlock *next; @@ -119,7 +120,8 @@ void mcs_spin_unlock(struct mcs_spinlock */ struct optimistic_spin_queue { - struct optimistic_spin_queue *next, *prev; + atomic_pointer(struct optimistic_spin_queue *) next; + struct optimistic_spin_queue *prev; int locked; /* 1 if lock acquired */ }; Index: linux-3.15-rc8/include/asm-generic/atomic-long.h =================================================================== --- linux-3.15-rc8.orig/include/asm-generic/atomic-long.h 2014-06-02 17:11:17.000000000 +0200 +++ linux-3.15-rc8/include/asm-generic/atomic-long.h 2014-06-02 17:11:50.000000000 +0200 @@ -255,4 +255,31 @@ static inline long atomic_long_add_unles #endif /* BITS_PER_LONG == 64 */ +#define atomic_pointer(type) \ +union { \ + atomic_long_t __a; \ + type __t; \ + char __check_sizeof[sizeof(type) == sizeof(long) ? 1 : -1]; \ +} + +#define ATOMIC_POINTER_INIT(i) { .__t = (i) } + +#define atomic_pointer_read(v) ((typeof((v)->__t))atomic_long_read(&(v)->__a)) + +#define atomic_pointer_set(v, i) ({ \ + typeof((v)->__t) __i = (i); \ + atomic_long_set(&(v)->__a, (long)(__i)); \ +}) + +#define atomic_pointer_xchg(v, i) ({ \ + typeof((v)->__t) __i = (i); \ + (typeof((v)->__t))atomic_long_xchg(&(v)->__a, (long)(__i)); \ +}) + +#define atomic_pointer_cmpxchg(v, old, new) ({ \ + typeof((v)->__t) __old = (old); \ + typeof((v)->__t) __new = (new); \ + (typeof((v)->__t))atomic_long_cmpxchg(&(v)->__a, (long)(__old), (long)(__new));\ +}) + #endif /* _ASM_GENERIC_ATOMIC_LONG_H */ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:00 ` [PATCH v2] introduce atomic_pointer to " Mikulas Patocka @ 2014-06-02 16:25 ` Peter Zijlstra 2014-06-02 16:30 ` Peter Zijlstra ` (2 more replies) 2014-06-02 16:50 ` Jason Low 1 sibling, 3 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-02 16:25 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 12:00:45PM -0400, Mikulas Patocka wrote: > struct optimistic_spin_queue { > - struct optimistic_spin_queue *next, *prev; > + atomic_pointer(struct optimistic_spin_queue *) next; > + struct optimistic_spin_queue *prev; > int locked; /* 1 if lock acquired */ > }; > > Index: linux-3.15-rc8/include/asm-generic/atomic-long.h > =================================================================== > --- linux-3.15-rc8.orig/include/asm-generic/atomic-long.h 2014-06-02 17:11:17.000000000 +0200 > +++ linux-3.15-rc8/include/asm-generic/atomic-long.h 2014-06-02 17:11:50.000000000 +0200 > @@ -255,4 +255,31 @@ static inline long atomic_long_add_unles > > #endif /* BITS_PER_LONG == 64 */ > > +#define atomic_pointer(type) \ > +union { \ > + atomic_long_t __a; \ > + type __t; \ > + char __check_sizeof[sizeof(type) == sizeof(long) ? 1 : -1]; \ > +} That's still entirely disgusting, and afaict entirely redundant. You can do that test in the operators below just fine. > +#define ATOMIC_POINTER_INIT(i) { .__t = (i) } > + > +#define atomic_pointer_read(v) ((typeof((v)->__t))atomic_long_read(&(v)->__a)) > + > +#define atomic_pointer_set(v, i) ({ \ > + typeof((v)->__t) __i = (i); \ > + atomic_long_set(&(v)->__a, (long)(__i)); \ > +}) > + > +#define atomic_pointer_xchg(v, i) ({ \ > + typeof((v)->__t) __i = (i); \ > + (typeof((v)->__t))atomic_long_xchg(&(v)->__a, (long)(__i)); \ > +}) > + > +#define atomic_pointer_cmpxchg(v, old, new) ({ \ > + typeof((v)->__t) __old = (old); \ > + typeof((v)->__t) __new = (new); \ > + (typeof((v)->__t))atomic_long_cmpxchg(&(v)->__a, (long)(__old), (long)(__new));\ > +}) And I can't say I'm a particular fan of these ops either, as alternative I'm almost inclined to just exclude parisc from using opt spinning. That said, this patch still doesn't address the far more interesting problem of actually finding these issues for these few weird archs. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:25 ` Peter Zijlstra @ 2014-06-02 16:30 ` Peter Zijlstra 2014-06-02 16:46 ` Paul E. McKenney 2014-06-02 17:33 ` Waiman Long 2014-06-02 16:43 ` Paul E. McKenney 2014-06-02 17:09 ` Linus Torvalds 2 siblings, 2 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-02 16:30 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: > I'm almost inclined to just exclude parisc from using opt spinning. > > That said, this patch still doesn't address the far more interesting > problem of actually finding these issues for these few weird archs. So why do these archs provide xchg() and cmpxchg() at all? Wouldn't it be much simpler if archs that cannot sanely do this, not provide these primitives at all? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:30 ` Peter Zijlstra @ 2014-06-02 16:46 ` Paul E. McKenney 2014-06-02 17:33 ` Waiman Long 1 sibling, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 16:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 06:30:32PM +0200, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: > > I'm almost inclined to just exclude parisc from using opt spinning. > > > > That said, this patch still doesn't address the far more interesting > > problem of actually finding these issues for these few weird archs. > > So why do these archs provide xchg() and cmpxchg() at all? Wouldn't it > be much simpler if archs that cannot sanely do this, not provide these > primitives at all? Such architectures would also need to avoid NO_HZ_FULL_SYSIDLE and RCU_NOCB_CPU, but those are probably entirely reasonable restrictions. Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:30 ` Peter Zijlstra 2014-06-02 16:46 ` Paul E. McKenney @ 2014-06-02 17:33 ` Waiman Long 2014-06-02 20:05 ` Peter Zijlstra 1 sibling, 1 reply; 61+ messages in thread From: Waiman Long @ 2014-06-02 17:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On 06/02/2014 12:30 PM, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: >> I'm almost inclined to just exclude parisc from using opt spinning. >> >> That said, this patch still doesn't address the far more interesting >> problem of actually finding these issues for these few weird archs. > So why do these archs provide xchg() and cmpxchg() at all? Wouldn't it > be much simpler if archs that cannot sanely do this, not provide these > primitives at all? I believe xchg() and cmpxchg() are used in quite a number of places within the generic kernel code. So kernel compilation will fail if those APIs aren't provided by an architecture. -Longman ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:33 ` Waiman Long @ 2014-06-02 20:05 ` Peter Zijlstra 2014-06-02 20:22 ` Linus Torvalds 2014-06-02 20:24 ` James Bottomley 0 siblings, 2 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-02 20:05 UTC (permalink / raw) To: Waiman Long Cc: Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 01:33:34PM -0400, Waiman Long wrote: > On 06/02/2014 12:30 PM, Peter Zijlstra wrote: > >On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: > >>I'm almost inclined to just exclude parisc from using opt spinning. > >> > >>That said, this patch still doesn't address the far more interesting > >>problem of actually finding these issues for these few weird archs. > >So why do these archs provide xchg() and cmpxchg() at all? Wouldn't it > >be much simpler if archs that cannot sanely do this, not provide these > >primitives at all? > > I believe xchg() and cmpxchg() are used in quite a number of places within > the generic kernel code. So kernel compilation will fail if those APIs > aren't provided by an architecture. Yep.. so this is going to be painful for a while. But given their (parisc, sparc32, metag-lock1) constraints, who knows how many of those uses are actually broken. So the question is, do you prefer subtly broken code or hard compile fails? Me, I go for the compile fail. In any case, this all goes towards what hpa said, what are the minimal requirements we have for running Linux. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:05 ` Peter Zijlstra @ 2014-06-02 20:22 ` Linus Torvalds 2014-06-02 21:02 ` Paul E. McKenney 2014-06-03 7:54 ` Peter Zijlstra 2014-06-02 20:24 ` James Bottomley 1 sibling, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2014-06-02 20:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 2, 2014 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > So the question is, do you prefer subtly broken code or hard compile > fails? Me, I go for the compile fail. The thing is, parisc has a perfectly fine "cmpxchg" implementation in practice, and ACCESS_ONCE() and friends work fine too for reading. What the "use a spinlock" approach cannot generally do is: - ACCESS_ONCE() to _write_ things doesn't work well. You really should use "atomic_set()". - you may not necessarily be able to mix partial updates (ie differently sized updates to the same thing) depending on just how the spinlock hashing works but both of those are really rare issues and don't affect normal code. I would not necessarily be opposed to splitting up ACCESS_ONCE() for reading and for writing, and maybe we could do something special for the writing path (which tends to be less ctitical). It's really mixing "ACCESS_ONCE(x)" to _set_ a value, together with atomic ops to update it, that ends up being problematic. Maybe there are other issues I can't think of right now. But basically, parisc _can_ do cmpxchg, it's just that the code needs to be somewhat sanitized. Side note: some of the RCU code uses "ACCESS_ONCE()" for read-modify-write code, which is just f*cking crazy. The semantics are dubious, and it generally makes gcc create bad code too. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:22 ` Linus Torvalds @ 2014-06-02 21:02 ` Paul E. McKenney 2014-06-02 21:12 ` Linus Torvalds 2014-06-03 7:54 ` Peter Zijlstra 1 sibling, 1 reply; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 21:02 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 02, 2014 at 01:22:10PM -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > So the question is, do you prefer subtly broken code or hard compile > > fails? Me, I go for the compile fail. > > The thing is, parisc has a perfectly fine "cmpxchg" implementation in > practice, and ACCESS_ONCE() and friends work fine too for reading. > > What the "use a spinlock" approach cannot generally do is: > > - ACCESS_ONCE() to _write_ things doesn't work well. You really > should use "atomic_set()". > > - you may not necessarily be able to mix partial updates (ie > differently sized updates to the same thing) depending on just how the > spinlock hashing works > > but both of those are really rare issues and don't affect normal code. > > I would not necessarily be opposed to splitting up ACCESS_ONCE() for > reading and for writing, and maybe we could do something special for > the writing path (which tends to be less ctitical). It's really mixing > "ACCESS_ONCE(x)" to _set_ a value, together with atomic ops to update > it, that ends up being problematic. Knowing what I know now about how ACCESS_ONCE() is used, I would have split it for reading and writing to begin with. Where is that time machine when you need it? ;-) > Maybe there are other issues I can't think of right now. But > basically, parisc _can_ do cmpxchg, it's just that the code needs to > be somewhat sanitized. > > Side note: some of the RCU code uses "ACCESS_ONCE()" for > read-modify-write code, which is just f*cking crazy. The semantics are > dubious, and it generally makes gcc create bad code too. A couple of the places are admittedly overkill, for example the pair of: ACCESS_ONCE(rsp->n_force_qs_lh)++; which is just for debug statistics in any case. I could put these back to "rsp->n_force_qs_lh++;" without problems, if desired. (Yeah, I know, I am overly paranoid.) However, these cases need a bit more care: ACCESS_ONCE(rdp->qlen)++; ACCESS_ONCE(rsp->n_barrier_done)++; ACCESS_ONCE(sync_rcu_preempt_exp_count)++; In the ->qlen case, interrupts are disabled and the current CPU is the only one who can write, so the read need not be volatile. In the ->n_barrier_done, modifications are done holding ->barrier_mutex, so again the read need not be volatile. In the sync_rcu_preempt_exp_count case, modifications are done holding sync_rcu_preempt_exp_mutex, so once again, the read need not be volatile. So I could do something like: ACCESS_ONCE(rdp->qlen) = rdp->qlen + 1; But that still makes gcc generate bad code. The reason I was not all that worried about this is that these are not in fastpaths, and the last two are especially not in fastpaths. Suggestions? Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 21:02 ` Paul E. McKenney @ 2014-06-02 21:12 ` Linus Torvalds 2014-06-02 22:08 ` Paul E. McKenney 0 siblings, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2014-06-02 21:12 UTC (permalink / raw) To: Paul McKenney Cc: Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 2, 2014 at 2:02 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > In the ->qlen case, interrupts are disabled and the current CPU is > the only one who can write, so the read need not be volatile. In the > ->n_barrier_done, modifications are done holding ->barrier_mutex, so again > the read need not be volatile. In the sync_rcu_preempt_exp_count case, > modifications are done holding sync_rcu_preempt_exp_mutex, so once again, > the read need not be volatile. So I could do something like: > > ACCESS_ONCE(rdp->qlen) = rdp->qlen + 1; > > But that still makes gcc generate bad code > > The reason I was not all that worried about this is that these are not > in fastpaths, and the last two are especially not in fastpaths. > > Suggestions? So I think it probably *works*, but even so splitting it up to use ACCESS_ONCE() on just the write is probably a better option, if only because it would then make it much easier to change if we do end up splitting reads and writes. Because from a gcc code generation standpoint, using "volatile" will always be horrible, because gcc will never be able to turn it into a read-modify-write cycle. Arguable gcc _should_ be able to do that (it is certainly allowable within the virtual machine definition), but I understand why it doesn't ("volatile? Let's not optimize anything at all, because it's special"). So "ACCESS_ONCE() + R-M-W" operation is actually pretty much guaranteed to be "ACCESS_TWICE()", which may well be ok (performance may not matter, and even when it does most architectures don't actually have r-m-w instructions and when they do they aren't always even faster), but I think it's just horribly horribly bad from a conceptual and readability standpoint because it's so misleading. So I'd actually rather see two explicit ACCESS_ONCE() calls - once to read, once to write. Because that at least describes what is happening, unlike the current situation. Put another way: I can understand why you do it, and I can even agree that it is "correct" from a functionality standpoint. But even despite that all, I really don't like the construct very much.. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 21:12 ` Linus Torvalds @ 2014-06-02 22:08 ` Paul E. McKenney 2014-06-02 22:44 ` Eric Dumazet 2014-06-02 22:55 ` Linus Torvalds 0 siblings, 2 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 22:08 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 02, 2014 at 02:12:30PM -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 2:02 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > > > In the ->qlen case, interrupts are disabled and the current CPU is > > the only one who can write, so the read need not be volatile. In the > > ->n_barrier_done, modifications are done holding ->barrier_mutex, so again > > the read need not be volatile. In the sync_rcu_preempt_exp_count case, > > modifications are done holding sync_rcu_preempt_exp_mutex, so once again, > > the read need not be volatile. So I could do something like: > > > > ACCESS_ONCE(rdp->qlen) = rdp->qlen + 1; > > > > But that still makes gcc generate bad code > > > > The reason I was not all that worried about this is that these are not > > in fastpaths, and the last two are especially not in fastpaths. > > > > Suggestions? > > So I think it probably *works*, but even so splitting it up to use > ACCESS_ONCE() on just the write is probably a better option, if only > because it would then make it much easier to change if we do end up > splitting reads and writes. > > Because from a gcc code generation standpoint, using "volatile" will > always be horrible, because gcc will never be able to turn it into a > read-modify-write cycle. Arguable gcc _should_ be able to do that (it > is certainly allowable within the virtual machine definition), but I > understand why it doesn't ("volatile? Let's not optimize anything at > all, because it's special"). > > So "ACCESS_ONCE() + R-M-W" operation is actually pretty much > guaranteed to be "ACCESS_TWICE()", which may well be ok (performance > may not matter, and even when it does most architectures don't > actually have r-m-w instructions and when they do they aren't always > even faster), but I think it's just horribly horribly bad from a > conceptual and readability standpoint because it's so misleading. > > So I'd actually rather see two explicit ACCESS_ONCE() calls - once to > read, once to write. Because that at least describes what is > happening, unlike the current situation. > > Put another way: I can understand why you do it, and I can even agree > that it is "correct" from a functionality standpoint. But even despite > that all, I really don't like the construct very much.. OK, I have queued the following commit for 3.17. Is this what you had in mind? Thanx, Paul ------------------------------------------------------------------------ rcu: Eliminate read-modify-write ACCESS_ONCE() calls RCU contains code of the following forms: ACCESS_ONCE(x)++; ACCESS_ONCE(x) += y; ACCESS_ONCE(x) -= y; Now these constructs do operate correctly, but they really result in a pair of volatile accesses, one to do the load and another to do the store. This can be confusing, as the casual reader might well assume that (for example) gcc might generate a memory-to-memory add instruction for each of these three cases. In fact, gcc will do no such thing. Also, there is a good chance that the kernel will move to separate load and store variants of ACCESS_ONCE(), and constructs like the above could easily confuse both people and scripts attempting to make that sort of change. Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really only need the store to be volatile, so that the read-modify-write form might be misleading. This commit therefore changes the above forms in RCU so that each instance of ACCESS_ONCE() either does a load or a store, but not both. In a few cases, ACCESS_ONCE() was not critical, for example, for maintaining statisitics. In these cases, ACCESS_ONCE() has been dispensed with entirely Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index c639556f3fa0..c0120279dead 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -295,12 +295,15 @@ EXPORT_SYMBOL_GPL(cleanup_srcu_struct); int __srcu_read_lock(struct srcu_struct *sp) { int idx; + unsigned long *lp; idx = ACCESS_ONCE(sp->completed) & 0x1; preempt_disable(); - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; + lp = this_cpu_ptr(&sp->per_cpu_ref->c[idx]); + ACCESS_ONCE(*lp) = *lp + 1; smp_mb(); /* B */ /* Avoid leaking the critical section. */ - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; + lp = this_cpu_ptr(&sp->per_cpu_ref->seq[idx]); + ACCESS_ONCE(*lp) = *lp + 1; preempt_enable(); return idx; } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index d1c8e4a85b92..f0ed867070cd 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2275,7 +2275,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp) } smp_mb(); /* List handling before counting for rcu_barrier(). */ rdp->qlen_lazy -= count_lazy; - ACCESS_ONCE(rdp->qlen) -= count; + ACCESS_ONCE(rdp->qlen) = rdp->qlen - count; rdp->n_cbs_invoked += count; /* Reinstate batch limit if we have worked down the excess. */ @@ -2420,7 +2420,7 @@ static void force_quiescent_state(struct rcu_state *rsp) if (rnp_old != NULL) raw_spin_unlock(&rnp_old->fqslock); if (ret) { - ACCESS_ONCE(rsp->n_force_qs_lh)++; + rsp->n_force_qs_lh++; return; } rnp_old = rnp; @@ -2432,7 +2432,7 @@ static void force_quiescent_state(struct rcu_state *rsp) smp_mb__after_unlock_lock(); raw_spin_unlock(&rnp_old->fqslock); if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) { - ACCESS_ONCE(rsp->n_force_qs_lh)++; + rsp->n_force_qs_lh++; raw_spin_unlock_irqrestore(&rnp_old->lock, flags); return; /* Someone beat us to it. */ } @@ -2621,7 +2621,7 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu), local_irq_restore(flags); return; } - ACCESS_ONCE(rdp->qlen)++; + ACCESS_ONCE(rdp->qlen) = rdp->qlen + 1; if (lazy) rdp->qlen_lazy++; else @@ -3185,7 +3185,7 @@ static void _rcu_barrier(struct rcu_state *rsp) * ACCESS_ONCE() to prevent the compiler from speculating * the increment to precede the early-exit check. */ - ACCESS_ONCE(rsp->n_barrier_done)++; + ACCESS_ONCE(rsp->n_barrier_done) = rsp->n_barrier_done + 1; WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 1); _rcu_barrier_trace(rsp, "Inc1", -1, rsp->n_barrier_done); smp_mb(); /* Order ->n_barrier_done increment with below mechanism. */ @@ -3235,7 +3235,7 @@ static void _rcu_barrier(struct rcu_state *rsp) /* Increment ->n_barrier_done to prevent duplicate work. */ smp_mb(); /* Keep increment after above mechanism. */ - ACCESS_ONCE(rsp->n_barrier_done)++; + ACCESS_ONCE(rsp->n_barrier_done) = rsp->n_barrier_done + 1; WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 0); _rcu_barrier_trace(rsp, "Inc2", -1, rsp->n_barrier_done); smp_mb(); /* Keep increment before caller's subsequent code. */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index aee1e924b048..7ce734040a5e 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2274,8 +2274,8 @@ static int rcu_nocb_kthread(void *arg) tail = xchg(&rdp->nocb_tail, &rdp->nocb_head); c = atomic_long_xchg(&rdp->nocb_q_count, 0); cl = atomic_long_xchg(&rdp->nocb_q_count_lazy, 0); - ACCESS_ONCE(rdp->nocb_p_count) += c; - ACCESS_ONCE(rdp->nocb_p_count_lazy) += cl; + rdp->nocb_p_count += c; + rdp->nocb_p_count_lazy += cl; rcu_nocb_wait_gp(rdp); /* Each pass through the following loop invokes a callback. */ ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 22:08 ` Paul E. McKenney @ 2014-06-02 22:44 ` Eric Dumazet 2014-06-02 23:17 ` Paul E. McKenney 2014-06-02 22:55 ` Linus Torvalds 1 sibling, 1 reply; 61+ messages in thread From: Eric Dumazet @ 2014-06-02 22:44 UTC (permalink / raw) To: paulmck Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, 2014-06-02 at 15:08 -0700, Paul E. McKenney wrote: > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > index c639556f3fa0..c0120279dead 100644 > --- a/kernel/rcu/srcu.c > +++ b/kernel/rcu/srcu.c > @@ -295,12 +295,15 @@ EXPORT_SYMBOL_GPL(cleanup_srcu_struct); > int __srcu_read_lock(struct srcu_struct *sp) > { > int idx; > + unsigned long *lp; > > idx = ACCESS_ONCE(sp->completed) & 0x1; > preempt_disable(); > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; > + lp = this_cpu_ptr(&sp->per_cpu_ref->c[idx]); > + ACCESS_ONCE(*lp) = *lp + 1; > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; > + lp = this_cpu_ptr(&sp->per_cpu_ref->seq[idx]); > + ACCESS_ONCE(*lp) = *lp + 1; > preempt_enable(); > return idx; > This probably could use the following diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index c639556f3fa0..3a97eb6f9076 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -298,9 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp) idx = ACCESS_ONCE(sp->completed) & 0x1; preempt_disable(); - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; + this_cpu_inc(sp->per_cpu_ref->c[idx]); smp_mb(); /* B */ /* Avoid leaking the critical section. */ - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; + this_cpu_inc(sp->per_cpu_ref->seq[idx]); preempt_enable(); return idx; } ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 22:44 ` Eric Dumazet @ 2014-06-02 23:17 ` Paul E. McKenney 2014-06-02 23:53 ` Eric Dumazet 0 siblings, 1 reply; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 23:17 UTC (permalink / raw) To: Eric Dumazet Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 02, 2014 at 03:44:48PM -0700, Eric Dumazet wrote: > On Mon, 2014-06-02 at 15:08 -0700, Paul E. McKenney wrote: > > > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > > index c639556f3fa0..c0120279dead 100644 > > --- a/kernel/rcu/srcu.c > > +++ b/kernel/rcu/srcu.c > > @@ -295,12 +295,15 @@ EXPORT_SYMBOL_GPL(cleanup_srcu_struct); > > int __srcu_read_lock(struct srcu_struct *sp) > > { > > int idx; > > + unsigned long *lp; > > > > idx = ACCESS_ONCE(sp->completed) & 0x1; > > preempt_disable(); > > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; > > + lp = this_cpu_ptr(&sp->per_cpu_ref->c[idx]); > > + ACCESS_ONCE(*lp) = *lp + 1; > > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; > > + lp = this_cpu_ptr(&sp->per_cpu_ref->seq[idx]); > > + ACCESS_ONCE(*lp) = *lp + 1; > > preempt_enable(); > > return idx; > > > > This probably could use the following > > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > index c639556f3fa0..3a97eb6f9076 100644 > --- a/kernel/rcu/srcu.c > +++ b/kernel/rcu/srcu.c > @@ -298,9 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp) > > idx = ACCESS_ONCE(sp->completed) & 0x1; > preempt_disable(); > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; > + this_cpu_inc(sp->per_cpu_ref->c[idx]); > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; > + this_cpu_inc(sp->per_cpu_ref->seq[idx]); > preempt_enable(); > return idx; > } Good point! But given that I already have preemption disabled and given that __srcu_read_lock() is not to be used by irq handlers, I should be able to use __this_cpu_inc(), correct? Just to avoid unnecessary irq disabling on non-x86 platforms... Seems to pass a quick build, so trying a bit heavier testing. Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 23:17 ` Paul E. McKenney @ 2014-06-02 23:53 ` Eric Dumazet 2014-06-03 0:28 ` Paul E. McKenney 0 siblings, 1 reply; 61+ messages in thread From: Eric Dumazet @ 2014-06-02 23:53 UTC (permalink / raw) To: paulmck Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, 2014-06-02 at 16:17 -0700, Paul E. McKenney wrote: > But given that I already have preemption disabled and given that > __srcu_read_lock() is not to be used by irq handlers, I should be able to > use __this_cpu_inc(), correct? Just to avoid unnecessary irq disabling > on non-x86 platforms... Absolutely, __this_cpu_inc() is OK here. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 23:53 ` Eric Dumazet @ 2014-06-03 0:28 ` Paul E. McKenney 0 siblings, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-03 0:28 UTC (permalink / raw) To: Eric Dumazet Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 02, 2014 at 04:53:44PM -0700, Eric Dumazet wrote: > On Mon, 2014-06-02 at 16:17 -0700, Paul E. McKenney wrote: > > > But given that I already have preemption disabled and given that > > __srcu_read_lock() is not to be used by irq handlers, I should be able to > > use __this_cpu_inc(), correct? Just to avoid unnecessary irq disabling > > on non-x86 platforms... > > Absolutely, __this_cpu_inc() is OK here. Cool, giving it a test... Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 22:08 ` Paul E. McKenney 2014-06-02 22:44 ` Eric Dumazet @ 2014-06-02 22:55 ` Linus Torvalds 2014-06-03 16:48 ` Paul E. McKenney 1 sibling, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2014-06-02 22:55 UTC (permalink / raw) To: Paul McKenney Cc: Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 2, 2014 at 3:08 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > rcu: Eliminate read-modify-write ACCESS_ONCE() calls > > preempt_disable(); > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; > + lp = this_cpu_ptr(&sp->per_cpu_ref->c[idx]); > + ACCESS_ONCE(*lp) = *lp + 1; > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; > + lp = this_cpu_ptr(&sp->per_cpu_ref->seq[idx]); > + ACCESS_ONCE(*lp) = *lp + 1; > preempt_enable(); > return idx; What Eric said. This should just use "this_cpu_inc()" instead. Particularly with the smp_mb() and the preempt_enable(), there's no way that could/should leak, and the ACCESS_ONCE() seems pointless and ugly. And the good news is, gcc _will_ generate good code for that. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 22:55 ` Linus Torvalds @ 2014-06-03 16:48 ` Paul E. McKenney 0 siblings, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-03 16:48 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 02, 2014 at 03:55:57PM -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 3:08 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > > > rcu: Eliminate read-modify-write ACCESS_ONCE() calls > > > > preempt_disable(); > > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; > > + lp = this_cpu_ptr(&sp->per_cpu_ref->c[idx]); > > + ACCESS_ONCE(*lp) = *lp + 1; > > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > > - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; > > + lp = this_cpu_ptr(&sp->per_cpu_ref->seq[idx]); > > + ACCESS_ONCE(*lp) = *lp + 1; > > preempt_enable(); > > return idx; > > What Eric said. This should just use "this_cpu_inc()" instead. > Particularly with the smp_mb() and the preempt_enable(), there's no > way that could/should leak, and the ACCESS_ONCE() seems pointless and > ugly. > > And the good news is, gcc _will_ generate good code for that. And here is the update, which passes light rcutorture testing. Thanx, Paul ------------------------------------------------------------------------ rcu: Eliminate read-modify-write ACCESS_ONCE() calls RCU contains code of the following forms: ACCESS_ONCE(x)++; ACCESS_ONCE(x) += y; ACCESS_ONCE(x) -= y; Now these constructs do operate correctly, but they really result in a pair of volatile accesses, one to do the load and another to do the store. This can be confusing, as the casual reader might well assume that (for example) gcc might generate a memory-to-memory add instruction for each of these three cases. In fact, gcc will do no such thing. Also, there is a good chance that the kernel will move to separate load and store variants of ACCESS_ONCE(), and constructs like the above could easily confuse both people and scripts attempting to make that sort of change. Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really only need the store to be volatile, so that the read-modify-write form might be misleading. This commit therefore changes the above forms in RCU so that each instance of ACCESS_ONCE() either does a load or a store, but not both. In a few cases, ACCESS_ONCE() was not critical, for example, for maintaining statisitics. In these cases, ACCESS_ONCE() has been dispensed with entirely. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index c639556f3fa0..e037f3eb2f7b 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -298,9 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp) idx = ACCESS_ONCE(sp->completed) & 0x1; preempt_disable(); - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1; + __this_cpu_inc(sp->per_cpu_ref->c[idx]); smp_mb(); /* B */ /* Avoid leaking the critical section. */ - ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1; + __this_cpu_inc(sp->per_cpu_ref->seq[idx]); preempt_enable(); return idx; } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index d1c8e4a85b92..f0ed867070cd 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2275,7 +2275,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp) } smp_mb(); /* List handling before counting for rcu_barrier(). */ rdp->qlen_lazy -= count_lazy; - ACCESS_ONCE(rdp->qlen) -= count; + ACCESS_ONCE(rdp->qlen) = rdp->qlen - count; rdp->n_cbs_invoked += count; /* Reinstate batch limit if we have worked down the excess. */ @@ -2420,7 +2420,7 @@ static void force_quiescent_state(struct rcu_state *rsp) if (rnp_old != NULL) raw_spin_unlock(&rnp_old->fqslock); if (ret) { - ACCESS_ONCE(rsp->n_force_qs_lh)++; + rsp->n_force_qs_lh++; return; } rnp_old = rnp; @@ -2432,7 +2432,7 @@ static void force_quiescent_state(struct rcu_state *rsp) smp_mb__after_unlock_lock(); raw_spin_unlock(&rnp_old->fqslock); if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) { - ACCESS_ONCE(rsp->n_force_qs_lh)++; + rsp->n_force_qs_lh++; raw_spin_unlock_irqrestore(&rnp_old->lock, flags); return; /* Someone beat us to it. */ } @@ -2621,7 +2621,7 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu), local_irq_restore(flags); return; } - ACCESS_ONCE(rdp->qlen)++; + ACCESS_ONCE(rdp->qlen) = rdp->qlen + 1; if (lazy) rdp->qlen_lazy++; else @@ -3185,7 +3185,7 @@ static void _rcu_barrier(struct rcu_state *rsp) * ACCESS_ONCE() to prevent the compiler from speculating * the increment to precede the early-exit check. */ - ACCESS_ONCE(rsp->n_barrier_done)++; + ACCESS_ONCE(rsp->n_barrier_done) = rsp->n_barrier_done + 1; WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 1); _rcu_barrier_trace(rsp, "Inc1", -1, rsp->n_barrier_done); smp_mb(); /* Order ->n_barrier_done increment with below mechanism. */ @@ -3235,7 +3235,7 @@ static void _rcu_barrier(struct rcu_state *rsp) /* Increment ->n_barrier_done to prevent duplicate work. */ smp_mb(); /* Keep increment after above mechanism. */ - ACCESS_ONCE(rsp->n_barrier_done)++; + ACCESS_ONCE(rsp->n_barrier_done) = rsp->n_barrier_done + 1; WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 0); _rcu_barrier_trace(rsp, "Inc2", -1, rsp->n_barrier_done); smp_mb(); /* Keep increment before caller's subsequent code. */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index aee1e924b048..7ce734040a5e 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2274,8 +2274,8 @@ static int rcu_nocb_kthread(void *arg) tail = xchg(&rdp->nocb_tail, &rdp->nocb_head); c = atomic_long_xchg(&rdp->nocb_q_count, 0); cl = atomic_long_xchg(&rdp->nocb_q_count_lazy, 0); - ACCESS_ONCE(rdp->nocb_p_count) += c; - ACCESS_ONCE(rdp->nocb_p_count_lazy) += cl; + rdp->nocb_p_count += c; + rdp->nocb_p_count_lazy += cl; rcu_nocb_wait_gp(rdp); /* Each pass through the following loop invokes a callback. */ ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:22 ` Linus Torvalds 2014-06-02 21:02 ` Paul E. McKenney @ 2014-06-03 7:54 ` Peter Zijlstra 1 sibling, 0 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-03 7:54 UTC (permalink / raw) To: Linus Torvalds Cc: Waiman Long, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low [-- Attachment #1: Type: text/plain, Size: 1093 bytes --] On Mon, Jun 02, 2014 at 01:22:10PM -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 1:05 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > So the question is, do you prefer subtly broken code or hard compile > > fails? Me, I go for the compile fail. > > The thing is, parisc has a perfectly fine "cmpxchg" implementation in > practice, and ACCESS_ONCE() and friends work fine too for reading. > > What the "use a spinlock" approach cannot generally do is: > > - ACCESS_ONCE() to _write_ things doesn't work well. You really > should use "atomic_set()". > > - you may not necessarily be able to mix partial updates (ie > differently sized updates to the same thing) depending on just how the > spinlock hashing works > > but both of those are really rare issues and don't affect normal code. Agreed on the second, although that would be fairly easy to fix by masking out the lower few bits in the pointer address before hashing. The first, you're probably, right, but seeing how its a completely silent fail atm I'm not at all comfortable with it. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:05 ` Peter Zijlstra 2014-06-02 20:22 ` Linus Torvalds @ 2014-06-02 20:24 ` James Bottomley 1 sibling, 0 replies; 61+ messages in thread From: James Bottomley @ 2014-06-02 20:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, 2014-06-02 at 22:05 +0200, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 01:33:34PM -0400, Waiman Long wrote: > > On 06/02/2014 12:30 PM, Peter Zijlstra wrote: > > >On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: > > >>I'm almost inclined to just exclude parisc from using opt spinning. > > >> > > >>That said, this patch still doesn't address the far more interesting > > >>problem of actually finding these issues for these few weird archs. > > >So why do these archs provide xchg() and cmpxchg() at all? Wouldn't it > > >be much simpler if archs that cannot sanely do this, not provide these > > >primitives at all? > > > > I believe xchg() and cmpxchg() are used in quite a number of places within > > the generic kernel code. So kernel compilation will fail if those APIs > > aren't provided by an architecture. > > Yep.. so this is going to be painful for a while. But given their > (parisc, sparc32, metag-lock1) constraints, who knows how many of those > uses are actually broken. > > So the question is, do you prefer subtly broken code or hard compile > fails? Me, I go for the compile fail. The failure is only when a variable that will have an atomic exchange done on it is updated by a simple operation. To do this properly, we'd probably need an update macro we could supply the locking to, and a way of marking the variable to get the compiler to cause a build error if it was ever updated improperly, but that's starting to look very similar to Mikulas' proposal. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:25 ` Peter Zijlstra 2014-06-02 16:30 ` Peter Zijlstra @ 2014-06-02 16:43 ` Paul E. McKenney 2014-06-02 17:14 ` James Bottomley 2014-06-02 17:29 ` Waiman Long 2014-06-02 17:09 ` Linus Torvalds 2 siblings, 2 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 16:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 12:00:45PM -0400, Mikulas Patocka wrote: > > struct optimistic_spin_queue { > > - struct optimistic_spin_queue *next, *prev; > > + atomic_pointer(struct optimistic_spin_queue *) next; > > + struct optimistic_spin_queue *prev; > > int locked; /* 1 if lock acquired */ > > }; > > > > Index: linux-3.15-rc8/include/asm-generic/atomic-long.h > > =================================================================== > > --- linux-3.15-rc8.orig/include/asm-generic/atomic-long.h 2014-06-02 17:11:17.000000000 +0200 > > +++ linux-3.15-rc8/include/asm-generic/atomic-long.h 2014-06-02 17:11:50.000000000 +0200 > > @@ -255,4 +255,31 @@ static inline long atomic_long_add_unles > > > > #endif /* BITS_PER_LONG == 64 */ > > > > +#define atomic_pointer(type) \ > > +union { \ > > + atomic_long_t __a; \ > > + type __t; \ > > + char __check_sizeof[sizeof(type) == sizeof(long) ? 1 : -1]; \ > > +} > > That's still entirely disgusting, and afaict entirely redundant. You can > do that test in the operators below just fine. > > > +#define ATOMIC_POINTER_INIT(i) { .__t = (i) } > > + > > +#define atomic_pointer_read(v) ((typeof((v)->__t))atomic_long_read(&(v)->__a)) > > + > > +#define atomic_pointer_set(v, i) ({ \ > > + typeof((v)->__t) __i = (i); \ > > + atomic_long_set(&(v)->__a, (long)(__i)); \ > > +}) > > + > > +#define atomic_pointer_xchg(v, i) ({ \ > > + typeof((v)->__t) __i = (i); \ > > + (typeof((v)->__t))atomic_long_xchg(&(v)->__a, (long)(__i)); \ > > +}) > > + > > +#define atomic_pointer_cmpxchg(v, old, new) ({ \ > > + typeof((v)->__t) __old = (old); \ > > + typeof((v)->__t) __new = (new); \ > > + (typeof((v)->__t))atomic_long_cmpxchg(&(v)->__a, (long)(__old), (long)(__new));\ > > +}) > > And I can't say I'm a particular fan of these ops either, as alternative > I'm almost inclined to just exclude parisc from using opt spinning. That is an excellent point for this particular issue. Do parisc systems really support enough CPUs to make queued spinlocks worthwhile? If not, maybe we should just have parisc stick with traditional spinlocks. > That said, this patch still doesn't address the far more interesting > problem of actually finding these issues for these few weird archs. Indeed. And finding other lower-probability failures due to other atomic manipulations of pointers that are also accessed with normal loads and stores. Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:43 ` Paul E. McKenney @ 2014-06-02 17:14 ` James Bottomley 2014-06-02 17:29 ` Waiman Long 1 sibling, 0 replies; 61+ messages in thread From: James Bottomley @ 2014-06-02 17:14 UTC (permalink / raw) To: paulmck Cc: Peter Zijlstra, Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On Mon, 2014-06-02 at 09:43 -0700, Paul E. McKenney wrote: > On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: > > On Mon, Jun 02, 2014 at 12:00:45PM -0400, Mikulas Patocka wrote: > > > struct optimistic_spin_queue { > > > - struct optimistic_spin_queue *next, *prev; > > > + atomic_pointer(struct optimistic_spin_queue *) next; > > > + struct optimistic_spin_queue *prev; > > > int locked; /* 1 if lock acquired */ > > > }; > > > > > > Index: linux-3.15-rc8/include/asm-generic/atomic-long.h > > > =================================================================== > > > --- linux-3.15-rc8.orig/include/asm-generic/atomic-long.h 2014-06-02 17:11:17.000000000 +0200 > > > +++ linux-3.15-rc8/include/asm-generic/atomic-long.h 2014-06-02 17:11:50.000000000 +0200 > > > @@ -255,4 +255,31 @@ static inline long atomic_long_add_unles > > > > > > #endif /* BITS_PER_LONG == 64 */ > > > > > > +#define atomic_pointer(type) \ > > > +union { \ > > > + atomic_long_t __a; \ > > > + type __t; \ > > > + char __check_sizeof[sizeof(type) == sizeof(long) ? 1 : -1]; \ > > > +} > > > > That's still entirely disgusting, and afaict entirely redundant. You can > > do that test in the operators below just fine. > > > > > +#define ATOMIC_POINTER_INIT(i) { .__t = (i) } > > > + > > > +#define atomic_pointer_read(v) ((typeof((v)->__t))atomic_long_read(&(v)->__a)) > > > + > > > +#define atomic_pointer_set(v, i) ({ \ > > > + typeof((v)->__t) __i = (i); \ > > > + atomic_long_set(&(v)->__a, (long)(__i)); \ > > > +}) > > > + > > > +#define atomic_pointer_xchg(v, i) ({ \ > > > + typeof((v)->__t) __i = (i); \ > > > + (typeof((v)->__t))atomic_long_xchg(&(v)->__a, (long)(__i)); \ > > > +}) > > > + > > > +#define atomic_pointer_cmpxchg(v, old, new) ({ \ > > > + typeof((v)->__t) __old = (old); \ > > > + typeof((v)->__t) __new = (new); \ > > > + (typeof((v)->__t))atomic_long_cmpxchg(&(v)->__a, (long)(__old), (long)(__new));\ > > > +}) > > > > And I can't say I'm a particular fan of these ops either, as alternative > > I'm almost inclined to just exclude parisc from using opt spinning. > > That is an excellent point for this particular issue. Do parisc systems > really support enough CPUs to make queued spinlocks worthwhile? If not, > maybe we should just have parisc stick with traditional spinlocks. Yes and No. No for Linux because the only hyper CPU system is the superdome, which we've never managed to boot linux on (it has some complexities in the Bus architecture) and we're not likely to try because the installations tend to cost north of US$1m. For the Server systems we do have a few high CPU count ones, but we lost access to them when HP dismantled the parisc linux lab. Currently the standard is about 4 cpus. I think just not using queued spinlocks is fine for us. Should anyone ever try the large CPU systems, we can revisit. James ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:43 ` Paul E. McKenney 2014-06-02 17:14 ` James Bottomley @ 2014-06-02 17:29 ` Waiman Long 1 sibling, 0 replies; 61+ messages in thread From: Waiman Long @ 2014-06-02 17:29 UTC (permalink / raw) To: paulmck Cc: Peter Zijlstra, Mikulas Patocka, Linus Torvalds, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton, Jason Low On 06/02/2014 12:43 PM, Paul E. McKenney wrote: > On Mon, Jun 02, 2014 at 06:25:25PM +0200, Peter Zijlstra wrote: >> On Mon, Jun 02, 2014 at 12:00:45PM -0400, Mikulas Patocka wrote: >>> struct optimistic_spin_queue { >>> - struct optimistic_spin_queue *next, *prev; >>> + atomic_pointer(struct optimistic_spin_queue *) next; >>> + struct optimistic_spin_queue *prev; >>> int locked; /* 1 if lock acquired */ >>> }; >>> >>> Index: linux-3.15-rc8/include/asm-generic/atomic-long.h >>> =================================================================== >>> --- linux-3.15-rc8.orig/include/asm-generic/atomic-long.h 2014-06-02 17:11:17.000000000 +0200 >>> +++ linux-3.15-rc8/include/asm-generic/atomic-long.h 2014-06-02 17:11:50.000000000 +0200 >>> @@ -255,4 +255,31 @@ static inline long atomic_long_add_unles >>> >>> #endif /* BITS_PER_LONG == 64 */ >>> >>> +#define atomic_pointer(type) \ >>> +union { \ >>> + atomic_long_t __a; \ >>> + type __t; \ >>> + char __check_sizeof[sizeof(type) == sizeof(long) ? 1 : -1]; \ >>> +} >> That's still entirely disgusting, and afaict entirely redundant. You can >> do that test in the operators below just fine. >> >>> +#define ATOMIC_POINTER_INIT(i) { .__t = (i) } >>> + >>> +#define atomic_pointer_read(v) ((typeof((v)->__t))atomic_long_read(&(v)->__a)) >>> + >>> +#define atomic_pointer_set(v, i) ({ \ >>> + typeof((v)->__t) __i = (i); \ >>> + atomic_long_set(&(v)->__a, (long)(__i)); \ >>> +}) >>> + >>> +#define atomic_pointer_xchg(v, i) ({ \ >>> + typeof((v)->__t) __i = (i); \ >>> + (typeof((v)->__t))atomic_long_xchg(&(v)->__a, (long)(__i)); \ >>> +}) >>> + >>> +#define atomic_pointer_cmpxchg(v, old, new) ({ \ >>> + typeof((v)->__t) __old = (old); \ >>> + typeof((v)->__t) __new = (new); \ >>> + (typeof((v)->__t))atomic_long_cmpxchg(&(v)->__a, (long)(__old), (long)(__new));\ >>> +}) >> And I can't say I'm a particular fan of these ops either, as alternative >> I'm almost inclined to just exclude parisc from using opt spinning. > That is an excellent point for this particular issue. Do parisc systems > really support enough CPUs to make queued spinlocks worthwhile? If not, > maybe we should just have parisc stick with traditional spinlocks. The operation in question is the optimistic spinning code of mutex which is currently active, I think, for all architectures. It is not related to the queued spinlock, though it will have the same problem. Yes, by disabling the MUTEX_SPIN_ON_OWNER config variable from PA-RISC, we can disable optimistic spinning. -Longman ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:25 ` Peter Zijlstra 2014-06-02 16:30 ` Peter Zijlstra 2014-06-02 16:43 ` Paul E. McKenney @ 2014-06-02 17:09 ` Linus Torvalds 2014-06-02 17:12 ` Davidlohr Bueso ` (3 more replies) 2 siblings, 4 replies; 61+ messages in thread From: Linus Torvalds @ 2014-06-02 17:09 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > And I can't say I'm a particular fan of these ops either, as alternative > I'm almost inclined to just exclude parisc from using opt spinning. Please do. There is no way in hell that we should introduce a magic new atomic_pointer thing for parisc. And the idea somebody had to change ACCESS_ONCE() for PA-RISC (I'm not going to go back to find who to blame) is just horribly wrong too, since it's not even necessary for any normal use: the special "load-and-store-zero" instruction isn't actually used for "real" data, it's used only for the special spinlocks afaik, so doing it for all ACCESS_ONCE() users would be wrong even on PA-RISC. For any normal data, the usual "just load the value, making sure the compiler doesn't reload it" is perfectly fine - even on PA-RISC. Now, if PA-RISC was a major architecture, we'd have to figure this out. But as it is, PA-RISC is just about the shittiest RISC ever invented (with original sparc being a strong contender), and let's face it, nobody really uses it. It's a "fun project", but it is not something that we should use to mess up either ACCESS_ONCE() or the MCS locks. Just make PA-RISC use its own locks, not any of the new fancy ones. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:09 ` Linus Torvalds @ 2014-06-02 17:12 ` Davidlohr Bueso 2014-06-02 17:42 ` Waiman Long 2014-06-02 20:46 ` Mikulas Patocka ` (2 subsequent siblings) 3 siblings, 1 reply; 61+ messages in thread From: Davidlohr Bueso @ 2014-06-02 17:12 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, 2014-06-02 at 10:09 -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > And I can't say I'm a particular fan of these ops either, as alternative > > I'm almost inclined to just exclude parisc from using opt spinning. > > Please do. I agree, this is the best way out of this mess. Furthermore, it would also be nice to consolidate opt spinning in a common CONFIG option -- right now mutexes and rwsems create their own dependencies. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:12 ` Davidlohr Bueso @ 2014-06-02 17:42 ` Waiman Long 0 siblings, 0 replies; 61+ messages in thread From: Waiman Long @ 2014-06-02 17:42 UTC (permalink / raw) To: Davidlohr Bueso Cc: Linus Torvalds, Peter Zijlstra, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Thomas Gleixner, Rik van Riel, Andrew Morton, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On 06/02/2014 01:12 PM, Davidlohr Bueso wrote: > On Mon, 2014-06-02 at 10:09 -0700, Linus Torvalds wrote: >> On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra<peterz@infradead.org> wrote: >>> And I can't say I'm a particular fan of these ops either, as alternative >>> I'm almost inclined to just exclude parisc from using opt spinning. >> Please do. > I agree, this is the best way out of this mess. Furthermore, it would > also be nice to consolidate opt spinning in a common CONFIG option -- > right now mutexes and rwsems create their own dependencies. > I would suggest adding a RWSEM_SPIN_ON_OWNER to control opt spinning in rwsem. Currently MUTEX_SPIN_ON_OWNER is doing that for mutex, and it is disabled when mutex debugging is turned on. So I think it is better to allow them to be disabled separately. -Longman ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:09 ` Linus Torvalds 2014-06-02 17:12 ` Davidlohr Bueso @ 2014-06-02 20:46 ` Mikulas Patocka 2014-06-02 20:53 ` Linus Torvalds 2014-06-03 7:20 ` Peter Zijlstra 2014-06-02 21:03 ` Paul E. McKenney 2014-06-06 15:06 ` Peter Zijlstra 3 siblings, 2 replies; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 20:46 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, 2 Jun 2014, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > And I can't say I'm a particular fan of these ops either, as alternative > > I'm almost inclined to just exclude parisc from using opt spinning. > > Please do. > > There is no way in hell that we should introduce a magic new > atomic_pointer thing for parisc. And the idea somebody had to change > ACCESS_ONCE() for PA-RISC (I'm not going to go back to find who to > blame) is just horribly wrong too, since it's not even necessary for > any normal use: the special "load-and-store-zero" instruction isn't > actually used for "real" data, it's used only for the special > spinlocks afaik, so doing it for all ACCESS_ONCE() users would be > wrong even on PA-RISC. For any normal data, the usual "just load the > value, making sure the compiler doesn't reload it" is perfectly fine - > even on PA-RISC. > > Now, if PA-RISC was a major architecture, we'd have to figure this > out. But as it is, PA-RISC is just about the shittiest RISC ever > invented (with original sparc being a strong contender), and let's > face it, nobody really uses it. It's a "fun project", but it is not > something that we should use to mess up either ACCESS_ONCE() or the > MCS locks. > > Just make PA-RISC use its own locks, not any of the new fancy ones. > > Linus And what else do you want to do? Peter Zijlstra said "I've been using xchg() and cmpxchg() without such consideration for quite a while." - so it basically implies that the kernel is full of such races, mcs_spinlock is just the most visible one that crashes the kernel first. It's not only parisc - tile32, arc, metag (maybe hexagon) are broken too, because they don't have cmpxchg in hardware. We have atomic_t, atomic64_t, atomic_long_t that can be sanely used even on architectures without hardware cmpxchg - so I ask - why can't we have atomic_pointer_t with the same semantics? (pointer type conversion issues can be solved, as it is done in the PATCH v2) Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:46 ` Mikulas Patocka @ 2014-06-02 20:53 ` Linus Torvalds 2014-06-02 21:12 ` Mikulas Patocka 2014-06-03 7:20 ` Peter Zijlstra 1 sibling, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2014-06-02 20:53 UTC (permalink / raw) To: Mikulas Patocka Cc: Peter Zijlstra, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 2, 2014 at 1:46 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > And what else do you want to do? > > Peter Zijlstra said "I've been using xchg() and cmpxchg() without such > consideration for quite a while." - so it basically implies that the > kernel is full of such races, mcs_spinlock is just the most visible one > that crashes the kernel first. .. so your whole argument is bogus, because it doesn't actually fix anything else. Now, something that *would* fix something else is (for example) to just make "ACCESS_ONCE()" a rvalue so that you cannot use it for assignments, and then trying to sort out what happens then. It's possible that the "atomic_pointer_t" would be a part of the solution to that "what happens then", but THERE IS NO WAY IN HELL we're adding it for just one architecture and one use that doesn't warrant even _existing_ on that architecture. See what I'm saying? You're not fixing the problem, you're fixing one unimportant detail that isn't worth fixing that way. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:53 ` Linus Torvalds @ 2014-06-02 21:12 ` Mikulas Patocka 2014-06-03 7:36 ` Peter Zijlstra 0 siblings, 1 reply; 61+ messages in thread From: Mikulas Patocka @ 2014-06-02 21:12 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, 2 Jun 2014, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 1:46 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > And what else do you want to do? > > > > Peter Zijlstra said "I've been using xchg() and cmpxchg() without such > > consideration for quite a while." - so it basically implies that the > > kernel is full of such races, mcs_spinlock is just the most visible one > > that crashes the kernel first. > > .. so your whole argument is bogus, because it doesn't actually fix > anything else. > > Now, something that *would* fix something else is (for example) to > just make "ACCESS_ONCE()" a rvalue so that you cannot use it for > assignments, and then trying to sort out what happens then. It's > possible that the "atomic_pointer_t" would be a part of the solution > to that "what happens then", but THERE IS NO WAY IN HELL we're adding > it for just one architecture and one use that doesn't warrant even > _existing_ on that architecture. The patch adds atomic_pointer_t for all architectures - it is in the common code and it is backed by atomic_long_t (that already exists for all architectures). There is no new arch-specific code at all. When we have atomic_pointer_t, we can find the instances of xchg() and cmpxchg() and convert them to atomic_pointer_t (or to other atomic*_t types). When we convert them all, we can drop xchg() and cmpxchg() at all (at least from architecture-neutral code). The problem with xchg() and cmpxchg() is that they are very easy to misuse. Peter Zijlstra didn't know that they are not atomic w.r.t. normal stores, a lot of other people don't know it too - and if we allow these functions to be used, this race condition will reappear in the future again and again. That's why I'm proposing atomic_pointer_t - it guarantees that this race condition can't be made. > See what I'm saying? > > You're not fixing the problem, you're fixing one unimportant detail > that isn't worth fixing that way. > > Linus Regarding reworking ACCESS_ONCE() for reads and writes - the problem is - how do you make people use it? ACCESS_ONCE() is already missing at a lot of places (it doesn't cause any visible bug on the condition that the compiler doesn't split the load or store to multiple accesses), I can assume that people will omit ATOMIC_ONCE_STORE() too even if we make it. Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 21:12 ` Mikulas Patocka @ 2014-06-03 7:36 ` Peter Zijlstra 2014-06-03 11:14 ` Mikulas Patocka 2014-06-03 14:07 ` Paul E. McKenney 0 siblings, 2 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-03 7:36 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low [-- Attachment #1: Type: text/plain, Size: 1724 bytes --] On Mon, Jun 02, 2014 at 05:12:16PM -0400, Mikulas Patocka wrote: > The patch adds atomic_pointer_t for all architectures - it is in the > common code and it is backed by atomic_long_t (that already exists for all > architectures). There is no new arch-specific code at all. > > When we have atomic_pointer_t, we can find the instances of xchg() and > cmpxchg() and convert them to atomic_pointer_t (or to other atomic*_t > types). > > When we convert them all, we can drop xchg() and cmpxchg() at all (at > least from architecture-neutral code). > > The problem with xchg() and cmpxchg() is that they are very easy to > misuse. Peter Zijlstra didn't know that they are not atomic w.r.t. normal > stores, a lot of other people don't know it too - and if we allow these > functions to be used, this race condition will reappear in the future > again and again. > > That's why I'm proposing atomic_pointer_t - it guarantees that this race > condition can't be made. But its horrible, and doesn't have any benefit afaict. So if we really want to keep supporting these platforms; I would propose something like: #ifdef __CHECKER__ #define __atomic __attribute__((address_space(5))) #else #define __atomic #endif #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) Along with changes to xchg() and cmpxchg() that require them to take pointers to __atomic. That way we keep the flexibility of xchg() and cmpxchg() for being (mostly) type and size invariant, and get sparse to find wrong usage. Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement store() however they like. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-03 7:36 ` Peter Zijlstra @ 2014-06-03 11:14 ` Mikulas Patocka 2014-06-03 13:24 ` Peter Zijlstra 2014-06-03 14:07 ` Paul E. McKenney 1 sibling, 1 reply; 61+ messages in thread From: Mikulas Patocka @ 2014-06-03 11:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Tue, 3 Jun 2014, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 05:12:16PM -0400, Mikulas Patocka wrote: > > The patch adds atomic_pointer_t for all architectures - it is in the > > common code and it is backed by atomic_long_t (that already exists for all > > architectures). There is no new arch-specific code at all. > > > > When we have atomic_pointer_t, we can find the instances of xchg() and > > cmpxchg() and convert them to atomic_pointer_t (or to other atomic*_t > > types). > > > > When we convert them all, we can drop xchg() and cmpxchg() at all (at > > least from architecture-neutral code). > > > > The problem with xchg() and cmpxchg() is that they are very easy to > > misuse. Peter Zijlstra didn't know that they are not atomic w.r.t. normal > > stores, a lot of other people don't know it too - and if we allow these > > functions to be used, this race condition will reappear in the future > > again and again. > > > > That's why I'm proposing atomic_pointer_t - it guarantees that this race > > condition can't be made. > > But its horrible, and doesn't have any benefit afaict. > > So if we really want to keep supporting these platforms; I would propose > something like: > > #ifdef __CHECKER__ > #define __atomic __attribute__((address_space(5))) > #else > #define __atomic > #endif > > #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) > #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) > > Along with changes to xchg() and cmpxchg() that require them to take > pointers to __atomic. > > That way we keep the flexibility of xchg() and cmpxchg() for being > (mostly) type and size invariant, and get sparse to find wrong usage. > > Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement > store() however they like. Your proposal is very good because it warns about incorrect usage automatically. Your usage is very similar to what my patch at the top of this thread does: Instead of "__atomic struct s *p;" declaration, my patch uses "atomic_pointer(struct s*) p;" as the declaration Instead of store(&p, v), my patch uses atomic_pointer_set(&p, v); Instead of load(&p), my patch uses atomic_pointer_get(&p); Instead of xchg(&p, v), my patch uses atomic_pointer_xchg(&p, v); Instead of cmpxchg(&p, v1, v2), my patch uses atomic_pointer_cmpxchg(&p1, v1, v2); > But its horrible, and doesn't have any benefit afaict. See the five cases above - why do you say that the operation on the left is good and the operation on the right is horrible? To me, it looks like they are both similar, they are just named differently. Both check the type of the pointer and warns if the user passes incompatible pointer. If I rename the operations in my patch to store(), load(), xchg(), cmpxchg(), would you like it? My patch has advantage (over your #define __atomic __attribute__((address_space(5))) ) that it checks the mismatches at compile time. Your proposal only check them with sparse. But either way - it is very good that the mismatches are being checked automatically. We need some method to catch these races automatically. There are places where people xchg() or cmpxchg() with direct modifications, see for example this: $ grep -w mnt_expiry_mark */*.c fs/namespace.c: if (unlikely(m->mnt_expiry_mark)) fs/namespace.c: m->mnt_expiry_mark = 0; fs/namespace.c: if (!xchg(&mnt->mnt_expiry_mark, 1)) fs/namespace.c: /* we mustn't call path_put() as that would clear mnt_expiry_mark */ fs/namespace.c: if (!xchg(&mnt->mnt_expiry_mark, 1) || $ grep "sub_info->complete" */*.c kernel/kmod.c: struct completion *comp = xchg(&sub_info->complete, NULL); kernel/kmod.c: sub_info->complete = &done; kernel/kmod.c: if (xchg(&sub_info->complete, NULL)) $ grep -w "fdt->fd" */*.c fs/file.c: free_fdmem(fdt->fd); fs/file.c: fdt->fd = data; fs/file.c: free_fdmem(fdt->fd); fs/file.c: struct file * file = xchg(&fdt->fd[i], NULL); fs/file.c: if (rcu_access_pointer(fdt->fd[fd]) != NULL) { fs/file.c: rcu_assign_pointer(fdt->fd[fd], NULL); fs/file.c: BUG_ON(fdt->fd[fd] != NULL); fs/file.c: rcu_assign_pointer(fdt->fd[fd], file); fs/file.c: file = fdt->fd[fd]; fs/file.c: rcu_assign_pointer(fdt->fd[fd], NULL); fs/file.c: file = fdt->fd[fd]; fs/file.c: rcu_assign_pointer(fdt->fd[fd], NULL); fs/file.c: tofree = fdt->fd[fd]; fs/file.c: rcu_assign_pointer(fdt->fd[fd], file); fs/file.c: file = rcu_dereference_check_fdtable(files, fdt->fd[n]); $ grep -w exit_state */*.c fs/exec.c: if (likely(leader->exit_state)) fs/exec.c: BUG_ON(leader->exit_state != EXIT_ZOMBIE); fs/exec.c: leader->exit_state = EXIT_DEAD; kernel/exit.c: if (leader != p && thread_group_empty(leader) && leader->exit_state == EXIT_ZOMBIE) { kernel/exit.c: leader->exit_state = EXIT_DEAD; kernel/exit.c: (p->exit_state && thread_group_empty(p)) || kernel/exit.c: if (p->exit_state == EXIT_DEAD) kernel/exit.c: p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) { kernel/exit.c: p->exit_state = EXIT_DEAD; kernel/exit.c: tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; kernel/exit.c: if (cmpxchg(&p->exit_state, EXIT_ZOMBIE, state) != EXIT_ZOMBIE) kernel/exit.c: p->exit_state = state; kernel/exit.c: if (unlikely(p->exit_state == EXIT_DEAD)) kernel/exit.c: if (unlikely(p->exit_state == EXIT_TRACE)) { kernel/exit.c: if (p->exit_state == EXIT_ZOMBIE) { kernel/fork.c: WARN_ON(!tsk->exit_state); kernel/posix-cpu-timers.c: if (unlikely(p->exit_state)) kernel/posix-cpu-timers.c: } else if (unlikely(p->exit_state) && thread_group_empty(p)) { kernel/ptrace.c: if (unlikely(task->exit_state)) kernel/ptrace.c: if (p->exit_state != EXIT_ZOMBIE) kernel/ptrace.c: p->exit_state = EXIT_DEAD; kernel/ptrace.c: if (child->exit_state) /* already dead */ kernel/signal.c: if (t->exit_state) kernel/taskstats.c: if (tsk->exit_state) mm/oom_kill.c: if (task->exit_state) Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-03 11:14 ` Mikulas Patocka @ 2014-06-03 13:24 ` Peter Zijlstra 2014-06-03 14:18 ` Mikulas Patocka 0 siblings, 1 reply; 61+ messages in thread From: Peter Zijlstra @ 2014-06-03 13:24 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low [-- Attachment #1: Type: text/plain, Size: 3963 bytes --] On Tue, Jun 03, 2014 at 07:14:31AM -0400, Mikulas Patocka wrote: > > So if we really want to keep supporting these platforms; I would propose > > something like: > > > > #ifdef __CHECKER__ > > #define __atomic __attribute__((address_space(5))) > > #else > > #define __atomic > > #endif > > > > #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) > > #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) > > > > Along with changes to xchg() and cmpxchg() that require them to take > > pointers to __atomic. > > > > That way we keep the flexibility of xchg() and cmpxchg() for being > > (mostly) type and size invariant, and get sparse to find wrong usage. > > > > Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement > > store() however they like. > > Your proposal is very good because it warns about incorrect usage > automatically. Exactly the point. > Your usage is very similar to what my patch at the top of this thread > does: > > Instead of "__atomic struct s *p;" declaration, my patch uses > "atomic_pointer(struct s*) p;" as the declaration > Instead of store(&p, v), my patch uses atomic_pointer_set(&p, v); > Instead of load(&p), my patch uses atomic_pointer_get(&p); > Instead of xchg(&p, v), my patch uses atomic_pointer_xchg(&p, v); > Instead of cmpxchg(&p, v1, v2), my patch uses atomic_pointer_cmpxchg(&p1, v1, v2); > > > But its horrible, and doesn't have any benefit afaict. > > See the five cases above - why do you say that the operation on the left > is good and the operation on the right is horrible? To me, it looks like > they are both similar, they are just named differently. Both check the > type of the pointer and warns if the user passes incompatible pointer. If > I rename the operations in my patch to store(), load(), xchg(), cmpxchg(), > would you like it? Nope.. because the above store,load,xchg,cmpxchg are type invariant and work for anything of size (1),2,4,(8). So I dislike your proposal on a number of points: 1) its got pointer in, and while the immediate problem is indeed with pointers, there is no reason it always should be, so we'll keep on introducing new APIs; 2) its got a fixed length, nl. sizeof(void *), if we were to find another case which had the same problem which used 'int' we'd have to again create new APIs; 3) you only fixed the one site; 4) I'm the lazy kind and atomic_foo_* is just too much typing, let alone remembering all the various new atomic_foo_ APIs resulting from all this. This is the place where I really miss C++ templates; and yes before people shoot me in the head for that, I do know about all the various pitfalls and down sides of those too. > My patch has advantage (over your #define __atomic > __attribute__((address_space(5))) ) that it checks the mismatches at > compile time. Your proposal only check them with sparse. But either way - > it is very good that the mismatches are being checked automatically. So my proposal goes a lot further in that by making xchg() and cmpxchg() require pointer to __atomic, all sites get coverage, not only the one case where you found was a problem. Yes, this requires a lot more effort, for we'll have to pretty much audit and annotate the entire tree, but such things can be done, see for example the introduction of __rcu. Also, these days we get automagic emails if we introduce new sparse fails, so it being sparse and not gcc isn't really any threshold at all. > We need some method to catch these races automatically. There are places > where people xchg() or cmpxchg() with direct modifications, see for > example this: Yep, so all those places will immediately stand out, the first fail will be that those variables aren't marked __atomic, once you do that, the direct assignment will complain about crossing the address_space marker. Voila, sorted. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-03 13:24 ` Peter Zijlstra @ 2014-06-03 14:18 ` Mikulas Patocka 0 siblings, 0 replies; 61+ messages in thread From: Mikulas Patocka @ 2014-06-03 14:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Tue, 3 Jun 2014, Peter Zijlstra wrote: > On Tue, Jun 03, 2014 at 07:14:31AM -0400, Mikulas Patocka wrote: > > > So if we really want to keep supporting these platforms; I would propose > > > something like: > > > > > > #ifdef __CHECKER__ > > > #define __atomic __attribute__((address_space(5))) > > > #else > > > #define __atomic > > > #endif > > > > > > #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) > > > #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) > > > > > > Along with changes to xchg() and cmpxchg() that require them to take > > > pointers to __atomic. > > > > > > That way we keep the flexibility of xchg() and cmpxchg() for being > > > (mostly) type and size invariant, and get sparse to find wrong usage. > > > > > > Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement > > > store() however they like. > > > > Your proposal is very good because it warns about incorrect usage > > automatically. > > Exactly the point. > > > Your usage is very similar to what my patch at the top of this thread > > does: > > > > Instead of "__atomic struct s *p;" declaration, my patch uses > > "atomic_pointer(struct s*) p;" as the declaration > > Instead of store(&p, v), my patch uses atomic_pointer_set(&p, v); > > Instead of load(&p), my patch uses atomic_pointer_get(&p); > > Instead of xchg(&p, v), my patch uses atomic_pointer_xchg(&p, v); > > Instead of cmpxchg(&p, v1, v2), my patch uses atomic_pointer_cmpxchg(&p1, v1, v2); > > > > > But its horrible, and doesn't have any benefit afaict. > > > > See the five cases above - why do you say that the operation on the left > > is good and the operation on the right is horrible? To me, it looks like > > they are both similar, they are just named differently. Both check the > > type of the pointer and warns if the user passes incompatible pointer. If > > I rename the operations in my patch to store(), load(), xchg(), cmpxchg(), > > would you like it? > > Nope.. because the above store,load,xchg,cmpxchg are type invariant and > work for anything of size (1),2,4,(8). > > So I dislike your proposal on a number of points: > > 1) its got pointer in, and while the immediate problem is indeed with > pointers, there is no reason it always should be, so we'll keep on > introducing new APIs; > > 2) its got a fixed length, nl. sizeof(void *), if we were to find > another case which had the same problem which used 'int' we'd have to > again create new APIs; > > 3) you only fixed the one site; > > 4) I'm the lazy kind and atomic_foo_* is just too much typing, let > alone remembering all the various new atomic_foo_ APIs resulting from > all this. > > This is the place where I really miss C++ templates; and yes before > people shoot me in the head for that, I do know about all the various > pitfalls and down sides of those too. > > > My patch has advantage (over your #define __atomic > > __attribute__((address_space(5))) ) that it checks the mismatches at > > compile time. Your proposal only check them with sparse. But either way - > > it is very good that the mismatches are being checked automatically. > > So my proposal goes a lot further in that by making xchg() and cmpxchg() > require pointer to __atomic, all sites get coverage, not only the one > case where you found was a problem. > > Yes, this requires a lot more effort, for we'll have to pretty much > audit and annotate the entire tree, but such things can be done, see for > example the introduction of __rcu. > > Also, these days we get automagic emails if we introduce new sparse > fails, so it being sparse and not gcc isn't really any threshold at all. > > > We need some method to catch these races automatically. There are places > > where people xchg() or cmpxchg() with direct modifications, see for > > example this: > > Yep, so all those places will immediately stand out, the first fail will > be that those variables aren't marked __atomic, once you do that, the > direct assignment will complain about crossing the address_space marker. > > Voila, sorted. I originally wanted to remove PA-RISC xchg and cmpxchg, force compile failure on places where it is used and convert them to atomic operations. But there's a lot of such places, the patch would be big and it would probably trigger some compile failures in configurations that I can't test. So, I agree that your approch with sparse tagging is better, it only warns about unsafe use and it won't be breaking compilation for so many people. Mikulas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-03 7:36 ` Peter Zijlstra 2014-06-03 11:14 ` Mikulas Patocka @ 2014-06-03 14:07 ` Paul E. McKenney 2014-06-03 15:09 ` Peter Zijlstra 1 sibling, 1 reply; 61+ messages in thread From: Paul E. McKenney @ 2014-06-03 14:07 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Tue, Jun 03, 2014 at 09:36:13AM +0200, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 05:12:16PM -0400, Mikulas Patocka wrote: > > The patch adds atomic_pointer_t for all architectures - it is in the > > common code and it is backed by atomic_long_t (that already exists for all > > architectures). There is no new arch-specific code at all. > > > > When we have atomic_pointer_t, we can find the instances of xchg() and > > cmpxchg() and convert them to atomic_pointer_t (or to other atomic*_t > > types). > > > > When we convert them all, we can drop xchg() and cmpxchg() at all (at > > least from architecture-neutral code). > > > > The problem with xchg() and cmpxchg() is that they are very easy to > > misuse. Peter Zijlstra didn't know that they are not atomic w.r.t. normal > > stores, a lot of other people don't know it too - and if we allow these > > functions to be used, this race condition will reappear in the future > > again and again. > > > > That's why I'm proposing atomic_pointer_t - it guarantees that this race > > condition can't be made. > > But its horrible, and doesn't have any benefit afaict. > > So if we really want to keep supporting these platforms; I would propose > something like: > > #ifdef __CHECKER__ > #define __atomic __attribute__((address_space(5))) > #else > #define __atomic > #endif > > #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) > #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) > > Along with changes to xchg() and cmpxchg() that require them to take > pointers to __atomic. > > That way we keep the flexibility of xchg() and cmpxchg() for being > (mostly) type and size invariant, and get sparse to find wrong usage. > > Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement > store() however they like. Should be fun interacting with atomic operations on __rcu variables (address space 4). Of course, that is already fun... Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-03 14:07 ` Paul E. McKenney @ 2014-06-03 15:09 ` Peter Zijlstra 2014-06-03 15:56 ` Paul E. McKenney 0 siblings, 1 reply; 61+ messages in thread From: Peter Zijlstra @ 2014-06-03 15:09 UTC (permalink / raw) To: Paul E. McKenney Cc: Mikulas Patocka, Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low, sparse [-- Attachment #1: Type: text/plain, Size: 1530 bytes --] On Tue, Jun 03, 2014 at 07:07:27AM -0700, Paul E. McKenney wrote: > On Tue, Jun 03, 2014 at 09:36:13AM +0200, Peter Zijlstra wrote: > > > > #ifdef __CHECKER__ > > #define __atomic __attribute__((address_space(5))) > > #else > > #define __atomic > > #endif > > > > #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) > > #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) > > > > Along with changes to xchg() and cmpxchg() that require them to take > > pointers to __atomic. > > > > That way we keep the flexibility of xchg() and cmpxchg() for being > > (mostly) type and size invariant, and get sparse to find wrong usage. > > > > Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement > > store() however they like. > > Should be fun interacting with atomic operations on __rcu variables > (address space 4). Of course, that is already fun... > Hmm, good point, I suppose sparse doesn't like two different address_space annotations on the same variable ? /me adds Christpoher Li to the CC list. ISTR Mikulas actually listing one such, me digs in recent email.. > $ grep -w "fdt->fd" */*.c > fs/file.c: free_fdmem(fdt->fd); > fs/file.c: fdt->fd = data; > fs/file.c: free_fdmem(fdt->fd); > fs/file.c: struct file * file = xchg(&fdt->fd[i], NULL); So yes, that's going to be fun, mostly because rcu_assign_pointer() doesn't actually do the right magic for this to be safe on their platform(s). [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-03 15:09 ` Peter Zijlstra @ 2014-06-03 15:56 ` Paul E. McKenney 0 siblings, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-03 15:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Mikulas Patocka, Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low, sparse On Tue, Jun 03, 2014 at 05:09:08PM +0200, Peter Zijlstra wrote: > On Tue, Jun 03, 2014 at 07:07:27AM -0700, Paul E. McKenney wrote: > > On Tue, Jun 03, 2014 at 09:36:13AM +0200, Peter Zijlstra wrote: > > > > > > #ifdef __CHECKER__ > > > #define __atomic __attribute__((address_space(5))) > > > #else > > > #define __atomic > > > #endif > > > > > > #define store(p, v) (*(p) = (typeof(*(p)) __force __atomic)(v)) > > > #define load(p) ((typeof(*p) __force)ACCESS_ONCE(*(p))) > > > > > > Along with changes to xchg() and cmpxchg() that require them to take > > > pointers to __atomic. > > > > > > That way we keep the flexibility of xchg() and cmpxchg() for being > > > (mostly) type and size invariant, and get sparse to find wrong usage. > > > > > > Then parisc, sparc32, tile32, metag-lock1 and arc-!llsc can go implement > > > store() however they like. > > > > Should be fun interacting with atomic operations on __rcu variables > > (address space 4). Of course, that is already fun... > > > > Hmm, good point, I suppose sparse doesn't like two different > address_space annotations on the same variable ? > > /me adds Christpoher Li to the CC list. > > ISTR Mikulas actually listing one such, me digs in recent email.. > > > $ grep -w "fdt->fd" */*.c > > fs/file.c: free_fdmem(fdt->fd); > > fs/file.c: fdt->fd = data; > > fs/file.c: free_fdmem(fdt->fd); > > fs/file.c: struct file * file = xchg(&fdt->fd[i], NULL); > > So yes, that's going to be fun, mostly because rcu_assign_pointer() > doesn't actually do the right magic for this to be safe on their > platform(s). Maybe at some point sparse needs to keep a bit mask for the address spaces, so that you caould say somthing like: struct foo __atomic __rcu *p; Thanx, Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 20:46 ` Mikulas Patocka 2014-06-02 20:53 ` Linus Torvalds @ 2014-06-03 7:20 ` Peter Zijlstra 1 sibling, 0 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-03 7:20 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low [-- Attachment #1: Type: text/plain, Size: 1460 bytes --] On Mon, Jun 02, 2014 at 04:46:32PM -0400, Mikulas Patocka wrote: > It's not only parisc - tile32, arc, metag (maybe hexagon) are broken too, > because they don't have cmpxchg in hardware. metag actually does, and the lock1 thing is a fallback/test thing: config METAG_ATOMICITY_LOCK1 depends on SMP bool "lock1" help This option uses the LOCK1 instruction for atomicity. This is mainly provided as a debugging aid if the lnkget/lnkset atomicity primitive isn't working properly. Then again, metag has qualiteee bits like: config METAG_SMP_WRITE_REORDERING bool help This attempts to prevent cache-memory incoherence due to external reordering of writes from different hardware threads when SMP is enabled. It adds fences (system event 0) to smp_mb and smp_rmb in an attempt to catch some of the cases, and also before writes to shared memory in LOCK1 protected atomics and spinlocks. This will not completely prevent cache incoherency on affected cores. Which makes me back away slowly before starting to run. And there is sane arc hardware: config ARC_HAS_LLSC bool "Insn: LLOCK/SCOND (efficient atomic ops)" default y depends on ARC_CPU_770 && !ARC_CANT_LLSC tile32 is indeed equally wrecked, but at least they have a tile64 system that is useful (albeit somewhat strange). Same for sparc32, there's sparc64 which is sane. parisc otoh is a dead arch that never got sane. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:09 ` Linus Torvalds 2014-06-02 17:12 ` Davidlohr Bueso 2014-06-02 20:46 ` Mikulas Patocka @ 2014-06-02 21:03 ` Paul E. McKenney 2014-06-06 15:06 ` Peter Zijlstra 3 siblings, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 21:03 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low On Mon, Jun 02, 2014 at 10:09:35AM -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > And I can't say I'm a particular fan of these ops either, as alternative > > I'm almost inclined to just exclude parisc from using opt spinning. > > Please do. > > There is no way in hell that we should introduce a magic new > atomic_pointer thing for parisc. And the idea somebody had to change > ACCESS_ONCE() for PA-RISC (I'm not going to go back to find who to > blame) is just horribly wrong too, since it's not even necessary for > any normal use: the special "load-and-store-zero" instruction isn't > actually used for "real" data, it's used only for the special > spinlocks afaik, so doing it for all ACCESS_ONCE() users would be > wrong even on PA-RISC. For any normal data, the usual "just load the > value, making sure the compiler doesn't reload it" is perfectly fine - > even on PA-RISC. Guilty to charges as read on suggesting PA-RISC-specific ACCESS_ONCE(). ;-) Thanx, Paul > Now, if PA-RISC was a major architecture, we'd have to figure this > out. But as it is, PA-RISC is just about the shittiest RISC ever > invented (with original sparc being a strong contender), and let's > face it, nobody really uses it. It's a "fun project", but it is not > something that we should use to mess up either ACCESS_ONCE() or the > MCS locks. > > Just make PA-RISC use its own locks, not any of the new fancy ones. > > Linus > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:09 ` Linus Torvalds ` (2 preceding siblings ...) 2014-06-02 21:03 ` Paul E. McKenney @ 2014-06-06 15:06 ` Peter Zijlstra 2014-06-06 15:15 ` Paul E. McKenney 2014-06-06 15:42 ` Davidlohr Bueso 3 siblings, 2 replies; 61+ messages in thread From: Peter Zijlstra @ 2014-06-06 15:06 UTC (permalink / raw) To: Linus Torvalds Cc: Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low, mingo On Mon, Jun 02, 2014 at 10:09:35AM -0700, Linus Torvalds wrote: > On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > And I can't say I'm a particular fan of these ops either, as alternative > > I'm almost inclined to just exclude parisc from using opt spinning. > > Please do. Something like so; if the rwsem stuff lands in .15 we need more for that, it doesn't have a convenient CONFIG symbol like this. Linus will you take this from email, or should I get it through tip/locking/urgent or so? --- Subject: locking, mutex: Disable optimistic spinning for PA-RISC PA-RISC's cmpxchg is not save against normal stores and the code used for optimistic spinning is known broken because of this. Disable for now. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- kernel/Kconfig.locks | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 35536d9c0964..9c239e080c2d 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -222,7 +222,7 @@ endif config MUTEX_SPIN_ON_OWNER def_bool y - depends on SMP && !DEBUG_MUTEXES + depends on SMP && !DEBUG_MUTEXES && !PARISC config ARCH_USE_QUEUE_RWLOCK bool ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-06 15:06 ` Peter Zijlstra @ 2014-06-06 15:15 ` Paul E. McKenney 2014-06-06 15:42 ` Davidlohr Bueso 1 sibling, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-06 15:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Davidlohr Bueso, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low, mingo On Fri, Jun 06, 2014 at 05:06:07PM +0200, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 10:09:35AM -0700, Linus Torvalds wrote: > > On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > And I can't say I'm a particular fan of these ops either, as alternative > > > I'm almost inclined to just exclude parisc from using opt spinning. > > > > Please do. > > Something like so; if the rwsem stuff lands in .15 we need more for > that, it doesn't have a convenient CONFIG symbol like this. > > Linus will you take this from email, or should I get it through > tip/locking/urgent or so? > > --- > Subject: locking, mutex: Disable optimistic spinning for PA-RISC > > PA-RISC's cmpxchg is not save against normal stores and the code used > for optimistic spinning is known broken because of this. > > Disable for now. > > Reported-by: Mikulas Patocka <mpatocka@redhat.com> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > --- > kernel/Kconfig.locks | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks > index 35536d9c0964..9c239e080c2d 100644 > --- a/kernel/Kconfig.locks > +++ b/kernel/Kconfig.locks > @@ -222,7 +222,7 @@ endif > > config MUTEX_SPIN_ON_OWNER > def_bool y > - depends on SMP && !DEBUG_MUTEXES > + depends on SMP && !DEBUG_MUTEXES && !PARISC > > config ARCH_USE_QUEUE_RWLOCK > bool > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-06 15:06 ` Peter Zijlstra 2014-06-06 15:15 ` Paul E. McKenney @ 2014-06-06 15:42 ` Davidlohr Bueso 1 sibling, 0 replies; 61+ messages in thread From: Davidlohr Bueso @ 2014-06-06 15:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Mikulas Patocka, James E.J. Bottomley, Helge Deller, John David Anglin, Parisc List, Linux Kernel Mailing List, Paul McKenney, Vinod, Chegu, Waiman Long, Thomas Gleixner, Rik van Riel, Andrew Morton, Peter Anvin, Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J, Jason Low, mingo On Fri, 2014-06-06 at 17:06 +0200, Peter Zijlstra wrote: > On Mon, Jun 02, 2014 at 10:09:35AM -0700, Linus Torvalds wrote: > > On Mon, Jun 2, 2014 at 9:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > And I can't say I'm a particular fan of these ops either, as alternative > > > I'm almost inclined to just exclude parisc from using opt spinning. > > > > Please do. > > Something like so; if the rwsem stuff lands in .15 we need more for > that, it doesn't have a convenient CONFIG symbol like this. > > Linus will you take this from email, or should I get it through > tip/locking/urgent or so? > > --- > Subject: locking, mutex: Disable optimistic spinning for PA-RISC > > PA-RISC's cmpxchg is not save against normal stores and the code used > for optimistic spinning is known broken because of this. > > Disable for now. I almost hit the send button :) > > Reported-by: Mikulas Patocka <mpatocka@redhat.com> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > --- Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:00 ` [PATCH v2] introduce atomic_pointer to " Mikulas Patocka 2014-06-02 16:25 ` Peter Zijlstra @ 2014-06-02 16:50 ` Jason Low 2014-06-02 17:03 ` Paul E. McKenney 2014-06-02 17:25 ` Waiman Long 1 sibling, 2 replies; 61+ messages in thread From: Jason Low @ 2014-06-02 16:50 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton On Mon, 2014-06-02 at 12:00 -0400, Mikulas Patocka wrote: > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > so, in this case, cmpxchg or xchg isn't really atomic at all. So if the problem is using ACCESS_ONCE writes with cmpxchg and xchg at the same time, would the below change address this problem? ----- diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c index 838dc9e..8396721 100644 --- a/kernel/locking/mcs_spinlock.c +++ b/kernel/locking/mcs_spinlock.c @@ -71,7 +71,7 @@ bool osq_lock(struct optimistic_spin_queue **lock) if (likely(prev == NULL)) return true; - ACCESS_ONCE(prev->next) = node; + xchg(&prev->next, node); /* * Normally @prev is untouchable after the above store; because at that @@ -144,7 +144,7 @@ unqueue: */ ACCESS_ONCE(next->prev) = prev; - ACCESS_ONCE(prev->next) = next; + xchg(&prev->next, next); return false; } ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:50 ` Jason Low @ 2014-06-02 17:03 ` Paul E. McKenney 2014-06-02 17:25 ` Waiman Long 1 sibling, 0 replies; 61+ messages in thread From: Paul E. McKenney @ 2014-06-02 17:03 UTC (permalink / raw) To: Jason Low Cc: Mikulas Patocka, Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, chegu_vinod, Waiman.Long, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton On Mon, Jun 02, 2014 at 09:50:10AM -0700, Jason Low wrote: > On Mon, 2014-06-02 at 12:00 -0400, Mikulas Patocka wrote: > > If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at > > the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, > > so, in this case, cmpxchg or xchg isn't really atomic at all. > > So if the problem is using ACCESS_ONCE writes with cmpxchg and xchg at > the same time, would the below change address this problem? And one could use cmpxchg() or atomic_add_return(..., 0) to read a value out. Probably at the cost of some performance impact, though. Thanx, Paul > ----- > diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c > index 838dc9e..8396721 100644 > --- a/kernel/locking/mcs_spinlock.c > +++ b/kernel/locking/mcs_spinlock.c > @@ -71,7 +71,7 @@ bool osq_lock(struct optimistic_spin_queue **lock) > if (likely(prev == NULL)) > return true; > > - ACCESS_ONCE(prev->next) = node; > + xchg(&prev->next, node); > > /* > * Normally @prev is untouchable after the above store; because at that > @@ -144,7 +144,7 @@ unqueue: > */ > > ACCESS_ONCE(next->prev) = prev; > - ACCESS_ONCE(prev->next) = next; > + xchg(&prev->next, next); > > return false; > } > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 16:50 ` Jason Low 2014-06-02 17:03 ` Paul E. McKenney @ 2014-06-02 17:25 ` Waiman Long 2014-06-02 17:38 ` H. Peter Anvin 1 sibling, 1 reply; 61+ messages in thread From: Waiman Long @ 2014-06-02 17:25 UTC (permalink / raw) To: Jason Low Cc: Mikulas Patocka, Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, tglx, riel, akpm, davidlohr, hpa, andi, aswin, scott.norton On 06/02/2014 12:50 PM, Jason Low wrote: > On Mon, 2014-06-02 at 12:00 -0400, Mikulas Patocka wrote: >> If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at >> the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock, >> so, in this case, cmpxchg or xchg isn't really atomic at all. > So if the problem is using ACCESS_ONCE writes with cmpxchg and xchg at > the same time, would the below change address this problem? > > ----- > diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c > index 838dc9e..8396721 100644 > --- a/kernel/locking/mcs_spinlock.c > +++ b/kernel/locking/mcs_spinlock.c > @@ -71,7 +71,7 @@ bool osq_lock(struct optimistic_spin_queue **lock) > if (likely(prev == NULL)) > return true; > > - ACCESS_ONCE(prev->next) = node; > + xchg(&prev->next, node); > > /* > * Normally @prev is untouchable after the above store; because at that > @@ -144,7 +144,7 @@ unqueue: > */ > > ACCESS_ONCE(next->prev) = prev; > - ACCESS_ONCE(prev->next) = next; > + xchg(&prev->next, next); > > return false; > } > > Doing an xchg is a very expensive operation compared with ACCESS_ONCE. I will not suggest doing that to make it right for PA-RISC at the expense of performance in other architectures. -Longman ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] introduce atomic_pointer to fix a race condition in cancelable mcs spinlocks 2014-06-02 17:25 ` Waiman Long @ 2014-06-02 17:38 ` H. Peter Anvin 0 siblings, 0 replies; 61+ messages in thread From: H. Peter Anvin @ 2014-06-02 17:38 UTC (permalink / raw) To: Waiman Long, Jason Low Cc: Mikulas Patocka, Linus Torvalds, Peter Zijlstra, jejb, deller, John David Anglin, linux-parisc, linux-kernel, paulmck, chegu_vinod, tglx, riel, akpm, davidlohr, andi, aswin, scott.norton On 06/02/2014 10:25 AM, Waiman Long wrote: > > Doing an xchg is a very expensive operation compared with ACCESS_ONCE. I > will not suggest doing that to make it right for PA-RISC at the expense > of performance in other architectures. > And of course, this gets into the toxic question: what are reasonable minimum requirements for Linux? How far do we need to stretch to support niche architectures which have very small (Linux) userbases? -hpa ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2014-06-06 15:42 UTC | newest] Thread overview: 61+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-01 17:53 [PATCH] fix a race condition in cancelable mcs spinlocks Mikulas Patocka 2014-06-01 19:20 ` Peter Zijlstra 2014-06-01 20:46 ` John David Anglin 2014-06-01 21:30 ` Peter Zijlstra 2014-06-01 21:46 ` Paul E. McKenney 2014-06-02 9:19 ` Mikulas Patocka 2014-06-02 13:24 ` Paul E. McKenney 2014-06-02 15:57 ` Mikulas Patocka 2014-06-02 16:39 ` Paul E. McKenney 2014-06-02 19:56 ` James Bottomley 2014-06-03 7:56 ` Peter Zijlstra 2014-06-04 12:53 ` Mikulas Patocka 2014-06-02 13:58 ` Mikulas Patocka 2014-06-02 14:02 ` Mikulas Patocka 2014-06-02 15:39 ` John David Anglin 2014-06-02 10:34 ` Mikulas Patocka 2014-06-02 14:14 ` Waiman Long 2014-06-02 15:27 ` Jason Low 2014-06-02 16:00 ` [PATCH v2] introduce atomic_pointer to " Mikulas Patocka 2014-06-02 16:25 ` Peter Zijlstra 2014-06-02 16:30 ` Peter Zijlstra 2014-06-02 16:46 ` Paul E. McKenney 2014-06-02 17:33 ` Waiman Long 2014-06-02 20:05 ` Peter Zijlstra 2014-06-02 20:22 ` Linus Torvalds 2014-06-02 21:02 ` Paul E. McKenney 2014-06-02 21:12 ` Linus Torvalds 2014-06-02 22:08 ` Paul E. McKenney 2014-06-02 22:44 ` Eric Dumazet 2014-06-02 23:17 ` Paul E. McKenney 2014-06-02 23:53 ` Eric Dumazet 2014-06-03 0:28 ` Paul E. McKenney 2014-06-02 22:55 ` Linus Torvalds 2014-06-03 16:48 ` Paul E. McKenney 2014-06-03 7:54 ` Peter Zijlstra 2014-06-02 20:24 ` James Bottomley 2014-06-02 16:43 ` Paul E. McKenney 2014-06-02 17:14 ` James Bottomley 2014-06-02 17:29 ` Waiman Long 2014-06-02 17:09 ` Linus Torvalds 2014-06-02 17:12 ` Davidlohr Bueso 2014-06-02 17:42 ` Waiman Long 2014-06-02 20:46 ` Mikulas Patocka 2014-06-02 20:53 ` Linus Torvalds 2014-06-02 21:12 ` Mikulas Patocka 2014-06-03 7:36 ` Peter Zijlstra 2014-06-03 11:14 ` Mikulas Patocka 2014-06-03 13:24 ` Peter Zijlstra 2014-06-03 14:18 ` Mikulas Patocka 2014-06-03 14:07 ` Paul E. McKenney 2014-06-03 15:09 ` Peter Zijlstra 2014-06-03 15:56 ` Paul E. McKenney 2014-06-03 7:20 ` Peter Zijlstra 2014-06-02 21:03 ` Paul E. McKenney 2014-06-06 15:06 ` Peter Zijlstra 2014-06-06 15:15 ` Paul E. McKenney 2014-06-06 15:42 ` Davidlohr Bueso 2014-06-02 16:50 ` Jason Low 2014-06-02 17:03 ` Paul E. McKenney 2014-06-02 17:25 ` Waiman Long 2014-06-02 17:38 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox