* [PATCH net-next] af_unix: fix a fatal race with bit fields @ 2013-05-01 1:12 Eric Dumazet 2013-05-01 1:39 ` Benjamin Herrenschmidt 2013-05-01 1:51 ` Anton Blanchard 0 siblings, 2 replies; 23+ messages in thread From: Eric Dumazet @ 2013-05-01 1:12 UTC (permalink / raw) To: David Miller Cc: netdev, Benjamin Herrenschmidt, Paul Mackerras, Ambrose Feinstein, linuxppc-dev From: Eric Dumazet <edumazet@google.com> Using bit fields is dangerous on ppc64, as the compiler uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> --- Could ppc64 experts confirm using byte is safe, or should we really add a 32bit hole after the spinlock ? If so, I wonder how many other places need a change... include/net/af_unix.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index a8836e8..4520a23f 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -57,8 +57,8 @@ struct unix_sock { struct list_head link; atomic_long_t inflight; spinlock_t lock; - unsigned int gc_candidate : 1; - unsigned int gc_maybe_cycle : 1; + unsigned char gc_candidate; + unsigned char gc_maybe_cycle; unsigned char recursion_level; struct socket_wq peer_wq; }; ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet @ 2013-05-01 1:39 ` Benjamin Herrenschmidt 2013-05-01 7:36 ` David Miller ` (2 more replies) 2013-05-01 1:51 ` Anton Blanchard 1 sibling, 3 replies; 23+ messages in thread From: Benjamin Herrenschmidt @ 2013-05-01 1:39 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > instructions to manipulate them. If the 64bit word includes any > atomic_t or spinlock_t, we can lose critical concurrent changes. > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > gc_maybe_cycle/lock share the same 64bit word. > > This leads to fatal deadlock, as one/several cpus spin forever > on a spinlock that will never be available again. > > Reported-by: Ambrose Feinstein <ambrose@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > --- > > Could ppc64 experts confirm using byte is safe, or should we really add > a 32bit hole after the spinlock ? If so, I wonder how many other places > need a change... Wow, nice one ! I'm not even completely certain bytes are safe to be honest, though probably more than bitfields. I'll poke our compiler people. The worry is of course how many more of these do we potentially have ? We might be able to automate finding these issues with sparse, I suppose. Also I'd be surprised if ppc64 is the only one with that problem... what about sparc64 and arm64 ? Cheers, Ben. > include/net/af_unix.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index a8836e8..4520a23f 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -57,8 +57,8 @@ struct unix_sock { > struct list_head link; > atomic_long_t inflight; > spinlock_t lock; > - unsigned int gc_candidate : 1; > - unsigned int gc_maybe_cycle : 1; > + unsigned char gc_candidate; > + unsigned char gc_maybe_cycle; > unsigned char recursion_level; > struct socket_wq peer_wq; > }; > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 1:39 ` Benjamin Herrenschmidt @ 2013-05-01 7:36 ` David Miller 2013-05-01 8:08 ` Benjamin Herrenschmidt 2013-05-01 15:24 ` [PATCH v2 " Eric Dumazet 2013-05-01 12:08 ` [PATCH " Ben Hutchings 2013-05-03 14:29 ` David Laight 2 siblings, 2 replies; 23+ messages in thread From: David Miller @ 2013-05-01 7:36 UTC (permalink / raw) To: benh; +Cc: eric.dumazet, netdev, paulus, ambrose, linuxppc-dev From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Wed, 01 May 2013 11:39:53 +1000 > I'm not even completely certain bytes are safe to be honest, though > probably more than bitfields. I'll poke our compiler people. Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized accesses are not atomic, and therefore use racey read-modify-write sequences. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 7:36 ` David Miller @ 2013-05-01 8:08 ` Benjamin Herrenschmidt 2013-05-01 15:24 ` [PATCH v2 " Eric Dumazet 1 sibling, 0 replies; 23+ messages in thread From: Benjamin Herrenschmidt @ 2013-05-01 8:08 UTC (permalink / raw) To: David Miller; +Cc: netdev, linuxppc-dev, paulus, ambrose, eric.dumazet On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Wed, 01 May 2013 11:39:53 +1000 > > > I'm not even completely certain bytes are safe to be honest, though > > probably more than bitfields. I'll poke our compiler people. > > Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized > accesses are not atomic, and therefore use racey read-modify-write > sequences. In this case it depends whether the compiler will "chose" the smaller (32-bit) size which hopefully won't overlap with the atomic/lock provided the latter is aligned... lots of if's here, makes me nervous... At least the bytes seem to fix it for ppc64 so far... It would make feel generally better if we could get gcc to guarantee us to always use the smallest access size that encompass the whole bitfield (or at least not go larger than int when the bitfield is defined as unsigned int). This would take care of all the cases we haven't spotted yet (hopefully). For all intend and purposes those two fields are bits of an unsigned int, why the heck would the compiler use a larger access size anyway ? I seem to recall that we have other places where such an assumption is made that ints are accessed atomically, and Linus stating in the past that a compiler doing anything else was not worth bothering with. I don't see why bitfields of such int would be an exception to that rule (though again, this is probably not a rule stated in the standard ... oh well). /me goes have a glass of wine and not think about this until tomorrow. Cheers, Ben. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 net-next] af_unix: fix a fatal race with bit fields 2013-05-01 7:36 ` David Miller 2013-05-01 8:08 ` Benjamin Herrenschmidt @ 2013-05-01 15:24 ` Eric Dumazet 2013-05-01 15:53 ` David Laight 2013-05-01 19:14 ` David Miller 1 sibling, 2 replies; 23+ messages in thread From: Eric Dumazet @ 2013-05-01 15:24 UTC (permalink / raw) To: David Miller; +Cc: benh, netdev, paulus, ambrose, linuxppc-dev On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Wed, 01 May 2013 11:39:53 +1000 > > > I'm not even completely certain bytes are safe to be honest, though > > probably more than bitfields. I'll poke our compiler people. > > Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized > accesses are not atomic, and therefore use racey read-modify-write > sequences. Right, so what about the following more general fix ? Thanks ! [PATCH v2] af_unix: fix a fatal race with bit fields Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> --- include/net/af_unix.h | 5 +++-- net/unix/garbage.c | 12 ++++++------ 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index a8836e8..dbdfd2b 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -57,9 +57,10 @@ struct unix_sock { struct list_head link; atomic_long_t inflight; spinlock_t lock; - unsigned int gc_candidate : 1; - unsigned int gc_maybe_cycle : 1; unsigned char recursion_level; + unsigned long gc_flags; +#define UNIX_GC_CANDIDATE 0 +#define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; }; #define unix_sk(__sk) ((struct unix_sock *)__sk) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d0f6545..9c6cc08 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), * have been added to the queues after * starting the garbage collection */ - if (u->gc_candidate) { + if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { hit = true; func(u); } @@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u) * of the list, so that it's checked even if it was already * passed over */ - if (u->gc_maybe_cycle) + if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags)) list_move_tail(&u->link, &gc_candidates); } @@ -315,8 +315,8 @@ void unix_gc(void) BUG_ON(total_refs < inflight_refs); if (total_refs == inflight_refs) { list_move_tail(&u->link, &gc_candidates); - u->gc_candidate = 1; - u->gc_maybe_cycle = 1; + __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); + __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); } } @@ -344,7 +344,7 @@ void unix_gc(void) if (atomic_long_read(&u->inflight) > 0) { list_move_tail(&u->link, ¬_cycle_list); - u->gc_maybe_cycle = 0; + __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); scan_children(&u->sk, inc_inflight_move_tail, NULL); } } @@ -356,7 +356,7 @@ void unix_gc(void) */ while (!list_empty(¬_cycle_list)) { u = list_entry(not_cycle_list.next, struct unix_sock, link); - u->gc_candidate = 0; + __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags); list_move_tail(&u->link, &gc_inflight_list); } ^ permalink raw reply related [flat|nested] 23+ messages in thread
* RE: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields 2013-05-01 15:24 ` [PATCH v2 " Eric Dumazet @ 2013-05-01 15:53 ` David Laight 2013-05-01 16:00 ` Eric Dumazet 2013-05-01 19:14 ` David Miller 1 sibling, 1 reply; 23+ messages in thread From: David Laight @ 2013-05-01 15:53 UTC (permalink / raw) To: Eric Dumazet, David Miller; +Cc: benh, netdev, paulus, ambrose, linuxppc-dev > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index a8836e8..dbdfd2b 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -57,9 +57,10 @@ struct unix_sock { > struct list_head link; > atomic_long_t inflight; > spinlock_t lock; > - unsigned int gc_candidate : 1; > - unsigned int gc_maybe_cycle : 1; > unsigned char recursion_level; > + unsigned long gc_flags; > +#define UNIX_GC_CANDIDATE 0 > +#define UNIX_GC_MAYBE_CYCLE 1 > struct socket_wq peer_wq; > }; Why not just change gc_candidate and gc_maybe_cycle to unsigned char? It might reduce the number of pad bytes somewhat. David ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields 2013-05-01 15:53 ` David Laight @ 2013-05-01 16:00 ` Eric Dumazet 0 siblings, 0 replies; 23+ messages in thread From: Eric Dumazet @ 2013-05-01 16:00 UTC (permalink / raw) To: David Laight; +Cc: David Miller, benh, netdev, paulus, ambrose, linuxppc-dev On Wed, 2013-05-01 at 16:53 +0100, David Laight wrote: > Why not just change gc_candidate and gc_maybe_cycle to > unsigned char? > It might reduce the number of pad bytes somewhat. You didn't quite follow the discussion. I used bytes on V1, and we are not 100% sure its safe on all arches. unsigned long is guaranteed to be safe. We absolutely rely on this. Better use more bytes on a socket (with no impact on real memory use), than spending hours to debug some strange issues. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields 2013-05-01 15:24 ` [PATCH v2 " Eric Dumazet 2013-05-01 15:53 ` David Laight @ 2013-05-01 19:14 ` David Miller 1 sibling, 0 replies; 23+ messages in thread From: David Miller @ 2013-05-01 19:14 UTC (permalink / raw) To: eric.dumazet; +Cc: benh, netdev, paulus, ambrose, linuxppc-dev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 01 May 2013 08:24:03 -0700 > [PATCH v2] af_unix: fix a fatal race with bit fields > > Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] > uses 64bit instructions to manipulate them. > If the 64bit word includes any atomic_t or spinlock_t, we can lose > critical concurrent changes. > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > gc_maybe_cycle/lock share the same 64bit word. > > This leads to fatal deadlock, as one/several cpus spin forever > on a spinlock that will never be available again. > > A safer way would be to use a long to store flags. > This way we are sure compiler/arch wont do bad things. > > As we own unix_gc_lock spinlock when clearing or setting bits, > we can use the non atomic __set_bit()/__clear_bit(). > > recursion_level can share the same 64bit location with the spinlock, > as it is set only with this spinlock held. > > [1] bug fixed in gcc-4.8.0 : > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > > Reported-by: Ambrose Feinstein <ambrose@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied and queued up for -stable, thanks Eric. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 1:39 ` Benjamin Herrenschmidt 2013-05-01 7:36 ` David Miller @ 2013-05-01 12:08 ` Ben Hutchings 2013-05-03 14:29 ` David Laight 2 siblings, 0 replies; 23+ messages in thread From: Ben Hutchings @ 2013-05-01 12:08 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Eric Dumazet, David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev On Wed, 2013-05-01 at 11:39 +1000, Benjamin Herrenschmidt wrote: > On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > > instructions to manipulate them. If the 64bit word includes any > > atomic_t or spinlock_t, we can lose critical concurrent changes. > > > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > > gc_maybe_cycle/lock share the same 64bit word. > > > > This leads to fatal deadlock, as one/several cpus spin forever > > on a spinlock that will never be available again. > > > > Reported-by: Ambrose Feinstein <ambrose@google.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Cc: Paul Mackerras <paulus@samba.org> > > --- > > > > Could ppc64 experts confirm using byte is safe, or should we really add > > a 32bit hole after the spinlock ? If so, I wonder how many other places > > need a change... > > Wow, nice one ! > > I'm not even completely certain bytes are safe to be honest, though > probably more than bitfields. I'll poke our compiler people. There is a longstanding and hard-to-fix bug in gcc that is specific to bitfields. I think that the underlying type isn't propagated, so when it comes to code generation the compiler doesn't know the natural width for the memory access. As for bytes - early Alphas couldn't load/store less than 32 bits, but I doubt anyone cares any more. > The worry is of course how many more of these do we potentially have ? > We might be able to automate finding these issues with sparse, I > suppose. > > Also I'd be surprised if ppc64 is the only one with that problem... what > about sparc64 and arm64 ? I expect they can have the same general problem, but gcc may be more or less keen to generate 64-bit load/store instructions for bitfields on different architectures. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 1:39 ` Benjamin Herrenschmidt 2013-05-01 7:36 ` David Miller 2013-05-01 12:08 ` [PATCH " Ben Hutchings @ 2013-05-03 14:29 ` David Laight 2013-05-03 15:02 ` Eric Dumazet 2 siblings, 1 reply; 23+ messages in thread From: David Laight @ 2013-05-03 14:29 UTC (permalink / raw) To: Benjamin Herrenschmidt, Eric Dumazet Cc: David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev > > Could ppc64 experts confirm using byte is safe, or should we really add > > a 32bit hole after the spinlock ? If so, I wonder how many other places > > need a change... ... > Also I'd be surprised if ppc64 is the only one with that problem... what > about sparc64 and arm64 ? Even x86 could be affected. The width of the memory cycles used by the 'bit set and bit clear' instructions isn't documented. They are certainly allowed to do RMW on adjacent bytes. I don't remember whether they are constrained to only do 32bit accesses, but nothing used to say that they wouldn't do 32bit misaligned ones! (although I suspect they never have). David ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-03 14:29 ` David Laight @ 2013-05-03 15:02 ` Eric Dumazet 2013-05-03 15:44 ` David Laight 0 siblings, 1 reply; 23+ messages in thread From: Eric Dumazet @ 2013-05-03 15:02 UTC (permalink / raw) To: David Laight Cc: Benjamin Herrenschmidt, David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev On Fri, 2013-05-03 at 15:29 +0100, David Laight wrote: > > > Could ppc64 experts confirm using byte is safe, or should we really add > > > a 32bit hole after the spinlock ? If so, I wonder how many other places > > > need a change... > ... > > Also I'd be surprised if ppc64 is the only one with that problem... what > > about sparc64 and arm64 ? > > Even x86 could be affected. > The width of the memory cycles used by the 'bit set and bit clear' > instructions isn't documented. They are certainly allowed to do > RMW on adjacent bytes. > I don't remember whether they are constrained to only do > 32bit accesses, but nothing used to say that they wouldn't > do 32bit misaligned ones! (although I suspect they never have). x86 is not affected (or else we would have found the bug much earlier) Setting 1-bit field to one/zero uses OR/AND instructions. orb $4,724(%reg) doesn't load/store 64bits but 8bits. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-03 15:02 ` Eric Dumazet @ 2013-05-03 15:44 ` David Laight 0 siblings, 0 replies; 23+ messages in thread From: David Laight @ 2013-05-03 15:44 UTC (permalink / raw) To: Eric Dumazet Cc: Benjamin Herrenschmidt, David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev > > > Also I'd be surprised if ppc64 is the only one with that problem... what > > > about sparc64 and arm64 ? > > > > Even x86 could be affected. > > The width of the memory cycles used by the 'bit set and bit clear' > > instructions isn't documented. They are certainly allowed to do > > RMW on adjacent bytes. > > I don't remember whether they are constrained to only do > > 32bit accesses, but nothing used to say that they wouldn't > > do 32bit misaligned ones! (although I suspect they never have). > > x86 is not affected (or else we would have found the bug much earlier) > > Setting 1-bit field to one/zero uses OR/AND instructions. > > orb $4,724(%reg) > > doesn't load/store 64bits but 8bits. I was thinking of code that might be using BT, BTC, BTR or BTS. These are probably used with the 'lock' prefix - which would (I think) make them safe. The documented constraint is more specific than it used to be the Intel version reads: When accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32-bit operand size, using by the following relationship: Effective Address + (4 ∗ (BitOffset DIV 32)) Or, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship: Effective Address + (2 ∗ (BitOffset DIV 16)) It may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In particular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instructions to load from or store to these addresses, and use the register form of these instructions to manipulate the data. In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit operands. David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet 2013-05-01 1:39 ` Benjamin Herrenschmidt @ 2013-05-01 1:51 ` Anton Blanchard 2013-05-01 2:24 ` Eric Dumazet 1 sibling, 1 reply; 23+ messages in thread From: Anton Blanchard @ 2013-05-01 1:51 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein, amodra Hi Eric, > From: Eric Dumazet <edumazet@google.com> > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > instructions to manipulate them. If the 64bit word includes any > atomic_t or spinlock_t, we can lose critical concurrent changes. > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > gc_maybe_cycle/lock share the same 64bit word. > > This leads to fatal deadlock, as one/several cpus spin forever > on a spinlock that will never be available again. I just spoke to Alan Modra and he suspects this is a compiler bug. Can you give us your compiler version info? Anton > Reported-by: Ambrose Feinstein <ambrose@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > --- > > Could ppc64 experts confirm using byte is safe, or should we really > add a 32bit hole after the spinlock ? If so, I wonder how many other > places need a change... > > include/net/af_unix.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index a8836e8..4520a23f 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -57,8 +57,8 @@ struct unix_sock { > struct list_head link; > atomic_long_t inflight; > spinlock_t lock; > - unsigned int gc_candidate : 1; > - unsigned int gc_maybe_cycle : 1; > + unsigned char gc_candidate; > + unsigned char gc_maybe_cycle; > unsigned char recursion_level; > struct socket_wq peer_wq; > }; > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 1:51 ` Anton Blanchard @ 2013-05-01 2:24 ` Eric Dumazet 2013-05-01 3:54 ` Alan Modra 0 siblings, 1 reply; 23+ messages in thread From: Eric Dumazet @ 2013-05-01 2:24 UTC (permalink / raw) To: Anton Blanchard Cc: David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein, amodra On Wed, 2013-05-01 at 11:51 +1000, Anton Blanchard wrote: > Hi Eric, > > > From: Eric Dumazet <edumazet@google.com> > > > > Using bit fields is dangerous on ppc64, as the compiler uses 64bit > > instructions to manipulate them. If the 64bit word includes any > > atomic_t or spinlock_t, we can lose critical concurrent changes. > > > > This is happening in af_unix, where unix_sk(sk)->gc_candidate/ > > gc_maybe_cycle/lock share the same 64bit word. > > > > This leads to fatal deadlock, as one/several cpus spin forever > > on a spinlock that will never be available again. > > I just spoke to Alan Modra and he suspects this is a compiler > bug. Can you give us your compiler version info? $ gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -v Using built-in specs. COLLECT_GCC=gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc COLLECT_LTO_WRAPPER=/usr/local/google/home/edumazet/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/../libexec/gcc/powerpc64-linux/4.6.3/lto-wrapper Target: powerpc64-linux Configured with: /home/tony/buildall/src/gcc/configure --target=powerpc64-linux --host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-targets=all --prefix=/opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/ --enable-languages=c --with-newlib --without-headers --enable-sjlj-exceptions --with-system-libunwind --disable-nls --disable-threads --disable-shared --disable-libmudflap --disable-libssp --disable-libgomp --disable-decimal-float --enable-checking=release --with-mpfr=/home/tony/buildall/src/sys-x86_64 --with-gmp=/home/tony/buildall/src/sys-x86_64 --disable-bootstrap --disable-libquadmath Thread model: single gcc version 4.6.3 (GCC) $ cat try.c ; gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -O2 -S try.c ; cat try.s struct s { unsigned int lock; unsigned int f1 : 1; unsigned int f2 : 1; void *ptr; } *p ; showbug() { p->lock++; p->f1 = 1; } .file "try.c" .section ".toc","aw" .section ".text" .section ".toc","aw" .LC0: .tc p[TC],p .section ".text" .align 2 .globl showbug .section ".opd","aw" .align 3 showbug: .quad .L.showbug,.TOC.@tocbase,0 .previous .type showbug, @function .L.showbug: addis 9,2,.LC0@toc@ha ld 9,.LC0@toc@l(9) ld 9,0(9) lwz 11,0(9) addi 0,11,1 stw 0,0(9) li 11,1 ld 0,0(9) rldimi 0,11,31,32 std 0,0(9) blr .long 0 .byte 0,0,0,0,0,0,0,0 .size showbug,.-.L.showbug .comm p,8,8 .ident "GCC: (GNU) 4.6.3" You can see "ld 0,0(9)" is used : its a 64 bit load. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 2:24 ` Eric Dumazet @ 2013-05-01 3:54 ` Alan Modra 2013-05-01 5:04 ` Eric Dumazet 2013-05-02 17:02 ` Scott Wood 0 siblings, 2 replies; 23+ messages in thread From: Alan Modra @ 2013-05-01 3:54 UTC (permalink / raw) To: Eric Dumazet Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > li 11,1 > ld 0,0(9) > rldimi 0,11,31,32 > std 0,0(9) > blr > .ident "GCC: (GNU) 4.6.3" > > You can see "ld 0,0(9)" is used : its a 64 bit load. Yup. This is not a powerpc64 specific problem. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Fixed in 4.8.0 and 4.7.3. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 3:54 ` Alan Modra @ 2013-05-01 5:04 ` Eric Dumazet 2013-05-01 15:10 ` Stephen Hemminger 2013-05-03 1:31 ` Alan Modra 2013-05-02 17:02 ` Scott Wood 1 sibling, 2 replies; 23+ messages in thread From: Eric Dumazet @ 2013-05-01 5:04 UTC (permalink / raw) To: Alan Modra Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote: > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > > li 11,1 > > ld 0,0(9) > > rldimi 0,11,31,32 > > std 0,0(9) > > blr > > .ident "GCC: (GNU) 4.6.3" > > > > You can see "ld 0,0(9)" is used : its a 64 bit load. > > Yup. This is not a powerpc64 specific problem. See > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > Fixed in 4.8.0 and 4.7.3. Ah thanks. This seems a pretty serious issue, is it documented somewhere that ppc64, sparc64 and others need such compiler version ? These kind of errors are pretty hard to find, its a pity to spend time on them. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 5:04 ` Eric Dumazet @ 2013-05-01 15:10 ` Stephen Hemminger 2013-05-02 21:11 ` Benjamin Herrenschmidt 2013-05-03 1:31 ` Alan Modra 1 sibling, 1 reply; 23+ messages in thread From: Stephen Hemminger @ 2013-05-01 15:10 UTC (permalink / raw) To: Eric Dumazet Cc: Alan Modra, Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein On Tue, 30 Apr 2013 22:04:32 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote: > > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > > > li 11,1 > > > ld 0,0(9) > > > rldimi 0,11,31,32 > > > std 0,0(9) > > > blr > > > .ident "GCC: (GNU) 4.6.3" > > > > > > You can see "ld 0,0(9)" is used : its a 64 bit load. > > > > Yup. This is not a powerpc64 specific problem. See > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > > Fixed in 4.8.0 and 4.7.3. > > Ah thanks. > > This seems a pretty serious issue, is it documented somewhere that > ppc64, sparc64 and others need such compiler version ? > > These kind of errors are pretty hard to find, its a pity to spend time > on them. There is a checkbin target inside arch/powerpc/Makefile Shouldn't a check be added there to block building kernel with known bad GCC versions? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 15:10 ` Stephen Hemminger @ 2013-05-02 21:11 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 23+ messages in thread From: Benjamin Herrenschmidt @ 2013-05-02 21:11 UTC (permalink / raw) To: Stephen Hemminger Cc: Eric Dumazet, Alan Modra, Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein On Wed, 2013-05-01 at 08:10 -0700, Stephen Hemminger wrote: > > These kind of errors are pretty hard to find, its a pity to spend > time > > on them. > > There is a checkbin target inside arch/powerpc/Makefile > Shouldn't a check be added there to block building kernel with known > bad GCC versions? In this case that makes it all GCC versions except the *very latest* .... not practical. I suppose we should try to make sure that at least the next batch of enterprise distro get that fix on gcc side. Ben. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 5:04 ` Eric Dumazet 2013-05-01 15:10 ` Stephen Hemminger @ 2013-05-03 1:31 ` Alan Modra 2013-05-03 8:20 ` David Laight ` (2 more replies) 1 sibling, 3 replies; 23+ messages in thread From: Alan Modra @ 2013-05-03 1:31 UTC (permalink / raw) To: Eric Dumazet Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote: > These kind of errors are pretty hard to find, its a pity to spend time > on them. Well, yes. From the first comment in gcc PR52080. "For the following testcase we generate a 8 byte RMW cycle on IA64 which causes locking problems in the linux kernel btrfs filesystem." Did someone fix btrfs, but not check other kernel locks? Having now hit the same problem again, have you checked that other kernel locks don't have adjacent bit fields in the same 64-bit word? And comment the struct to ensure someone doesn't optimize those unsigned chars back to bit fields. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-03 1:31 ` Alan Modra @ 2013-05-03 8:20 ` David Laight 2013-05-03 12:57 ` Benjamin Herrenschmidt 2013-05-03 14:14 ` Eric Dumazet 2 siblings, 0 replies; 23+ messages in thread From: David Laight @ 2013-05-03 8:20 UTC (permalink / raw) To: Alan Modra, Eric Dumazet Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein > Did someone fix btrfs, but not check other kernel locks? Having now > hit the same problem again, have you checked that other kernel locks > don't have adjacent bit fields in the same 64-bit word? And comment > the struct to ensure someone doesn't optimize those unsigned chars > back to bit fields. Seems a good reason to have a general policy of not using bit fields! Separate char fields normally generate faster code - possibly at the expense of an increase in the allocated size of a structure. David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-03 1:31 ` Alan Modra 2013-05-03 8:20 ` David Laight @ 2013-05-03 12:57 ` Benjamin Herrenschmidt 2013-05-03 14:14 ` Eric Dumazet 2 siblings, 0 replies; 23+ messages in thread From: Benjamin Herrenschmidt @ 2013-05-03 12:57 UTC (permalink / raw) To: Alan Modra Cc: Eric Dumazet, netdev, Ambrose Feinstein, Paul Mackerras, Anton Blanchard, linuxppc-dev, David Miller On Fri, 2013-05-03 at 11:01 +0930, Alan Modra wrote: > On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote: > > These kind of errors are pretty hard to find, its a pity to spend time > > on them. > > Well, yes. From the first comment in gcc PR52080. "For the following > testcase we generate a 8 byte RMW cycle on IA64 which causes locking > problems in the linux kernel btrfs filesystem." > > Did someone fix btrfs, but not check other kernel locks? Having now > hit the same problem again, have you checked that other kernel locks > don't have adjacent bit fields in the same 64-bit word? And comment > the struct to ensure someone doesn't optimize those unsigned chars > back to bit fields. Unfortunately, fixing "other" kernel locks is near impossible. One could try to grep for all spinlock_t and maybe even all atomic_t, may even write a script to spot automatically if a bitfield appears to be around (though it could be hidden behind a structure etc...) but what about an int accessed with cmxchg (a kernel macro doing a lwarx/stwcx. loop on a value) for example ? There's plenty of these... I don't think we can realistically "fix" all potential occurrences of that bug in the kernel short of geting rid of all bitfields, which isn't going to happen any time soon. I'm afraid this *must* be fixed at the compiler level, with as backports much as can realistically be done back to distros. Ben. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-03 1:31 ` Alan Modra 2013-05-03 8:20 ` David Laight 2013-05-03 12:57 ` Benjamin Herrenschmidt @ 2013-05-03 14:14 ` Eric Dumazet 2 siblings, 0 replies; 23+ messages in thread From: Eric Dumazet @ 2013-05-03 14:14 UTC (permalink / raw) To: Alan Modra Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein On Fri, 2013-05-03 at 11:01 +0930, Alan Modra wrote: > On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote: > > These kind of errors are pretty hard to find, its a pity to spend time > > on them. > > Well, yes. From the first comment in gcc PR52080. "For the following > testcase we generate a 8 byte RMW cycle on IA64 which causes locking > problems in the linux kernel btrfs filesystem." > > Did someone fix btrfs, but not check other kernel locks? Having now > hit the same problem again, have you checked that other kernel locks > don't have adjacent bit fields in the same 64-bit word? And comment > the struct to ensure someone doesn't optimize those unsigned chars > back to bit fields. Not only spinlock, but atomic_t followed by bit fields. BTW, if a spinlock is followed by bit fields, but bit fields only changed when this spinlock is held, there is no problem, unless spinlock is a ticket spinlock. In af_unix, bug happens because the bit fields were changed without spinlock being held (another global spinlock is used instead) (ppc64 doesnt use ticket spinlocks yet) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields 2013-05-01 3:54 ` Alan Modra 2013-05-01 5:04 ` Eric Dumazet @ 2013-05-02 17:02 ` Scott Wood 1 sibling, 0 replies; 23+ messages in thread From: Scott Wood @ 2013-05-02 17:02 UTC (permalink / raw) To: Alan Modra Cc: Eric Dumazet, netdev, Ambrose Feinstein, Paul Mackerras, Anton Blanchard, linuxppc-dev, David Miller On 04/30/2013 10:54:25 PM, Alan Modra wrote: > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: > > li 11,1 > > ld 0,0(9) > > rldimi 0,11,31,32 > > std 0,0(9) > > blr > > .ident "GCC: (GNU) 4.6.3" > > > > You can see "ld 0,0(9)" is used : its a 64 bit load. > > Yup. This is not a powerpc64 specific problem. See > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 > Fixed in 4.8.0 and 4.7.3. FWIW (especially if a GCC version check is added), it seems to have been fixed as far back as 4.7.1, not just 4.7.3. -Scott ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-05-03 15:45 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-01 1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet 2013-05-01 1:39 ` Benjamin Herrenschmidt 2013-05-01 7:36 ` David Miller 2013-05-01 8:08 ` Benjamin Herrenschmidt 2013-05-01 15:24 ` [PATCH v2 " Eric Dumazet 2013-05-01 15:53 ` David Laight 2013-05-01 16:00 ` Eric Dumazet 2013-05-01 19:14 ` David Miller 2013-05-01 12:08 ` [PATCH " Ben Hutchings 2013-05-03 14:29 ` David Laight 2013-05-03 15:02 ` Eric Dumazet 2013-05-03 15:44 ` David Laight 2013-05-01 1:51 ` Anton Blanchard 2013-05-01 2:24 ` Eric Dumazet 2013-05-01 3:54 ` Alan Modra 2013-05-01 5:04 ` Eric Dumazet 2013-05-01 15:10 ` Stephen Hemminger 2013-05-02 21:11 ` Benjamin Herrenschmidt 2013-05-03 1:31 ` Alan Modra 2013-05-03 8:20 ` David Laight 2013-05-03 12:57 ` Benjamin Herrenschmidt 2013-05-03 14:14 ` Eric Dumazet 2013-05-02 17:02 ` Scott Wood
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).