From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org,
ambrose@google.com, netdev@vger.kernel.org
Subject: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
Date: Wed, 01 May 2013 08:24:03 -0700 [thread overview]
Message-ID: <1367421843.11020.43.camel@edumazet-glaptop> (raw)
In-Reply-To: <20130501.033650.703182794549888825.davem@davemloft.net>
On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
>
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
>
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.
Right, so what about the following more general fix ?
Thanks !
[PATCH v2] af_unix: fix a fatal race with bit fields
Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.
This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.
This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.
A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.
As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().
recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.
[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
include/net/af_unix.h | 5 +++--
net/unix/garbage.c | 12 ++++++------
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a8836e8..dbdfd2b 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -57,9 +57,10 @@ struct unix_sock {
struct list_head link;
atomic_long_t inflight;
spinlock_t lock;
- unsigned int gc_candidate : 1;
- unsigned int gc_maybe_cycle : 1;
unsigned char recursion_level;
+ unsigned long gc_flags;
+#define UNIX_GC_CANDIDATE 0
+#define UNIX_GC_MAYBE_CYCLE 1
struct socket_wq peer_wq;
};
#define unix_sk(__sk) ((struct unix_sock *)__sk)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index d0f6545..9c6cc08 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
* have been added to the queues after
* starting the garbage collection
*/
- if (u->gc_candidate) {
+ if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
hit = true;
func(u);
}
@@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
* of the list, so that it's checked even if it was already
* passed over
*/
- if (u->gc_maybe_cycle)
+ if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
list_move_tail(&u->link, &gc_candidates);
}
@@ -315,8 +315,8 @@ void unix_gc(void)
BUG_ON(total_refs < inflight_refs);
if (total_refs == inflight_refs) {
list_move_tail(&u->link, &gc_candidates);
- u->gc_candidate = 1;
- u->gc_maybe_cycle = 1;
+ __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
+ __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
}
}
@@ -344,7 +344,7 @@ void unix_gc(void)
if (atomic_long_read(&u->inflight) > 0) {
list_move_tail(&u->link, ¬_cycle_list);
- u->gc_maybe_cycle = 0;
+ __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
scan_children(&u->sk, inc_inflight_move_tail, NULL);
}
}
@@ -356,7 +356,7 @@ void unix_gc(void)
*/
while (!list_empty(¬_cycle_list)) {
u = list_entry(not_cycle_list.next, struct unix_sock, link);
- u->gc_candidate = 0;
+ __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
list_move_tail(&u->link, &gc_inflight_list);
}
next prev parent reply other threads:[~2013-05-01 15:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-01 1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet
2013-05-01 1:39 ` Benjamin Herrenschmidt
2013-05-01 7:36 ` David Miller
2013-05-01 8:08 ` Benjamin Herrenschmidt
2013-05-01 15:24 ` Eric Dumazet [this message]
2013-05-01 15:53 ` [PATCH v2 " David Laight
2013-05-01 16:00 ` Eric Dumazet
2013-05-01 19:14 ` David Miller
2013-05-01 12:08 ` [PATCH " Ben Hutchings
2013-05-03 14:29 ` David Laight
2013-05-03 15:02 ` Eric Dumazet
2013-05-03 15:44 ` David Laight
2013-05-01 1:51 ` Anton Blanchard
2013-05-01 2:24 ` Eric Dumazet
2013-05-01 3:54 ` Alan Modra
2013-05-01 5:04 ` Eric Dumazet
2013-05-01 15:10 ` Stephen Hemminger
2013-05-02 21:11 ` Benjamin Herrenschmidt
2013-05-03 1:31 ` Alan Modra
2013-05-03 8:20 ` David Laight
2013-05-03 12:57 ` Benjamin Herrenschmidt
2013-05-03 14:14 ` Eric Dumazet
2013-05-02 17:02 ` Scott Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1367421843.11020.43.camel@edumazet-glaptop \
--to=eric.dumazet@gmail.com \
--cc=ambrose@google.com \
--cc=davem@davemloft.net \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=netdev@vger.kernel.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox