All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Ambrose Feinstein <ambrose@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	"David S. Miller" <davem@davemloft.net>,
	Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [PATCH 2.6.32 26/42] af_unix: fix a fatal race with bit fields
Date: Sat, 23 Jan 2016 15:12:47 +0100	[thread overview]
Message-ID: <20160123141223.114779810@1wt.eu> (raw)
In-Reply-To: <aa387f55227cb730b41e3d621bf460ff@local>

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.dumazet@gmail.com>

commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream.

Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 2ee9cbe7e7bfe2d36374288b818aa31b2c4981db)
[wt: adjusted context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/af_unix.h |  5 +++--
 net/unix/garbage.c    | 12 ++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index c364711..faf1d6d 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -55,9 +55,10 @@ struct unix_sock {
 	struct list_head	link;
         atomic_long_t           inflight;
         spinlock_t		lock;
-	unsigned int		gc_candidate : 1;
-	unsigned int		gc_maybe_cycle : 1;
 	unsigned char		recursion_level;
+	unsigned long		gc_flags;
+#define UNIX_GC_CANDIDATE	0
+#define UNIX_GC_MAYBE_CYCLE	1
         wait_queue_head_t       peer_wait;
 	wait_queue_t		peer_wake;
 };
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index cb72e91..de93193 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -195,7 +195,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 					 * have been added to the queues after
 					 * starting the garbage collection
 					 */
-					if (u->gc_candidate) {
+					if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
 						hit = true;
 						func(u);
 					}
@@ -264,7 +264,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
 	 * of the list, so that it's checked even if it was already
 	 * passed over
 	 */
-	if (u->gc_maybe_cycle)
+	if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
 		list_move_tail(&u->link, &gc_candidates);
 }
 
@@ -325,8 +325,8 @@ void unix_gc(void)
 		BUG_ON(total_refs < inflight_refs);
 		if (total_refs == inflight_refs) {
 			list_move_tail(&u->link, &gc_candidates);
-			u->gc_candidate = 1;
-			u->gc_maybe_cycle = 1;
+			__set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
+			__set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 		}
 	}
 
@@ -354,7 +354,7 @@ void unix_gc(void)
 
 		if (atomic_long_read(&u->inflight) > 0) {
 			list_move_tail(&u->link, &not_cycle_list);
-			u->gc_maybe_cycle = 0;
+			__clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 			scan_children(&u->sk, inc_inflight_move_tail, NULL);
 		}
 	}
@@ -366,7 +366,7 @@ void unix_gc(void)
 	 */
 	while (!list_empty(&not_cycle_list)) {
 		u = list_entry(not_cycle_list.next, struct unix_sock, link);
-		u->gc_candidate = 0;
+		__clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
 		list_move_tail(&u->link, &gc_inflight_list);
 	}
 
-- 
1.7.12.2.21.g234cd45.dirty

  parent reply	other threads:[~2016-01-23 14:51 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <aa387f55227cb730b41e3d621bf460ff@local>
2016-01-23 14:12 ` [PATCH 2.6.32 01/42] ip6mr: call del_timer_sync() in ip6mr_free_table() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 02/42] isdn_ppp: Add checks for allocation failure in isdn_ppp_open() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 04/42] RDS: fix race condition when sending a message on unbound socket Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 05/42] unix: avoid use-after-free in ep_remove_wait_queue Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 06/42] ext4: Fix null dereference in ext4_fill_super() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 07/42] Revert "net: add length argument to skb_copy_and_csum_datagram_iovec" Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 08/42] udp: properly support MSG_PEEK with truncated buffers Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 09/42] KEYS: Fix race between read and revoke Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 11/42] net: fix warnings in make htmldocs by moving macro definition out of field declaration Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 12/42] bluetooth: Validate socket address length in sco_sock_bind() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 13/42] sctp: translate host order to network order when setting a hmacid Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 14/42] fuse: break infinite loop in fuse_fill_write_pages() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 15/42] fix sysvfs symlinks Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 16/42] vfs: Avoid softlockups with sendfile(2) Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 17/42] ext4: Fix handling of extended tv_sec Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 18/42] nfs: if we have no valid attrs, then dont declare the attribute cache valid Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 19/42] wan/x25: Fix use-after-free in x25_asy_open_tty() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 20/42] ipv4: igmp: Allow removing groups from a removed interface Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 21/42] sched/core: Remove false-positive warning from wake_up_process() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 22/42] ipmi: move timer init to before irq is setup Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 23/42] tcp: initialize tp->copied_seq in case of cross SYN connection Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 24/42] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 25/42] sctp: update the netstamp_needed counter when copying sockets Willy Tarreau
2016-01-23 14:12 ` Willy Tarreau [this message]
2016-01-23 14:12 ` [PATCH 2.6.32 27/42] rfkill: copy the name into the rfkill struct Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 28/42] ses: Fix problems with simple enclosures Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 29/42] ses: fix additional element traversal bug Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 30/42] tty: Fix GPF in flush_to_ldisc() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 31/42] mISDN: fix a loop count Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 32/42] ser_gigaset: fix deallocation of platform device structure Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 33/42] spi: fix parent-device reference leak Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 34/42] s390/dis: Fix handling of format specifiers Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 35/42] USB: ipaq.c: fix a timeout loop Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 36/42] USB: fix invalid memory access in hub_activate() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 37/42] MIPS: Fix restart of indirect syscalls Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 38/42] parisc: Fix syscall restarts Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 39/42] ipv6/addrlabel: fix ip6addrlbl_get() Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 40/42] mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() Willy Tarreau
2016-01-23 18:13   ` Ben Hutchings
2016-01-23 18:29     ` Willy Tarreau
2016-01-23 19:05       ` Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 41/42] KVM: x86: Reload pit counters for all channels when restoring state Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 42/42] kvm: x86: only channel 0 of the i8254 is linked to the HPET Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160123141223.114779810@1wt.eu \
    --to=w@1wt.eu \
    --cc=ambrose@google.com \
    --cc=ben@decadent.org.uk \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.