From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Ambrose Feinstein <ambrose@google.com>,
Eric Dumazet <edumazet@google.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
"David S. Miller" <davem@davemloft.net>,
Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [PATCH 2.6.32 26/42] af_unix: fix a fatal race with bit fields
Date: Sat, 23 Jan 2016 15:12:47 +0100 [thread overview]
Message-ID: <20160123141223.114779810@1wt.eu> (raw)
In-Reply-To: <aa387f55227cb730b41e3d621bf460ff@local>
2.6.32-longterm review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <eric.dumazet@gmail.com>
commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream.
Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.
This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.
This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.
A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.
As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().
recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.
[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 2ee9cbe7e7bfe2d36374288b818aa31b2c4981db)
[wt: adjusted context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
include/net/af_unix.h | 5 +++--
net/unix/garbage.c | 12 ++++++------
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index c364711..faf1d6d 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -55,9 +55,10 @@ struct unix_sock {
struct list_head link;
atomic_long_t inflight;
spinlock_t lock;
- unsigned int gc_candidate : 1;
- unsigned int gc_maybe_cycle : 1;
unsigned char recursion_level;
+ unsigned long gc_flags;
+#define UNIX_GC_CANDIDATE 0
+#define UNIX_GC_MAYBE_CYCLE 1
wait_queue_head_t peer_wait;
wait_queue_t peer_wake;
};
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index cb72e91..de93193 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -195,7 +195,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
* have been added to the queues after
* starting the garbage collection
*/
- if (u->gc_candidate) {
+ if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
hit = true;
func(u);
}
@@ -264,7 +264,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
* of the list, so that it's checked even if it was already
* passed over
*/
- if (u->gc_maybe_cycle)
+ if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
list_move_tail(&u->link, &gc_candidates);
}
@@ -325,8 +325,8 @@ void unix_gc(void)
BUG_ON(total_refs < inflight_refs);
if (total_refs == inflight_refs) {
list_move_tail(&u->link, &gc_candidates);
- u->gc_candidate = 1;
- u->gc_maybe_cycle = 1;
+ __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
+ __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
}
}
@@ -354,7 +354,7 @@ void unix_gc(void)
if (atomic_long_read(&u->inflight) > 0) {
list_move_tail(&u->link, ¬_cycle_list);
- u->gc_maybe_cycle = 0;
+ __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
scan_children(&u->sk, inc_inflight_move_tail, NULL);
}
}
@@ -366,7 +366,7 @@ void unix_gc(void)
*/
while (!list_empty(¬_cycle_list)) {
u = list_entry(not_cycle_list.next, struct unix_sock, link);
- u->gc_candidate = 0;
+ __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
list_move_tail(&u->link, &gc_inflight_list);
}
--
1.7.12.2.21.g234cd45.dirty
next prev parent reply other threads:[~2016-01-23 14:12 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aa387f55227cb730b41e3d621bf460ff@local>
2016-01-23 14:12 ` [PATCH 2.6.32 01/42] ip6mr: call del_timer_sync() in ip6mr_free_table() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 02/42] isdn_ppp: Add checks for allocation failure in isdn_ppp_open() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 04/42] RDS: fix race condition when sending a message on unbound socket Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 05/42] unix: avoid use-after-free in ep_remove_wait_queue Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 06/42] ext4: Fix null dereference in ext4_fill_super() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 07/42] Revert "net: add length argument to skb_copy_and_csum_datagram_iovec" Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 08/42] udp: properly support MSG_PEEK with truncated buffers Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 09/42] KEYS: Fix race between read and revoke Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 11/42] net: fix warnings in make htmldocs by moving macro definition out of field declaration Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 12/42] bluetooth: Validate socket address length in sco_sock_bind() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 13/42] sctp: translate host order to network order when setting a hmacid Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 14/42] fuse: break infinite loop in fuse_fill_write_pages() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 15/42] fix sysvfs symlinks Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 16/42] vfs: Avoid softlockups with sendfile(2) Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 17/42] ext4: Fix handling of extended tv_sec Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 18/42] nfs: if we have no valid attrs, then dont declare the attribute cache valid Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 19/42] wan/x25: Fix use-after-free in x25_asy_open_tty() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 20/42] ipv4: igmp: Allow removing groups from a removed interface Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 21/42] sched/core: Remove false-positive warning from wake_up_process() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 22/42] ipmi: move timer init to before irq is setup Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 23/42] tcp: initialize tp->copied_seq in case of cross SYN connection Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 24/42] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 25/42] sctp: update the netstamp_needed counter when copying sockets Willy Tarreau
2016-01-23 14:12 ` Willy Tarreau [this message]
2016-01-23 14:12 ` [PATCH 2.6.32 27/42] rfkill: copy the name into the rfkill struct Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 28/42] ses: Fix problems with simple enclosures Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 29/42] ses: fix additional element traversal bug Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 30/42] tty: Fix GPF in flush_to_ldisc() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 31/42] mISDN: fix a loop count Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 32/42] ser_gigaset: fix deallocation of platform device structure Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 33/42] spi: fix parent-device reference leak Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 34/42] s390/dis: Fix handling of format specifiers Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 35/42] USB: ipaq.c: fix a timeout loop Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 36/42] USB: fix invalid memory access in hub_activate() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 37/42] MIPS: Fix restart of indirect syscalls Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 38/42] parisc: Fix syscall restarts Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 39/42] ipv6/addrlabel: fix ip6addrlbl_get() Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 40/42] mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() Willy Tarreau
2016-01-23 18:13 ` Ben Hutchings
2016-01-23 18:29 ` Willy Tarreau
2016-01-23 19:05 ` Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 41/42] KVM: x86: Reload pit counters for all channels when restoring state Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 42/42] kvm: x86: only channel 0 of the i8254 is linked to the HPET Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160123141223.114779810@1wt.eu \
--to=w@1wt.eu \
--cc=ambrose@google.com \
--cc=ben@decadent.org.uk \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).