From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>,
Jason Baron <jbaron@akamai.com>,
"David S. Miller" <davem@davemloft.net>,
Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [PATCH 2.6.32 05/42] unix: avoid use-after-free in ep_remove_wait_queue
Date: Sat, 23 Jan 2016 15:12:26 +0100 [thread overview]
Message-ID: <20160123141222.198334616@1wt.eu> (raw)
In-Reply-To: <aa387f55227cb730b41e3d621bf460ff@local>
2.6.32-longterm review patch. If anyone has any objections, please let me know.
------------------
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
commit 7d267278a9ece963d77eefec61630223fce08c6c upstream.
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32:
- Access sk_sleep directly, not through sk_sleep() function
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
include/net/af_unix.h | 1 +
net/unix/af_unix.c | 183 ++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 163 insertions(+), 21 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 861045f..c364711 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -59,6 +59,7 @@ struct unix_sock {
unsigned int gc_maybe_cycle : 1;
unsigned char recursion_level;
wait_queue_head_t peer_wait;
+ wait_queue_t peer_wake;
};
#define unix_sk(__sk) ((struct unix_sock *)__sk)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 79c1dce..8e6a609 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -306,6 +306,118 @@ found:
return s;
}
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+ void *key)
+{
+ struct unix_sock *u;
+ wait_queue_head_t *u_sleep;
+
+ u = container_of(q, struct unix_sock, peer_wake);
+
+ __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
+ q);
+ u->peer_wake.private = NULL;
+
+ /* relaying can only happen while the wq still exists */
+ u_sleep = u->sk.sk_sleep;
+ if (u_sleep)
+ wake_up_interruptible_poll(u_sleep, key);
+
+ return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+ struct unix_sock *u, *u_other;
+ int rc;
+
+ u = unix_sk(sk);
+ u_other = unix_sk(other);
+ rc = 0;
+ spin_lock(&u_other->peer_wait.lock);
+
+ if (!u->peer_wake.private) {
+ u->peer_wake.private = other;
+ __add_wait_queue(&u_other->peer_wait, &u->peer_wake);
+
+ rc = 1;
+ }
+
+ spin_unlock(&u_other->peer_wait.lock);
+ return rc;
+}
+
+static void unix_dgram_peer_wake_disconnect(struct sock *sk,
+ struct sock *other)
+{
+ struct unix_sock *u, *u_other;
+
+ u = unix_sk(sk);
+ u_other = unix_sk(other);
+ spin_lock(&u_other->peer_wait.lock);
+
+ if (u->peer_wake.private == other) {
+ __remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
+ u->peer_wake.private = NULL;
+ }
+
+ spin_unlock(&u_other->peer_wait.lock);
+}
+
+static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
+ struct sock *other)
+{
+ unix_dgram_peer_wake_disconnect(sk, other);
+ wake_up_interruptible_poll(sk->sk_sleep,
+ POLLOUT |
+ POLLWRNORM |
+ POLLWRBAND);
+}
+
+/* preconditions:
+ * - unix_peer(sk) == other
+ * - association is stable
+ */
+static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
+{
+ int connected;
+
+ connected = unix_dgram_peer_wake_connect(sk, other);
+
+ if (unix_recvq_full(other))
+ return 1;
+
+ if (connected)
+ unix_dgram_peer_wake_disconnect(sk, other);
+
+ return 0;
+}
+
static inline int unix_writable(struct sock *sk)
{
return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
@@ -410,6 +522,8 @@ static void unix_release_sock(struct sock *sk, int embrion)
sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
read_unlock(&skpair->sk_callback_lock);
}
+
+ unix_dgram_peer_wake_disconnect(sk, skpair);
sock_put(skpair); /* It may now die */
unix_peer(sk) = NULL;
}
@@ -609,6 +723,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
INIT_LIST_HEAD(&u->link);
mutex_init(&u->readlock); /* single task reading lock */
init_waitqueue_head(&u->peer_wait);
+ init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
unix_insert_socket(unix_sockets_unbound, sk);
out:
if (sk == NULL)
@@ -987,6 +1102,8 @@ restart:
if (unix_peer(sk)) {
struct sock *old_peer = unix_peer(sk);
unix_peer(sk) = other;
+ unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
+
unix_state_double_unlock(sk, other);
if (other != old_peer)
@@ -1385,6 +1502,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
long timeo;
struct scm_cookie tmp_scm;
int max_level = 0;
+ int sk_locked;
if (NULL == siocb->scm)
siocb->scm = &tmp_scm;
@@ -1450,12 +1568,14 @@ restart:
goto out_free;
}
+ sk_locked = 0;
unix_state_lock(other);
+restart_locked:
err = -EPERM;
if (!unix_may_send(sk, other))
goto out_unlock;
- if (sock_flag(other, SOCK_DEAD)) {
+ if (unlikely(sock_flag(other, SOCK_DEAD))) {
/*
* Check with 1003.1g - what should
* datagram error
@@ -1463,10 +1583,14 @@ restart:
unix_state_unlock(other);
sock_put(other);
+ if (!sk_locked)
+ unix_state_lock(sk);
+
err = 0;
- unix_state_lock(sk);
if (unix_peer(sk) == other) {
unix_peer(sk) = NULL;
+ unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+
unix_state_unlock(sk);
unix_dgram_disconnected(sk, other);
@@ -1492,21 +1616,38 @@ restart:
goto out_unlock;
}
- if (unix_peer(other) != sk && unix_recvq_full(other)) {
- if (!timeo) {
- err = -EAGAIN;
- goto out_unlock;
+ if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+ if (timeo) {
+ timeo = unix_wait_for_peer(other, timeo);
+
+ err = sock_intr_errno(timeo);
+ if (signal_pending(current))
+ goto out_free;
+
+ goto restart;
}
- timeo = unix_wait_for_peer(other, timeo);
+ if (!sk_locked) {
+ unix_state_unlock(other);
+ unix_state_double_lock(sk, other);
+ }
- err = sock_intr_errno(timeo);
- if (signal_pending(current))
- goto out_free;
+ if (unix_peer(sk) != other ||
+ unix_dgram_peer_wake_me(sk, other)) {
+ err = -EAGAIN;
+ sk_locked = 1;
+ goto out_unlock;
+ }
- goto restart;
+ if (!sk_locked) {
+ sk_locked = 1;
+ goto restart_locked;
+ }
}
+ if (unlikely(sk_locked))
+ unix_state_unlock(sk);
+
skb_queue_tail(&other->sk_receive_queue, skb);
if (max_level > unix_sk(other)->recursion_level)
unix_sk(other)->recursion_level = max_level;
@@ -1517,6 +1658,8 @@ restart:
return len;
out_unlock:
+ if (sk_locked)
+ unix_state_unlock(sk);
unix_state_unlock(other);
out_free:
kfree_skb(skb);
@@ -2103,17 +2246,15 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
/* writable? */
writable = unix_writable(sk);
if (writable) {
- other = unix_peer_get(sk);
- if (other) {
- if (unix_peer(other) != sk) {
- sock_poll_wait(file, &unix_sk(other)->peer_wait,
- wait);
- if (unix_recvq_full(other))
- writable = 0;
- }
+ unix_state_lock(sk);
- sock_put(other);
- }
+ other = unix_peer(sk);
+ if (other && unix_peer(other) != sk &&
+ unix_recvq_full(other) &&
+ unix_dgram_peer_wake_me(sk, other))
+ writable = 0;
+
+ unix_state_unlock(sk);
}
if (writable)
--
1.7.12.2.21.g234cd45.dirty
next prev parent reply other threads:[~2016-01-23 14:12 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aa387f55227cb730b41e3d621bf460ff@local>
2016-01-23 14:12 ` [PATCH 2.6.32 01/42] ip6mr: call del_timer_sync() in ip6mr_free_table() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 02/42] isdn_ppp: Add checks for allocation failure in isdn_ppp_open() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 04/42] RDS: fix race condition when sending a message on unbound socket Willy Tarreau
2016-01-23 14:12 ` Willy Tarreau [this message]
2016-01-23 14:12 ` [PATCH 2.6.32 06/42] ext4: Fix null dereference in ext4_fill_super() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 07/42] Revert "net: add length argument to skb_copy_and_csum_datagram_iovec" Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 08/42] udp: properly support MSG_PEEK with truncated buffers Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 09/42] KEYS: Fix race between read and revoke Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 11/42] net: fix warnings in make htmldocs by moving macro definition out of field declaration Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 12/42] bluetooth: Validate socket address length in sco_sock_bind() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 13/42] sctp: translate host order to network order when setting a hmacid Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 14/42] fuse: break infinite loop in fuse_fill_write_pages() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 15/42] fix sysvfs symlinks Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 16/42] vfs: Avoid softlockups with sendfile(2) Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 17/42] ext4: Fix handling of extended tv_sec Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 18/42] nfs: if we have no valid attrs, then dont declare the attribute cache valid Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 19/42] wan/x25: Fix use-after-free in x25_asy_open_tty() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 20/42] ipv4: igmp: Allow removing groups from a removed interface Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 21/42] sched/core: Remove false-positive warning from wake_up_process() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 22/42] ipmi: move timer init to before irq is setup Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 23/42] tcp: initialize tp->copied_seq in case of cross SYN connection Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 24/42] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 25/42] sctp: update the netstamp_needed counter when copying sockets Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 26/42] af_unix: fix a fatal race with bit fields Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 27/42] rfkill: copy the name into the rfkill struct Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 28/42] ses: Fix problems with simple enclosures Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 29/42] ses: fix additional element traversal bug Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 30/42] tty: Fix GPF in flush_to_ldisc() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 31/42] mISDN: fix a loop count Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 32/42] ser_gigaset: fix deallocation of platform device structure Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 33/42] spi: fix parent-device reference leak Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 34/42] s390/dis: Fix handling of format specifiers Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 35/42] USB: ipaq.c: fix a timeout loop Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 36/42] USB: fix invalid memory access in hub_activate() Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 37/42] MIPS: Fix restart of indirect syscalls Willy Tarreau
2016-01-23 14:12 ` [PATCH 2.6.32 38/42] parisc: Fix syscall restarts Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 39/42] ipv6/addrlabel: fix ip6addrlbl_get() Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 40/42] mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() Willy Tarreau
2016-01-23 18:13 ` Ben Hutchings
2016-01-23 18:29 ` Willy Tarreau
2016-01-23 19:05 ` Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 41/42] KVM: x86: Reload pit counters for all channels when restoring state Willy Tarreau
2016-01-23 14:13 ` [PATCH 2.6.32 42/42] kvm: x86: only channel 0 of the i8254 is linked to the HPET Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160123141222.198334616@1wt.eu \
--to=w@1wt.eu \
--cc=ben@decadent.org.uk \
--cc=davem@davemloft.net \
--cc=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rweikusat@mobileactivedefense.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).