From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Rainer Weikusat <rweikusat@mobileactivedefense.com>,
Jason Baron <jbaron@akamai.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 3.14 01/47] unix: avoid use-after-free in ep_remove_wait_queue
Date: Wed, 20 Jan 2016 14:00:33 -0800 [thread overview]
Message-ID: <20160120215507.817355776@linuxfoundation.org> (raw)
In-Reply-To: <20160120215507.575738941@linuxfoundation.org>
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
[ Upstream commit 7d267278a9ece963d77eefec61630223fce08c6c ]
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/net/af_unix.h | 1
net/unix/af_unix.c | 183 ++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 165 insertions(+), 19 deletions(-)
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -63,6 +63,7 @@ struct unix_sock {
#define UNIX_GC_CANDIDATE 0
#define UNIX_GC_MAYBE_CYCLE 1
struct socket_wq peer_wq;
+ wait_queue_t peer_wake;
};
static inline struct unix_sock *unix_sk(struct sock *sk)
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -316,6 +316,118 @@ found:
return s;
}
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+ void *key)
+{
+ struct unix_sock *u;
+ wait_queue_head_t *u_sleep;
+
+ u = container_of(q, struct unix_sock, peer_wake);
+
+ __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
+ q);
+ u->peer_wake.private = NULL;
+
+ /* relaying can only happen while the wq still exists */
+ u_sleep = sk_sleep(&u->sk);
+ if (u_sleep)
+ wake_up_interruptible_poll(u_sleep, key);
+
+ return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+ struct unix_sock *u, *u_other;
+ int rc;
+
+ u = unix_sk(sk);
+ u_other = unix_sk(other);
+ rc = 0;
+ spin_lock(&u_other->peer_wait.lock);
+
+ if (!u->peer_wake.private) {
+ u->peer_wake.private = other;
+ __add_wait_queue(&u_other->peer_wait, &u->peer_wake);
+
+ rc = 1;
+ }
+
+ spin_unlock(&u_other->peer_wait.lock);
+ return rc;
+}
+
+static void unix_dgram_peer_wake_disconnect(struct sock *sk,
+ struct sock *other)
+{
+ struct unix_sock *u, *u_other;
+
+ u = unix_sk(sk);
+ u_other = unix_sk(other);
+ spin_lock(&u_other->peer_wait.lock);
+
+ if (u->peer_wake.private == other) {
+ __remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
+ u->peer_wake.private = NULL;
+ }
+
+ spin_unlock(&u_other->peer_wait.lock);
+}
+
+static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
+ struct sock *other)
+{
+ unix_dgram_peer_wake_disconnect(sk, other);
+ wake_up_interruptible_poll(sk_sleep(sk),
+ POLLOUT |
+ POLLWRNORM |
+ POLLWRBAND);
+}
+
+/* preconditions:
+ * - unix_peer(sk) == other
+ * - association is stable
+ */
+static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
+{
+ int connected;
+
+ connected = unix_dgram_peer_wake_connect(sk, other);
+
+ if (unix_recvq_full(other))
+ return 1;
+
+ if (connected)
+ unix_dgram_peer_wake_disconnect(sk, other);
+
+ return 0;
+}
+
static inline int unix_writable(struct sock *sk)
{
return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
@@ -420,6 +532,8 @@ static void unix_release_sock(struct soc
skpair->sk_state_change(skpair);
sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
}
+
+ unix_dgram_peer_wake_disconnect(sk, skpair);
sock_put(skpair); /* It may now die */
unix_peer(sk) = NULL;
}
@@ -653,6 +767,7 @@ static struct sock *unix_create1(struct
INIT_LIST_HEAD(&u->link);
mutex_init(&u->readlock); /* single task reading lock */
init_waitqueue_head(&u->peer_wait);
+ init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
unix_insert_socket(unix_sockets_unbound(sk), sk);
out:
if (sk == NULL)
@@ -1020,6 +1135,8 @@ restart:
if (unix_peer(sk)) {
struct sock *old_peer = unix_peer(sk);
unix_peer(sk) = other;
+ unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
+
unix_state_double_unlock(sk, other);
if (other != old_peer)
@@ -1459,6 +1576,7 @@ static int unix_dgram_sendmsg(struct kio
struct scm_cookie tmp_scm;
int max_level;
int data_len = 0;
+ int sk_locked;
if (NULL == siocb->scm)
siocb->scm = &tmp_scm;
@@ -1536,12 +1654,14 @@ restart:
goto out_free;
}
+ sk_locked = 0;
unix_state_lock(other);
+restart_locked:
err = -EPERM;
if (!unix_may_send(sk, other))
goto out_unlock;
- if (sock_flag(other, SOCK_DEAD)) {
+ if (unlikely(sock_flag(other, SOCK_DEAD))) {
/*
* Check with 1003.1g - what should
* datagram error
@@ -1549,10 +1669,14 @@ restart:
unix_state_unlock(other);
sock_put(other);
+ if (!sk_locked)
+ unix_state_lock(sk);
+
err = 0;
- unix_state_lock(sk);
if (unix_peer(sk) == other) {
unix_peer(sk) = NULL;
+ unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+
unix_state_unlock(sk);
unix_dgram_disconnected(sk, other);
@@ -1578,21 +1702,38 @@ restart:
goto out_unlock;
}
- if (unix_peer(other) != sk && unix_recvq_full(other)) {
- if (!timeo) {
- err = -EAGAIN;
- goto out_unlock;
+ if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+ if (timeo) {
+ timeo = unix_wait_for_peer(other, timeo);
+
+ err = sock_intr_errno(timeo);
+ if (signal_pending(current))
+ goto out_free;
+
+ goto restart;
}
- timeo = unix_wait_for_peer(other, timeo);
+ if (!sk_locked) {
+ unix_state_unlock(other);
+ unix_state_double_lock(sk, other);
+ }
- err = sock_intr_errno(timeo);
- if (signal_pending(current))
- goto out_free;
+ if (unix_peer(sk) != other ||
+ unix_dgram_peer_wake_me(sk, other)) {
+ err = -EAGAIN;
+ sk_locked = 1;
+ goto out_unlock;
+ }
- goto restart;
+ if (!sk_locked) {
+ sk_locked = 1;
+ goto restart_locked;
+ }
}
+ if (unlikely(sk_locked))
+ unix_state_unlock(sk);
+
if (sock_flag(other, SOCK_RCVTSTAMP))
__net_timestamp(skb);
maybe_add_creds(skb, sock, other);
@@ -1606,6 +1747,8 @@ restart:
return len;
out_unlock:
+ if (sk_locked)
+ unix_state_unlock(sk);
unix_state_unlock(other);
out_free:
kfree_skb(skb);
@@ -2263,14 +2406,16 @@ static unsigned int unix_dgram_poll(stru
return mask;
writable = unix_writable(sk);
- other = unix_peer_get(sk);
- if (other) {
- if (unix_peer(other) != sk) {
- sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
- if (unix_recvq_full(other))
- writable = 0;
- }
- sock_put(other);
+ if (writable) {
+ unix_state_lock(sk);
+
+ other = unix_peer(sk);
+ if (other && unix_peer(other) != sk &&
+ unix_recvq_full(other) &&
+ unix_dgram_peer_wake_me(sk, other))
+ writable = 0;
+
+ unix_state_unlock(sk);
}
if (writable)
next prev parent reply other threads:[~2016-01-20 22:13 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-20 22:00 [PATCH 3.14 00/47] 3.14.59-stable review Greg Kroah-Hartman
2016-01-20 22:00 ` Greg Kroah-Hartman [this message]
2016-01-20 22:00 ` [PATCH 3.14 02/47] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 03/47] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 04/47] packet: always probe for transport header Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 05/47] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 06/47] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 07/47] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 08/47] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 10/47] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 11/47] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 12/47] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 13/47] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 14/47] net: ip6mr: " Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 15/47] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 16/47] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 17/47] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 18/47] Btrfs: fix race leading to incorrect item deletion when dropping extents Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 19/47] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 20/47] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 21/47] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 22/47] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 23/47] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 24/47] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 25/47] ocfs2: fix umask ignored issue Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 26/47] USB: cdc_acm: Ignore Infineon Flash Loader utility Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 27/47] USB: serial: Another Infineon flash loader USB ID Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 28/47] USB: cp210x: Remove CP2110 ID from compatibility list Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 29/47] USB: add quirk for devices with broken LPM Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 30/47] USB: whci-hcd: add check for dma mapping error Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 31/47] usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 32/47] gre6: allow to update all parameters via rtnl Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 33/47] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 34/47] sctp: use the same clock as if sock source timestamps were on Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 35/47] sctp: update the netstamp_needed counter when copying sockets Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 36/47] ipv6: sctp: clone options to avoid use after free Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 38/47] sh_eth: fix kernel oops in skb_put() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 39/47] vlan: Fix untag operations of stacked vlans with REORDER_HEADER off Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 40/47] skbuff: Fix offset error in skb_reorder_vlan_header Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 41/47] pptp: verify sockaddr_len in pptp_bind() and pptp_connect() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 42/47] bluetooth: Validate socket address length in sco_sock_bind() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 43/47] af_unix: Revert lock_interruptible in stream receive code Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 44/47] KEYS: Fix race between key destruction and finding a keyring by name Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 45/47] KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 46/47] KEYS: Fix race between read and revoke Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 47/47] KEYS: Fix keyring ref leak in join_session_keyring() Greg Kroah-Hartman
2016-01-20 23:15 ` [PATCH 3.14 00/47] 3.14.59-stable review Shuah Khan
2016-01-21 12:21 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160120215507.817355776@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rweikusat@mobileactivedefense.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).