All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	yangyingliang@huawei.com, martin.lau@kernel.org
Subject: Re: [PATCH bpf 1/2] bpf: sockmap, af_unix sockets need to hold ref for pair sock
Date: Fri, 27 Oct 2023 15:32:15 +0200	[thread overview]
Message-ID: <87o7gk18ja.fsf@cloudflare.com> (raw)
In-Reply-To: <65383999941f3_1969a2083e@john.notmuch>

On Tue, Oct 24, 2023 at 02:39 PM -07, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> On Mon, Oct 16, 2023 at 12:08 PM -07, John Fastabend wrote:
>> > AF_UNIX sockets are a paired socket. So sending on one of the pairs
>> > will lookup the paired socket as part of the send operation. It is
>> > possible however to put just one of the pairs in a BPF map. This
>> > currently increments the refcnt on the sock in the sockmap to
>> > ensure it is not free'd by the stack before sockmap cleans up its
>> > state and stops any skbs being sent/recv'd to that socket.
>> >
>> > But we missed a case. If the peer socket is closed it will be
>> > free'd by the stack. However, the paired socket can still be
>> > referenced from BPF sockmap side because we hold a reference
>> > there. Then if we are sending traffic through BPF sockmap to
>> > that socket it will try to dereference the free'd pair in its
>> > send logic creating a use after free.  And following splat,
>> >
>> >    [59.900375] BUG: KASAN: slab-use-after-free in sk_wake_async+0x31/0x1b0
>> >    [59.901211] Read of size 8 at addr ffff88811acbf060 by task kworker/1:2/954
>> >    [...]
>> >    [59.905468] Call Trace:
>> >    [59.905787]  <TASK>
>> >    [59.906066]  dump_stack_lvl+0x130/0x1d0
>> >    [59.908877]  print_report+0x16f/0x740
>> >    [59.910629]  kasan_report+0x118/0x160
>> >    [59.912576]  sk_wake_async+0x31/0x1b0
>> >    [59.913554]  sock_def_readable+0x156/0x2a0
>> >    [59.914060]  unix_stream_sendmsg+0x3f9/0x12a0
>> >    [59.916398]  sock_sendmsg+0x20e/0x250
>> >    [59.916854]  skb_send_sock+0x236/0xac0
>> >    [59.920527]  sk_psock_backlog+0x287/0xaa0
>> 
>> Isn't the problem here that unix_stream_sendmsg doesn't grab a ref to
>> peer sock? Unlike unix_dgram_sendmsg which uses the unix_peer_get
>> helper.
>
> It does by my read. In unix_stream_connect we have,
>
> 	sock_hold(sk);
> 	unix_peer(newsk)	= sk;
> 	newsk->sk_state		= TCP_ESTABLISHED;
>
> where it assigns the peer sock. unix_dgram_connect() also calls
> sock_hold() but through the path that does the socket lookup, such as
> unix_find_other().
>
> The problem I see is before the socket does the kfree on the
> sock we need to be sure the backlog is canceled and the skb list
> ingress_skb is purged. If we don't ensure this then the redirect
> will 
>
> My model is this,
>
>          s1            c1
> refcnt    1             1
> connect   2             2
> psock     3             3
> send(s1) ...
> close(s1) 2             1 <- close drops psock count also
> close(c1) 0             0
>
> The important bit here is the psock has a refcnt on the
> underlying sock (psock->sk) and wont dec that until after
> cancel_delayed_work_sync() completes. This ensures the
> backlog wont try to sendmsg() on that sock after we free
> it. We also check for SOCK_DEAD and abort to avoid sending
> over a socket that has been marked DEAD.
>
> So... After close(s1) the only thing keeping that sock
> around is c1. Then we close(c1) that call path is
>
>  unix_release
>    close() 
>    unix_release_sock()
>      skpair = unix_peer(sk);
>      ...
>      sock_put(skpair);  <- trouble here
>
> The release will call sock_put() on the pair socket and
> dec it to 0 where it gets free'd through sk_free(). But
> now the trouble is we haven't waited for cancel_delayed_work_sync()
> on the c1 socket yet so backlog can still run. When it does
> run it may try to send a pkg over socket s1. OK right up until
> the sendmsg(s1, ...) does a peer lookup and derefs the peer
> socket. The peer socket was free'd earlier so use after free. 
>
> The question I had originally was this is odd, we are allowing
> a sendmsg(s1) over a socket while its in unix_release(). We
> used to take the sock lock from the backlog that was dropped
> in the name of performance, but it creates these races.
>
> Other fixes I considered. First adding sock lock back to
> backlog. But that punishes the UDP and TCP cases that don't
> have this problem. Set the SOCK_DEAD flag earlier or check
> later but this just makes the race smaller doesn't really
> eliminate it.
>
> So this patch is what I came up with. 

What I was getting at is that we could make it safe to call sendmsg on a
unix stream sock while its peer is being release. And not just for
sockmap. I expect io_uring might have the same problem. But I didn't
actually check yet.

For that we could keep a ref to peer for the duration of sendmsg call,
like unix dgram does. Then 'other' doesn't become a stale pointer before
we're done with it.

Bumping ref count on each sendmsg is not free, but maybe its
acceptable. Unix dgram sockets live with it.

With a patch like below, I'm no longer able to trigger an UAF splat.

WDYT?

---8<---

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3e8a04a13668..48cf19ea9294 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2198,7 +2198,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out_err;
 	} else {
 		err = -ENOTCONN;
-		other = unix_peer(sk);
+		other = unix_peer_get(sk);
 		if (!other)
 			goto out_err;
 	}
@@ -2282,6 +2282,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 	}
 #endif
 
+	sock_put(other);
 	scm_destroy(&scm);
 
 	return sent;
@@ -2294,6 +2295,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		send_sig(SIGPIPE, current, 0);
 	err = -EPIPE;
 out_err:
+	if (other)
+		sock_put(other);
 	scm_destroy(&scm);
 	return sent ? : err;
 }

  reply	other threads:[~2023-10-27 13:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 19:08 [PATCH bpf 0/2] sockmap fix for KASAN_VMALLOC and af_unix John Fastabend
2023-10-16 19:08 ` [PATCH bpf 1/2] bpf: sockmap, af_unix sockets need to hold ref for pair sock John Fastabend
2023-10-18 10:40   ` Jakub Sitnicki
2023-10-24 21:39     ` John Fastabend
2023-10-27 13:32       ` Jakub Sitnicki [this message]
2023-10-27 17:38         ` Kuniyuki Iwashima
2023-10-28  7:33           ` Jakub Sitnicki
2023-11-04  3:38             ` John Fastabend
2023-11-06 10:15               ` Jakub Sitnicki
2023-11-06 12:35   ` Jakub Sitnicki
2023-11-20 21:13     ` Martin KaFai Lau
2023-11-21 20:40       ` John Fastabend
2023-11-22 19:26         ` John Fastabend
2023-10-16 19:08 ` [PATCH bpf 2/2] bpf: sockmap, add af_unix test with both sockets in map John Fastabend
2023-11-06 12:44   ` Jakub Sitnicki
2023-11-06 14:42     ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o7gk18ja.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yangyingliang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.