From: Jakub Sitnicki <jakub@cloudflare.com>
To: Jiang Wang <jiang.wang@bytedance.com>
Cc: netdev@vger.kernel.org, cong.wang@bytedance.com,
duanxiongchun@bytedance.com, xieyongji@bytedance.com,
chaiwen.cc@bytedance.com, "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Lorenz Bauer <lmb@cloudflare.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
Shuah Khan <shuah@kernel.org>,
Johan Almbladh <johan.almbladh@anyfinetworks.com>,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 2/5] af_unix: add unix_stream_proto for sockmap
Date: Wed, 04 Aug 2021 18:59:35 +0200 [thread overview]
Message-ID: <87zgtxcfig.fsf@cloudflare.com> (raw)
In-Reply-To: <20210802211912.116329-3-jiang.wang@bytedance.com>
On Mon, Aug 02, 2021 at 11:19 PM CEST, Jiang Wang wrote:
[...]
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index ae5fa4338..42f50ea7a 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -517,9 +517,15 @@ static bool sk_is_tcp(const struct sock *sk)
> sk->sk_protocol == IPPROTO_TCP;
> }
>
> +static bool sk_is_unix_stream(const struct sock *sk)
> +{
> + return sk->sk_type == SOCK_STREAM &&
> + sk->sk_protocol == PF_UNIX;
> +}
> +
> static bool sock_map_redirect_allowed(const struct sock *sk)
> {
> - if (sk_is_tcp(sk))
> + if (sk_is_tcp(sk) || sk_is_unix_stream(sk))
> return sk->sk_state != TCP_LISTEN;
> else
> return sk->sk_state == TCP_ESTABLISHED;
Let me provide some context.
The reason why we check != TCP_LISTEN for TCP sockets is that we want to
allow redirect redirect to sockets that are about to transition from
TCP_SYN_RECV to TCP_ESTABLISHED, in addition to sockets already in
TCP_ESTABLISHED state.
That's because BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB callback happens when
socket is still in TCP_SYN_RECV state. With BPF sockops program, we can
insert such socket into a sockmap. Hence, there is a short window of
opportunity when we could redirect to a socket in TCP_SYN_RECV.
UNIX sockets can be only in TCP_{CLOSE,LISTEN,ESTABLISHED} state,
AFAIK. So it is sufficient to rely on the default == TCP_ESTABLISHED
check.
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 0ae3fc4c8..9c1711c67 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -791,17 +791,35 @@ static void unix_close(struct sock *sk, long timeout)
> */
> }
>
> -struct proto unix_proto = {
> - .name = "UNIX",
> +static void unix_unhash(struct sock *sk)
> +{
> + /* Nothing to do here, unix socket does not need a ->unhash().
> + * This is merely for sockmap.
> + */
> +}
> +
> +struct proto unix_dgram_proto = {
> + .name = "UNIX-DGRAM",
> + .owner = THIS_MODULE,
> + .obj_size = sizeof(struct unix_sock),
> + .close = unix_close,
> +#ifdef CONFIG_BPF_SYSCALL
> + .psock_update_sk_prot = unix_dgram_bpf_update_proto,
> +#endif
> +};
> +
> +struct proto unix_stream_proto = {
> + .name = "UNIX-STREAM",
> .owner = THIS_MODULE,
> .obj_size = sizeof(struct unix_sock),
> .close = unix_close,
> + .unhash = unix_unhash,
> #ifdef CONFIG_BPF_SYSCALL
> - .psock_update_sk_prot = unix_bpf_update_proto,
> + .psock_update_sk_prot = unix_stream_bpf_update_proto,
> #endif
> };
>
> -static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
> +static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, int type)
> {
> struct sock *sk = NULL;
> struct unix_sock *u;
> @@ -810,7 +828,11 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
> if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files())
> goto out;
>
> - sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto, kern);
> + if (type == SOCK_STREAM)
> + sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_stream_proto, kern);
> + else /*dgram and seqpacket */
> + sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_dgram_proto, kern);
> +
> if (!sk)
> goto out;
>
> @@ -872,7 +894,7 @@ static int unix_create(struct net *net, struct socket *sock, int protocol,
> return -ESOCKTNOSUPPORT;
> }
>
> - return unix_create1(net, sock, kern) ? 0 : -ENOMEM;
> + return unix_create1(net, sock, kern, sock->type) ? 0 : -ENOMEM;
> }
>
> static int unix_release(struct socket *sock)
> @@ -1286,7 +1308,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> err = -ENOMEM;
>
> /* create new sock for complete connection */
> - newsk = unix_create1(sock_net(sk), NULL, 0);
> + newsk = unix_create1(sock_net(sk), NULL, 0, sock->type);
> if (newsk == NULL)
> goto out;
>
> @@ -2214,7 +2236,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
> struct sock *sk = sock->sk;
>
> #ifdef CONFIG_BPF_SYSCALL
> - if (sk->sk_prot != &unix_proto)
> + if (sk->sk_prot != &unix_dgram_proto)
> return sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
> flags & ~MSG_DONTWAIT, NULL);
> #endif
KASAN might be unhappy about access to sk->sk_prot not annotated with
READ_ONCE. In unix_bpf we have WRITE_ONCE(sk->sk_prot, ...) [1]
[...]
[1] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#why-kernel-code-should-use-read_once-and-write_once-for-shared-memory-accesses
next prev parent reply other threads:[~2021-08-04 16:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-02 21:19 [PATCH bpf-next v3 0/5] sockmap: add sockmap support for unix stream socket Jiang Wang
2021-08-02 21:19 ` [PATCH bpf-next v3 1/5] af_unix: add read_sock for stream socket types Jiang Wang
2021-08-02 21:19 ` [PATCH bpf-next v3 2/5] af_unix: add unix_stream_proto for sockmap Jiang Wang
2021-08-03 0:24 ` Jiang Wang .
2021-08-03 0:44 ` kernel test robot
2021-08-03 0:44 ` kernel test robot
2021-08-04 16:59 ` Jakub Sitnicki [this message]
2021-08-04 17:20 ` Jiang Wang .
2021-08-02 21:19 ` [PATCH bpf-next v3 3/5] selftest/bpf: add tests for sockmap with unix stream type Jiang Wang
2021-08-02 21:19 ` [PATCH bpf-next v3 4/5] selftest/bpf: change udp to inet in some function names Jiang Wang
2021-08-02 21:19 ` [PATCH bpf-next v3 5/5] selftest/bpf: add new tests in sockmap for unix stream to tcp Jiang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgtxcfig.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=chaiwen.cc@bytedance.com \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=duanxiongchun@bytedance.com \
--cc=jiang.wang@bytedance.com \
--cc=johan.almbladh@anyfinetworks.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=lmb@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=xieyongji@bytedance.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.