From: Jakub Sitnicki <jakub@cloudflare.com>
To: Martin Lau <kafai@fb.com>
Cc: "bpf\@vger.kernel.org" <bpf@vger.kernel.org>,
"netdev\@vger.kernel.org" <netdev@vger.kernel.org>,
"kernel-team\@cloudflare.com" <kernel-team@cloudflare.com>,
John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH bpf-next 4/8] bpf, sockmap: Don't let child socket inherit psock or its ops on copy
Date: Tue, 26 Nov 2019 19:36:09 +0100 [thread overview]
Message-ID: <87d0deo57q.fsf@cloudflare.com> (raw)
In-Reply-To: <20191126171607.pzrg5qhbavh7enwh@kafai-mbp.dhcp.thefacebook.com>
On Tue, Nov 26, 2019 at 06:16 PM CET, Martin Lau wrote:
> On Tue, Nov 26, 2019 at 04:54:33PM +0100, Jakub Sitnicki wrote:
>> On Mon, Nov 25, 2019 at 11:38 PM CET, Martin Lau wrote:
>> > On Sat, Nov 23, 2019 at 12:07:47PM +0100, Jakub Sitnicki wrote:
>> > [ ... ]
>> >
>> >> @@ -370,6 +378,11 @@ static inline void sk_psock_restore_proto(struct sock *sk,
>> >> sk->sk_prot = psock->sk_proto;
>> >> psock->sk_proto = NULL;
>> >> }
>> >> +
>> >> + if (psock->icsk_af_ops) {
>> >> + icsk->icsk_af_ops = psock->icsk_af_ops;
>> >> + psock->icsk_af_ops = NULL;
>> >> + }
>> >> }
>> >
>> > [ ... ]
>> >
>> >> +static struct sock *tcp_bpf_syn_recv_sock(const struct sock *sk,
>> >> + struct sk_buff *skb,
>> >> + struct request_sock *req,
>> >> + struct dst_entry *dst,
>> >> + struct request_sock *req_unhash,
>> >> + bool *own_req)
>> >> +{
>> >> + const struct inet_connection_sock_af_ops *ops;
>> >> + void (*write_space)(struct sock *sk);
>> >> + struct sk_psock *psock;
>> >> + struct proto *proto;
>> >> + struct sock *child;
>> >> +
>> >> + rcu_read_lock();
>> >> + psock = sk_psock(sk);
>> >> + if (likely(psock)) {
>> >> + proto = psock->sk_proto;
>> >> + write_space = psock->saved_write_space;
>> >> + ops = psock->icsk_af_ops;
>> > It is not immediately clear to me what ensure
>> > ops is not NULL here.
>> >
>> > It is likely I missed something. A short comment would
>> > be very useful here.
>>
>> I can see the readability problem. Looking at it now, perhaps it should
>> be rewritten, to the same effect, as:
>>
>> static struct sock *tcp_bpf_syn_recv_sock(...)
>> {
>> const struct inet_connection_sock_af_ops *ops = NULL;
>> ...
>>
>> rcu_read_lock();
>> psock = sk_psock(sk);
>> if (likely(psock)) {
>> proto = psock->sk_proto;
>> write_space = psock->saved_write_space;
>> ops = psock->icsk_af_ops;
>> }
>> rcu_read_unlock();
>>
>> if (!ops)
>> ops = inet_csk(sk)->icsk_af_ops;
>> child = ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
>>
>> If psock->icsk_af_ops were NULL, it would mean we haven't initialized it
>> properly. To double check what happens here:
> I did not mean the init path. The init path is fine since it init
> eveything on psock before publishing the sk to the sock_map.
>
> I was thinking the delete path (e.g. sock_map_delete_elem). It is not clear
> to me what prevent the earlier pasted sk_psock_restore_proto() which sets
> psock->icsk_af_ops to NULL from running in parallel with
> tcp_bpf_syn_recv_sock()? An explanation would be useful.
Ah, I misunderstood. Nothing prevents the race, AFAIK.
Setting psock->icsk_af_ops to null on restore and not checking for it
here was a bad move on my side. Also I need to revisit what to do about
psock->sk_proto so the child socket doesn't end up with null sk_proto.
This race should be easy enough to trigger. Will give it a shot.
Thank you for bringing this up,
Jakub
>
>>
>> In sock_map_link we do a setup dance where we first create the psock and
>> later initialize the socket callbacks (tcp_bpf_init).
>>
>> static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
>> struct sock *sk)
>> {
>> ...
>> if (psock) {
>> ...
>> } else {
>> psock = sk_psock_init(sk, map->numa_node);
>> if (!psock) {
>> ret = -ENOMEM;
>> goto out_progs;
>> }
>> sk_psock_is_new = true;
>> }
>> ...
>> if (sk_psock_is_new) {
>> ret = tcp_bpf_init(sk);
>> if (ret < 0)
>> goto out_drop;
>> } else {
>> tcp_bpf_reinit(sk);
>> }
>>
>> The "if (sk_psock_new)" branch triggers the call chain that leads to
>> saving & overriding socket callbacks.
>>
>> tcp_bpf_init -> tcp_bpf_update_sk_prot -> sk_psock_update_proto
>>
>> Among them, icsk_af_ops.
>>
>> static inline void sk_psock_update_proto(...)
>> {
>> ...
>> psock->icsk_af_ops = icsk->icsk_af_ops;
>> icsk->icsk_af_ops = af_ops;
>> }
>>
>> Goes without saying that a comment is needed.
>>
>> Thanks for the feedback,
>> Jakub
next prev parent reply other threads:[~2019-11-26 18:36 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-23 11:07 [PATCH bpf-next 0/8] Extend SOCKMAP to store listening sockets Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 1/8] bpf, sockmap: Return socket cookie on lookup from syscall Jakub Sitnicki
2019-11-24 5:32 ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 2/8] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Jakub Sitnicki
2019-11-24 5:35 ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 3/8] bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP Jakub Sitnicki
2019-11-24 5:38 ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 4/8] bpf, sockmap: Don't let child socket inherit psock or its ops on copy Jakub Sitnicki
2019-11-24 5:56 ` John Fastabend
2019-11-25 22:38 ` Martin Lau
2019-11-26 15:54 ` Jakub Sitnicki
2019-11-26 17:16 ` Martin Lau
2019-11-26 18:36 ` Jakub Sitnicki [this message]
[not found] ` <87sglsfdda.fsf@cloudflare.com>
2019-12-11 17:20 ` Martin Lau
2019-12-12 11:27 ` Jakub Sitnicki
2019-12-12 19:23 ` Martin Lau
2019-12-17 15:06 ` Jakub Sitnicki
2019-11-26 18:43 ` John Fastabend
2019-11-27 22:18 ` Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 5/8] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2019-11-24 5:57 ` John Fastabend
2019-11-25 1:24 ` Alexei Starovoitov
2019-11-25 4:17 ` John Fastabend
2019-11-25 10:40 ` Jakub Sitnicki
2019-11-25 22:07 ` Martin Lau
2019-11-26 14:30 ` Jakub Sitnicki
2019-11-26 19:03 ` Martin Lau
2019-11-27 21:34 ` Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 6/8] libbpf: Recognize SK_REUSEPORT programs from section name Jakub Sitnicki
2019-11-24 5:57 ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 7/8] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2019-11-24 6:00 ` John Fastabend
2019-11-25 22:30 ` Martin Lau
2019-11-26 14:32 ` Jakub Sitnicki
2019-12-12 10:30 ` Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 8/8] selftests/bpf: Tests for SOCKMAP holding listening sockets Jakub Sitnicki
2019-11-24 6:04 ` John Fastabend
2019-11-24 6:10 ` [PATCH bpf-next 0/8] Extend SOCKMAP to store " John Fastabend
2019-11-25 9:22 ` Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d0deo57q.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=bpf@vger.kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kernel-team@cloudflare.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.