From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
daniel@iogearbox.net, ast@kernel.org, song@kernel.org,
jonathan.lemon@gmail.com
Subject: Re: [bpf PATCH v2 2/8] bpf: sockmap, ensure sock lock held during tear down
Date: Thu, 06 Feb 2020 13:26:30 +0100 [thread overview]
Message-ID: <871rr7oqa1.fsf@cloudflare.com> (raw)
In-Reply-To: <5e3ba96ca7889_6b512aafe4b145b812@john-XPS-13-9370.notmuch>
On Thu, Feb 06, 2020 at 06:51 AM CET, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> On Sat, Jan 11, 2020 at 07:12 AM CET, John Fastabend wrote:
>> > The sock_map_free() and sock_hash_free() paths used to delete sockmap
>> > and sockhash maps walk the maps and destroy psock and bpf state associated
>> > with the socks in the map. When done the socks no longer have BPF programs
>> > attached and will function normally. This can happen while the socks in
>> > the map are still "live" meaning data may be sent/received during the walk.
>> >
>> > Currently, though we don't take the sock_lock when the psock and bpf state
>> > is removed through this path. Specifically, this means we can be writing
>> > into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
>> > while they are also being called from the networking side. This is not
>> > safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
>> > believed it was safe. Further its not clear to me its even a good idea
>> > to try and do this on "live" sockets while networking side might also
>> > be using the socket. Instead of trying to reason about using the socks
>> > from both sides lets realize that every use case I'm aware of rarely
>> > deletes maps, in fact kubernetes/Cilium case builds map at init and
>> > never tears it down except on errors. So lets do the simple fix and
>> > grab sock lock.
>> >
>> > This patch wraps sock deletes from maps in sock lock and adds some
>> > annotations so we catch any other cases easier.
>> >
>> > Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
>> > Cc: stable@vger.kernel.org
>> > Acked-by: Song Liu <songliubraving@fb.com>
>> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> > ---
>> > net/core/skmsg.c | 2 ++
>> > net/core/sock_map.c | 7 ++++++-
>> > 2 files changed, 8 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> > index ded2d5227678..3866d7e20c07 100644
>> > --- a/net/core/skmsg.c
>> > +++ b/net/core/skmsg.c
>> > @@ -594,6 +594,8 @@ EXPORT_SYMBOL_GPL(sk_psock_destroy);
>> >
>> > void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
>> > {
>> > + sock_owned_by_me(sk);
>> > +
>> > sk_psock_cork_free(psock);
>> > sk_psock_zap_ingress(psock);
>> >
>> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> > index eb114ee419b6..8998e356f423 100644
>> > --- a/net/core/sock_map.c
>> > +++ b/net/core/sock_map.c
>> > @@ -241,8 +241,11 @@ static void sock_map_free(struct bpf_map *map)
>> > struct sock *sk;
>> >
>> > sk = xchg(psk, NULL);
>> > - if (sk)
>> > + if (sk) {
>> > + lock_sock(sk);
>> > sock_map_unref(sk, psk);
>> > + release_sock(sk);
>> > + }
>> > }
>> > raw_spin_unlock_bh(&stab->lock);
>> > rcu_read_unlock();
>>
>> John, I've noticed this is triggering warnings that we might sleep in
>> lock_sock while (1) in RCU read-side section, and (2) holding a spin
>> lock:
[...]
>>
>> Here's an idea how to change the locking. I'm still wrapping my head
>> around what protects what in sock_map_free, so please bear with me:
>>
>> 1. synchronize_rcu before we iterate over the array is not needed,
>> AFAICT. We are not free'ing the map just yet, hence any readers
>> accessing the map via the psock are not in danger of use-after-free.
>
> Agreed. When we added 2bb90e5cc90e ("bpf: sockmap, synchronize_rcu before
> free'ing map") we could have done this.
>
>>
>> 2. rcu_read_lock is needed to protect access to psock inside
>> sock_map_unref, but we can't sleep while in RCU read-side. So push
>> it down, after we grab the sock lock.
>
> yes this looks better.
>
>>
>> 3. Grabbing stab->lock seems not needed, either. We get called from
>> bpf_map_free_deferred, after map refcnt dropped to 0, so we're not
>> racing with any other map user to modify its contents.
>
> This I'll need to think on a bit. We have the link-lock there so
> probably should be safe to drop. But will need to trace this through
> git history to be sure.
>
[...]
>> WDYT?
>
> Can you push the fix to bpf but leave the stab->lock for now. I think
> we can do a slightly better cleanup on stab->lock in bpf-next.
Here it is:
https://lore.kernel.org/bpf/20200206111652.694507-1-jakub@cloudflare.com/T/#t
I left the "extra" synchronize_rcu before walking the map. On second
thought, this isn't a bug. Just adds extra wait. bpf-next material?
>
>>
>> Reproducer follows.
>
> push reproducer into selftests?
Included the reproducer with the fixes. If it gets dropped from the
series, I'll resubmit it once bpf-next reopens.
next prev parent reply other threads:[~2020-02-06 12:26 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-11 6:11 [bpf PATCH v2 0/8] Fixes for sockmap/tls from more complex BPF progs John Fastabend
2020-01-11 6:11 ` [bpf PATCH v2 1/8] bpf: sockmap/tls, during free we may call tcp_bpf_unhash() in loop John Fastabend
2020-01-13 13:29 ` Jakub Sitnicki
2020-01-14 3:19 ` John Fastabend
2020-01-11 6:12 ` [bpf PATCH v2 2/8] bpf: sockmap, ensure sock lock held during tear down John Fastabend
2020-02-05 17:55 ` Jakub Sitnicki
2020-02-06 5:51 ` John Fastabend
2020-02-06 12:26 ` Jakub Sitnicki [this message]
2020-02-06 19:04 ` John Fastabend
2020-01-11 6:12 ` [bpf PATCH v2 3/8] bpf: sockmap/tls, push write_space updates through ulp updates John Fastabend
2020-01-12 1:00 ` Jonathan Lemon
2020-01-13 13:59 ` Jakub Sitnicki
2020-01-11 6:12 ` [bpf PATCH v2 4/8] bpf: sockmap, skmsg helper overestimates push, pull, and pop bounds John Fastabend
2020-01-11 6:12 ` [bpf PATCH v2 5/8] bpf: sockmap/tls, msg_push_data may leave end mark in place John Fastabend
2020-01-11 6:12 ` [bpf PATCH v2 6/8] bpf: sockmap/tls, tls_sw can create a plaintext buf > encrypt buf John Fastabend
2020-01-11 6:12 ` [bpf PATCH v2 7/8] bpf: sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining John Fastabend
2020-01-11 6:12 ` [bpf PATCH v2 8/8] bpf: sockmap/tls, fix pop data with SK_DROP return code John Fastabend
2020-01-15 22:37 ` [bpf PATCH v2 0/8] Fixes for sockmap/tls from more complex BPF progs Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871rr7oqa1.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.