From: Stanislav Fomichev <sdf@fomichev.me>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Stanislav Fomichev <sdf@google.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net,
ast@kernel.org, Martin KaFai Lau <kafai@fb.com>,
Yonghong Song <yhs@fb.com>
Subject: Re: [PATCH bpf-next v2 2/4] bpf: support cloning sk storage on accept()
Date: Mon, 12 Aug 2019 10:52:49 -0700 [thread overview]
Message-ID: <20190812175249.GF2820@mini-arch> (raw)
In-Reply-To: <db5ec323-1126-d461-bc65-27ccc1414589@iogearbox.net>
On 08/12, Daniel Borkmann wrote:
> On 8/9/19 6:10 PM, Stanislav Fomichev wrote:
> > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > and call it from sk_clone_lock.
> >
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Cc: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> [...]
> > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > +{
> > + struct bpf_sk_storage *new_sk_storage = NULL;
> > + struct bpf_sk_storage *sk_storage;
> > + struct bpf_sk_storage_elem *selem;
> > + int ret;
> > +
> > + RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > +
> > + rcu_read_lock();
> > + sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > +
> > + if (!sk_storage || hlist_empty(&sk_storage->list))
> > + goto out;
> > +
> > + hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > + struct bpf_sk_storage_elem *copy_selem;
> > + struct bpf_sk_storage_map *smap;
> > + struct bpf_map *map;
> > + int refold;
> > +
> > + smap = rcu_dereference(SDATA(selem)->smap);
> > + if (!(smap->map.map_flags & BPF_F_CLONE))
> > + continue;
> > +
> > + map = bpf_map_inc_not_zero(&smap->map, false);
> > + if (IS_ERR(map))
> > + continue;
> > +
> > + copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > + if (!copy_selem) {
> > + ret = -ENOMEM;
> > + bpf_map_put(map);
> > + goto err;
> > + }
> > +
> > + if (new_sk_storage) {
> > + selem_link_map(smap, copy_selem);
> > + __selem_link_sk(new_sk_storage, copy_selem);
> > + } else {
> > + ret = sk_storage_alloc(newsk, smap, copy_selem);
> > + if (ret) {
> > + kfree(copy_selem);
> > + atomic_sub(smap->elem_size,
> > + &newsk->sk_omem_alloc);
> > + bpf_map_put(map);
> > + goto err;
> > + }
> > +
> > + new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > + }
> > + bpf_map_put(map);
>
> The map get/put combination /under/ RCU read lock seems a bit odd to me, could
> you exactly describe the race that this would be preventing?
There is a race between sk storage release and sk storage clone.
bpf_sk_storage_map_free uses synchronize_rcu to wait for all existing
users to finish and the new ones are prevented via map's refcnt being
zero; we need to do something like that for the clone.
Martin suggested to use bpf_map_inc_not_zero/bpf_map_put.
If I read everythin correctly, I think without map_inc/map_put we
get the following race:
CPU0 CPU1
bpf_map_put
bpf_sk_storage_map_free(smap)
synchronize_rcu
// no more users via bpf or
// syscall, but clone
// can still happen
for each (bucket)
selem_unlink
selem_unlink_map(smap)
// adding anything at
// this point to the
// bucket will leak
rcu_read_lock
tcp_v4_rcv
tcp_v4_do_rcv
// sk is lockless TCP_LISTEN
tcp_v4_cookie_check
tcp_v4_syn_recv_sock
bpf_sk_storage_clone
rcu_dereference(sk->sk_bpf_storage)
selem_link_map(smap, copy)
// adding new element to the
// map -> leak
rcu_read_unlock
selem_unlink_sk
sk->sk_bpf_storage = NULL
synchronize_rcu
next prev parent reply other threads:[~2019-08-12 17:52 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-09 16:10 [PATCH bpf-next v2 0/4] bpf: support cloning sk storage on accept() Stanislav Fomichev
2019-08-09 16:10 ` [PATCH bpf-next v2 1/4] bpf: export bpf_map_inc_not_zero Stanislav Fomichev
2019-08-11 23:53 ` Yonghong Song
2019-08-09 16:10 ` [PATCH bpf-next v2 2/4] bpf: support cloning sk storage on accept() Stanislav Fomichev
2019-08-11 23:54 ` Yonghong Song
2019-08-12 10:17 ` Daniel Borkmann
2019-08-12 17:52 ` Stanislav Fomichev [this message]
2019-08-13 21:12 ` Daniel Borkmann
2019-08-13 21:28 ` Stanislav Fomichev
2019-08-13 1:47 ` Martin Lau
2019-08-13 5:05 ` Stanislav Fomichev
2019-08-09 16:10 ` [PATCH bpf-next v2 3/4] bpf: sync bpf.h to tools/ Stanislav Fomichev
2019-08-09 16:10 ` [PATCH bpf-next v2 4/4] selftests/bpf: add sockopt clone/inheritance test Stanislav Fomichev
2019-08-11 23:54 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190812175249.GF2820@mini-arch \
--to=sdf@fomichev.me \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.