From: Stanislav Fomichev <sdf@fomichev.me>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>,
Network Development <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array
Date: Tue, 14 May 2019 10:53:32 -0700 [thread overview]
Message-ID: <20190514175332.GC10244@mini-arch> (raw)
In-Reply-To: <20190514174523.myybhjzfhmxdycgf@ast-mbp>
On 05/14, Alexei Starovoitov wrote:
> On Tue, May 14, 2019 at 10:30:02AM -0700, Stanislav Fomichev wrote:
> > On 05/14, Alexei Starovoitov wrote:
> > > On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 05/08, Stanislav Fomichev wrote:
> > > > > On 05/08, Alexei Starovoitov wrote:
> > > > > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > > > > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > > > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > > > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > > > > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > > > > > __rcu annotations and move rcu handling to a higher level.
> > > > > > >
> > > > > > > It looks like all those routines are called from the rcu update
> > > > > > > side and we can use simple rcu_dereference_protected to get a
> > > > > > > reference that is valid as long as we hold a mutex (i.e. no other
> > > > > > > updater can change the pointer, no need for rcu read section and
> > > > > > > there should not be a use-after-free problem).
> > > > > > >
> > > > > > > To be fair, there is currently no issue with the existing approach
> > > > > > > since the calls are mutex-protected, pointer values don't change,
> > > > > > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > > > > >
> > > > > > > The series fixes the following sparse warnings:
> > > > > >
> > > > > > Absolutely not.
> > > > > > please fix it properly.
> > > > > > Removing annotations is not a fix.
> > > > > I'm fixing it properly by removing the annotations and moving lifetime
> > > > > management to the upper layer. See commits 2-4 where I fix the users, the
> > > > > first patch is just the "preparation".
> > > > >
> > > > > The users are supposed to do:
> > > > >
> > > > > mutex_lock(&x);
> > > > > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> > > > > // ...
> > > > > // call bpf_prog_array helpers while mutex guarantees that
> > > > > // the object referenced by p is valid (i.e. no need for bpf_prog_array
> > > > > // helpers to care about rcu lifetime)
> > > > > // ...
> > > > > mutex_unlock(&x);
> > > > >
> > > > > What am I missing here?
> > > >
> > > > Just to give you my perspective on why current api with __rcu annotations
> > > > is working, but not optimal (even if used from the rcu read section).
> > > >
> > > > Sample code:
> > > >
> > > > struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> > > > int n;
> > > >
> > > > rcu_read_lock();
> > > > n = bpf_prog_array_length(progs);
> > > > if (n > 0) {
> > > > // do something with __rcu progs
> > > > do_something(progs);
> > > > }
> > > > rcu_read_unlock();
> > > >
> > > > Since progs is __rcu annotated, do_something() would need to do
> > > > rcu_dereference again and it might get a different value compared to
> > > > whatever bpf_prog_array_free got while doing its dereference.
> > >
> > > correct and I believe the code deals with it fine.
> > > cnt could be different between two calls.
> > Yes, currently there is no problem, all users of these apis
> > are fine because they are holding a mutex (and are hence in the rcu update
> > path, i.e. the pointer can't change and they have a consistent view
> > between the calls).
> >
> > For example, we currently have:
> >
> > int n1, n2;
> > mutex_lock(&x);
> > n1 = bpf_prog_array_length(progs);
> > n2 = bpf_prog_array_length(progs);
> > // n1 is guaranteed to be the same as n2 as long as we
> > // hold a mutex; single updater only
> > ...
> > mutex_unlock(&x);
> >
> > Versus:
> >
> > rcu_read_lock();
> > n1 = bpf_prog_array_length(progs);
> > n2 = bpf_prog_array_length(progs);
> > // n1 and n2 can be different; rcu_read_lock is all about
> > // lifetime
> > ...
> > rcu_read_unlock();
> >
> > But, as I said, we don't use those __rcu annotated bpf_prog_array
> > routines in the rcu read section, so we are fine.
> >
> > (I'm just showing that potentially there might be a problem if we don't move
> > rcu management away from bpf_prog_array routines and if someone
> > decides to call them under rcu_read_lock).
> >
> > > > A better way is not to deal with rcu inside those helpers and let
> > > > higher layers do that:
> > > >
> > > > struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> > > > struct bpf_prog_array *p;
> > > > int n;
> > > >
> > > > rcu_read_lock();
> > > > p = rcu_dereference(p);
> > > > n = bpf_prog_array_length(p);
> > > > if (n > 0) {
> > > > do_something(p); // do_something sees the same p as bpf_prog_array_length
> > > > }
> > > > rcu_read_unlock();
> > > >
> > > > What do you think?
> > >
> > > I'm not sure that generically applicable.
> > > Which piece of code do you have in mind for such refactoring?
> > All existing callers of (take a look at patches 2-4):
> >
> > * bpf_prog_array_free
> > * bpf_prog_array_length
> > * bpf_prog_array_copy_to_user
> > * bpf_prog_array_delete_safe
> > * bpf_prog_array_copy_info
> > * bpf_prog_array_copy
> >
> > They are:
> >
> > * perf event bpf attach/detach/query (under bpf_event_mutex)
> > * cgroup attach/detach/query (management of cgroup_bpf->effective, under
> > cgroup_mutex)
> > * lirc bpf attach/detach/query (under ir_raw_handler_lock)
> >
> > Nobody uses these apis in the rcu read section, so we can remove the
> > annotations and use rcu_dereference_protected on the higher layer.
> >
> > Side bonus is that we can also remove __rcu from cgroup_bpf.inactive
> > (which is just a temp storage and not updated in the rcu fashion) and
> > from old_array in activate_effective_progs (which is an on-stack
> > array and, I guess, only has __rcu annotation to satisfy sparse).
> >
> > So nothing is changing functionality-wise, but it becomes a bit easier
> > to reason about by being more explicit with rcu api.
>
> disagree. mutex is necesary.
Oh, for sure, mutex is necessary, I'm not taking away the mutex.
All I'm doing is I'm asking the users of those apis to be more
explicit by using rcu_dereference_protected(progs, lockdep_is_held(mtx)),
which can be read as "I know I'm holding a mutex and I'm in rcu
update section, progs pointer is not going to change. So as long
as I'm holding a mutex, I can call bpf_prog_array helpers and
get consistent result between the calls".
Existing __rcu annotations don't add anything to the safety.
See, for example, bpf_prog_array_copy_to_user and bpf_prog_array_length
which currently do rcu_read_lock/rcu_read_unlock inside. If we expect
the callers to hold a mutex, why do we start rcu-read section inside
and rcu-update section?
next prev parent reply other threads:[~2019-05-14 17:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-08 17:18 [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 1/4] " Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 2/4] bpf: media: properly use bpf_prog_array api Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 3/4] bpf: cgroup: " Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 4/4] bpf: tracing: " Stanislav Fomichev
2019-05-08 17:56 ` [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array Alexei Starovoitov
2019-05-08 18:12 ` Stanislav Fomichev
2019-05-13 18:57 ` Stanislav Fomichev
2019-05-14 16:55 ` Alexei Starovoitov
2019-05-14 17:30 ` Stanislav Fomichev
2019-05-14 17:45 ` Alexei Starovoitov
2019-05-14 17:53 ` Stanislav Fomichev [this message]
2019-05-15 1:25 ` Alexei Starovoitov
2019-05-15 2:11 ` Stanislav Fomichev
2019-05-15 2:27 ` Alexei Starovoitov
2019-05-15 2:44 ` Eric Dumazet
2019-05-15 2:56 ` Stanislav Fomichev
2019-05-15 3:16 ` Alexei Starovoitov
2019-05-15 3:38 ` Stanislav Fomichev
2019-05-15 3:42 ` Alexei Starovoitov
2019-05-15 3:49 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190514175332.GC10244@mini-arch \
--to=sdf@fomichev.me \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).