All of lore.kernel.org
 help / color / mirror / Atom feed
From: sdf@google.com
To: Martin KaFai Lau <kafai@fb.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [PATCH bpf-next] bpf: move rcu lock management out of BPF_PROG_RUN routines
Date: Wed, 13 Apr 2022 15:52:18 -0700	[thread overview]
Message-ID: <YldUIipJvL/7tK4P@google.com> (raw)
In-Reply-To: <20220413223216.7lrdbizxg4g2bv5i@kafai-mbp.dhcp.thefacebook.com>

On 04/13, Martin KaFai Lau wrote:
> On Wed, Apr 13, 2022 at 12:52:53PM -0700, Andrii Nakryiko wrote:
> > On Wed, Apr 13, 2022 at 12:39 PM <sdf@google.com> wrote:
> > >
> > > On 04/13, Andrii Nakryiko wrote:
> > > > On Wed, Apr 13, 2022 at 11:33 AM Stanislav Fomichev <sdf@google.com>
> > > > wrote:
> > > > >
> > > > > Commit 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of  
> macros
> > > > > into functions") switched a bunch of BPF_PROG_RUN macros to inline
> > > > > routines. This changed the semantic a bit. Due to arguments  
> expansion
> > > > > of macros, it used to be:
> > > > >
> > > > >         rcu_read_lock();
> > > > >         array = rcu_dereference(cgrp->bpf.effective[atype]);
> > > > >         ...
> > > > >
> > > > > Now, with with inline routines, we have:
> > > > >         array_rcu = rcu_dereference(cgrp->bpf.effective[atype]);
> > > > >         /* array_rcu can be kfree'd here */
> > > > >         rcu_read_lock();
> > > > >         array = rcu_dereference(array_rcu);
> > > > >
> > >
> > > > So subtle difference, wow...
> > >
> > > > But this open-coding of rcu_read_lock() seems very unfortunate as
> > > > well. Would making BPF_PROG_RUN_ARRAY back to a macro which only  
> does
> > > > rcu lock/unlock and grabs effective array and then calls static  
> inline
> > > > function be a viable solution?
> > >
> > > > #define BPF_PROG_RUN_ARRAY_CG_FLAGS(array_rcu, ctx, run_prog,  
> ret_flags) \
> > > >    ({
> > > >        int ret;
> > >
> > > >        rcu_read_lock();
> > > >        ret =
> > > > __BPF_PROG_RUN_ARRAY_CG_FLAGS(rcu_dereference(array_rcu), ....);
> > > >        rcu_read_unlock();
> > > >        ret;
> > > >    })
> > >
> > >
> > > > where __BPF_PROG_RUN_ARRAY_CG_FLAGS is what
> > > > BPF_PROG_RUN_ARRAY_CG_FLAGS is today but with __rcu annotation  
> dropped
> > > > (and no internal rcu stuff)?
> > >
> > > Yeah, that should work. But why do you think it's better to hide them?
> > > I find those automatic rcu locks deep in the call stack a bit obscure
> > > (when reasoning about sleepable vs non-sleepable contexts/bpf).
> > >
> > > I, as the caller, know that the effective array is rcu-managed (it
> > > has __rcu annotation) and it seems natural for me to grab rcu lock
> > > while work with it; I might grab it for some other things like cgroup
> > > anyway.
> >
> > If you think that having this more explicitly is better, I'm fine with
> > that as well. I thought a simpler invocation pattern would be good,
> > given we call bpf_prog_run_array variants in quite a lot of places. So
> > count me indifferent. I'm curious what others think.

> Would it work if the bpf_prog_run_array_cg() directly takes the
> 'struct cgroup *cgrp' argument instead of the array ?
> bpf_prog_run_array_cg() should know what protection is needed
> to get member from the cgrp ptr.  The sk call path should be able
> to provide a cgrp ptr.  For current cgrp, pass NULL as the cgrp
> pointer and then current will be used in bpf_prog_run_array_cg().
> A rcu_read_lock() is needed anyway to get the current's cgrp
> and can be done together in bpf_prog_run_array_cg().

> That there are only two remaining bpf_prog_run_array() usages
> from lirc and bpf_trace which are not too bad to have them
> directly do rcu_read_lock on their own struct ?

 From Andrii's original commit message:

     I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to  
accept
     struct cgroup and enum bpf_attach_type instead of bpf_prog_array,  
fetching
     cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
     required including include/linux/cgroup-defs.h, which I wasn't sure is  
ok with
     everyone.

I guess including cgroup-defs.h/bpf-cgroup-defs.h into bpf.h might still
be somewhat problematic?

But even if we pass the cgroup pointer, I'm assuming that this cgroup  
pointer
is still rcu-managed, right? So the callers still have to rcu-lock.
However, in most places we don't care and do "cgrp =  
sock_cgroup_ptr(&sk->sk_cgrp_data);"
but seems like it depends on the fact that sockets can't (yet?)
change their cgroup association and it's fine to not rcu-lock that
cgroup. Seems fragile, but ok. It always stumbles me when I see:

cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
bpf_prog_run_array_cg_flags(cgrp.bpf->effective[atype], ...)

But then, with current, it becomes:

rcu_read_lock();
cgrp = task_dfl_cgroup(current);
bpf_prog_run_array_cg_flags(cgrp.bpf->effective[atype], ...)
rcu_read_unlock();

Idk, I might be overthinking it. I'll try to see if including
bpf-cgroup-defs.h and passing cgroup_bpf is workable.

  reply	other threads:[~2022-04-13 22:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 18:32 [PATCH bpf-next] bpf: move rcu lock management out of BPF_PROG_RUN routines Stanislav Fomichev
2022-04-13 19:23 ` Andrii Nakryiko
2022-04-13 19:39   ` sdf
2022-04-13 19:52     ` Andrii Nakryiko
2022-04-13 22:31       ` Daniel Borkmann
2022-04-13 22:32       ` Martin KaFai Lau
2022-04-13 22:52         ` sdf [this message]
2022-04-13 23:56           ` Martin KaFai Lau
2022-04-14 21:41             ` Andrii Nakryiko
2022-04-14  9:30           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YldUIipJvL/7tK4P@google.com \
    --to=sdf@google.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.