Re: [PATCH bpf-next v2] bpf: move rcu lock management out of BPF_PROG_RUN routines

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: sdf@google.com
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>
Subject: Re: [PATCH bpf-next v2] bpf: move rcu lock management out of BPF_PROG_RUN routines
Date: Tue, 19 Apr 2022 08:42:21 -0700	[thread overview]
Message-ID: <Yl7YXXIG/EECZxd9@google.com> (raw)
In-Reply-To: <CAADnVQ+X5HPDsqXX6mHWV4sT9=2gQSag5cc9w6iJG_YE577ZEw@mail.gmail.com>

On 04/18, Alexei Starovoitov wrote:
> On Mon, Apr 18, 2022 at 9:50 AM <sdf@google.com> wrote:
> >
> > On 04/16, Alexei Starovoitov wrote:
> > > On Thu, Apr 14, 2022 at 9:12 AM Stanislav Fomichev <sdf@google.com>  
> wrote:
> > > > +static int
> > > > +bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > > > +                           enum cgroup_bpf_attach_type atype,
> > > > +                           const void *ctx, bpf_prog_run_fn  
> run_prog,
> > > > +                           int retval, u32 *ret_flags)
> > > > +{
> > > > +       const struct bpf_prog_array_item *item;
> > > > +       const struct bpf_prog *prog;
> > > > +       const struct bpf_prog_array *array;
> > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > +       struct bpf_cg_run_ctx run_ctx;
> > > > +       u32 func_ret;
> > > > +
> > > > +       run_ctx.retval = retval;
> > > > +       migrate_disable();
> > > > +       rcu_read_lock();
> > > > +       array = rcu_dereference(cgrp->effective[atype]);
> > > > +       item = &array->items[0];
> > > > +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > +       while ((prog = READ_ONCE(item->prog))) {
> > > > +               run_ctx.prog_item = item;
> > > > +               func_ret = run_prog(prog, ctx);
> > > ...
> > > > +       ret = bpf_prog_run_array_cg(&cgrp->bpf, CGROUP_GETSOCKOPT,
> > > >                                     &ctx, bpf_prog_run, retval);
> >
> > > Did you check the asm that bpf_prog_run gets inlined
> > > after being passed as a pointer to a function?
> > > Crossing fingers... I suspect not every compiler can do that :(
> > > De-virtualization optimization used to be tricky.
> >
> > No, I didn't, but looking at it right now, both gcc and clang
> > seem to be doing inlining all way up to bpf_dispatcher_nop_func.
> >
> > clang:
> >
> >    0000000000001750 <__cgroup_bpf_run_filter_sock_addr>:
> >    __cgroup_bpf_run_filter_sock_addr():
> >    ./kernel/bpf/cgroup.c:1226
> >    int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
> >                                       struct sockaddr *uaddr,
> >                                       enum cgroup_bpf_attach_type atype,
> >                                       void *t_ctx,
> >                                       u32 *flags)
> >    {
> >
> >    ...
> >
> >    ./include/linux/filter.h:628
> >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> >        1980:    49 8d 75 48             lea    0x48(%r13),%rsi
> >    bpf_dispatcher_nop_func():
> >    ./include/linux/bpf.h:804
> >         return bpf_func(ctx, insnsi);
> >        1984:    4c 89 f7                mov    %r14,%rdi
> >        1987:    41 ff 55 30             call   *0x30(%r13)
> >        198b:    89 c3                   mov    %eax,%ebx
> >
> > gcc (w/retpoline):
> >
> >    0000000000001110 <__cgroup_bpf_run_filter_sock_addr>:
> >    __cgroup_bpf_run_filter_sock_addr():
> >    kernel/bpf/cgroup.c:1226
> >    {
> >
> >    ...
> >
> >    ./include/linux/filter.h:628
> >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> >        11c5:    49 8d 75 48             lea    0x48(%r13),%rsi
> >    bpf_dispatcher_nop_func():
> >    ./include/linux/bpf.h:804
> >        11c9:    48 8d 7c 24 10          lea    0x10(%rsp),%rdi
> >        11ce:    e8 00 00 00 00          call   11d3
> > <__cgroup_bpf_run_filter_sock_addr+0xc3>
> >                         11cf: R_X86_64_PLT32     
> __x86_indirect_thunk_rax-0x4
> >        11d3:    89 c3                   mov    %eax,%ebx

> Hmm. I'm not sure how you've got this asm.
> Here is what I see with gcc 8 and gcc 10:
> bpf_prog_run_array_cg:
> ...
>          movq    %rcx, %r12      # run_prog, run_prog
> ...
> # ../kernel/bpf/cgroup.c:77:            run_ctx.prog_item = item;
>          movq    %rbx, (%rsp)    # item, run_ctx.prog_item
> # ../kernel/bpf/cgroup.c:78:            if (!run_prog(prog, ctx) &&
> !IS_ERR_VALUE((long)run_ctx.retval))
>          movq    %rbp, %rsi      # ctx,
>          call    *%r12   # run_prog

> __cgroup_bpf_run_filter_sk:
>          movq    $bpf_prog_run, %rcx     #,
> # ../kernel/bpf/cgroup.c:1202:  return
> bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
>          leaq    1520(%rax), %rdi        #, tmp92
> # ../kernel/bpf/cgroup.c:1202:  return
> bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
>          jmp     bpf_prog_run_array_cg   #

> This is without kasan, lockdep and all debug configs are off.

> So the generated code is pretty bad as I predicted :(

> So I'm afraid this approach is no go.

I've retested again and it still unrolls it for me on gcc 11 :-/
Anyway, I guess we have two options:

1. Go back to defines.
2. Don't pass a ptr to func, but pass an enum which indicates whether
    to use bpf_prog_run or __bpf_prog_run_save_cb. Seems like in this
    case the compiler shouldn't have any trouble unwrapping it?

I'll prototype and send (2). If it won't work out we can always get back
to (1).

next prev parent reply	other threads:[~2022-04-19 15:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 16:12 [PATCH bpf-next v2] bpf: move rcu lock management out of BPF_PROG_RUN routines Stanislav Fomichev
2022-04-14 20:23 ` Martin KaFai Lau
2022-04-16  1:28 ` Alexei Starovoitov
2022-04-18 16:50   ` sdf
2022-04-19  5:18     ` Alexei Starovoitov
2022-04-19 15:42       ` sdf [this message]
2022-04-19 16:20         ` Alexei Starovoitov
2022-04-19 16:32           ` Alexei Starovoitov
2022-04-19 16:35             ` Stanislav Fomichev
2022-04-19 16:48               ` Alexei Starovoitov
2022-04-19 17:01                 ` Stanislav Fomichev
2022-04-19 17:05                   ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yl7YXXIG/EECZxd9@google.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).