BPF List
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Jiri Olsa <olsajiri@gmail.com>,
	 Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Leon Hwang <hffilwlqm@gmail.com>,  bpf <bpf@vger.kernel.org>,
	 Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 "Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
	 Jakub Sitnicki <jakub@cloudflare.com>,
	 Ilya Leoshkevich <iii@linux.ibm.com>,
	 Hengqi Chen <hengqi.chen@gmail.com>,
	 kernel-patches-bot@fb.com
Subject: Re: [PATCH bpf-next 2/4] bpf, x64: Fix tailcall hierarchy
Date: Fri, 05 Jan 2024 16:18:13 -0800	[thread overview]
Message-ID: <65989c459a6b6_3a2dc208c1@john.notmuch> (raw)
In-Reply-To: <ZZf4uXuSvFq1JwU1@krava>

Jiri Olsa wrote:
> On Thu, Jan 04, 2024 at 08:15:36PM -0800, Alexei Starovoitov wrote:
> > On Thu, Jan 4, 2024 at 6:23 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
> > >
> > >
> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index fe30b9ebb8de4..67fa337fc2e0c 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -259,7 +259,7 @@ struct jit_context {
> > >  /* Number of bytes emit_patch() needs to generate instructions */
> > >  #define X86_PATCH_SIZE         5
> > >  /* Number of bytes that will be skipped on tailcall */
> > > -#define X86_TAIL_CALL_OFFSET   (11 + ENDBR_INSN_SIZE)
> > > +#define X86_TAIL_CALL_OFFSET   (22 + ENDBR_INSN_SIZE)
> > >
> > >  static void push_r12(u8 **pprog)
> > >  {
> > > @@ -406,14 +406,21 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
> > >          */
> > >         emit_nops(&prog, X86_PATCH_SIZE);
> > >         if (!ebpf_from_cbpf) {
> > > -               if (tail_call_reachable && !is_subprog)
> > > +               if (tail_call_reachable && !is_subprog) {
> > >                         /* When it's the entry of the whole tailcall context,
> > >                          * zeroing rax means initialising tail_call_cnt.
> > >                          */
> > > -                       EMIT2(0x31, 0xC0); /* xor eax, eax */
> > > -               else
> > > -                       /* Keep the same instruction layout. */
> > > -                       EMIT2(0x66, 0x90); /* nop2 */
> > > +                       EMIT2(0x31, 0xC0);       /* xor eax, eax */
> > > +                       EMIT1(0x50);             /* push rax */
> > > +                       /* Make rax as ptr that points to tail_call_cnt. */
> > > +                       EMIT3(0x48, 0x89, 0xE0); /* mov rax, rsp */
> > > +                       EMIT1_off32(0xE8, 2);    /* call main prog */
> > > +                       EMIT1(0x59);             /* pop rcx, get rid of tail_call_cnt */
> > > +                       EMIT1(0xC3);             /* ret */
> > > +               } else {
> > > +                       /* Keep the same instruction size. */
> > > +                       emit_nops(&prog, 13);
> > > +               }
> > 
> > I'm afraid the extra call breaks stack unwinding and many other things.
> > The proper frame needs to be setup (push rbp; etc)
> > and 'leave' + emit_return() is used.
> > Plain 'ret' is not ok.
> > x86_call_depth_emit_accounting() needs to be used too.
> > That will make X86_TAIL_CALL_OFFSET adjustment very complicated.
> > Also the fix doesn't address the stack size issue.
> > We shouldn't allow all the extra frames at run-time.
> > 
> > The tail_cnt_ptr approach is interesting but too heavy,
> > since arm64, s390 and other JITs would need to repeat it with equally
> > complicated calculations in TAIL_CALL_OFFSET.
> > 
> > The fix should really be thought through for all JITs. Not just x86.
> > 
> > I'm thinking whether we should do the following instead:
> > 
> > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > index 0bdbbbeab155..0b45571559be 100644
> > --- a/kernel/bpf/arraymap.c
> > +++ b/kernel/bpf/arraymap.c
> > @@ -910,7 +910,7 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
> >         if (IS_ERR(prog))
> >                 return prog;
> > 
> > -       if (!bpf_prog_map_compatible(map, prog)) {
> > +       if (!bpf_prog_map_compatible(map, prog) || prog->aux->func_cnt) {
> >                 bpf_prog_put(prog);
> >                 return ERR_PTR(-EINVAL);
> >         }
> > 
> > This will stop stack growth, but it will break a few existing tests.
> > I feel it's a price worth paying.
> > 
> > John, Daniel,
> > 
> > do you see anything breaking on cilium side if we disallow
> > progs with subprogs to be inserted in prog_array ?
> 
> FWIW tetragon should be ok with this.. we use few subprograms in
> hubble, but most of them are not called from tail called programs

We actually do this in some of the l7 parsers where we try to use
subprogs as much as possible but still use prog_array for calls
that might end up being recursive.

I'll think about it Monday my first thought is some refactoring and
using bpf_loop or other helpers could resolve this.

If its just bpf-next and not backported onto stable we can probably
figure something out.

> 
> jirka
> 
> > 
> > Other alternatives?
> > 
> 

  reply	other threads:[~2024-01-06  0:18 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-04 14:22 [PATCH bpf-next 0/4] bpf, x64: Fix tailcall hierarchy Leon Hwang
2024-01-04 14:22 ` [PATCH bpf-next 1/4] bpf, x64: Use emit_nops() to replace memcpy()'ing x86_nops[5] Leon Hwang
2024-01-04 14:22 ` [PATCH bpf-next 2/4] bpf, x64: Fix tailcall hierarchy Leon Hwang
2024-01-05  4:15   ` Alexei Starovoitov
2024-01-05  6:15     ` Leon Hwang
2024-01-05 17:43       ` Alexei Starovoitov
2024-01-06  2:38         ` Leon Hwang
2024-01-05 10:33     ` Leon Hwang
2024-01-05 17:47       ` Alexei Starovoitov
2024-01-06  2:33         ` Leon Hwang
2024-01-06  3:34           ` Alexei Starovoitov
2024-01-05 12:40     ` Jiri Olsa
2024-01-06  0:18       ` John Fastabend [this message]
2024-01-06  3:46         ` Alexei Starovoitov
2024-02-14  5:47     ` Leon Hwang
2024-02-14 11:25       ` Maciej Fijalkowski
2024-02-14 16:31         ` Leon Hwang
2024-02-14 23:16       ` Alexei Starovoitov
2024-02-15 13:16         ` Leon Hwang
2024-02-16  2:18           ` Alexei Starovoitov
2024-02-17 13:43             ` Leon Hwang
2024-02-20  5:13               ` Leon Hwang
2024-02-20 17:34                 ` Alexei Starovoitov
2024-02-20 17:33               ` Alexei Starovoitov
2024-02-21 14:42                 ` Leon Hwang
2024-01-04 14:22 ` [PATCH bpf-next 3/4] bpf, x64: Rename RESTORE_TAIL_CALL_CNT() to LOAD_TAIL_CALL_CNT_PTR() Leon Hwang
2024-01-04 14:22 ` [PATCH bpf-next 4/4] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65989c459a6b6_3a2dc208c1@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hengqi.chen@gmail.com \
    --cc=hffilwlqm@gmail.com \
    --cc=iii@linux.ibm.com \
    --cc=jakub@cloudflare.com \
    --cc=kernel-patches-bot@fb.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=olsajiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox