All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>, Hao Sun <sunhao.th@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Hao Luo <haoluo@google.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Stanislav Fomichev <sdf@google.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yhs@fb.com>
Subject: Re: WARNING in bpf_bprintf_prepare
Date: Wed, 9 Nov 2022 14:49:48 +0100	[thread overview]
Message-ID: <Y2uv/GjnSdr/ujOj@krava> (raw)
In-Reply-To: <CAADnVQKXcdVa-gDj2_QTrBuVea+KMuFUdabR--HCf7x6Vo6uXg@mail.gmail.com>

On Mon, Nov 07, 2022 at 12:49:01PM -0800, Alexei Starovoitov wrote:
> On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > > Jiri Olsa <olsajiri@gmail.com> 于2022年10月27日周四 19:24写道:
> > > > >
> > > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > > Hi,
> > > > > >
> > > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > > But this warning can still be triggered on 247f34f7b803
> > > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > > still is an issue.
> > > > > >
> > > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > > git tree: upstream
> > > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > > >
> > > > > hi,
> > > > > right, that fix addressed that issue for single bpf program,
> > > > > and it won't prevent if there are multiple programs hook on
> > > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > > >
> > > > > I'm not sure we can do something there.. will check
> > > > >
> > > > > do you run just the reproducer, or you load the server somehow?
> > > > > I cannot hit the issue so far
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > Last email has format issues, resend it here.
> > > >
> > > > I built the kernel with the config in the link, which contains
> > > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > > boot the kernel with normal qemu setup and then the warning can be
> > > > triggered by executing the reproducer.
> > > >
> > > > Also, I’m willing to test the proposed patch if any.
> > >
> > > fyi I reproduced that.. will check if we can do anything about that
> >
> > I reproduced this with set of 8 programs all hooked to contention_begin
> > tracepoint and here's what I think is happening:
> >
> > all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> > running 'perf bench sched messaging' to load the machine
> >
> > at some point some contended lock triggers trace_contention_begin:
> >
> >   trace_contention_begin
> >     __traceiter_contention_begin                                <-- iterates all functions attached to tracepoint
> >       __bpf_trace_run(prog1)
> >         prog1->active = 1
> >         bpf_prog_run(prog1)
> >           bpf_trace_printk
> >             bpf_bprintf_prepare                                 <-- takes buffer 1 out of 3
> >             raw_spin_lock_irqsave(trace_printk_lock)
> >
> >               # we have global single trace_printk_lock, so we will trigger
> >               # its trace_contention_begin at some point
> >
> >               trace_contention_begin
> >                 __traceiter_contention_begin
> >                   __bpf_trace_run(prog1)
> >                     prog1->active block                         <-- prog1 is already 'running', skipping the execution
> >                   __bpf_trace_run(prog2)
> >                     prog2->active = 1
> >                     bpf_prog_run(prog2)
> >                       bpf_trace_printk
> >                         bpf_bprintf_prepare                     <-- takes buffer 2 out of 3
> >                         raw_spin_lock_irqsave(trace_printk_lock)
> >                           trace_contention_begin
> >                             __traceiter_contention_begin
> >                               __bpf_trace_run(prog1)
> >                                 prog1->active block             <-- prog1 is already 'running', skipping the execution
> >                               __bpf_trace_run(prog2)
> >                                 prog2->active block             <-- prog2 is already 'running', skipping the execution
> >                               __bpf_trace_run(prog3)
> >                                  prog3->active = 1
> >                                  bpf_prog_run(prog3)
> >                                    bpf_trace_printk
> >                                      bpf_bprintf_prepare        <-- takes buffer 3 out of 3
> >                                      raw_spin_lock_irqsave(trace_printk_lock)
> >                                        trace_contention_begin
> >                                          __traceiter_contention_begin
> >                                            __bpf_trace_run(prog1)
> >                                              prog1->active block      <-- prog1 is already 'running', skipping the execution
> >                                            __bpf_trace_run(prog2)
> >                                              prog2->active block      <-- prog2 is already 'running', skipping the execution
> >                                            __bpf_trace_run(prog3)
> >                                              prog3->active block      <-- prog3 is already 'running', skipping the execution
> >                                            __bpf_trace_run(prog4)
> >                                              prog4->active = 1
> >                                              bpf_prog_run(prog4)
> >                                                bpf_trace_printk
> >                                                  bpf_bprintf_prepare  <-- tries to take buffer 4 out of 3 -> WARNING
> >
> >
> > the code path may vary based on the contention of the trace_printk_lock,
> > so I saw different nesting within 8 programs, but all eventually ended up
> > at 4 levels of nesting and hit the warning
> >
> > I think we could perhaps move the 'active' flag protection from program
> > to the tracepoint level (in the patch below), to prevent nesting execution
> > of the same tracepoint, so it'd look like:
> >
> >   trace_contention_begin
> >     __traceiter_contention_begin
> >       __bpf_trace_run(prog1) {
> >         contention_begin.active = 1
> >         bpf_prog_run(prog1)
> >           bpf_trace_printk
> >             bpf_bprintf_prepare
> >             raw_spin_lock_irqsave(trace_printk_lock)
> >               trace_contention_begin
> >                 __traceiter_contention_begin
> >                   __bpf_trace_run(prog1)
> >                     blocked because contention_begin.active == 1
> >                   __bpf_trace_run(prog2)
> >                     blocked because contention_begin.active == 1
> >                   __bpf_trace_run(prog3)
> >                   ...
> >                   __bpf_trace_run(prog8)
> >                     blocked because contention_begin.active == 1
> >
> >             raw_spin_unlock_irqrestore
> >             bpf_bprintf_cleanup
> >
> >         contention_begin.active = 0
> >       }
> >
> >       __bpf_trace_run(prog2) {
> >         contention_begin.active = 1
> >         bpf_prog_run(prog2)
> >           ...
> >         contention_begin.active = 0
> >       }
> >
> > do we need bpf program execution in nested tracepoints?
> > we could actually allow 3 nesting levels for this case.. thoughts?
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > index 6a13220d2d27..5a354ae096e5 100644
> > --- a/include/trace/bpf_probe.h
> > +++ b/include/trace/bpf_probe.h
> > @@ -78,11 +78,15 @@
> >  #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> >
> >  #define __BPF_DECLARE_TRACE(call, proto, args)                         \
> > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call);              \
> >  static notrace void                                                    \
> >  __bpf_trace_##call(void *__data, proto)                                        \
> >  {                                                                      \
> >         struct bpf_prog *prog = __data;                                 \
> > -       CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> > +                                                                       \
> > +       if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1))             \
> > +               CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> > +       this_cpu_dec(__bpf_trace_tp_active_##call);                                     \
> >  }
> 
> This approach will hurt real use cases where
> multiple and different raw_tp progs run on the same cpu.

would the 2 levels of nesting help in here?

I can imagine the change above would break use case where we want to
trigger tracepoints in irq context that interrupted task that's already
in the same tracepoint

with 2 levels of nesting we would trigger that tracepoint from irq and
would still be safe with bpf_bprintf_prepare buffer

what other use case do I miss?

thanks,
jirka

> Instead let's disallow attaching to trace_contention and
> potentially any other hook with similar recursion properties.
> Another option is to add a recursion check to trace_contention itself.


  reply	other threads:[~2022-11-09 13:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27  2:27 WARNING in bpf_bprintf_prepare Hao Sun
2022-10-27 11:24 ` Jiri Olsa
2022-10-27 11:45   ` Hao Sun
2022-11-02 14:28     ` Jiri Olsa
2022-11-07 12:31       ` Jiri Olsa
2022-11-07 20:49         ` Alexei Starovoitov
2022-11-09 13:49           ` Jiri Olsa [this message]
2022-11-09 19:41             ` Alexei Starovoitov
2022-11-09 23:53               ` Jiri Olsa
2022-11-11 14:45                 ` Jiri Olsa
2022-11-11 16:02                   ` Hao Sun
2022-11-14  8:04                     ` Jiri Olsa
2022-11-14 22:47                       ` Jiri Olsa
2022-11-15  1:48                         ` Hao Sun
2022-11-15 17:01                           ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2uv/GjnSdr/ujOj@krava \
    --to=olsajiri@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=sunhao.th@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.