From: Jiri Olsa <olsajiri@gmail.com>
To: Hao Sun <sunhao.th@gmail.com>, ast@kernel.org
Cc: Jiri Olsa <olsajiri@gmail.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
andrii@kernel.org, bpf <bpf@vger.kernel.org>,
daniel@iogearbox.net, haoluo@google.com,
john.fastabend@gmail.com, kpsingh@kernel.org,
martin.lau@linux.dev, sdf@google.com, song@kernel.org,
yhs@fb.com
Subject: Re: WARNING in bpf_bprintf_prepare
Date: Mon, 7 Nov 2022 13:31:06 +0100 [thread overview]
Message-ID: <Y2j6ivTwFmA0FtvY@krava> (raw)
In-Reply-To: <Y2J+n7SqmtfyA7ZM@krava>
On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > Jiri Olsa <olsajiri@gmail.com> 于2022年10月27日周四 19:24写道:
> > >
> > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > Hi,
> > > >
> > > > The following warning can be triggered with the C reproducer in the link.
> > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > But this warning can still be triggered on 247f34f7b803
> > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > still is an issue.
> > > >
> > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > git tree: upstream
> > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > >
> > > hi,
> > > right, that fix addressed that issue for single bpf program,
> > > and it won't prevent if there are multiple programs hook on
> > > contention_begin tracepoint and calling bpf_trace_printk,
> > >
> > > I'm not sure we can do something there.. will check
> > >
> > > do you run just the reproducer, or you load the server somehow?
> > > I cannot hit the issue so far
> > >
> >
> > Hi,
> >
> > Last email has format issues, resend it here.
> >
> > I built the kernel with the config in the link, which contains
> > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > boot the kernel with normal qemu setup and then the warning can be
> > triggered by executing the reproducer.
> >
> > Also, I’m willing to test the proposed patch if any.
>
> fyi I reproduced that.. will check if we can do anything about that
I reproduced this with set of 8 programs all hooked to contention_begin
tracepoint and here's what I think is happening:
all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
running 'perf bench sched messaging' to load the machine
at some point some contended lock triggers trace_contention_begin:
trace_contention_begin
__traceiter_contention_begin <-- iterates all functions attached to tracepoint
__bpf_trace_run(prog1)
prog1->active = 1
bpf_prog_run(prog1)
bpf_trace_printk
bpf_bprintf_prepare <-- takes buffer 1 out of 3
raw_spin_lock_irqsave(trace_printk_lock)
# we have global single trace_printk_lock, so we will trigger
# its trace_contention_begin at some point
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
prog1->active block <-- prog1 is already 'running', skipping the execution
__bpf_trace_run(prog2)
prog2->active = 1
bpf_prog_run(prog2)
bpf_trace_printk
bpf_bprintf_prepare <-- takes buffer 2 out of 3
raw_spin_lock_irqsave(trace_printk_lock)
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
prog1->active block <-- prog1 is already 'running', skipping the execution
__bpf_trace_run(prog2)
prog2->active block <-- prog2 is already 'running', skipping the execution
__bpf_trace_run(prog3)
prog3->active = 1
bpf_prog_run(prog3)
bpf_trace_printk
bpf_bprintf_prepare <-- takes buffer 3 out of 3
raw_spin_lock_irqsave(trace_printk_lock)
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
prog1->active block <-- prog1 is already 'running', skipping the execution
__bpf_trace_run(prog2)
prog2->active block <-- prog2 is already 'running', skipping the execution
__bpf_trace_run(prog3)
prog3->active block <-- prog3 is already 'running', skipping the execution
__bpf_trace_run(prog4)
prog4->active = 1
bpf_prog_run(prog4)
bpf_trace_printk
bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
the code path may vary based on the contention of the trace_printk_lock,
so I saw different nesting within 8 programs, but all eventually ended up
at 4 levels of nesting and hit the warning
I think we could perhaps move the 'active' flag protection from program
to the tracepoint level (in the patch below), to prevent nesting execution
of the same tracepoint, so it'd look like:
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1) {
contention_begin.active = 1
bpf_prog_run(prog1)
bpf_trace_printk
bpf_bprintf_prepare
raw_spin_lock_irqsave(trace_printk_lock)
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
blocked because contention_begin.active == 1
__bpf_trace_run(prog2)
blocked because contention_begin.active == 1
__bpf_trace_run(prog3)
...
__bpf_trace_run(prog8)
blocked because contention_begin.active == 1
raw_spin_unlock_irqrestore
bpf_bprintf_cleanup
contention_begin.active = 0
}
__bpf_trace_run(prog2) {
contention_begin.active = 1
bpf_prog_run(prog2)
...
contention_begin.active = 0
}
do we need bpf program execution in nested tracepoints?
we could actually allow 3 nesting levels for this case.. thoughts?
thanks,
jirka
---
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27..5a354ae096e5 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -78,11 +78,15 @@
#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
#define __BPF_DECLARE_TRACE(call, proto, args) \
+static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
static notrace void \
__bpf_trace_##call(void *__data, proto) \
{ \
struct bpf_prog *prog = __data; \
- CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ \
+ if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ this_cpu_dec(__bpf_trace_tp_active_##call); \
}
#undef DECLARE_EVENT_CLASS
next prev parent reply other threads:[~2022-11-07 12:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-27 2:27 WARNING in bpf_bprintf_prepare Hao Sun
2022-10-27 11:24 ` Jiri Olsa
2022-10-27 11:45 ` Hao Sun
2022-11-02 14:28 ` Jiri Olsa
2022-11-07 12:31 ` Jiri Olsa [this message]
2022-11-07 20:49 ` Alexei Starovoitov
2022-11-09 13:49 ` Jiri Olsa
2022-11-09 19:41 ` Alexei Starovoitov
2022-11-09 23:53 ` Jiri Olsa
2022-11-11 14:45 ` Jiri Olsa
2022-11-11 16:02 ` Hao Sun
2022-11-14 8:04 ` Jiri Olsa
2022-11-14 22:47 ` Jiri Olsa
2022-11-15 1:48 ` Hao Sun
2022-11-15 17:01 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2j6ivTwFmA0FtvY@krava \
--to=olsajiri@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=sunhao.th@gmail.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox