linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org,
	jolsa@kernel.org, "Paul E . McKenney" <paulmck@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
Date: Sat, 6 Apr 2024 12:41:05 +0900	[thread overview]
Message-ID: <20240406124105.5a31c488b00c36432cc81446@kernel.org> (raw)
In-Reply-To: <CAEf4BzauQ2WKMjZdc9s0rBWa01BYbgwHN6aNDXQSHYia47pQ-w@mail.gmail.com>

On Tue, 2 Apr 2024 22:21:00 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Apr 2, 2024 at 9:00 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Apr 2, 2024 at 5:52 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Wed, 3 Apr 2024 09:40:48 +0900
> > > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > >
> > > > OK, for me, this last sentence is preferred for the help message. That explains
> > > > what this is for.
> > > >
> > > >         All callbacks that attach to the function tracing have some sort
> > > >         of protection against recursion. This option is only to verify that
> > > >        ftrace (and other users of ftrace_test_recursion_trylock()) are not
> > > >         called outside of RCU, as if they are, it can cause a race.
> > > >         But it also has a noticeable overhead when enabled.
> >
> > Sounds good to me, I can add this to the description of the Kconfig option.
> >
> > > >
> > > > BTW, how much overhead does this introduce? and the race case a kernel crash?
> >
> > I just checked our fleet-wide production data for the last 24 hours.
> > Within the kprobe/kretprobe code path (ftrace_trampoline and
> > everything called from it), rcu_is_watching (both calls, see below)
> > cause 0.484% CPU cycles usage, which isn't nothing. So definitely we'd
> > prefer to be able to avoid that in production use cases.
> >
> 
> I just ran synthetic microbenchmark testing multi-kretprobe
> throughput. We get (in millions of BPF kretprobe-multi program
> invocations per second):
>   - 5.568M/s as baseline;
>   - 5.679M/s with changes in this patch (+2% throughput improvement);
>   - 5.808M/s with disabling rcu_is_watching in rethook_try_get()
> (+2.3% more vs just one of rcu_is_watching, and +4.3% vs baseline).
> 
> It's definitely noticeable.

Thanks for checking the overhead! Hmm, it is considerable.

> > > > or just messed up the ftrace buffer?
> > >
> > > There's a hypothetical race where it can cause a use after free.

Hmm, so it might not lead a kernel crash but is better to enable with
other debugging options.

> > >
> > > That is, after you shutdown ftrace, you need to call synchronize_rcu_tasks(),
> > > which requires RCU to be watching. There's a theoretical case where that
> > > task calls the trampoline and misses the synchronization. Note, these
> > > locations are with preemption disabled, as rcu is always watching when
> > > preemption is enabled. Thus it would be extremely fast where as the
> > > synchronize_rcu_tasks() is rather slow.
> > >
> > > We also have synchronize_rcu_tasks_rude() which would actually keep the
> > > trace from happening, as it would schedule on each CPU forcing all CPUs to
> > > have RCU watching.
> > >
> > > I have never heard of this race being hit. I guess it could happen on a VM
> > > where a vCPU gets preempted at the right moment for a long time and the
> > > other CPUs synchronize.
> > >
> > > But like lockdep, where deadlocks can crash the kernel, we don't enable it
> > > for production.
> > >
> > > The overhead is another function call within the function tracer. I had
> > > numbers before, but I guess I could run tests again and get new numbers.
> > >
> >
> > I just noticed another rcu_is_watching() call, in rethook_try_get(),
> > which seems to be a similar and complementary validation check to the
> > one we are putting under CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING option
> > in this patch. It feels like both of them should be controlled by the
> > same settings. WDYT? Can I add CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
> > guard around rcu_is_watching() check in rethook_try_get() as well?

Hmmm, no, I think it should not change the rethook side because rethook
can be used with kprobes without ftrace. If we can detect it is used from
the ftrace, we can skip it. (From this reason, I would like to remove
return probe from kprobes...)

Thank you,

> >
> >
> > > Thanks,
> > >
> > > -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

  parent reply	other threads:[~2024-04-06  3:41 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 16:03 [PATCH] ftrace: make extra rcu_is_watching() validation check optional Andrii Nakryiko
2024-03-25  2:38 ` Masami Hiramatsu
2024-03-25 16:56   ` Andrii Nakryiko
2024-03-25 22:13   ` Steven Rostedt
2024-03-26 16:16     ` Andrii Nakryiko
2024-03-26 19:01       ` Steven Rostedt
2024-03-29 16:39         ` Andrii Nakryiko
2024-04-01 11:25         ` Masami Hiramatsu
2024-04-01 16:09           ` Steven Rostedt
2024-04-02  0:38             ` Masami Hiramatsu
2024-04-02  2:29               ` Andrii Nakryiko
2024-04-02  2:47                 ` Steven Rostedt
2024-04-03  0:40                   ` Masami Hiramatsu
2024-04-03  0:54                     ` Steven Rostedt
2024-04-03  4:00                       ` Andrii Nakryiko
2024-04-03  5:21                         ` Andrii Nakryiko
2024-04-03 13:55                           ` Steven Rostedt
2024-04-06  3:41                           ` Masami Hiramatsu [this message]
2024-04-06 16:06                             ` Andrii Nakryiko
2024-04-03 13:53                         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240406124105.5a31c488b00c36432cc81446@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=jolsa@kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).