All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, paulmck <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	bpf@vger.kernel.org
Subject: Re: Instrumentation and RCU
Date: Tue, 17 Mar 2020 13:56:14 -0400	[thread overview]
Message-ID: <20200317175614.GA13090@google.com> (raw)
In-Reply-To: <20200310014043.4dbagqbr2wsbuarm@ast-mbp>

On Mon, Mar 09, 2020 at 06:40:45PM -0700, Alexei Starovoitov wrote:
> On Mon, Mar 09, 2020 at 02:37:40PM -0400, Mathieu Desnoyers wrote:
> > > 
> > >    But what's relevant is the tracer overhead which is e.g. inflicted
> > >    with todays trace_hardirqs_off/on() implementation because that
> > >    unconditionally uses the rcuidle variant with the scru/rcu_irq dance
> > >    around every tracepoint.
> > 
> > I think one of the big issues here is that most of the uses of
> > trace_hardirqs_off() are from sites which already have RCU watching,
> > so we are doing heavy-weight operations for nothing.
> 
> I think kernel/trace/trace_preemptirq.c created too many problems for the
> kernel without providing tangible benefits. My understanding no one is using it
> in production.

Hi Alexei,
There are various people use the preempt/irq disable tracepoints for last 2
years at Google and ARM. There's also a BPF tool (in BCC) that uses those for
tracing critical sections. Also Daniel Bristot's entire Preempt-IRQ formal
verification stuff depends on it.

> It's a tool to understand how kernel works. And such debugging
> tool can and should be removed.

If we go by that line of reasoning, then function tracing also should be
removed from the kernel.

I am glad Thomas and Peter are working on it and looking forward to seeing
the patches,

thanks,

 - Joel


> One of Thomas's patches mentioned that bpf can be invoked from hardirq and
> preempt tracers. This connection doesn't exist in a direct way, but
> theoretically it's possible. There is no practical use though and I would be
> happy to blacklist such bpf usage at a minimum.
> 
> > We could use the approach proposed by Peterz's and Steven's patches to basically
> > do a lightweight "is_rcu_watching()" check for rcuidle tracepoint, and only enable
> > RCU for those cases. We could then simply go back on using regular RCU like so:
> > 
> > #define __DO_TRACE(tp, proto, args, cond, rcuidle)                      \
> >         do {                                                            \
> >                 struct tracepoint_func *it_func_ptr;                    \
> >                 void *it_func;                                          \
> >                 void *__data;                                           \
> >                 bool exit_rcu = false;                                  \
> >                                                                         \
> >                 if (!(cond))                                            \
> >                         return;                                         \
> >                                                                         \
> >                 if (rcuidle && !rcu_is_watching()) {                    \
> >                         rcu_irq_enter_irqson();                         \
> >                         exit_rcu = true;                                \
> >                 }                                                       \
> >                 preempt_disable_notrace();                              \
> >                 it_func_ptr = rcu_dereference_raw((tp)->funcs);         \
> >                 if (it_func_ptr) {                                      \
> >                         do {                                            \
> >                                 it_func = (it_func_ptr)->func;          \
> >                                 __data = (it_func_ptr)->data;           \
> >                                 ((void(*)(proto))(it_func))(args);      \
> >                         } while ((++it_func_ptr)->func);                \
> >                 }                                                       \
> >                 preempt_enable_notrace();                               \
> >                 if (exit_rcu)                                           \
> >                         rcu_irq_exit_irqson();                          \
> >         } while (0)
> 
> I think it's a fine approach interim.
> 
> Long term sounds like Paul is going to provide sleepable and low overhead
> rcu_read_lock_for_tracers() that will include bpf.
> My understanding that this new rcu flavor won't have "idle" issues,
> so rcu_is_watching() checks will not be necessary.
> And if we remove trace_preemptirq.c the only thing left will be Thomas's points
> 1 (low level entry) and 2 (breakpoints) that can be addressed without
> creating fancy .text annotations and teach objtool about it.
> 
> In the mean time I've benchmarked srcu for sleepable bpf and it's quite heavy.
> srcu_read_lock+unlock roughly adds 10x execution cost to trivial bpf prog.
> I'm proceeding with it anyway, but really hoping that
> rcu_read_lock_for_tracers() will materialize soon.
> 
> In general I'm sceptical that .text annotations will work. Let's say all of
> idle is a red zone. But a ton of normal functions are called when idle. So
> objtool will go and mark them as red zone too. This way large percent of the
> kernel will be off limits for tracers. Which is imo not a good trade off. I
> think addressing 1 and 2 with explicit notrace/nokprobe annotations will cover
> all practical cases where people can shot themselves in a foot with a tracer. I
> realize that there will be forever whack-a-mole game and these annotations will
> never reach 100%. I think it's a fine trade off. Security is never 100% either.
> Tracing is never going to be 100% safe too.

  parent reply	other threads:[~2020-03-17 17:56 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 17:02 Instrumentation and RCU Thomas Gleixner
2020-03-09 18:15 ` Steven Rostedt
2020-03-09 18:42   ` Joel Fernandes
2020-03-09 19:07     ` Steven Rostedt
2020-03-09 19:20       ` Mathieu Desnoyers
2020-03-16 15:02       ` Joel Fernandes
2020-03-09 18:59   ` Thomas Gleixner
2020-03-10  8:09     ` Masami Hiramatsu
2020-03-10 11:43       ` Thomas Gleixner
2020-03-10 15:31         ` Mathieu Desnoyers
2020-03-10 15:46           ` Steven Rostedt
2020-03-10 16:21             ` Mathieu Desnoyers
2020-03-11  0:18               ` Masami Hiramatsu
2020-03-11  0:37                 ` Mathieu Desnoyers
2020-03-11  7:48                   ` Masami Hiramatsu
2020-03-10 16:06         ` Masami Hiramatsu
2020-03-12 13:53         ` Peter Zijlstra
2020-03-10 15:24       ` Mathieu Desnoyers
2020-03-10 17:05       ` Daniel Thompson
2020-03-09 18:37 ` Mathieu Desnoyers
2020-03-09 18:44   ` Steven Rostedt
2020-03-09 18:52     ` Mathieu Desnoyers
2020-03-09 19:09       ` Steven Rostedt
2020-03-09 19:25         ` Mathieu Desnoyers
2020-03-09 19:52   ` Thomas Gleixner
2020-03-10 15:03     ` Mathieu Desnoyers
2020-03-10 16:48       ` Thomas Gleixner
2020-03-10 17:40         ` Mathieu Desnoyers
2020-03-10 18:31           ` Thomas Gleixner
2020-03-10 18:37             ` Mathieu Desnoyers
2020-03-10  1:40   ` Alexei Starovoitov
2020-03-10  8:02     ` Thomas Gleixner
2020-03-10 16:54     ` Paul E. McKenney
2020-03-17 17:56     ` Joel Fernandes [this message]
2020-03-09 20:18 ` Peter Zijlstra
2020-03-09 20:47 ` Paul E. McKenney
2020-03-09 20:58   ` Steven Rostedt
2020-03-09 21:25     ` Paul E. McKenney
2020-03-09 23:52   ` Frederic Weisbecker
2020-03-10  2:26     ` Paul E. McKenney
2020-03-10 15:13   ` Mathieu Desnoyers
2020-03-10 16:49     ` Paul E. McKenney
2020-03-10 17:22       ` Mathieu Desnoyers
2020-03-10 17:26         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200317175614.GA13090@google.com \
    --to=joel@joelfernandes.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.