public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 19/21] tracing: Account for preempt off in preempt_schedule()
Date: Fri, 23 Sep 2011 16:00:51 +0200	[thread overview]
Message-ID: <1316786451.9084.41.camel@twins> (raw)
In-Reply-To: <1316780676.29966.184.camel@gandalf.stny.rr.com>

On Fri, 2011-09-23 at 08:24 -0400, Steven Rostedt wrote:
> On Fri, 2011-09-23 at 13:22 +0200, Peter Zijlstra wrote:
> > On Fri, 2011-09-23 at 07:19 -0400, Steven Rostedt wrote:
> > 
> > > What would you suggest? Just ignore the latencies that schedule
> > > produces, even though its been one of the top causes of latencies?
> > 
> > I would like to actually understand the issue first.. so far all I've
> > got is confusion.
> 
> Simple. The preemptoff and even preemptirqsoff latency tracers record
> every time preemption is disabled and enabled. For preemptoff, this only
> includes modification of the preempt count. For preemptirqsoff, it
> includes both preempt count increment and interrupts being
> disabled/enabled.

Simple, except for this blob below..

> Currently, the preempt check is done in add/sub_preempt_count(). But in
> preempt_schedule() we call add/sub_preempt_count_notrace() which updates
> the preempt_count directly without any of the preempt off/on checks.
> 
> The changelog I referenced talked about why we use the notrace versions.

Which is the important bit, because that's icky and obviously buggy
(otherwise you wouldn't need to fix it), so one cannot judge the fix
without understanding the initial problem. 

> Some function tracing hooks use the preempt_enable/disable_notrace().

That's pretty much a non-statement in that it doesn't explain anything.

> Function tracer is not the only user of the function tracing facility.

So what?!

> With the original preempt_diable(), when we have preempt tracing
> enabled, the add/sub_preempt_count()s become traced by the function
> tracer (which is also a good thing as I've used that info). 

Why and how? It would seem to me marking those notrace would cure some
of the issues.

> The issue is
> in preempt_schedule() which is called by preempt_enable() if
> NEED_RESCHED is set and PREEMPT_ACTIVE is not set. One of the first
> things that preempt_schedule() does is call
> add_preempt_count(PREEMPT_ACTIVE), to add the PREEMPT_ACTIVE to preempt
> count and not come back into preempt_schedule() when interrupted again.

That sentence doesn't really parse for me, but I think you mean to say
that PREEMPT_ACTIVE guards against recursive preemption?

> But! If add_preempt_count(PREEPMT_ACTIVE) is traced, we call into the
> function tracing mechanism *before* it adds PREEMPT_ACTIVE, and when the
> function hook calls preempt_enable_notrace() it will notice the
> NEED_RESCHED set and PREEMPT_ACTIVE not set and recurse back into the
> preempt_schedule() and boom!

Right, so you've got the mess that there's two different notrace options
for the {add,sub}_preempt_count(). One is not doing the
trace_preempt_{off,on} thing, and one is marking the function notrace.

The normal {add,sub}_preempt_count() muck does everything, the
_notrace() one doesn't do either, and here you need one that does one
but not the other.

So you then open code things, but instead of using
trace_preempt_{off,on} you use {start,stop}_critical_timings(), which is
a totally different beast, confusing matters more.

What's worse, there's not a single comment near preempt_schedule()
explaining any of this mess.

> By making preempt_schedule() use notrace we avoid this issue with the
> function tracing hooks, but in the mean time, we just lost the check
> that preemption was disabled.

Uhm, no, by marking preempt_schedule() notrace you don't call into the
function tracer for this call. The preemption check was lost by using
the _notrace() preempt_count fiddling.

Hiding preempt_schedule() from the function tracer isn't exactly nice
though. Although I think we can reconstruct it was called by seeing
schedule() called and seeing N set (except of course that the default
function tracer output doesn't show that (one more reason the the
default output format is broken).

>  Since we know that preemption and
> interrupts were both enabled before calling into preempt_schedule()
> (otherwise it is a bug), we can just tell the latency tracers that
> preemption is being disabled manually with the
> start/stop_critical_timings(). Note, these function names comes from the
> original latency_tracer that was in -rt.

Doesn't matter where they come from, they're still confusing for anybody
trying to make sense of the {add,sub}_preempt_count vs &_notrace
functions.

> There's another location in the kernel that we need to manually call
> into the latency tracer and that's in idle. The cpu_idle() calls
> disables preemption then disables interrupts and may call some assembly
> instruction that puts the system into idle but wakes up on interrupts.
> Then on return, interrupts are enabled and preemption is again enabled.

> Since we don't know about this wakeup on interrupts, the latency tracers
> would count this idle wait as a latency, which obviously is not what we
> want. Which is where the start/stop_critical_timings() was created for.
> The preempt_schedule() case is similar in an opposite way. Instead of
> not wanting to trace, we want to trace, and the code works for this
> location too.

Wouldn't it be much more sensible to mark all that preempt/irq fiddling
in the idle path with _notrace instead? And stick a comment near.

Having two things to do the same thing just doesn't make sense.


  reply	other threads:[~2011-09-23 14:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-22 22:09 [PATCH 00/21] [GIT PULL][v3.2] tracing: queued up stuff waiting for k.org to come back on line Steven Rostedt
2011-09-22 22:09 ` [PATCH 01/21] tracing: Clean up tb_fmt to not give faulty compile warning Steven Rostedt
2011-09-22 22:09 ` [PATCH 02/21] Tracepoint: Dissociate from module mutex Steven Rostedt
2011-09-22 22:09 ` [PATCH 03/21] x86: jump_label: arch_jump_label_text_poke_early: add missing __init Steven Rostedt
2011-09-22 22:09 ` [PATCH 04/21] tracing/filter: Use static allocation for filter predicates Steven Rostedt
2011-09-22 22:09 ` [PATCH 05/21] tracing/filter: Separate predicate init and filter addition Steven Rostedt
2011-09-22 22:09 ` [PATCH 06/21] tracing/filter: Remove field_name from filter_pred struct Steven Rostedt
2011-09-22 22:09 ` [PATCH 07/21] tracing/filter: Simplify tracepoint event lookup Steven Rostedt
2011-09-22 22:09 ` [PATCH 08/21] tracing/filter: Unify predicate tree walking, change check_pred_tree Steven Rostedt
2011-09-22 22:09 ` [PATCH 09/21] tracing/filter: Change count_leafs function to use walk_pred_tree Steven Rostedt
2011-09-22 22:09 ` [PATCH 10/21] tracing/filter: Change fold_pred_tree " Steven Rostedt
2011-09-22 22:09 ` [PATCH 11/21] tracing/filter: Change fold_pred " Steven Rostedt
2011-09-22 22:09 ` [PATCH 12/21] tracing/filter: Change filter_match_preds function to use Steven Rostedt
2011-09-22 22:09 ` [PATCH 13/21] tracing/filter: Add startup tests for events filter Steven Rostedt
2011-09-22 22:09 ` [PATCH 14/21] tracing: Add preempt disable for filter self test Steven Rostedt
2011-09-22 22:09 ` [PATCH 15/21] trace: Add a new readonly entry to report total buffer size Steven Rostedt
2011-09-22 22:09 ` [PATCH 16/21] trace: Add ring buffer stats to measure rate of events Steven Rostedt
2011-09-22 22:09 ` [PATCH 17/21] tracing: Add a counter clock for those that do not trust clocks Steven Rostedt
2011-09-22 22:09 ` [PATCH 18/21] tracing: Fix preemptirqsoff tracer to not stop at preempt off Steven Rostedt
2011-09-22 22:09 ` [PATCH 19/21] tracing: Account for preempt off in preempt_schedule() Steven Rostedt
2011-09-23 11:00   ` Peter Zijlstra
2011-09-23 11:19     ` Steven Rostedt
2011-09-23 11:22       ` Peter Zijlstra
2011-09-23 12:24         ` Steven Rostedt
2011-09-23 14:00           ` Peter Zijlstra [this message]
2011-09-23 14:07           ` Peter Zijlstra
2011-09-23 13:24       ` Peter Zijlstra
2011-09-22 22:09 ` [PATCH 20/21] tracing: Do not allocate buffer for trace_marker Steven Rostedt
2011-09-22 22:09 ` [PATCH 21/21] tracing: Add optional percpu buffers for trace_printk() Steven Rostedt
2011-09-23 11:02   ` Peter Zijlstra
2011-09-23 11:07   ` Peter Zijlstra
2011-09-23 11:16     ` Steven Rostedt
2011-09-23 11:28       ` Peter Zijlstra
2011-09-23 11:39         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1316786451.9084.41.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox