All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Arun Sharma <asharma@fb.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Steven Rostedt <rostedt@goodmis.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Andrew Vagin <avagin@openvz.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 1/2] tracing, sched: move the sched_switch tracepoint
Date: Mon, 19 Dec 2011 18:40:58 -0500	[thread overview]
Message-ID: <20111219234058.GA19915@Krystal> (raw)
In-Reply-To: <1324337005-31718-2-git-send-email-asharma@fb.com>

* Arun Sharma (asharma@fb.com) wrote:
> Move the tracepoint from prepare_task_switch() to
> finish_task_switch().
> 
> Without this, the event gets attributed to prev rather
> than next.

I would have thought that the semantic of "being scheduled"
(sched_switch) would belong to the context of the thread being scheduled
out ?

> For sleep profiling purposes, we want to collect
> the stacktrace of "next". This also makes the event usable in
> per-process mode, without root privileges.
>
> I can't think of this breaking the semantics of the existing
> uses of this tracepoint.

See comments inlined below,

> If there are any concerns, we should
> be able to define a new tracepoint usable for sleep profiling.

If it can be achieved by adding analysis to user-level tools, without
extracting more data, I think a less intrusive option should be
preferred. But for your per-process use-case, you might need a second
trace point.

> 
> Signed-off-by: Arun Sharma <asharma@fb.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: linux-kernel@vger.kernel.org
> ---
>  kernel/sched.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index d6b149c..b06b3c6 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3157,7 +3157,6 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
>  	fire_sched_out_preempt_notifiers(prev, next);
>  	prepare_lock_switch(rq, next);
>  	prepare_arch_switch(next);
> -	trace_sched_switch(prev, next);
>  }
>  
>  /**
> @@ -3205,6 +3204,7 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
>  #endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
>  	finish_lock_switch(rq, prev);
>  
> +	trace_sched_switch(prev, current, rq->clock);

Hrm, this is moving the event outside of the lock_switch, which has
reenabled interrupts and released the lock. This would therefore
introduce new "interesting" sequences of events where an IRQ could
appear to nest over the wrong PID. Also, I'm not certain it is still
safe to access "prev->" at this point. So the patch definitely needs to
be reviewed as to what event order it will change and to make sure it
does not introduce races.

I'd also be tempted to keep the trace_sched_switch in
prepare_task_switch, and create a new event for finish_task_switch.

Best regards,

Mathieu

>  	fire_sched_in_preempt_notifiers(current);
>  	if (mm)
>  		mmdrop(mm);
> -- 
> 1.7.4
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2011-12-19 23:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-19 23:23 [PATCH 0/2] Sleep profiling Arun Sharma
2011-12-19 23:23 ` [PATCH 1/2] tracing, sched: move the sched_switch tracepoint Arun Sharma
2011-12-19 23:29   ` Peter Zijlstra
2011-12-19 23:36   ` Peter Zijlstra
2011-12-19 23:40   ` Mathieu Desnoyers [this message]
2011-12-19 23:23 ` [PATCH 2/2] tracing, sched: Add delay info to sched_switch Arun Sharma
2011-12-19 23:28   ` Peter Zijlstra
2011-12-19 23:34     ` Steven Rostedt
2011-12-19 23:40       ` Arun Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111219234058.GA19915@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@infradead.org \
    --cc=asharma@fb.com \
    --cc=avagin@openvz.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.