* [PATCH] tracing: Add task activate/deactivate tracepoints
@ 2010-05-28 14:26 Frederic Weisbecker
2010-05-28 15:15 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2010-05-28 14:26 UTC (permalink / raw)
To: LKML; +Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Ingo Molnar,
Steven Rostedt
We have various tracepoints that tell us when a task is going to
be enqueued in a runqueue: fork, wakeup, migrate.
But they don't always provide us the level of information necessary
to know what is actually in which runqueue, precisely because the
migrate event is only fired if the task is queued on another
cpu than its previous one. So we don't always know where a waking up
task goes.
And moreover we don't have events that tells a task goes to sleep,
and even that wouldn't cover every cases when a task is dequeued.
So bring these two new tracepoints to get informations about the
load of each runqueues.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
include/trace/events/sched.h | 36 ++++++++++++++++++++++++++++++++++++
kernel/sched.c | 2 ++
2 files changed, 38 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 4f733ec..f7f94af 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -185,6 +185,42 @@ TRACE_EVENT(sched_migrate_task,
__entry->orig_cpu, __entry->dest_cpu)
);
+DECLARE_EVENT_CLASS(sched_activation_template,
+
+ TP_PROTO(struct task_struct *p, int cpu),
+
+ TP_ARGS(p, cpu),
+
+ TP_STRUCT__entry(
+ __field( pid_t, pid )
+ __field( int, cpu )
+ ),
+
+ TP_fast_assign(
+ __entry->pid = p->pid;
+ __entry->cpu = cpu;
+ ),
+
+ TP_printk("pid=%d cpu=%d", __entry->pid, __entry->cpu)
+);
+
+
+/*
+ * Tracepoint for activating a task, pulling it in a runqueue
+ */
+DEFINE_EVENT(sched_activation_template, sched_activate_task,
+ TP_PROTO(struct task_struct *p, int cpu),
+ TP_ARGS(p, cpu));
+
+
+/*
+ * Tracepoint for deactivating a task, pushing it out a runqueue
+ */
+DEFINE_EVENT(sched_activation_template, sched_deactivate_task,
+ TP_PROTO(struct task_struct *p, int cpu),
+ TP_ARGS(p, cpu));
+
+
DECLARE_EVENT_CLASS(sched_process_template,
TP_PROTO(struct task_struct *p),
diff --git a/kernel/sched.c b/kernel/sched.c
index 1d93cd0..8c0b90d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1904,6 +1904,7 @@ static void activate_task(struct rq *rq, struct task_struct *p, int flags)
if (task_contributes_to_load(p))
rq->nr_uninterruptible--;
+ trace_sched_activate_task(p, cpu_of(rq));
enqueue_task(rq, p, flags);
inc_nr_running(rq);
}
@@ -1916,6 +1917,7 @@ static void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
if (task_contributes_to_load(p))
rq->nr_uninterruptible++;
+ trace_sched_deactivate_task(p, cpu_of(rq));
dequeue_task(rq, p, flags);
dec_nr_running(rq);
}
--
1.6.2.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-28 14:26 [PATCH] tracing: Add task activate/deactivate tracepoints Frederic Weisbecker
@ 2010-05-28 15:15 ` Peter Zijlstra
2010-05-31 8:00 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-28 15:15 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Steven Rostedt
On Fri, 2010-05-28 at 16:26 +0200, Frederic Weisbecker wrote:
> We have various tracepoints that tell us when a task is going to
> be enqueued in a runqueue: fork, wakeup, migrate.
>
> But they don't always provide us the level of information necessary
> to know what is actually in which runqueue, precisely because the
> migrate event is only fired if the task is queued on another
> cpu than its previous one. So we don't always know where a waking up
> task goes.
>
> And moreover we don't have events that tells a task goes to sleep,
> and even that wouldn't cover every cases when a task is dequeued.
>
> So bring these two new tracepoints to get informations about the
> load of each runqueues.
NAK, aside from a few corner cases wakeup and sleep are the important
points.
The activate and deactivate functions are implementation details.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-28 15:15 ` Peter Zijlstra
@ 2010-05-31 8:00 ` Ingo Molnar
2010-05-31 8:12 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2010-05-31 8:00 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Frederic Weisbecker, LKML, Steven Rostedt
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-05-28 at 16:26 +0200, Frederic Weisbecker wrote:
> > We have various tracepoints that tell us when a task is going to
> > be enqueued in a runqueue: fork, wakeup, migrate.
> >
> > But they don't always provide us the level of information necessary
> > to know what is actually in which runqueue, precisely because the
> > migrate event is only fired if the task is queued on another
> > cpu than its previous one. So we don't always know where a waking up
> > task goes.
> >
> > And moreover we don't have events that tells a task goes to sleep,
> > and even that wouldn't cover every cases when a task is dequeued.
> >
> > So bring these two new tracepoints to get informations about the
> > load of each runqueues.
>
> NAK, aside from a few corner cases wakeup and sleep are the important
> points.
>
> The activate and deactivate functions are implementation details.
Frederic, can you show us a concrete example of where we dont know what is
going on due to inadequate instrumentation? Can we fix that be extending the
existing tracepoints?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 8:00 ` Ingo Molnar
@ 2010-05-31 8:12 ` Peter Zijlstra
2010-05-31 8:54 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-31 8:12 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Frederic Weisbecker, LKML, Steven Rostedt
On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> >
> > NAK, aside from a few corner cases wakeup and sleep are the important
> > points.
> >
> > The activate and deactivate functions are implementation details.
>
> Frederic, can you show us a concrete example of where we dont know what is
> going on due to inadequate instrumentation? Can we fix that be extending the
> existing tracepoints?
Right, so a few of those corner cases I mentioned above are things like
re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
activate cycles. so if you want to see those, we can add an explicit
tracepoint for those actions.
An explicit nice/PI-boost tracepoint is much clearer than trying to
figure out wth the deactivate/activate cycle was for.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 8:12 ` Peter Zijlstra
@ 2010-05-31 8:54 ` Peter Zijlstra
2010-05-31 14:36 ` Frederic Weisbecker
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-31 8:54 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Frederic Weisbecker, LKML, Steven Rostedt
On Mon, 2010-05-31 at 10:12 +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> > >
> > > NAK, aside from a few corner cases wakeup and sleep are the important
> > > points.
> > >
> > > The activate and deactivate functions are implementation details.
> >
> > Frederic, can you show us a concrete example of where we dont know what is
> > going on due to inadequate instrumentation? Can we fix that be extending the
> > existing tracepoints?
>
> Right, so a few of those corner cases I mentioned above are things like
> re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
> activate cycles. so if you want to see those, we can add an explicit
> tracepoint for those actions.
>
> An explicit nice/PI-boost tracepoint is much clearer than trying to
> figure out wth the deactivate/activate cycle was for.
Another advantage of explicit tracepoints is that you'd see them even
for non-running tasks, because we only do the deactivate/activate thingy
for runnable tasks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 8:54 ` Peter Zijlstra
@ 2010-05-31 14:36 ` Frederic Weisbecker
2010-05-31 14:43 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2010-05-31 14:36 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Steven Rostedt
On Mon, May 31, 2010 at 10:54:59AM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 10:12 +0200, Peter Zijlstra wrote:
> > On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> > > >
> > > > NAK, aside from a few corner cases wakeup and sleep are the important
> > > > points.
> > > >
> > > > The activate and deactivate functions are implementation details.
> > >
> > > Frederic, can you show us a concrete example of where we dont know what is
> > > going on due to inadequate instrumentation? Can we fix that be extending the
> > > existing tracepoints?
> >
> > Right, so a few of those corner cases I mentioned above are things like
> > re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
> > activate cycles. so if you want to see those, we can add an explicit
> > tracepoint for those actions.
> >
> > An explicit nice/PI-boost tracepoint is much clearer than trying to
> > figure out wth the deactivate/activate cycle was for.
>
> Another advantage of explicit tracepoints is that you'd see them even
> for non-running tasks, because we only do the deactivate/activate thingy
> for runnable tasks.
Yeah. So I agree with you that activate/deactivate are too much
implementation related, they even don't give much sense as we
don't know the cause of the event, could be a simple renice, or
could be a sleep.
So agreed, this sucks.
For the corner cases like re-nice and PI-boost or so, we can indeed plug
some higher level tracepoints there.
But there is one more important problem these tracepoints were solving and
that still need something:
We don't know when a task goes to sleep. We have two wait tracepoints,
sched_wait_task() to wait for a task to unschedule, and sched_process_wait()
that is a hooks for waitid and wait4 syscalls. So we are missing all
the event waiting from inside the kernel. But even with that, wait and sleep
doesn't mean the same thing. Sleeping don't always involve using the waiting
API.
I think we need such tracepoint:
diff --git a/kernel/sched.c b/kernel/sched.c
index 8c0b90d..5f67c04 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3628,8 +3628,10 @@ need_resched_nonpreemptible:
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
- else
+ else {
+ trace_sched_task_sleep(prev);
deactivate_task(rq, prev, DEQUEUE_SLEEP);
+ }
switch_count = &prev->nvcsw;
}
And if people need tracepoints in the events waiting API, we can add that
later.
And concerning the task waking up, if it is not migrated, it means it stays
on its orig cpu. This is something that can be dealt from the post-processing.
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 14:36 ` Frederic Weisbecker
@ 2010-05-31 14:43 ` Peter Zijlstra
2010-05-31 14:48 ` Frederic Weisbecker
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-31 14:43 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Steven Rostedt
On Mon, 2010-05-31 at 16:36 +0200, Frederic Weisbecker wrote:
> On Mon, May 31, 2010 at 10:54:59AM +0200, Peter Zijlstra wrote:
> > On Mon, 2010-05-31 at 10:12 +0200, Peter Zijlstra wrote:
> > > On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> > > > >
> > > > > NAK, aside from a few corner cases wakeup and sleep are the important
> > > > > points.
> > > > >
> > > > > The activate and deactivate functions are implementation details.
> > > >
> > > > Frederic, can you show us a concrete example of where we dont know what is
> > > > going on due to inadequate instrumentation? Can we fix that be extending the
> > > > existing tracepoints?
> > >
> > > Right, so a few of those corner cases I mentioned above are things like
> > > re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
> > > activate cycles. so if you want to see those, we can add an explicit
> > > tracepoint for those actions.
> > >
> > > An explicit nice/PI-boost tracepoint is much clearer than trying to
> > > figure out wth the deactivate/activate cycle was for.
> >
> > Another advantage of explicit tracepoints is that you'd see them even
> > for non-running tasks, because we only do the deactivate/activate thingy
> > for runnable tasks.
>
>
> Yeah. So I agree with you that activate/deactivate are too much
> implementation related, they even don't give much sense as we
> don't know the cause of the event, could be a simple renice, or
> could be a sleep.
>
> So agreed, this sucks.
>
> For the corner cases like re-nice and PI-boost or so, we can indeed plug
> some higher level tracepoints there.
>
> But there is one more important problem these tracepoints were solving and
> that still need something:
>
> We don't know when a task goes to sleep. We have two wait tracepoints,
> sched_wait_task() to wait for a task to unschedule, and sched_process_wait()
> that is a hooks for waitid and wait4 syscalls. So we are missing all
> the event waiting from inside the kernel. But even with that, wait and sleep
> doesn't mean the same thing. Sleeping don't always involve using the waiting
> API.
>
> I think we need such tracepoint:
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 8c0b90d..5f67c04 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3628,8 +3628,10 @@ need_resched_nonpreemptible:
> if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> if (unlikely(signal_pending_state(prev->state, prev)))
> prev->state = TASK_RUNNING;
> - else
> + else {
> + trace_sched_task_sleep(prev);
> deactivate_task(rq, prev, DEQUEUE_SLEEP);
> + }
> switch_count = &prev->nvcsw;
> }
> And concerning the task waking up, if it is not migrated, it means it stays
> on its orig cpu. This is something that can be dealt from the post-processing.
Hurm,.. I was thinking trace_sched_switch(.prev_state != TASK_RUNNING)
would be enough, but its not for preemptible kernels.
Should we maybe cure this and rely on sched_switch() to detect sleeps?
It seems natural since only the current task can go to sleep, its just
that the whole preempt state gets a bit iffy.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 14:43 ` Peter Zijlstra
@ 2010-05-31 14:48 ` Frederic Weisbecker
2010-05-31 16:18 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2010-05-31 14:48 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Steven Rostedt
On Mon, May 31, 2010 at 04:43:33PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 16:36 +0200, Frederic Weisbecker wrote:
> > On Mon, May 31, 2010 at 10:54:59AM +0200, Peter Zijlstra wrote:
> > > On Mon, 2010-05-31 at 10:12 +0200, Peter Zijlstra wrote:
> > > > On Mon, 2010-05-31 at 10:00 +0200, Ingo Molnar wrote:
> > > > > >
> > > > > > NAK, aside from a few corner cases wakeup and sleep are the important
> > > > > > points.
> > > > > >
> > > > > > The activate and deactivate functions are implementation details.
> > > > >
> > > > > Frederic, can you show us a concrete example of where we dont know what is
> > > > > going on due to inadequate instrumentation? Can we fix that be extending the
> > > > > existing tracepoints?
> > > >
> > > > Right, so a few of those corner cases I mentioned above are things like
> > > > re-nice, PI-boosts etc.. Those use deactivate, modify task-state,
> > > > activate cycles. so if you want to see those, we can add an explicit
> > > > tracepoint for those actions.
> > > >
> > > > An explicit nice/PI-boost tracepoint is much clearer than trying to
> > > > figure out wth the deactivate/activate cycle was for.
> > >
> > > Another advantage of explicit tracepoints is that you'd see them even
> > > for non-running tasks, because we only do the deactivate/activate thingy
> > > for runnable tasks.
> >
> >
> > Yeah. So I agree with you that activate/deactivate are too much
> > implementation related, they even don't give much sense as we
> > don't know the cause of the event, could be a simple renice, or
> > could be a sleep.
> >
> > So agreed, this sucks.
> >
> > For the corner cases like re-nice and PI-boost or so, we can indeed plug
> > some higher level tracepoints there.
> >
> > But there is one more important problem these tracepoints were solving and
> > that still need something:
> >
> > We don't know when a task goes to sleep. We have two wait tracepoints,
> > sched_wait_task() to wait for a task to unschedule, and sched_process_wait()
> > that is a hooks for waitid and wait4 syscalls. So we are missing all
> > the event waiting from inside the kernel. But even with that, wait and sleep
> > doesn't mean the same thing. Sleeping don't always involve using the waiting
> > API.
> >
> > I think we need such tracepoint:
> >
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index 8c0b90d..5f67c04 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -3628,8 +3628,10 @@ need_resched_nonpreemptible:
> > if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> > if (unlikely(signal_pending_state(prev->state, prev)))
> > prev->state = TASK_RUNNING;
> > - else
> > + else {
> > + trace_sched_task_sleep(prev);
> > deactivate_task(rq, prev, DEQUEUE_SLEEP);
> > + }
> > switch_count = &prev->nvcsw;
> > }
>
> > And concerning the task waking up, if it is not migrated, it means it stays
> > on its orig cpu. This is something that can be dealt from the post-processing.
>
> Hurm,.. I was thinking trace_sched_switch(.prev_state != TASK_RUNNING)
> would be enough, but its not for preemptible kernels.
>
> Should we maybe cure this and rely on sched_switch() to detect sleeps?
> It seems natural since only the current task can go to sleep, its just
> that the whole preempt state gets a bit iffy.
Sounds good, we have the preempt depth in the common tracepoint headers, I'll
try to rebuild a reliable cpu runqueue from post-processing and see if all that
is enough.
Thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 14:48 ` Frederic Weisbecker
@ 2010-05-31 16:18 ` Peter Zijlstra
2010-05-31 16:37 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-31 16:18 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Steven Rostedt
On Mon, 2010-05-31 at 16:48 +0200, Frederic Weisbecker wrote:
> > Should we maybe cure this and rely on sched_switch() to detect sleeps?
> > It seems natural since only the current task can go to sleep, its just
> > that the whole preempt state gets a bit iffy.
How about something like the below?
Steve, is that proper usage of CREATE_TRACE_POINT?
---
Subject: sched, trace: Fix sched_switch() prev_state argument
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon May 31 18:13:25 CEST 2010
For CONFIG_PREEMPT=y kernels the sched_switch(.prev_state) argument
isn't useful because we can get preempted with current->state !=
TASK_RUNNING without actually getting removed from the runqueue.
Cure this by treating all preempted tasks as runnable from the
tracer's point of view.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/trace/events/sched.h | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
Index: linux-2.6/include/trace/events/sched.h
===================================================================
--- linux-2.6.orig/include/trace/events/sched.h
+++ linux-2.6/include/trace/events/sched.h
@@ -115,6 +115,23 @@ DEFINE_EVENT(sched_wakeup_template, sche
TP_PROTO(struct task_struct *p, int success),
TP_ARGS(p, success));
+#ifdef CREATE_TRACE_POINTS
+static inline long __trace_sched_switch_state(struct task_struct *p)
+{
+ long state = p->state;
+
+#ifdef CONFIG_PREEMPT
+ /*
+ * For all intents and purposes a preempted task is a running task.
+ */
+ if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
+ state = TASK_RUNNING;
+#endif
+
+ return state;
+}
+#endif
+
/*
* Tracepoint for task switches, performed by the scheduler:
*/
@@ -139,7 +156,7 @@ TRACE_EVENT(sched_switch,
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
__entry->prev_prio = prev->prio;
- __entry->prev_state = prev->state;
+ __entry->prev_state = __trace_sched_switch_state(prev);
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
__entry->next_pid = next->pid;
__entry->next_prio = next->prio;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 16:18 ` Peter Zijlstra
@ 2010-05-31 16:37 ` Steven Rostedt
2010-05-31 18:28 ` Peter Zijlstra
2010-05-31 16:51 ` Frederic Weisbecker
2010-06-01 9:13 ` [tip:sched/urgent] sched, trace: Fix sched_switch() prev_state argument tip-bot for Peter Zijlstra
2 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2010-05-31 16:37 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Frederic Weisbecker, Ingo Molnar, LKML
Expect slow responses from me today. It's a US Holiday.
On Mon, 2010-05-31 at 18:18 +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 16:48 +0200, Frederic Weisbecker wrote:
> > > Should we maybe cure this and rely on sched_switch() to detect sleeps?
> > > It seems natural since only the current task can go to sleep, its just
> > > that the whole preempt state gets a bit iffy.
>
> How about something like the below?
>
> Steve, is that proper usage of CREATE_TRACE_POINT?
>
> ---
> Subject: sched, trace: Fix sched_switch() prev_state argument
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon May 31 18:13:25 CEST 2010
>
> For CONFIG_PREEMPT=y kernels the sched_switch(.prev_state) argument
> isn't useful because we can get preempted with current->state !=
> TASK_RUNNING without actually getting removed from the runqueue.
>
> Cure this by treating all preempted tasks as runnable from the
> tracer's point of view.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> include/trace/events/sched.h | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/include/trace/events/sched.h
> ===================================================================
> --- linux-2.6.orig/include/trace/events/sched.h
> +++ linux-2.6/include/trace/events/sched.h
> @@ -115,6 +115,23 @@ DEFINE_EVENT(sched_wakeup_template, sche
> TP_PROTO(struct task_struct *p, int success),
> TP_ARGS(p, success));
>
> +#ifdef CREATE_TRACE_POINTS
I guess this could work. I can't think of anything that would cause this
to fail. But this is not exactly what the CREATE_TRACE_POINTS macro was
for.
Maybe we could make a CREATE_UTIL_FUNCTIONS macro that the
define_trace.h can unset like it does with CREATE_TRACE_POINTS before
recursively including the trace headers.
Maybe I'm a bit paranoid, but I'm a little nervous to extend the
CREATE_TRACE_POINTS macro to be used within the header to create utility
functions, although, currently I don't think there's anything
technically wrong in doing so.
-- Steve
> +static inline long __trace_sched_switch_state(struct task_struct *p)
> +{
> + long state = p->state;
> +
> +#ifdef CONFIG_PREEMPT
> + /*
> + * For all intents and purposes a preempted task is a running task.
> + */
> + if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
> + state = TASK_RUNNING;
> +#endif
> +
> + return state;
> +}
> +#endif
> +
> /*
> * Tracepoint for task switches, performed by the scheduler:
> */
> @@ -139,7 +156,7 @@ TRACE_EVENT(sched_switch,
> memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> __entry->prev_pid = prev->pid;
> __entry->prev_prio = prev->prio;
> - __entry->prev_state = prev->state;
> + __entry->prev_state = __trace_sched_switch_state(prev);
> memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> __entry->next_pid = next->pid;
> __entry->next_prio = next->prio;
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 16:18 ` Peter Zijlstra
2010-05-31 16:37 ` Steven Rostedt
@ 2010-05-31 16:51 ` Frederic Weisbecker
2010-06-01 9:13 ` [tip:sched/urgent] sched, trace: Fix sched_switch() prev_state argument tip-bot for Peter Zijlstra
2 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2010-05-31 16:51 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Steven Rostedt
On Mon, May 31, 2010 at 06:18:35PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 16:48 +0200, Frederic Weisbecker wrote:
> > > Should we maybe cure this and rely on sched_switch() to detect sleeps?
> > > It seems natural since only the current task can go to sleep, its just
> > > that the whole preempt state gets a bit iffy.
>
> How about something like the below?
>
> Steve, is that proper usage of CREATE_TRACE_POINT?
>
> ---
> Subject: sched, trace: Fix sched_switch() prev_state argument
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon May 31 18:13:25 CEST 2010
>
> For CONFIG_PREEMPT=y kernels the sched_switch(.prev_state) argument
> isn't useful because we can get preempted with current->state !=
> TASK_RUNNING without actually getting removed from the runqueue.
>
> Cure this by treating all preempted tasks as runnable from the
> tracer's point of view.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
Other than Steve's said, the thing looks good.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 16:37 ` Steven Rostedt
@ 2010-05-31 18:28 ` Peter Zijlstra
2010-05-31 19:14 ` Steven Rostedt
2010-05-31 19:16 ` Steven Rostedt
0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2010-05-31 18:28 UTC (permalink / raw)
To: rostedt; +Cc: Frederic Weisbecker, Ingo Molnar, LKML
On Mon, 2010-05-31 at 12:37 -0400, Steven Rostedt wrote:
> >
> > +#ifdef CREATE_TRACE_POINTS
>
> I guess this could work. I can't think of anything that would cause this
> to fail. But this is not exactly what the CREATE_TRACE_POINTS macro was
> for.
>
> Maybe we could make a CREATE_UTIL_FUNCTIONS macro that the
> define_trace.h can unset like it does with CREATE_TRACE_POINTS before
> recursively including the trace headers.
>
> Maybe I'm a bit paranoid, but I'm a little nervous to extend the
> CREATE_TRACE_POINTS macro to be used within the header to create utility
> functions, although, currently I don't think there's anything
> technically wrong in doing so.
Right, I can attest to the compile mess that results in not having
it :-) Given that, I think we're fairly safe with stretching it like
this, the compiler will yell real loud if you mess this up. So I'm not
sure you need to be very paranoid about this.
Duplicating the whole CREATE_TRACE_POINT logic just for a different name
doesn't seem worth the effort at this time, esp. given the compiler
results if you get it wrong.
So do you object if I merge this for now, or would you really rather see
something else?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 18:28 ` Peter Zijlstra
@ 2010-05-31 19:14 ` Steven Rostedt
2010-05-31 19:16 ` Steven Rostedt
1 sibling, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2010-05-31 19:14 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Frederic Weisbecker, Ingo Molnar, LKML
On Mon, 2010-05-31 at 20:28 +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 12:37 -0400, Steven Rostedt wrote:
> > >
> So do you object if I merge this for now, or would you really rather see
> something else?
>
No, I don't object.
Cautiously-acked-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] tracing: Add task activate/deactivate tracepoints
2010-05-31 18:28 ` Peter Zijlstra
2010-05-31 19:14 ` Steven Rostedt
@ 2010-05-31 19:16 ` Steven Rostedt
1 sibling, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2010-05-31 19:16 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Frederic Weisbecker, Ingo Molnar, LKML
On Mon, 2010-05-31 at 20:28 +0200, Peter Zijlstra wrote:
> On Mon, 2010-05-31 at 12:37 -0400, Steven Rostedt wrote:
> > Maybe I'm a bit paranoid, but I'm a little nervous to extend the
> > CREATE_TRACE_POINTS macro to be used within the header to create utility
> > functions, although, currently I don't think there's anything
> > technically wrong in doing so.
>
> Right, I can attest to the compile mess that results in not having
> it :-) Given that, I think we're fairly safe with stretching it like
> this, the compiler will yell real loud if you mess this up. So I'm not
> sure you need to be very paranoid about this.
Actually, I'm not worried about getting the utility functions right. I'm
actually more worried about extending TRACE_EVENT() and having this be a
thorn in our side when doing so.
-- Steve
^ permalink raw reply [flat|nested] 15+ messages in thread
* [tip:sched/urgent] sched, trace: Fix sched_switch() prev_state argument
2010-05-31 16:18 ` Peter Zijlstra
2010-05-31 16:37 ` Steven Rostedt
2010-05-31 16:51 ` Frederic Weisbecker
@ 2010-06-01 9:13 ` tip-bot for Peter Zijlstra
2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-06-01 9:13 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, rostedt, a.p.zijlstra, tglx, mingo
Commit-ID: 02f726949f2be0967aa4871dd4e47d3967779b26
Gitweb: http://git.kernel.org/tip/02f726949f2be0967aa4871dd4e47d3967779b26
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Mon, 31 May 2010 18:13:25 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 1 Jun 2010 09:27:17 +0200
sched, trace: Fix sched_switch() prev_state argument
For CONFIG_PREEMPT=y kernels the sched_switch(.prev_state) argument isn't
useful because we can get preempted with current->state != TASK_RUNNING
without actually getting removed from the runqueue.
Cure this by treating all preempted tasks as runnable from the tracer's
point of view.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cautiously-acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1275322715.27810.23323.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/trace/events/sched.h | 19 ++++++++++++++++++-
1 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 4f733ec..b9e1dd6 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -115,6 +115,23 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new,
TP_PROTO(struct task_struct *p, int success),
TP_ARGS(p, success));
+#ifdef CREATE_TRACE_POINTS
+static inline long __trace_sched_switch_state(struct task_struct *p)
+{
+ long state = p->state;
+
+#ifdef CONFIG_PREEMPT
+ /*
+ * For all intents and purposes a preempted task is a running task.
+ */
+ if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
+ state = TASK_RUNNING;
+#endif
+
+ return state;
+}
+#endif
+
/*
* Tracepoint for task switches, performed by the scheduler:
*/
@@ -139,7 +156,7 @@ TRACE_EVENT(sched_switch,
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
__entry->prev_prio = prev->prio;
- __entry->prev_state = prev->state;
+ __entry->prev_state = __trace_sched_switch_state(prev);
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
__entry->next_pid = next->pid;
__entry->next_prio = next->prio;
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-06-01 9:13 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-28 14:26 [PATCH] tracing: Add task activate/deactivate tracepoints Frederic Weisbecker
2010-05-28 15:15 ` Peter Zijlstra
2010-05-31 8:00 ` Ingo Molnar
2010-05-31 8:12 ` Peter Zijlstra
2010-05-31 8:54 ` Peter Zijlstra
2010-05-31 14:36 ` Frederic Weisbecker
2010-05-31 14:43 ` Peter Zijlstra
2010-05-31 14:48 ` Frederic Weisbecker
2010-05-31 16:18 ` Peter Zijlstra
2010-05-31 16:37 ` Steven Rostedt
2010-05-31 18:28 ` Peter Zijlstra
2010-05-31 19:14 ` Steven Rostedt
2010-05-31 19:16 ` Steven Rostedt
2010-05-31 16:51 ` Frederic Weisbecker
2010-06-01 9:13 ` [tip:sched/urgent] sched, trace: Fix sched_switch() prev_state argument tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox