public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] sched: schedtop utility
@ 2008-05-22 14:06 Gregory Haskins
  2008-05-22 14:33 ` Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Gregory Haskins @ 2008-05-22 14:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, rostedt, Peter Zijlstra, suresh.b.siddha,
	aneesh.kumar, dhaval, vatsa, David Bahi

Hi all scheduler developers,
  I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop".

This utility will process statistics from /proc/schedstat such that the busiest stats will bubble up to the top.  It can alternately be sorted by the largest stat, or by name.  Stats can be included or excluded based on reg-ex pattern matching.

You can download the tarball here:

ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz

I have also posted it to the opensuse build service for generating RPMS for a handful of 32/64-bit x86 distros for your convenience:

http://download.opensuse.org/repositories/home:/ghaskins/

(Note that the build is still in progress for some of the favors, so if you do not see the flavor you are looking for, check back in a little while)

Comments/feedback/bug-fixes welcome!

Regards
-Greg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins
@ 2008-05-22 14:33 ` Steven Rostedt
  2008-06-02 12:48 ` Ankita Garg
  2008-06-03  3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins
  2 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2008-05-22 14:33 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, suresh.b.siddha,
	aneesh.kumar, dhaval, vatsa, David Bahi


On Thu, 22 May 2008, Gregory Haskins wrote:

> Hi all scheduler developers,
>   I had an itch to scratch w.r.t. watching the stats in
> /proc/schedstats, and it appears that the perl scripts referenced in
> Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so
> I whipped up a little utility I call "schedtop".

Greg,

Cool, I'll have to take a look when things get a little slower for me. But
when I get a chance, I'll make my comments then.

-- Steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins
  2008-05-22 14:33 ` Steven Rostedt
@ 2008-06-02 12:48 ` Ankita Garg
  2008-06-02 13:07   ` Peter Zijlstra
  2008-06-03  3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins
  2 siblings, 1 reply; 13+ messages in thread
From: Ankita Garg @ 2008-06-02 12:48 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: linux-kernel, Ingo Molnar, rostedt, Peter Zijlstra,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

Hi Gregory,

On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote:
> Hi all scheduler developers,
>   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop".
>

Nice tool! Helps in better visualization of the data in schedstats. 

Using the tool, realized that most of the timing related stats therein
might not be completely usable in many scenarios, as might already be
known.

Without any additional load on the system, all the stats are nice and
sane. But, as soon as I ran my particular testcase, the data
pertaining to the delta of run_delay/cpu_time went haywire! I understand
that all the values are based on top of rq->clock, which relies on tsc that 
is not synced across cpus and would result in skews/incorrect values.
But, turns out to be not so reliable data for debugging. This is
ofcourse nothing related to the tool, but for schedstat in
general...rather just adding on to the already existing woes with non-syned 
tscs :-)

> This utility will process statistics from /proc/schedstat such that the busiest stats will bubble up to the top.  It can alternately be sorted by the largest stat, or by name.  Stats can be included or excluded based on reg-ex pattern matching.
> 
> You can download the tarball here:
> 
> ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz
> 
> I have also posted it to the opensuse build service for generating RPMS for a handful of 32/64-bit x86 distros for your convenience:
> 
> http://download.opensuse.org/repositories/home:/ghaskins/
> 
> (Note that the build is still in progress for some of the favors, so if you do not see the flavor you are looking for, check back in a little while)
> 
> Comments/feedback/bug-fixes welcome!

Maybe support for displaying process specific schedstat info might help? 
We already do capture it under /proc/<pid>/schedstat. So, a -p switch for 
pid maybe? Reason for this is when looking at the entire system stats, I 
might not be able to get lot of information specific to my particular
application...like when a particular thread suffers a lot of delay
waiting for the runqueue... just a thought!

> 
> Regards
> -Greg
> 

-- 
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs, 
Bangalore, India   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-06-02 12:48 ` Ankita Garg
@ 2008-06-02 13:07   ` Peter Zijlstra
  2008-06-02 13:20     ` Gregory Haskins
  2008-06-05  5:20     ` Ankita Garg
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2008-06-02 13:07 UTC (permalink / raw)
  To: Ankita Garg
  Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote:
> Hi Gregory,
> 
> On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote:
> > Hi all scheduler developers,
> >   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop".
> >
> 
> Nice tool! Helps in better visualization of the data in schedstats. 
> 
> Using the tool, realized that most of the timing related stats therein
> might not be completely usable in many scenarios, as might already be
> known.
> 
> Without any additional load on the system, all the stats are nice and
> sane. But, as soon as I ran my particular testcase, the data
> pertaining to the delta of run_delay/cpu_time went haywire! I understand
> that all the values are based on top of rq->clock, which relies on tsc that 
> is not synced across cpus and would result in skews/incorrect values.
> But, turns out to be not so reliable data for debugging. This is
> ofcourse nothing related to the tool, but for schedstat in
> general...rather just adding on to the already existing woes with non-syned 
> tscs :-)

Thing is, things runtime should be calculated by using per cpu deltas.
You take a stamp when you get scheduled on the cpu and another one when
you stop running, then the delta is added to runtime.

This is always on the same cpu - when you get migrated you're stopped
and re-scheduled so that should work out nicely.

So in that sense it shouldn't matter that the rq->clock values can get
skewed between cpus.

So I'm still a little puzzled by your observations; though it could be
that the schedstat stuff got broken - I've never really looked too
closely at it.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-06-02 13:07   ` Peter Zijlstra
@ 2008-06-02 13:20     ` Gregory Haskins
  2008-06-05  5:20     ` Ankita Garg
  1 sibling, 0 replies; 13+ messages in thread
From: Gregory Haskins @ 2008-06-02 13:20 UTC (permalink / raw)
  To: Ankita Garg, Peter Zijlstra
  Cc: Ingo Molnar, rostedt, suresh.b.siddha, aneesh.kumar, dhaval,
	vatsa, David Bahi, linux-kernel

Hi Ankita,
  For some reason, I didn't get your original email.  I had to go find it on the lkml.org archives.

But anyway, see inline

>>> On Mon, Jun 2, 2008 at  9:07 AM, in message
<1212412051.6269.5.camel@lappy.programming.kicks-ass.net>, Peter Zijlstra
<peterz@infradead.org> wrote: 
> On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote:
>> Hi Gregory,
>> 
>> On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote:
>> > Hi all scheduler developers,
>> >   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, 
> and it appears that the perl scripts referenced in 
> Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I 
> whipped up a little utility I call "schedtop".
>> >
>> 
>> Nice tool! Helps in better visualization of the data in schedstats. 
>> 
>> Using the tool, realized that most of the timing related stats therein
>> might not be completely usable in many scenarios, as might already be
>> known.
>> 
>> Without any additional load on the system, all the stats are nice and
>> sane. But, as soon as I ran my particular testcase, the data
>> pertaining to the delta of run_delay/cpu_time went haywire! I understand
>> that all the values are based on top of rq->clock, which relies on tsc that 
>> is not synced across cpus and would result in skews/incorrect values.
>> But, turns out to be not so reliable data for debugging. This is
>> ofcourse nothing related to the tool, but for schedstat in
>> general...rather just adding on to the already existing woes with non-syned 
>> tscs :-)
> 
> Thing is, things runtime should be calculated by using per cpu deltas.
> You take a stamp when you get scheduled on the cpu and another one when
> you stop running, then the delta is added to runtime.
> 
> This is always on the same cpu - when you get migrated you're stopped
> and re-scheduled so that should work out nicely.
> 
> So in that sense it shouldn't matter that the rq->clock values can get
> skewed between cpus.
> 
> So I'm still a little puzzled by your observations; though it could be
> that the schedstat stuff got broken - I've never really looked too
> closely at it.

I suspect we must be talking about those stats that are always moving pretty fast.  I see that too, and I use the (potentially unknown) filtering feature of schedtop: "-i REGEX" will set the include filter, and "-x REGEX" will set the exclude filter.  By default, include allows everything, and exclude filters nothing.  Changing it to "-x sched_info" will exclude all those pesky stats that move fast and do not convey useful (to me, anyway) data.  I hope that helps!

Also, about your idea for the /proc/<pid>/schedstats, I was thinking the same thing while on my trip on Friday.  I will add this feature.  Thanks!

-Greg



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ANNOUNCE] sched: schedtop utility v0.3
  2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins
  2008-05-22 14:33 ` Steven Rostedt
  2008-06-02 12:48 ` Ankita Garg
@ 2008-06-03  3:21 ` Gregory Haskins
  2008-06-17 12:18   ` [ANNOUNCE] sched: schedtop utility v0.5 Gregory Haskins
  2 siblings, 1 reply; 13+ messages in thread
From: Gregory Haskins @ 2008-06-03  3:21 UTC (permalink / raw)
  To: Ankita Garg, linux-kernel
  Cc: Ingo Molnar, rostedt, Peter Zijlstra, suresh.b.siddha,
	aneesh.kumar, dhaval, vatsa, David Bahi

>>> On Thu, May 22, 2008 at 10:06 AM, in message <483545B4.BA47.005A.0@novell.com>,
Gregory Haskins wrote: 
> Hi all scheduler developers,
>   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, 
> and it appears that the perl scripts referenced in 
> Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I 
> whipped up a little utility I call "schedtop".
> 
> This utility will process statistics from /proc/schedstat such that the 
> busiest stats will bubble up to the top.  It can alternately be sorted by the 
> largest stat, or by name.  Stats can be included or excluded based on reg-ex 
> pattern matching.
> 
> You can download the tarball here:
> 
> ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz
> 
> I have also posted it to the opensuse build service for generating RPMS for 
> a handful of 32/64-bit x86 distros for your convenience:
> 
> http://download.opensuse.org/repositories/home:/ghaskins/
> 
> (Note that the build is still in progress for some of the favors, so if you 
> do not see the flavor you are looking for, check back in a little while)
> 
> Comments/feedback/bug-fixes welcome!
> 
> Regards
> -Greg

Hi All,
  I have posted an update to schedtop (v0.3) which adds /proc/<pid>/schedstats and /proc/<pid>/sched stats to the mix.

Also note that there is a comprehensive filtering mechanism built into all versions of schedtop:

"-i <REGEX>" sets the *include* pattern, and
"-x <REGEX> sets the *exclude* pattern.

By default, -i is set to allow everything, and -x is set to exclude nothing.  A common config for me is to use "-x sched_info" since those sched_info stats seem to always be moving rapidly and can cloud stats that are more interesting (to me, anyway).

Let me know if you have any questions.  Comments/feedback are welcome

-Greg




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-06-02 13:07   ` Peter Zijlstra
  2008-06-02 13:20     ` Gregory Haskins
@ 2008-06-05  5:20     ` Ankita Garg
  2008-06-19 10:27       ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Ankita Garg @ 2008-06-05  5:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

Hi,

On Mon, Jun 02, 2008 at 03:07:31PM +0200, Peter Zijlstra wrote:
> On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote:
> > Hi Gregory,
> > 
> > On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote:
> > > Hi all scheduler developers,
> > >   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop".
> > >
> > 
> > Nice tool! Helps in better visualization of the data in schedstats. 
> > 
> > Using the tool, realized that most of the timing related stats therein
> > might not be completely usable in many scenarios, as might already be
> > known.
> > 
> > Without any additional load on the system, all the stats are nice and
> > sane. But, as soon as I ran my particular testcase, the data
> > pertaining to the delta of run_delay/cpu_time went haywire! I understand
> > that all the values are based on top of rq->clock, which relies on tsc that 
> > is not synced across cpus and would result in skews/incorrect values.
> > But, turns out to be not so reliable data for debugging. This is
> > ofcourse nothing related to the tool, but for schedstat in
> > general...rather just adding on to the already existing woes with non-syned 
> > tscs :-)
> 
> Thing is, things runtime should be calculated by using per cpu deltas.
> You take a stamp when you get scheduled on the cpu and another one when
> you stop running, then the delta is added to runtime.
> 
> This is always on the same cpu - when you get migrated you're stopped
> and re-scheduled so that should work out nicely.
> 
> So in that sense it shouldn't matter that the rq->clock values can get
> skewed between cpus.
>
> So I'm still a little puzzled by your observations; though it could be
> that the schedstat stuff got broken - I've never really looked too
> closely at it.
> 

Thanks Peter for the explanation...
I agree with the above and that is the reason why I did not see weird
values with cpu_time. But, run_delay still would suffer skews as the end
points for delta could be taken on different cpus due to migration (more
so on RT kernel due to the push-pull operations). With the below patch,
I could not reproduce the issue I had seen earlier. After every dequeue,
we take the delta and start wait measurements from zero when moved to a 
different rq.
 

Signed-off-by: Ankita Garg <ankita@in.ibm.com> 

Index: linux-2.6.24.4/kernel/sched.c
===================================================================
--- linux-2.6.24.4.orig/kernel/sched.c	2008-06-03 14:14:07.000000000 +0530
+++ linux-2.6.24.4/kernel/sched.c	2008-06-04 12:48:34.000000000 +0530
@@ -948,6 +948,7 @@
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
 }
Index: linux-2.6.24.4/kernel/sched_stats.h
===================================================================
--- linux-2.6.24.4.orig/kernel/sched_stats.h	2008-06-03 14:14:28.000000000 +0530
+++ linux-2.6.24.4/kernel/sched_stats.h	2008-06-05 10:39:39.000000000 +0530
@@ -113,6 +113,13 @@
 	if (rq)
 		rq->rq_sched_info.cpu_time += delta;
 }
+
+static inline void
+rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
+{
+	if (rq)
+		rq->rq_sched_info.run_delay += delta;
+}
 # define schedstat_inc(rq, field)	do { (rq)->field++; } while (0)
 # define schedstat_add(rq, field, amt)	do { (rq)->field += (amt); } while (0)
 # define schedstat_set(var, val)	do { var = (val); } while (0)
@@ -129,6 +136,11 @@
 #endif
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
+static inline void sched_info_reset_dequeued(struct task_struct *t)
+{
+	t->sched_info.last_queued = 0;
+}
+
 /*
  * Called when a process is dequeued from the active array and given
  * the cpu.  We should note that with the exception of interactive
@@ -138,15 +150,22 @@
  * active queue, thus delaying tasks in the expired queue from running;
  * see scheduler_tick()).
  *
- * This function is only called from sched_info_arrive(), rather than
- * dequeue_task(). Even though a task may be queued and dequeued multiple
- * times as it is shuffled about, we're really interested in knowing how
- * long it was from the *first* time it was queued to the time that it
- * finally hit a cpu.
+ * Though we are interested in knowing how long it was from the *first* time a
+ * task was queued to the time that it finally hit a cpu, we call this routine
+ * from dequeue_task() to account for possible rq->clock skew across cpus. The
+ * delta taken on each cpu would annul the skew.
  */
 static inline void sched_info_dequeued(struct task_struct *t)
 {
-	t->sched_info.last_queued = 0;
+	unsigned long long now = task_rq(t)->clock, delta = 0;
+
+	if(unlikely(sched_info_on()))
+		if(t->sched_info.last_queued)
+				delta = now - t->sched_info.last_queued;
+	sched_info_reset_dequeued(t);
+	t->sched_info.run_delay += delta;
+
+	rq_sched_info_dequeued(task_rq(t), delta);
 }
 
 /*
@@ -160,7 +179,7 @@
 
 	if (t->sched_info.last_queued)
 		delta = now - t->sched_info.last_queued;
-	sched_info_dequeued(t);
+	sched_info_reset_dequeued(t);
 	t->sched_info.run_delay += delta;
 	t->sched_info.last_arrival = now;
 	t->sched_info.pcount++;

-- 
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs, 
Bangalore, India   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ANNOUNCE] sched: schedtop utility v0.5
  2008-06-03  3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins
@ 2008-06-17 12:18   ` Gregory Haskins
  0 siblings, 0 replies; 13+ messages in thread
From: Gregory Haskins @ 2008-06-17 12:18 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ankita Garg, linux-kernel, Ingo Molnar, rostedt, Peter Zijlstra,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

Gregory Haskins wrote:
>>>> On Thu, May 22, 2008 at 10:06 AM, in message <483545B4.BA47.005A.0@novell.com>,
> Gregory Haskins wrote: 
>> Hi all scheduler developers,
>>   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, 
>> and it appears that the perl scripts referenced in 
>> Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I 
>> whipped up a little utility I call "schedtop".
>>
>> This utility will process statistics from /proc/schedstat such that the 
>> busiest stats will bubble up to the top.  It can alternately be sorted by the 
>> largest stat, or by name.  Stats can be included or excluded based on reg-ex 
>> pattern matching.
>>
>> You can download the tarball here:
>>
>> ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz
>>
>> I have also posted it to the opensuse build service for generating RPMS for 
>> a handful of 32/64-bit x86 distros for your convenience:
>>
>> http://download.opensuse.org/repositories/home:/ghaskins/
>>
>> (Note that the build is still in progress for some of the favors, so if you 
>> do not see the flavor you are looking for, check back in a little while)
>>
>> Comments/feedback/bug-fixes welcome!
>>
>> Regards
>> -Greg
> 
> Hi All,
>   I have posted an update to schedtop (v0.3) which adds /proc/<pid>/schedstats and /proc/<pid>/sched stats to the mix.
> 
> Also note that there is a comprehensive filtering mechanism built into all versions of schedtop:
> 
> "-i <REGEX>" sets the *include* pattern, and
> "-x <REGEX> sets the *exclude* pattern.
> 
> By default, -i is set to allow everything, and -x is set to exclude nothing.  A common config for me is to use "-x sched_info" since those sched_info stats seem to always be moving rapidly and can cloud stats that are more interesting (to me, anyway).
> 
> Let me know if you have any questions.  Comments/feedback are welcome
> 
> -Greg

Hi All,
  I have released a new version (v0.5) which fixes a compile error on 
some platforms (like Mandriva, RHEL, and opensuse-factory) that did not 
like the usage of operator/ in boost::filesystem::path code.  v0.5 is 
now fully supported on all of the original platforms that v0.2 worked on.

> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-06-05  5:20     ` Ankita Garg
@ 2008-06-19 10:27       ` Peter Zijlstra
  2008-07-01  9:00         ` Ankita Garg
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2008-06-19 10:27 UTC (permalink / raw)
  To: Ankita Garg
  Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote:

> Thanks Peter for the explanation...
>
> I agree with the above and that is the reason why I did not see weird
> values with cpu_time. But, run_delay still would suffer skews as the end
> points for delta could be taken on different cpus due to migration (more
> so on RT kernel due to the push-pull operations). With the below patch,
> I could not reproduce the issue I had seen earlier. After every dequeue,
> we take the delta and start wait measurements from zero when moved to a 
> different rq.

OK, so task delay delay accounting is broken because it doesn't take
migration into account.

What you've done is make it symmetric wrt enqueue, and account it like

  cpu0      cpu1

enqueue
 <wait-d1>
dequeue
            enqueue
             <wait-d2>
            run

Where you add both d1 and d2 to the run_delay,.. right?

This seems like a good fix, however it looks like the patch will break
compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it
failing to provide a stub for sched_info_dequeue() in that case.


> Signed-off-by: Ankita Garg <ankita@in.ibm.com> 
> 
> Index: linux-2.6.24.4/kernel/sched.c
> ===================================================================
> --- linux-2.6.24.4.orig/kernel/sched.c	2008-06-03 14:14:07.000000000 +0530
> +++ linux-2.6.24.4/kernel/sched.c	2008-06-04 12:48:34.000000000 +0530
> @@ -948,6 +948,7 @@
>  
>  static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
>  {
> +	sched_info_dequeued(p);
>  	p->sched_class->dequeue_task(rq, p, sleep);
>  	p->se.on_rq = 0;
>  }
> Index: linux-2.6.24.4/kernel/sched_stats.h
> ===================================================================
> --- linux-2.6.24.4.orig/kernel/sched_stats.h	2008-06-03 14:14:28.000000000 +0530
> +++ linux-2.6.24.4/kernel/sched_stats.h	2008-06-05 10:39:39.000000000 +0530
> @@ -113,6 +113,13 @@
>  	if (rq)
>  		rq->rq_sched_info.cpu_time += delta;
>  }
> +
> +static inline void
> +rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
> +{
> +	if (rq)
> +		rq->rq_sched_info.run_delay += delta;
> +}
>  # define schedstat_inc(rq, field)	do { (rq)->field++; } while (0)
>  # define schedstat_add(rq, field, amt)	do { (rq)->field += (amt); } while (0)
>  # define schedstat_set(var, val)	do { var = (val); } while (0)
> @@ -129,6 +136,11 @@
>  #endif
>  
>  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> +static inline void sched_info_reset_dequeued(struct task_struct *t)
> +{
> +	t->sched_info.last_queued = 0;
> +}
> +
>  /*
>   * Called when a process is dequeued from the active array and given
>   * the cpu.  We should note that with the exception of interactive
> @@ -138,15 +150,22 @@
>   * active queue, thus delaying tasks in the expired queue from running;
>   * see scheduler_tick()).
>   *
> - * This function is only called from sched_info_arrive(), rather than
> - * dequeue_task(). Even though a task may be queued and dequeued multiple
> - * times as it is shuffled about, we're really interested in knowing how
> - * long it was from the *first* time it was queued to the time that it
> - * finally hit a cpu.
> + * Though we are interested in knowing how long it was from the *first* time a
> + * task was queued to the time that it finally hit a cpu, we call this routine
> + * from dequeue_task() to account for possible rq->clock skew across cpus. The
> + * delta taken on each cpu would annul the skew.
>   */
>  static inline void sched_info_dequeued(struct task_struct *t)
>  {
> -	t->sched_info.last_queued = 0;
> +	unsigned long long now = task_rq(t)->clock, delta = 0;
> +
> +	if(unlikely(sched_info_on()))
> +		if(t->sched_info.last_queued)
> +				delta = now - t->sched_info.last_queued;
> +	sched_info_reset_dequeued(t);
> +	t->sched_info.run_delay += delta;
> +
> +	rq_sched_info_dequeued(task_rq(t), delta);
>  }
>  
>  /*
> @@ -160,7 +179,7 @@
>  
>  	if (t->sched_info.last_queued)
>  		delta = now - t->sched_info.last_queued;
> -	sched_info_dequeued(t);
> +	sched_info_reset_dequeued(t);
>  	t->sched_info.run_delay += delta;
>  	t->sched_info.last_arrival = now;
>  	t->sched_info.pcount++;
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-06-19 10:27       ` Peter Zijlstra
@ 2008-07-01  9:00         ` Ankita Garg
  2008-07-03  8:34           ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Ankita Garg @ 2008-07-01  9:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

Hi Peter,

On Thu, Jun 19, 2008 at 12:27:14PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote:
> 
> > Thanks Peter for the explanation...
> >
> > I agree with the above and that is the reason why I did not see weird
> > values with cpu_time. But, run_delay still would suffer skews as the end
> > points for delta could be taken on different cpus due to migration (more
> > so on RT kernel due to the push-pull operations). With the below patch,
> > I could not reproduce the issue I had seen earlier. After every dequeue,
> > we take the delta and start wait measurements from zero when moved to a 
> > different rq.
> 
> OK, so task delay delay accounting is broken because it doesn't take
> migration into account.
> 
> What you've done is make it symmetric wrt enqueue, and account it like
> 
>   cpu0      cpu1
> 
> enqueue
>  <wait-d1>
> dequeue
>             enqueue
>              <wait-d2>
>             run
> 
> Where you add both d1 and d2 to the run_delay,.. right?
>

Thanks for reviewing the patch. The above is exactly what I have done.

> This seems like a good fix, however it looks like the patch will break
> compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it
> failing to provide a stub for sched_info_dequeue() in that case.
>

Fixed. Pl. find the new patch below.
 
Signed-off-by: Ankita Garg <ankita@in.ibm.com> 

Index: linux-2.6.24.4/kernel/sched.c
===================================================================
--- linux-2.6.24.4.orig/kernel/sched.c	2008-06-05 13:31:53.000000000 +0530
+++ linux-2.6.24.4/kernel/sched.c	2008-07-01 14:03:58.000000000 +0530
@@ -948,6 +948,7 @@
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
 }
Index: linux-2.6.24.4/kernel/sched_stats.h
===================================================================
--- linux-2.6.24.4.orig/kernel/sched_stats.h	2008-06-05 13:31:53.000000000 +0530
+++ linux-2.6.24.4/kernel/sched_stats.h	2008-07-01 14:23:32.000000000 +0530
@@ -113,6 +113,13 @@
 	if (rq)
 		rq->rq_sched_info.cpu_time += delta;
 }
+
+static inline void
+rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
+{
+	if (rq)
+		rq->rq_sched_info.run_delay += delta;
+}
 # define schedstat_inc(rq, field)	do { (rq)->field++; } while (0)
 # define schedstat_add(rq, field, amt)	do { (rq)->field += (amt); } while (0)
 # define schedstat_set(var, val)	do { var = (val); } while (0)
@@ -121,6 +128,9 @@
 rq_sched_info_arrive(struct rq *rq, unsigned long long delta)
 {}
 static inline void
+rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
+{}
+static inline void
 rq_sched_info_depart(struct rq *rq, unsigned long long delta)
 {}
 # define schedstat_inc(rq, field)	do { } while (0)
@@ -129,6 +139,11 @@
 #endif
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
+static inline void sched_info_reset_dequeued(struct task_struct *t)
+{
+	t->sched_info.last_queued = 0;
+}
+
 /*
  * Called when a process is dequeued from the active array and given
  * the cpu.  We should note that with the exception of interactive
@@ -138,15 +153,22 @@
  * active queue, thus delaying tasks in the expired queue from running;
  * see scheduler_tick()).
  *
- * This function is only called from sched_info_arrive(), rather than
- * dequeue_task(). Even though a task may be queued and dequeued multiple
- * times as it is shuffled about, we're really interested in knowing how
- * long it was from the *first* time it was queued to the time that it
- * finally hit a cpu.
+ * Though we are interested in knowing how long it was from the *first* time a
+ * task was queued to the time that it finally hit a cpu, we call this routine
+ * from dequeue_task() to account for possible rq->clock skew across cpus. The
+ * delta taken on each cpu would annul the skew.
  */
 static inline void sched_info_dequeued(struct task_struct *t)
 {
-	t->sched_info.last_queued = 0;
+	unsigned long long now = task_rq(t)->clock, delta = 0;
+
+	if(unlikely(sched_info_on()))
+		if(t->sched_info.last_queued)
+				delta = now - t->sched_info.last_queued;
+	sched_info_reset_dequeued(t);
+	t->sched_info.run_delay += delta;
+
+	rq_sched_info_dequeued(task_rq(t), delta);
 }
 
 /*
@@ -160,7 +182,7 @@
 
 	if (t->sched_info.last_queued)
 		delta = now - t->sched_info.last_queued;
-	sched_info_dequeued(t);
+	sched_info_reset_dequeued(t);
 	t->sched_info.run_delay += delta;
 	t->sched_info.last_arrival = now;
 	t->sched_info.pcount++;
@@ -231,7 +253,9 @@
 		__sched_info_switch(prev, next);
 }
 #else
-#define sched_info_queued(t)		do { } while (0)
-#define sched_info_switch(t, next)	do { } while (0)
+#define sched_info_queued(t)			do { } while (0)
+#define sched_info_reset_dequeued(t)	do { } while (0)
+#define sched_info_dequeued(t)			do { } while (0)
+#define sched_info_switch(t, next)		do { } while (0)
 #endif /* CONFIG_SCHEDSTATS || CONFIG_TASK_DELAY_ACCT */
 

-- 
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs, 
Bangalore, India   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-07-01  9:00         ` Ankita Garg
@ 2008-07-03  8:34           ` Ingo Molnar
  2008-07-03 20:56             ` Peter Zijlstra
  2008-10-21 16:29             ` Gregory Haskins
  0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2008-07-03  8:34 UTC (permalink / raw)
  To: Ankita Garg
  Cc: Peter Zijlstra, Gregory Haskins, linux-kernel, rostedt,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi


* Ankita Garg <ankita@in.ibm.com> wrote:

> > OK, so task delay delay accounting is broken because it doesn't take
> > migration into account.

> Fixed. Pl. find the new patch below.

applied to tip/sched/devel - thanks Ankita.

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-07-03  8:34           ` Ingo Molnar
@ 2008-07-03 20:56             ` Peter Zijlstra
  2008-10-21 16:29             ` Gregory Haskins
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2008-07-03 20:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ankita Garg, Gregory Haskins, linux-kernel, rostedt,
	suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

On Thu, 2008-07-03 at 10:34 +0200, Ingo Molnar wrote:
> * Ankita Garg <ankita@in.ibm.com> wrote:
> 
> > > OK, so task delay delay accounting is broken because it doesn't take
> > > migration into account.
> 
> > Fixed. Pl. find the new patch below.
> 
> applied to tip/sched/devel - thanks Ankita.

ACK


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] sched: schedtop utility
  2008-07-03  8:34           ` Ingo Molnar
  2008-07-03 20:56             ` Peter Zijlstra
@ 2008-10-21 16:29             ` Gregory Haskins
  1 sibling, 0 replies; 13+ messages in thread
From: Gregory Haskins @ 2008-10-21 16:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ankita Garg, Peter Zijlstra, Gregory Haskins, linux-kernel,
	rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi

I finally put up a wiki for my schedtop utility

http://rt.wiki.kernel.org/index.php/Schedtop_utility

Hopefully someone out there besides myself finds this tool useful!

Regards,
-Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-10-21 16:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins
2008-05-22 14:33 ` Steven Rostedt
2008-06-02 12:48 ` Ankita Garg
2008-06-02 13:07   ` Peter Zijlstra
2008-06-02 13:20     ` Gregory Haskins
2008-06-05  5:20     ` Ankita Garg
2008-06-19 10:27       ` Peter Zijlstra
2008-07-01  9:00         ` Ankita Garg
2008-07-03  8:34           ` Ingo Molnar
2008-07-03 20:56             ` Peter Zijlstra
2008-10-21 16:29             ` Gregory Haskins
2008-06-03  3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins
2008-06-17 12:18   ` [ANNOUNCE] sched: schedtop utility v0.5 Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox