* [ANNOUNCE] sched: schedtop utility
@ 2008-05-22 14:06 Gregory Haskins
2008-05-22 14:33 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Gregory Haskins @ 2008-05-22 14:06 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, rostedt, Peter Zijlstra, suresh.b.siddha,
aneesh.kumar, dhaval, vatsa, David Bahi
Hi all scheduler developers,
I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop".
This utility will process statistics from /proc/schedstat such that the busiest stats will bubble up to the top. It can alternately be sorted by the largest stat, or by name. Stats can be included or excluded based on reg-ex pattern matching.
You can download the tarball here:
ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz
I have also posted it to the opensuse build service for generating RPMS for a handful of 32/64-bit x86 distros for your convenience:
http://download.opensuse.org/repositories/home:/ghaskins/
(Note that the build is still in progress for some of the favors, so if you do not see the flavor you are looking for, check back in a little while)
Comments/feedback/bug-fixes welcome!
Regards
-Greg
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [ANNOUNCE] sched: schedtop utility 2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins @ 2008-05-22 14:33 ` Steven Rostedt 2008-06-02 12:48 ` Ankita Garg 2008-06-03 3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins 2 siblings, 0 replies; 13+ messages in thread From: Steven Rostedt @ 2008-05-22 14:33 UTC (permalink / raw) To: Gregory Haskins Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi On Thu, 22 May 2008, Gregory Haskins wrote: > Hi all scheduler developers, > I had an itch to scratch w.r.t. watching the stats in > /proc/schedstats, and it appears that the perl scripts referenced in > Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so > I whipped up a little utility I call "schedtop". Greg, Cool, I'll have to take a look when things get a little slower for me. But when I get a chance, I'll make my comments then. -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins 2008-05-22 14:33 ` Steven Rostedt @ 2008-06-02 12:48 ` Ankita Garg 2008-06-02 13:07 ` Peter Zijlstra 2008-06-03 3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins 2 siblings, 1 reply; 13+ messages in thread From: Ankita Garg @ 2008-06-02 12:48 UTC (permalink / raw) To: Gregory Haskins Cc: linux-kernel, Ingo Molnar, rostedt, Peter Zijlstra, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi Hi Gregory, On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote: > Hi all scheduler developers, > I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop". > Nice tool! Helps in better visualization of the data in schedstats. Using the tool, realized that most of the timing related stats therein might not be completely usable in many scenarios, as might already be known. Without any additional load on the system, all the stats are nice and sane. But, as soon as I ran my particular testcase, the data pertaining to the delta of run_delay/cpu_time went haywire! I understand that all the values are based on top of rq->clock, which relies on tsc that is not synced across cpus and would result in skews/incorrect values. But, turns out to be not so reliable data for debugging. This is ofcourse nothing related to the tool, but for schedstat in general...rather just adding on to the already existing woes with non-syned tscs :-) > This utility will process statistics from /proc/schedstat such that the busiest stats will bubble up to the top. It can alternately be sorted by the largest stat, or by name. Stats can be included or excluded based on reg-ex pattern matching. > > You can download the tarball here: > > ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz > > I have also posted it to the opensuse build service for generating RPMS for a handful of 32/64-bit x86 distros for your convenience: > > http://download.opensuse.org/repositories/home:/ghaskins/ > > (Note that the build is still in progress for some of the favors, so if you do not see the flavor you are looking for, check back in a little while) > > Comments/feedback/bug-fixes welcome! Maybe support for displaying process specific schedstat info might help? We already do capture it under /proc/<pid>/schedstat. So, a -p switch for pid maybe? Reason for this is when looking at the entire system stats, I might not be able to get lot of information specific to my particular application...like when a particular thread suffers a lot of delay waiting for the runqueue... just a thought! > > Regards > -Greg > -- Regards, Ankita Garg (ankita@in.ibm.com) Linux Technology Center IBM India Systems & Technology Labs, Bangalore, India ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-06-02 12:48 ` Ankita Garg @ 2008-06-02 13:07 ` Peter Zijlstra 2008-06-02 13:20 ` Gregory Haskins 2008-06-05 5:20 ` Ankita Garg 0 siblings, 2 replies; 13+ messages in thread From: Peter Zijlstra @ 2008-06-02 13:07 UTC (permalink / raw) To: Ankita Garg Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote: > Hi Gregory, > > On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote: > > Hi all scheduler developers, > > I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop". > > > > Nice tool! Helps in better visualization of the data in schedstats. > > Using the tool, realized that most of the timing related stats therein > might not be completely usable in many scenarios, as might already be > known. > > Without any additional load on the system, all the stats are nice and > sane. But, as soon as I ran my particular testcase, the data > pertaining to the delta of run_delay/cpu_time went haywire! I understand > that all the values are based on top of rq->clock, which relies on tsc that > is not synced across cpus and would result in skews/incorrect values. > But, turns out to be not so reliable data for debugging. This is > ofcourse nothing related to the tool, but for schedstat in > general...rather just adding on to the already existing woes with non-syned > tscs :-) Thing is, things runtime should be calculated by using per cpu deltas. You take a stamp when you get scheduled on the cpu and another one when you stop running, then the delta is added to runtime. This is always on the same cpu - when you get migrated you're stopped and re-scheduled so that should work out nicely. So in that sense it shouldn't matter that the rq->clock values can get skewed between cpus. So I'm still a little puzzled by your observations; though it could be that the schedstat stuff got broken - I've never really looked too closely at it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-06-02 13:07 ` Peter Zijlstra @ 2008-06-02 13:20 ` Gregory Haskins 2008-06-05 5:20 ` Ankita Garg 1 sibling, 0 replies; 13+ messages in thread From: Gregory Haskins @ 2008-06-02 13:20 UTC (permalink / raw) To: Ankita Garg, Peter Zijlstra Cc: Ingo Molnar, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi, linux-kernel Hi Ankita, For some reason, I didn't get your original email. I had to go find it on the lkml.org archives. But anyway, see inline >>> On Mon, Jun 2, 2008 at 9:07 AM, in message <1212412051.6269.5.camel@lappy.programming.kicks-ass.net>, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote: >> Hi Gregory, >> >> On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote: >> > Hi all scheduler developers, >> > I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, > and it appears that the perl scripts referenced in > Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I > whipped up a little utility I call "schedtop". >> > >> >> Nice tool! Helps in better visualization of the data in schedstats. >> >> Using the tool, realized that most of the timing related stats therein >> might not be completely usable in many scenarios, as might already be >> known. >> >> Without any additional load on the system, all the stats are nice and >> sane. But, as soon as I ran my particular testcase, the data >> pertaining to the delta of run_delay/cpu_time went haywire! I understand >> that all the values are based on top of rq->clock, which relies on tsc that >> is not synced across cpus and would result in skews/incorrect values. >> But, turns out to be not so reliable data for debugging. This is >> ofcourse nothing related to the tool, but for schedstat in >> general...rather just adding on to the already existing woes with non-syned >> tscs :-) > > Thing is, things runtime should be calculated by using per cpu deltas. > You take a stamp when you get scheduled on the cpu and another one when > you stop running, then the delta is added to runtime. > > This is always on the same cpu - when you get migrated you're stopped > and re-scheduled so that should work out nicely. > > So in that sense it shouldn't matter that the rq->clock values can get > skewed between cpus. > > So I'm still a little puzzled by your observations; though it could be > that the schedstat stuff got broken - I've never really looked too > closely at it. I suspect we must be talking about those stats that are always moving pretty fast. I see that too, and I use the (potentially unknown) filtering feature of schedtop: "-i REGEX" will set the include filter, and "-x REGEX" will set the exclude filter. By default, include allows everything, and exclude filters nothing. Changing it to "-x sched_info" will exclude all those pesky stats that move fast and do not convey useful (to me, anyway) data. I hope that helps! Also, about your idea for the /proc/<pid>/schedstats, I was thinking the same thing while on my trip on Friday. I will add this feature. Thanks! -Greg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-06-02 13:07 ` Peter Zijlstra 2008-06-02 13:20 ` Gregory Haskins @ 2008-06-05 5:20 ` Ankita Garg 2008-06-19 10:27 ` Peter Zijlstra 1 sibling, 1 reply; 13+ messages in thread From: Ankita Garg @ 2008-06-05 5:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi Hi, On Mon, Jun 02, 2008 at 03:07:31PM +0200, Peter Zijlstra wrote: > On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote: > > Hi Gregory, > > > > On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote: > > > Hi all scheduler developers, > > > I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, and it appears that the perl scripts referenced in Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I whipped up a little utility I call "schedtop". > > > > > > > Nice tool! Helps in better visualization of the data in schedstats. > > > > Using the tool, realized that most of the timing related stats therein > > might not be completely usable in many scenarios, as might already be > > known. > > > > Without any additional load on the system, all the stats are nice and > > sane. But, as soon as I ran my particular testcase, the data > > pertaining to the delta of run_delay/cpu_time went haywire! I understand > > that all the values are based on top of rq->clock, which relies on tsc that > > is not synced across cpus and would result in skews/incorrect values. > > But, turns out to be not so reliable data for debugging. This is > > ofcourse nothing related to the tool, but for schedstat in > > general...rather just adding on to the already existing woes with non-syned > > tscs :-) > > Thing is, things runtime should be calculated by using per cpu deltas. > You take a stamp when you get scheduled on the cpu and another one when > you stop running, then the delta is added to runtime. > > This is always on the same cpu - when you get migrated you're stopped > and re-scheduled so that should work out nicely. > > So in that sense it shouldn't matter that the rq->clock values can get > skewed between cpus. > > So I'm still a little puzzled by your observations; though it could be > that the schedstat stuff got broken - I've never really looked too > closely at it. > Thanks Peter for the explanation... I agree with the above and that is the reason why I did not see weird values with cpu_time. But, run_delay still would suffer skews as the end points for delta could be taken on different cpus due to migration (more so on RT kernel due to the push-pull operations). With the below patch, I could not reproduce the issue I had seen earlier. After every dequeue, we take the delta and start wait measurements from zero when moved to a different rq. Signed-off-by: Ankita Garg <ankita@in.ibm.com> Index: linux-2.6.24.4/kernel/sched.c =================================================================== --- linux-2.6.24.4.orig/kernel/sched.c 2008-06-03 14:14:07.000000000 +0530 +++ linux-2.6.24.4/kernel/sched.c 2008-06-04 12:48:34.000000000 +0530 @@ -948,6 +948,7 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) { + sched_info_dequeued(p); p->sched_class->dequeue_task(rq, p, sleep); p->se.on_rq = 0; } Index: linux-2.6.24.4/kernel/sched_stats.h =================================================================== --- linux-2.6.24.4.orig/kernel/sched_stats.h 2008-06-03 14:14:28.000000000 +0530 +++ linux-2.6.24.4/kernel/sched_stats.h 2008-06-05 10:39:39.000000000 +0530 @@ -113,6 +113,13 @@ if (rq) rq->rq_sched_info.cpu_time += delta; } + +static inline void +rq_sched_info_dequeued(struct rq *rq, unsigned long long delta) +{ + if (rq) + rq->rq_sched_info.run_delay += delta; +} # define schedstat_inc(rq, field) do { (rq)->field++; } while (0) # define schedstat_add(rq, field, amt) do { (rq)->field += (amt); } while (0) # define schedstat_set(var, val) do { var = (val); } while (0) @@ -129,6 +136,11 @@ #endif #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) +static inline void sched_info_reset_dequeued(struct task_struct *t) +{ + t->sched_info.last_queued = 0; +} + /* * Called when a process is dequeued from the active array and given * the cpu. We should note that with the exception of interactive @@ -138,15 +150,22 @@ * active queue, thus delaying tasks in the expired queue from running; * see scheduler_tick()). * - * This function is only called from sched_info_arrive(), rather than - * dequeue_task(). Even though a task may be queued and dequeued multiple - * times as it is shuffled about, we're really interested in knowing how - * long it was from the *first* time it was queued to the time that it - * finally hit a cpu. + * Though we are interested in knowing how long it was from the *first* time a + * task was queued to the time that it finally hit a cpu, we call this routine + * from dequeue_task() to account for possible rq->clock skew across cpus. The + * delta taken on each cpu would annul the skew. */ static inline void sched_info_dequeued(struct task_struct *t) { - t->sched_info.last_queued = 0; + unsigned long long now = task_rq(t)->clock, delta = 0; + + if(unlikely(sched_info_on())) + if(t->sched_info.last_queued) + delta = now - t->sched_info.last_queued; + sched_info_reset_dequeued(t); + t->sched_info.run_delay += delta; + + rq_sched_info_dequeued(task_rq(t), delta); } /* @@ -160,7 +179,7 @@ if (t->sched_info.last_queued) delta = now - t->sched_info.last_queued; - sched_info_dequeued(t); + sched_info_reset_dequeued(t); t->sched_info.run_delay += delta; t->sched_info.last_arrival = now; t->sched_info.pcount++; -- Regards, Ankita Garg (ankita@in.ibm.com) Linux Technology Center IBM India Systems & Technology Labs, Bangalore, India ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-06-05 5:20 ` Ankita Garg @ 2008-06-19 10:27 ` Peter Zijlstra 2008-07-01 9:00 ` Ankita Garg 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2008-06-19 10:27 UTC (permalink / raw) To: Ankita Garg Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote: > Thanks Peter for the explanation... > > I agree with the above and that is the reason why I did not see weird > values with cpu_time. But, run_delay still would suffer skews as the end > points for delta could be taken on different cpus due to migration (more > so on RT kernel due to the push-pull operations). With the below patch, > I could not reproduce the issue I had seen earlier. After every dequeue, > we take the delta and start wait measurements from zero when moved to a > different rq. OK, so task delay delay accounting is broken because it doesn't take migration into account. What you've done is make it symmetric wrt enqueue, and account it like cpu0 cpu1 enqueue <wait-d1> dequeue enqueue <wait-d2> run Where you add both d1 and d2 to the run_delay,.. right? This seems like a good fix, however it looks like the patch will break compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it failing to provide a stub for sched_info_dequeue() in that case. > Signed-off-by: Ankita Garg <ankita@in.ibm.com> > > Index: linux-2.6.24.4/kernel/sched.c > =================================================================== > --- linux-2.6.24.4.orig/kernel/sched.c 2008-06-03 14:14:07.000000000 +0530 > +++ linux-2.6.24.4/kernel/sched.c 2008-06-04 12:48:34.000000000 +0530 > @@ -948,6 +948,7 @@ > > static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) > { > + sched_info_dequeued(p); > p->sched_class->dequeue_task(rq, p, sleep); > p->se.on_rq = 0; > } > Index: linux-2.6.24.4/kernel/sched_stats.h > =================================================================== > --- linux-2.6.24.4.orig/kernel/sched_stats.h 2008-06-03 14:14:28.000000000 +0530 > +++ linux-2.6.24.4/kernel/sched_stats.h 2008-06-05 10:39:39.000000000 +0530 > @@ -113,6 +113,13 @@ > if (rq) > rq->rq_sched_info.cpu_time += delta; > } > + > +static inline void > +rq_sched_info_dequeued(struct rq *rq, unsigned long long delta) > +{ > + if (rq) > + rq->rq_sched_info.run_delay += delta; > +} > # define schedstat_inc(rq, field) do { (rq)->field++; } while (0) > # define schedstat_add(rq, field, amt) do { (rq)->field += (amt); } while (0) > # define schedstat_set(var, val) do { var = (val); } while (0) > @@ -129,6 +136,11 @@ > #endif > > #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) > +static inline void sched_info_reset_dequeued(struct task_struct *t) > +{ > + t->sched_info.last_queued = 0; > +} > + > /* > * Called when a process is dequeued from the active array and given > * the cpu. We should note that with the exception of interactive > @@ -138,15 +150,22 @@ > * active queue, thus delaying tasks in the expired queue from running; > * see scheduler_tick()). > * > - * This function is only called from sched_info_arrive(), rather than > - * dequeue_task(). Even though a task may be queued and dequeued multiple > - * times as it is shuffled about, we're really interested in knowing how > - * long it was from the *first* time it was queued to the time that it > - * finally hit a cpu. > + * Though we are interested in knowing how long it was from the *first* time a > + * task was queued to the time that it finally hit a cpu, we call this routine > + * from dequeue_task() to account for possible rq->clock skew across cpus. The > + * delta taken on each cpu would annul the skew. > */ > static inline void sched_info_dequeued(struct task_struct *t) > { > - t->sched_info.last_queued = 0; > + unsigned long long now = task_rq(t)->clock, delta = 0; > + > + if(unlikely(sched_info_on())) > + if(t->sched_info.last_queued) > + delta = now - t->sched_info.last_queued; > + sched_info_reset_dequeued(t); > + t->sched_info.run_delay += delta; > + > + rq_sched_info_dequeued(task_rq(t), delta); > } > > /* > @@ -160,7 +179,7 @@ > > if (t->sched_info.last_queued) > delta = now - t->sched_info.last_queued; > - sched_info_dequeued(t); > + sched_info_reset_dequeued(t); > t->sched_info.run_delay += delta; > t->sched_info.last_arrival = now; > t->sched_info.pcount++; > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-06-19 10:27 ` Peter Zijlstra @ 2008-07-01 9:00 ` Ankita Garg 2008-07-03 8:34 ` Ingo Molnar 0 siblings, 1 reply; 13+ messages in thread From: Ankita Garg @ 2008-07-01 9:00 UTC (permalink / raw) To: Peter Zijlstra Cc: Gregory Haskins, linux-kernel, Ingo Molnar, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi Hi Peter, On Thu, Jun 19, 2008 at 12:27:14PM +0200, Peter Zijlstra wrote: > On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote: > > > Thanks Peter for the explanation... > > > > I agree with the above and that is the reason why I did not see weird > > values with cpu_time. But, run_delay still would suffer skews as the end > > points for delta could be taken on different cpus due to migration (more > > so on RT kernel due to the push-pull operations). With the below patch, > > I could not reproduce the issue I had seen earlier. After every dequeue, > > we take the delta and start wait measurements from zero when moved to a > > different rq. > > OK, so task delay delay accounting is broken because it doesn't take > migration into account. > > What you've done is make it symmetric wrt enqueue, and account it like > > cpu0 cpu1 > > enqueue > <wait-d1> > dequeue > enqueue > <wait-d2> > run > > Where you add both d1 and d2 to the run_delay,.. right? > Thanks for reviewing the patch. The above is exactly what I have done. > This seems like a good fix, however it looks like the patch will break > compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it > failing to provide a stub for sched_info_dequeue() in that case. > Fixed. Pl. find the new patch below. Signed-off-by: Ankita Garg <ankita@in.ibm.com> Index: linux-2.6.24.4/kernel/sched.c =================================================================== --- linux-2.6.24.4.orig/kernel/sched.c 2008-06-05 13:31:53.000000000 +0530 +++ linux-2.6.24.4/kernel/sched.c 2008-07-01 14:03:58.000000000 +0530 @@ -948,6 +948,7 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) { + sched_info_dequeued(p); p->sched_class->dequeue_task(rq, p, sleep); p->se.on_rq = 0; } Index: linux-2.6.24.4/kernel/sched_stats.h =================================================================== --- linux-2.6.24.4.orig/kernel/sched_stats.h 2008-06-05 13:31:53.000000000 +0530 +++ linux-2.6.24.4/kernel/sched_stats.h 2008-07-01 14:23:32.000000000 +0530 @@ -113,6 +113,13 @@ if (rq) rq->rq_sched_info.cpu_time += delta; } + +static inline void +rq_sched_info_dequeued(struct rq *rq, unsigned long long delta) +{ + if (rq) + rq->rq_sched_info.run_delay += delta; +} # define schedstat_inc(rq, field) do { (rq)->field++; } while (0) # define schedstat_add(rq, field, amt) do { (rq)->field += (amt); } while (0) # define schedstat_set(var, val) do { var = (val); } while (0) @@ -121,6 +128,9 @@ rq_sched_info_arrive(struct rq *rq, unsigned long long delta) {} static inline void +rq_sched_info_dequeued(struct rq *rq, unsigned long long delta) +{} +static inline void rq_sched_info_depart(struct rq *rq, unsigned long long delta) {} # define schedstat_inc(rq, field) do { } while (0) @@ -129,6 +139,11 @@ #endif #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) +static inline void sched_info_reset_dequeued(struct task_struct *t) +{ + t->sched_info.last_queued = 0; +} + /* * Called when a process is dequeued from the active array and given * the cpu. We should note that with the exception of interactive @@ -138,15 +153,22 @@ * active queue, thus delaying tasks in the expired queue from running; * see scheduler_tick()). * - * This function is only called from sched_info_arrive(), rather than - * dequeue_task(). Even though a task may be queued and dequeued multiple - * times as it is shuffled about, we're really interested in knowing how - * long it was from the *first* time it was queued to the time that it - * finally hit a cpu. + * Though we are interested in knowing how long it was from the *first* time a + * task was queued to the time that it finally hit a cpu, we call this routine + * from dequeue_task() to account for possible rq->clock skew across cpus. The + * delta taken on each cpu would annul the skew. */ static inline void sched_info_dequeued(struct task_struct *t) { - t->sched_info.last_queued = 0; + unsigned long long now = task_rq(t)->clock, delta = 0; + + if(unlikely(sched_info_on())) + if(t->sched_info.last_queued) + delta = now - t->sched_info.last_queued; + sched_info_reset_dequeued(t); + t->sched_info.run_delay += delta; + + rq_sched_info_dequeued(task_rq(t), delta); } /* @@ -160,7 +182,7 @@ if (t->sched_info.last_queued) delta = now - t->sched_info.last_queued; - sched_info_dequeued(t); + sched_info_reset_dequeued(t); t->sched_info.run_delay += delta; t->sched_info.last_arrival = now; t->sched_info.pcount++; @@ -231,7 +253,9 @@ __sched_info_switch(prev, next); } #else -#define sched_info_queued(t) do { } while (0) -#define sched_info_switch(t, next) do { } while (0) +#define sched_info_queued(t) do { } while (0) +#define sched_info_reset_dequeued(t) do { } while (0) +#define sched_info_dequeued(t) do { } while (0) +#define sched_info_switch(t, next) do { } while (0) #endif /* CONFIG_SCHEDSTATS || CONFIG_TASK_DELAY_ACCT */ -- Regards, Ankita Garg (ankita@in.ibm.com) Linux Technology Center IBM India Systems & Technology Labs, Bangalore, India ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-07-01 9:00 ` Ankita Garg @ 2008-07-03 8:34 ` Ingo Molnar 2008-07-03 20:56 ` Peter Zijlstra 2008-10-21 16:29 ` Gregory Haskins 0 siblings, 2 replies; 13+ messages in thread From: Ingo Molnar @ 2008-07-03 8:34 UTC (permalink / raw) To: Ankita Garg Cc: Peter Zijlstra, Gregory Haskins, linux-kernel, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi * Ankita Garg <ankita@in.ibm.com> wrote: > > OK, so task delay delay accounting is broken because it doesn't take > > migration into account. > Fixed. Pl. find the new patch below. applied to tip/sched/devel - thanks Ankita. Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-07-03 8:34 ` Ingo Molnar @ 2008-07-03 20:56 ` Peter Zijlstra 2008-10-21 16:29 ` Gregory Haskins 1 sibling, 0 replies; 13+ messages in thread From: Peter Zijlstra @ 2008-07-03 20:56 UTC (permalink / raw) To: Ingo Molnar Cc: Ankita Garg, Gregory Haskins, linux-kernel, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi On Thu, 2008-07-03 at 10:34 +0200, Ingo Molnar wrote: > * Ankita Garg <ankita@in.ibm.com> wrote: > > > > OK, so task delay delay accounting is broken because it doesn't take > > > migration into account. > > > Fixed. Pl. find the new patch below. > > applied to tip/sched/devel - thanks Ankita. ACK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] sched: schedtop utility 2008-07-03 8:34 ` Ingo Molnar 2008-07-03 20:56 ` Peter Zijlstra @ 2008-10-21 16:29 ` Gregory Haskins 1 sibling, 0 replies; 13+ messages in thread From: Gregory Haskins @ 2008-10-21 16:29 UTC (permalink / raw) To: Ingo Molnar Cc: Ankita Garg, Peter Zijlstra, Gregory Haskins, linux-kernel, rostedt, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi I finally put up a wiki for my schedtop utility http://rt.wiki.kernel.org/index.php/Schedtop_utility Hopefully someone out there besides myself finds this tool useful! Regards, -Greg ^ permalink raw reply [flat|nested] 13+ messages in thread
* [ANNOUNCE] sched: schedtop utility v0.3 2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins 2008-05-22 14:33 ` Steven Rostedt 2008-06-02 12:48 ` Ankita Garg @ 2008-06-03 3:21 ` Gregory Haskins 2008-06-17 12:18 ` [ANNOUNCE] sched: schedtop utility v0.5 Gregory Haskins 2 siblings, 1 reply; 13+ messages in thread From: Gregory Haskins @ 2008-06-03 3:21 UTC (permalink / raw) To: Ankita Garg, linux-kernel Cc: Ingo Molnar, rostedt, Peter Zijlstra, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi >>> On Thu, May 22, 2008 at 10:06 AM, in message <483545B4.BA47.005A.0@novell.com>, Gregory Haskins wrote: > Hi all scheduler developers, > I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, > and it appears that the perl scripts referenced in > Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I > whipped up a little utility I call "schedtop". > > This utility will process statistics from /proc/schedstat such that the > busiest stats will bubble up to the top. It can alternately be sorted by the > largest stat, or by name. Stats can be included or excluded based on reg-ex > pattern matching. > > You can download the tarball here: > > ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz > > I have also posted it to the opensuse build service for generating RPMS for > a handful of 32/64-bit x86 distros for your convenience: > > http://download.opensuse.org/repositories/home:/ghaskins/ > > (Note that the build is still in progress for some of the favors, so if you > do not see the flavor you are looking for, check back in a little while) > > Comments/feedback/bug-fixes welcome! > > Regards > -Greg Hi All, I have posted an update to schedtop (v0.3) which adds /proc/<pid>/schedstats and /proc/<pid>/sched stats to the mix. Also note that there is a comprehensive filtering mechanism built into all versions of schedtop: "-i <REGEX>" sets the *include* pattern, and "-x <REGEX> sets the *exclude* pattern. By default, -i is set to allow everything, and -x is set to exclude nothing. A common config for me is to use "-x sched_info" since those sched_info stats seem to always be moving rapidly and can cloud stats that are more interesting (to me, anyway). Let me know if you have any questions. Comments/feedback are welcome -Greg ^ permalink raw reply [flat|nested] 13+ messages in thread
* [ANNOUNCE] sched: schedtop utility v0.5 2008-06-03 3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins @ 2008-06-17 12:18 ` Gregory Haskins 0 siblings, 0 replies; 13+ messages in thread From: Gregory Haskins @ 2008-06-17 12:18 UTC (permalink / raw) To: Gregory Haskins Cc: Ankita Garg, linux-kernel, Ingo Molnar, rostedt, Peter Zijlstra, suresh.b.siddha, aneesh.kumar, dhaval, vatsa, David Bahi Gregory Haskins wrote: >>>> On Thu, May 22, 2008 at 10:06 AM, in message <483545B4.BA47.005A.0@novell.com>, > Gregory Haskins wrote: >> Hi all scheduler developers, >> I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, >> and it appears that the perl scripts referenced in >> Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I >> whipped up a little utility I call "schedtop". >> >> This utility will process statistics from /proc/schedstat such that the >> busiest stats will bubble up to the top. It can alternately be sorted by the >> largest stat, or by name. Stats can be included or excluded based on reg-ex >> pattern matching. >> >> You can download the tarball here: >> >> ftp://ftp.novell.com/dev/ghaskins/schedtop.tar.gz >> >> I have also posted it to the opensuse build service for generating RPMS for >> a handful of 32/64-bit x86 distros for your convenience: >> >> http://download.opensuse.org/repositories/home:/ghaskins/ >> >> (Note that the build is still in progress for some of the favors, so if you >> do not see the flavor you are looking for, check back in a little while) >> >> Comments/feedback/bug-fixes welcome! >> >> Regards >> -Greg > > Hi All, > I have posted an update to schedtop (v0.3) which adds /proc/<pid>/schedstats and /proc/<pid>/sched stats to the mix. > > Also note that there is a comprehensive filtering mechanism built into all versions of schedtop: > > "-i <REGEX>" sets the *include* pattern, and > "-x <REGEX> sets the *exclude* pattern. > > By default, -i is set to allow everything, and -x is set to exclude nothing. A common config for me is to use "-x sched_info" since those sched_info stats seem to always be moving rapidly and can cloud stats that are more interesting (to me, anyway). > > Let me know if you have any questions. Comments/feedback are welcome > > -Greg Hi All, I have released a new version (v0.5) which fixes a compile error on some platforms (like Mandriva, RHEL, and opensuse-factory) that did not like the usage of operator/ in boost::filesystem::path code. v0.5 is now fully supported on all of the original platforms that v0.2 worked on. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-10-21 16:25 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-22 14:06 [ANNOUNCE] sched: schedtop utility Gregory Haskins 2008-05-22 14:33 ` Steven Rostedt 2008-06-02 12:48 ` Ankita Garg 2008-06-02 13:07 ` Peter Zijlstra 2008-06-02 13:20 ` Gregory Haskins 2008-06-05 5:20 ` Ankita Garg 2008-06-19 10:27 ` Peter Zijlstra 2008-07-01 9:00 ` Ankita Garg 2008-07-03 8:34 ` Ingo Molnar 2008-07-03 20:56 ` Peter Zijlstra 2008-10-21 16:29 ` Gregory Haskins 2008-06-03 3:21 ` [ANNOUNCE] sched: schedtop utility v0.3 Gregory Haskins 2008-06-17 12:18 ` [ANNOUNCE] sched: schedtop utility v0.5 Gregory Haskins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox