* [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
@ 2007-04-04 14:04 Dmitry Adamushko
2007-04-04 14:15 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-04 14:04 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel
Hello,
Scenario:
Currently running [task1] boosts a priority (lowers a static_prio) of
[task2] via { setpriority -> set_user_nice } and [task2] happens to be
in the "expired" array at the moment.
According to the set_user_nice(), "delta" is negative (the prio is
boosted) and, hence, resched_task(rq->curr) is called.
As the [task2] is in the "expired" array and there are still tasks (at
least [task1]) in the "active" one, the triggered reschedule is just
useless (e.g. gets control back to [task1]).
Am I missing something?
The same is applicable to rt_mutex_setprio().
Of course, not a big deal, but it's easily avoidable, e.g. (delta < 0
&& array == rq->active).
Thanks in advance for any comments.
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-04 14:04 [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Dmitry Adamushko
@ 2007-04-04 14:15 ` Ingo Molnar
2007-04-04 15:23 ` Dmitry Adamushko
2007-04-04 20:05 ` [PATCH] " Dmitry Adamushko
0 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2007-04-04 14:15 UTC (permalink / raw)
To: Dmitry Adamushko; +Cc: Andrew Morton, Linux Kernel
* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> Hello,
>
> Scenario:
>
> Currently running [task1] boosts a priority (lowers a static_prio) of
> [task2] via { setpriority -> set_user_nice } and [task2] happens to be
> in the "expired" array at the moment.
>
> According to the set_user_nice(), "delta" is negative (the prio is
> boosted) and, hence, resched_task(rq->curr) is called.
>
> As the [task2] is in the "expired" array and there are still tasks (at
> least [task1]) in the "active" one, the triggered reschedule is just
> useless (e.g. gets control back to [task1]).
>
> Am I missing something?
>
> The same is applicable to rt_mutex_setprio().
>
> Of course, not a big deal, but it's easily avoidable, e.g. (delta < 0
> && array == rq->active).
i think you are right and a micro-optimization could be done here. Would
you like to do a patch for this?
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-04 14:15 ` Ingo Molnar
@ 2007-04-04 15:23 ` Dmitry Adamushko
2007-04-04 20:05 ` [PATCH] " Dmitry Adamushko
1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-04 15:23 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel
On 04/04/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> > [...]
> >
> > The same is applicable to rt_mutex_setprio().
> >
> > Of course, not a big deal, but it's easily avoidable, e.g. (delta < 0
> > && array == rq->active).
>
> i think you are right and a micro-optimization could be done here. Would
> you like to do a patch for this?
Yes, I'll do it.
in fact, "delta < 0 && array == rq->active" is also sub-optimal.
"TASK_PREEMPTS_CURR(p, rq) && array == rq->active" seems to be ok.. or
maybe even TASK_PREEMPTS_CURR() should internally check for "p->array
== rq->active"...
will come with some solution.
Thanks.
>
> Ingo
>
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-04 14:15 ` Ingo Molnar
2007-04-04 15:23 ` Dmitry Adamushko
@ 2007-04-04 20:05 ` Dmitry Adamushko
2007-04-07 0:03 ` Andrew Morton
2007-04-07 9:19 ` [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Ingo Molnar
1 sibling, 2 replies; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-04 20:05 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel
Ingo,
following the conversation on "a redundant reschedule call in set_user_prio()",
here is a possible approach.
The patch is somewhat intrusive as it even dares to adapt TASK_PREEMPTS_CURR().
Nevertheless, this adaptation seems to be ok with all the current use-cases.
Presupposition: TASK_PREEMPTS_CURR(p, rq) will /never/ be used as "a
mere prio comparator" - e.g. to make decisions on which array a task
has to be placed in.
=====
o Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's
prio is higher than the current's one and the task is in the "active"
array.
This ensures we don't make redundant resched_task() calls when the
task is in the "expired" array (as may happen now in set_user_prio(),
rt_mutex_setprio() and pull_task() ) ;
o generilise conditions for a call to resched_task() in
set_user_nice(), rt_mutex_setprio() and sched_setscheduler()
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
--
--- linux-2.6.21-rc5/kernel/sched-orig.c 2007-04-04
18:26:19.000000000 +0200
+++ linux-2.6.21-rc5/kernel/sched.c 2007-04-04 18:26:43.000000000 +0200
@@ -168,7 +168,7 @@ unsigned long long __attribute__((weak))
(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
#define TASK_PREEMPTS_CURR(p, rq) \
- ((p)->prio < (rq)->curr->prio)
+ (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active))
#define SCALE_PRIO(x, prio) \
max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_TIMESLICE)
@@ -3847,13 +3847,13 @@ void rt_mutex_setprio(struct task_struct
struct prio_array *array;
unsigned long flags;
struct rq *rq;
- int oldprio;
+ int delta;
BUG_ON(prio < 0 || prio > MAX_PRIO);
rq = task_rq_lock(p, &flags);
- oldprio = p->prio;
+ delta = prio - p->prio;
array = p->array;
if (array)
dequeue_task(p, array);
@@ -3869,13 +3869,10 @@ void rt_mutex_setprio(struct task_struct
enqueue_task(p, array);
/*
* Reschedule if we are currently running on this runqueue and
- * our priority decreased, or if we are not currently running on
- * this runqueue and our priority is higher than the current's
+ * our priority decreased, or if our priority became higher
+ * than the current's.
*/
- if (task_running(rq, p)) {
- if (p->prio > oldprio)
- resched_task(rq->curr);
- } else if (TASK_PREEMPTS_CURR(p, rq))
+ if (TASK_PREEMPTS_CURR(p, rq) || (delta > 0 &&
task_running(rq, p)))
resched_task(rq->curr);
}
task_rq_unlock(rq, &flags);
@@ -3923,10 +3920,11 @@ void set_user_nice(struct task_struct *p
enqueue_task(p, array);
inc_raw_weighted_load(rq, p);
/*
- * If the task increased its priority or is running and
- * lowered its priority, then reschedule its CPU:
+ * Reschedule if we are currently running on this runqueue and
+ * our priority decreased, or if our priority became higher
+ * than the current's.
*/
- if (delta < 0 || (delta > 0 && task_running(rq, p)))
+ if (TASK_PREEMPTS_CURR(p, rq) || (delta > 0 &&
task_running(rq, p)))
resched_task(rq->curr);
}
out_unlock:
@@ -4153,13 +4151,10 @@ recheck:
__activate_task(p, rq);
/*
* Reschedule if we are currently running on this runqueue and
- * our priority decreased, or if we are not currently running on
- * this runqueue and our priority is higher than the current's
+ * our priority decreased, or our priority became higher
+ * than the current's.
*/
- if (task_running(rq, p)) {
- if (p->prio > oldprio)
- resched_task(rq->curr);
- } else if (TASK_PREEMPTS_CURR(p, rq))
+ if (TASK_PREEMPTS_CURR(p, rq) || (task_running(rq, p)
&& p->prio > oldprio))
resched_task(rq->curr);
}
__task_rq_unlock(rq);
=====
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-04 20:05 ` [PATCH] " Dmitry Adamushko
@ 2007-04-07 0:03 ` Andrew Morton
2007-04-07 9:16 ` Dmitry Adamushko
2007-04-07 9:24 ` Ingo Molnar
2007-04-07 9:19 ` [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Ingo Molnar
1 sibling, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2007-04-07 0:03 UTC (permalink / raw)
To: Dmitry Adamushko; +Cc: Ingo Molnar, Linux Kernel, Con Kolivas, Mike Galbraith
On Wed, 4 Apr 2007 22:05:40 +0200 "Dmitry Adamushko" <dmitry.adamushko@gmail.com> wrote:
> Ingo,
>
> following the conversation on "a redundant reschedule call in set_user_prio()",
> here is a possible approach.
>
> The patch is somewhat intrusive as it even dares to adapt TASK_PREEMPTS_CURR().
>
> Nevertheless, this adaptation seems to be ok with all the current use-cases.
>
> Presupposition: TASK_PREEMPTS_CURR(p, rq) will /never/ be used as "a
> mere prio comparator" - e.g. to make decisions on which array a task
> has to be placed in.
>
>
> =====
>
> o Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's
> prio is higher than the current's one and the task is in the "active"
> array.
> This ensures we don't make redundant resched_task() calls when the
> task is in the "expired" array (as may happen now in set_user_prio(),
> rt_mutex_setprio() and pull_task() ) ;
>
> o generilise conditions for a call to resched_task() in
> set_user_nice(), rt_mutex_setprio() and sched_setscheduler()
>
grief. This patch conflicts seriously with the staircase scheduler in -mm.
So to merge it I need to
- apply it
- then apply a revert-it-again patch
- then apply staircase
- then ask Con to cook up a staircase-based equivalent of your change.
so
- your code only gets publically tested in its against-staircase version
- the against-mainline version will get merged without having been
publically tested outside of staircase
which is probably all OK for a 2.6.22-rc1 thing, provided Ingo can give a
confident ack.
Where are we at with staircase anyway? Is it looking like a 2.6.22 thing?
I don't personally think we've yet seen enough serious performance testing
to permit a merge, apart from other issues...
> --- linux-2.6.21-rc5/kernel/sched-orig.c 2007-04-04
> 18:26:19.000000000 +0200
> +++ linux-2.6.21-rc5/kernel/sched.c 2007-04-04 18:26:43.000000000 +0200
> @@ -168,7 +168,7 @@ unsigned long long __attribute__((weak))
> (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
>
> #define TASK_PREEMPTS_CURR(p, rq) \
> - ((p)->prio < (rq)->curr->prio)
> + (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active))
Your patch was wordwrapped and had its tabs replaced with spaces. Please
fix your email client.
(I might as well make that paragraph my .signature)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-07 0:03 ` Andrew Morton
@ 2007-04-07 9:16 ` Dmitry Adamushko
2007-04-07 9:24 ` Ingo Molnar
1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-07 9:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, Con Kolivas, Mike Galbraith, Linux Kernel
On 07/04/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 4 Apr 2007 22:05:40 +0200 "Dmitry Adamushko"
> > [...]
> >
> > o Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's
> > prio is higher than the current's one and the task is in the "active"
> > array.
> > This ensures we don't make redundant resched_task() calls when the
> > task is in the "expired" array (as may happen now in set_user_prio(),
> > rt_mutex_setprio() and pull_task() ) ;
> >
> > o generilise conditions for a call to resched_task() in
> > set_user_nice(), rt_mutex_setprio() and sched_setscheduler()
> >
>
> grief. This patch conflicts seriously with the staircase scheduler in -mm.
> So to merge it I need to
>
> - apply it
> - then apply a revert-it-again patch
> - then apply staircase
> - then ask Con to cook up a staircase-based equivalent of your change.
I'll make a SD-based version and send it to Con.
> so
>
> - your code only gets publically tested in its against-staircase version
>
> - the against-mainline version will get merged without having been
> publically tested outside of staircase
>
> which is probably all OK for a 2.6.22-rc1 thing, provided Ingo can give a
> confident ack.
Ok, thanks.
btw, just out of curiosity. The very first approach I was thinking of
- was to move a task from the "expired" to the "active" array when its
priority is boosted (like rt_mutex_setprio() does for rt tasks).
Reasoning: getting a higher static_prio means getting an additional
quota of timeslice which still could be used during this rotation.
delta = task_timeslice(p->static_prio) - task_timeslice(old_static_prio)
Aha.. /here I'm looking at the mainline now/ another funny thing is
that a time_slice is not immediately affected by the change of
static_prio in set_user_nice(). If a task is in the expired array, it
will run the next rotation with the *old* time_slice (i.e. calculated
in task_running_tick() before putting the task in the expired array
and based on the *old* static_prio).
In theory, set_user_nice() could adjust a p->time_slice with "delta"
being calculated as shown above.. But ok, it's not more than a minor
inconsistency (of course, if I'm not missing something).
>
> > --- linux-2.6.21-rc5/kernel/sched-orig.c 2007-04-04
> > 18:26:19.000000000 +0200
> > +++ linux-2.6.21-rc5/kernel/sched.c 2007-04-04 18:26:43.000000000 +0200
> > @@ -168,7 +168,7 @@ unsigned long long __attribute__((weak))
> > (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
> >
> > #define TASK_PREEMPTS_CURR(p, rq) \
> > - ((p)->prio < (rq)->curr->prio)
> > + (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active))
>
> Your patch was wordwrapped and had its tabs replaced with spaces. Please
> fix your email client.
I apologize for this. Will fix.
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-04 20:05 ` [PATCH] " Dmitry Adamushko
2007-04-07 0:03 ` Andrew Morton
@ 2007-04-07 9:19 ` Ingo Molnar
1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2007-04-07 9:19 UTC (permalink / raw)
To: Dmitry Adamushko; +Cc: Andrew Morton, Linux Kernel
* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> following the conversation on "a redundant reschedule call in
> set_user_prio()", here is a possible approach.
>
> The patch is somewhat intrusive as it even dares to adapt
> TASK_PREEMPTS_CURR().
looks good to me, but the patch seems seriously whitespace-damaged: all
tabs were converted to spaces.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
2007-04-07 0:03 ` Andrew Morton
2007-04-07 9:16 ` Dmitry Adamushko
@ 2007-04-07 9:24 ` Ingo Molnar
2007-04-07 16:20 ` SD scheduler testing hitch Mike Galbraith
1 sibling, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2007-04-07 9:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dmitry Adamushko, Linux Kernel, Con Kolivas, Mike Galbraith
* Andrew Morton <akpm@linux-foundation.org> wrote:
> so
>
> - your code only gets publically tested in its against-staircase
> version
>
> - the against-mainline version will get merged without having been
> publically tested outside of staircase
>
> which is probably all OK for a 2.6.22-rc1 thing, provided Ingo can
> give a confident ack.
it looks good to me - and once i get a non-whitespace-damaged patch i'll
put it into -rt so we'll have testing. (this patch should have at most a
latency impact, if we forget to preempt somewhere, and -rt users are
quite touchy about latencies.)
> Where are we at with staircase anyway? Is it looking like a 2.6.22
> thing? I don't personally think we've yet seen enough serious
> performance testing to permit a merge, apart from other issues...
yes, that's my thinking too at the moment. I'd also like to see a
summary of 'open design questions' list from Mike (if Mike has
time/energy for that?) - many questions were raised, a good number of
them were answered, various changes done to SD but there's no good
summary of the current state of affairs.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* SD scheduler testing hitch
2007-04-07 9:24 ` Ingo Molnar
@ 2007-04-07 16:20 ` Mike Galbraith
2007-04-07 17:17 ` Mike Galbraith
0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2007-04-07 16:20 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Dmitry Adamushko, Linux Kernel, Con Kolivas
[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]
On Sat, 2007-04-07 at 11:24 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> > Where are we at with staircase anyway? Is it looking like a 2.6.22
> > thing? I don't personally think we've yet seen enough serious
> > performance testing to permit a merge, apart from other issues...
>
> yes, that's my thinking too at the moment. I'd also like to see a
> summary of 'open design questions' list from Mike (if Mike has
> time/energy for that?) - many questions were raised, a good number of
> them were answered, various changes done to SD but there's no good
> summary of the current state of affairs.
I'm working on it. I started testing fairness, but ran into a snag.
What I was testing was my theory that SD can't possibly be fair to
sleeping tasks because the differential between long burn short sleep
tasks and long sleep short burn tasks is tossed at the end of every
rotation. That theory seems to be true, but here's the snag...
2.6.21-rc6-sd-0.39, box is 3GHz P4/HT
tenpercent: tenpercent.c compiled to run 1 10% duty cycle task.
100ms and friends: tenpercent.c hard coded for N ms burn + 1 usec sleep.
taskset -c 1 ./tenpercent
taskset -c 1 ./100ms (or ilk)
top - 10:47:57 up 3:11, 13 users, load average: 1.65, 1.63, 2.50
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
7357 root 9 0 1568 440 360 R 92 0.0 10:55.01 1 100ms
7356 root 1 0 1568 444 360 S 8 0.0 1:00.01 1 tenpercent
5557 root 1 0 164m 21m 4876 S 0 2.1 1:58.90 0 Xorg
6343 root 3 0 2376 1068 768 R 0 0.1 2:51.19 0 top
top - 11:05:16 up 3:29, 13 users, load average: 1.52, 1.50, 1.81
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
7395 root 5 0 1568 444 360 R 90 0.0 8:54.25 1 100ms
7394 root 0 -10 1568 440 360 S 10 0.0 1:00.21 1 tenpercent
6343 root 3 0 2376 1068 768 R 0 0.1 3:04.16 0 top
1 root 1 0 736 288 240 S 0 0.0 0:00.90 0 init
top - 11:20:58 up 3:44, 13 users, load average: 1.89, 1.87, 1.78
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
7429 root 2 -10 1568 444 360 R 92 0.0 12:03.81 1 100ms
7428 root 0 -10 1568 444 360 R 8 0.0 1:00.08 1 tenpercent
6343 root 3 0 2376 1068 768 R 1 0.1 3:19.36 0 top
1 root 1 0 736 288 240 S 0 0.0 0:00.90 0 init
top - 12:22:27 up 4:46, 13 users, load average: 1.90, 1.92, 1.94
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
8235 root 1 -20 1568 444 360 R 95 0.0 19:31.20 1 100ms
8234 root 0 -20 1568 444 360 S 5 0.0 1:00.01 1 tenpercent
6343 root 3 0 2376 1068 768 R 1 0.1 4:24.24 0 top
4926 root 1 0 1820 632 544 S 0 0.1 0:02.34 0 hald-addon-stor
top - 13:38:22 up 6:02, 13 users, load average: 1.53, 1.51, 1.51
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
8643 root 5 0 1564 444 360 R 93 0.0 12:15.49 1 50ms
8642 root 1 0 1564 444 360 S 7 0.0 1:00.28 1 tenpercent
6343 root 3 0 2376 1080 768 R 0 0.1 5:27.22 0 top
1 root 1 0 736 288 240 S 0 0.0 0:00.91 0 init
top - 14:02:39 up 6:26, 13 users, load average: 1.75, 1.71, 1.56
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
8726 root 5 0 1564 444 360 R 94 0.0 15:19.07 1 8ms
8727 root 1 0 1564 444 360 R 6 0.0 1:00.11 1 tenpercent
5557 root 1 0 164m 21m 4632 S 0 2.1 3:20.92 0 Xorg
6079 root 1 0 31584 17m 12m S 0 1.7 0:04.35 0 konsole
top - 16:22:01 up 8:45, 13 users, load average: 1.73, 1.81, 1.60
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
10622 root 1 0 1428 264 212 R 98 0.0 10:00.43 1 xx
10621 root 1 0 1564 440 360 S 1 0.0 0:06.49 1 tenpercent
10423 root 3 0 2248 1052 764 R 0 0.1 0:27.45 0 top
1 root 1 0 736 288 240 S 0 0.0 0:00.91 0 init
xx.c just tries to terminate the rotation if it gets preempted, and
seems to succeed. It usually isn't this bad, but every few starts it
gets this bad. I thought it might be screwing up the calibration of
tenpercent if xx started first, but I plugged it into tenp.c (attached)
after the calibration, and still see this every few starts. It always
gets more cpu than it should, but sometimes it's extreme.
I have yet to see tenpercent start at 1 percent usage in many many
tries, but I just repeated it with the attached in seven tries.
xx.c
#include <stdio.h>
#include <sys/time.h>
#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))
int main(void)
{
struct timeval then, now;
struct timespec t = {0, 1000}, r;
for(;;) {
int t1, t2;
short i;
if (gettimeofday(&then, 0))
break;
for (i = 1; i > 0; i++);
if (gettimeofday(&now, 0))
break;
t2 = max(then.tv_usec, now.tv_usec);
t1 = min(then.tv_usec, now.tv_usec);
if (t2 - t1 >= 1000 && nanosleep(&t, &r))
break;
}
return 0;
}
[-- Attachment #2: fairtest.c --]
[-- Type: text/x-csrc, Size: 4379 bytes --]
// gcc -O2 -o tenp tenp.c -lrt
// code from interbench.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
/*
* Start $forks processes that run for 10% cpu time each. Set this to
* 15 * number of cpus for best effect.
*/
int forks = 1;
unsigned long run_us = 1000000000, sleep_us;
unsigned long loops_per_ms;
void terminal_error(const char *name)
{
fprintf(stderr, "\n");
perror(name);
exit (1);
}
unsigned long long get_nsecs(struct timespec *myts)
{
if (clock_gettime(CLOCK_REALTIME, myts))
terminal_error("clock_gettime");
return (myts->tv_sec * 1000000000 + myts->tv_nsec );
}
void burn_loops(unsigned long loops)
{
unsigned long i;
/*
* We need some magic here to prevent the compiler from optimising
* this loop away. Otherwise trying to emulate a fixed cpu load
* with this loop will not work.
*/
for (i = 0 ; i < loops ; i++)
asm volatile("" : : : "memory");
}
/* Use this many usecs of cpu time */
void burn_usecs(unsigned long usecs)
{
unsigned long ms_loops;
ms_loops = loops_per_ms / 1000 * usecs;
burn_loops(ms_loops);
}
void microsleep(unsigned long long usecs)
{
struct timespec req, rem;
rem.tv_sec = rem.tv_nsec = 0;
req.tv_sec = usecs / 1000000;
req.tv_nsec = (usecs - (req.tv_sec * 1000000)) * 1000;
continue_sleep:
if ((nanosleep(&req, &rem)) == -1) {
if (errno == EINTR) {
if (rem.tv_sec || rem.tv_nsec) {
req.tv_sec = rem.tv_sec;
req.tv_nsec = rem.tv_nsec;
goto continue_sleep;
}
goto out;
}
terminal_error("nanosleep");
}
out:
return;
}
/*
* In an unoptimised loop we try to benchmark how many meaningless loops
* per second we can perform on this hardware to fairly accurately
* reproduce certain percentage cpu usage
*/
void calibrate_loop(void)
{
unsigned long long start_time, loops_per_msec, run_time = 0,
min_run_us = run_us;
unsigned long loops;
struct timespec myts;
int i;
printf("Calibrating loop\n");
loops_per_msec = 1000000;
redo:
/* Calibrate to within 1% accuracy */
while (run_time > 1010000 || run_time < 990000) {
loops = loops_per_msec;
start_time = get_nsecs(&myts);
burn_loops(loops);
run_time = get_nsecs(&myts) - start_time;
loops_per_msec = (1000000 * loops_per_msec / run_time ? :
loops_per_msec);
}
/* Rechecking after a pause increases reproducibility */
microsleep(1);
loops = loops_per_msec;
start_time = get_nsecs(&myts);
burn_loops(loops);
run_time = get_nsecs(&myts) - start_time;
/* Tolerate 5% difference on checking */
if (run_time > 1050000 || run_time < 950000)
goto redo;
loops_per_ms=loops_per_msec;
printf("Calibrating sleep interval\n");
microsleep(1);
/* Find the smallest time interval close to 1ms that we can sleep */
for (i = 0; i < 100; i++) {
start_time=get_nsecs(&myts);
microsleep(1000);
run_time=get_nsecs(&myts)-start_time;
run_time /= 1000;
if (run_time < run_us && run_us > 1000)
run_us = run_time;
}
/* Then set run_us to that duration and sleep_us to 9 x that */
sleep_us = run_us * 9;
printf("Calibrating run interval\n");
microsleep(1);
/* Do a few runs to see what really gets us run_us runtime */
for (i = 0; i < 100; i++) {
start_time=get_nsecs(&myts);
burn_usecs(run_us);
run_time=get_nsecs(&myts)-start_time;
run_time /= 1000;
if (run_time < min_run_us && run_time > run_us)
min_run_us = run_time;
}
if (min_run_us < run_us)
run_us = run_us * run_us / min_run_us;
printf("Each fork will run for %lu usecs and sleep for %lu usecs\n",
run_us, sleep_us);
}
#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))
void steal(void)
{
struct timeval then, now;
struct timespec t = {0, 1000}, r;
for(;;) {
int t1, t2;
short i;
if (gettimeofday(&then, 0))
break;
for (i = 1; i > 0; i++);
if (gettimeofday(&now, 0))
break;
t2 = max(then.tv_usec, now.tv_usec);
t1 = min(then.tv_usec, now.tv_usec);
if (t2 - t1 >= 1000 && nanosleep(&t, &r))
break;
}
}
int main(void){
int i, child;
calibrate_loop();
printf("starting %d forks\n", forks);
for(i = 0; i < forks; i++){
if(!(child = fork()))
break;
}
if (child)
steal();
else while(1){
burn_usecs(run_us);
microsleep(sleep_us);
}
return 0;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SD scheduler testing hitch
2007-04-07 16:20 ` SD scheduler testing hitch Mike Galbraith
@ 2007-04-07 17:17 ` Mike Galbraith
2007-04-08 8:02 ` Mike Galbraith
0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2007-04-07 17:17 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Dmitry Adamushko, Linux Kernel, Con Kolivas
[-- Attachment #1: Type: text/plain, Size: 915 bytes --]
On Sat, 2007-04-07 at 18:20 +0200, Mike Galbraith wrote:
> xx.c
>
> #include <stdio.h>
> #include <sys/time.h>
>
> #define max(a,b) ((a) > (b) ? (a) : (b))
> #define min(a,b) ((a) < (b) ? (a) : (b))
>
> int main(void)
> {
> struct timeval then, now;
> struct timespec t = {0, 1000}, r;
>
> for(;;) {
> int t1, t2;
> short i;
>
> if (gettimeofday(&then, 0))
> break;
> for (i = 1; i > 0; i++);
> if (gettimeofday(&now, 0))
> break;
> t2 = max(then.tv_usec, now.tv_usec);
> t1 = min(then.tv_usec, now.tv_usec);
> if (t2 - t1 >= 1000 && nanosleep(&t, &r))
> break;
> }
> return 0;
> }
I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
here every time. (ran as taskset -c 1 nice -n -10 ./fairtest) The
starving 10% duty cycle task has trouble getting 1% CPU.
-Mike
[-- Attachment #2: fairtest.c --]
[-- Type: text/x-csrc, Size: 4377 bytes --]
// gcc -O2 -o tenp tenp.c -lrt
// code from interbench.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
/*
* Start $forks processes that run for 10% cpu time each. Set this to
* 15 * number of cpus for best effect.
*/
int forks = 1;
unsigned long run_us = 1000000000, sleep_us;
unsigned long loops_per_ms;
void terminal_error(const char *name)
{
fprintf(stderr, "\n");
perror(name);
exit (1);
}
unsigned long long get_nsecs(struct timespec *myts)
{
if (clock_gettime(CLOCK_REALTIME, myts))
terminal_error("clock_gettime");
return (myts->tv_sec * 1000000000 + myts->tv_nsec );
}
void burn_loops(unsigned long loops)
{
unsigned long i;
/*
* We need some magic here to prevent the compiler from optimising
* this loop away. Otherwise trying to emulate a fixed cpu load
* with this loop will not work.
*/
for (i = 0 ; i < loops ; i++)
asm volatile("" : : : "memory");
}
/* Use this many usecs of cpu time */
void burn_usecs(unsigned long usecs)
{
unsigned long ms_loops;
ms_loops = loops_per_ms / 1000 * usecs;
burn_loops(ms_loops);
}
void microsleep(unsigned long long usecs)
{
struct timespec req, rem;
rem.tv_sec = rem.tv_nsec = 0;
req.tv_sec = usecs / 1000000;
req.tv_nsec = (usecs - (req.tv_sec * 1000000)) * 1000;
continue_sleep:
if ((nanosleep(&req, &rem)) == -1) {
if (errno == EINTR) {
if (rem.tv_sec || rem.tv_nsec) {
req.tv_sec = rem.tv_sec;
req.tv_nsec = rem.tv_nsec;
goto continue_sleep;
}
goto out;
}
terminal_error("nanosleep");
}
out:
return;
}
/*
* In an unoptimised loop we try to benchmark how many meaningless loops
* per second we can perform on this hardware to fairly accurately
* reproduce certain percentage cpu usage
*/
void calibrate_loop(void)
{
unsigned long long start_time, loops_per_msec, run_time = 0,
min_run_us = run_us;
unsigned long loops;
struct timespec myts;
int i;
printf("Calibrating loop\n");
loops_per_msec = 1000000;
redo:
/* Calibrate to within 1% accuracy */
while (run_time > 1010000 || run_time < 990000) {
loops = loops_per_msec;
start_time = get_nsecs(&myts);
burn_loops(loops);
run_time = get_nsecs(&myts) - start_time;
loops_per_msec = (1000000 * loops_per_msec / run_time ? :
loops_per_msec);
}
/* Rechecking after a pause increases reproducibility */
microsleep(1);
loops = loops_per_msec;
start_time = get_nsecs(&myts);
burn_loops(loops);
run_time = get_nsecs(&myts) - start_time;
/* Tolerate 5% difference on checking */
if (run_time > 1050000 || run_time < 950000)
goto redo;
loops_per_ms=loops_per_msec;
printf("Calibrating sleep interval\n");
microsleep(1);
/* Find the smallest time interval close to 1ms that we can sleep */
for (i = 0; i < 100; i++) {
start_time=get_nsecs(&myts);
microsleep(1000);
run_time=get_nsecs(&myts)-start_time;
run_time /= 1000;
if (run_time < run_us && run_us > 1000)
run_us = run_time;
}
/* Then set run_us to that duration and sleep_us to 9 x that */
sleep_us = run_us * 9;
printf("Calibrating run interval\n");
microsleep(1);
/* Do a few runs to see what really gets us run_us runtime */
for (i = 0; i < 100; i++) {
start_time=get_nsecs(&myts);
burn_usecs(run_us);
run_time=get_nsecs(&myts)-start_time;
run_time /= 1000;
if (run_time < min_run_us && run_time > run_us)
min_run_us = run_time;
}
if (min_run_us < run_us)
run_us = run_us * run_us / min_run_us;
printf("Each fork will run for %lu usecs and sleep for %lu usecs\n",
run_us, sleep_us);
}
#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))
void steal(void)
{
struct timeval then, now;
struct timespec t = {0, 500}, r;
for(;;) {
int t1, t2;
short i;
if (gettimeofday(&then, 0))
break;
for (i = 1; i > 0; i++);
if (gettimeofday(&now, 0))
break;
t2 = max(then.tv_usec, now.tv_usec);
t1 = min(then.tv_usec, now.tv_usec);
if (t2 - t1 >= 500 && nanosleep(&t, &r))
break;
}
}
int main(void){
int i, child;
calibrate_loop();
printf("starting %d forks\n", forks);
for(i = 0; i < forks; i++){
if(!(child = fork()))
break;
}
if (child)
steal();
else while(1){
burn_usecs(run_us);
microsleep(sleep_us);
}
return 0;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SD scheduler testing hitch
2007-04-07 17:17 ` Mike Galbraith
@ 2007-04-08 8:02 ` Mike Galbraith
2007-04-09 0:14 ` Dmitry Adamushko
0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2007-04-08 8:02 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Dmitry Adamushko, Linux Kernel, Con Kolivas
On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:
> I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> here every time. (ran as taskset -c 1 nice -n -10 ./fairtest) The
> starving 10% duty cycle task has trouble getting 1% CPU.
Hmm. Playing with it some more today, it still happens, but it's not
very repeatable. Something is odd. I wonder if any SD using readers
will try it.
-Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SD scheduler testing hitch
2007-04-08 8:02 ` Mike Galbraith
@ 2007-04-09 0:14 ` Dmitry Adamushko
2007-04-09 0:23 ` Dmitry Adamushko
0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-09 0:14 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Ingo Molnar, Andrew Morton, Con Kolivas, Linux Kernel
On 08/04/07, Mike Galbraith <efault@gmx.de> wrote:
> On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:
>
> > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> > here every time. (ran as taskset -c 1 nice -n -10 ./fairtest) The
> > starving 10% duty cycle task has trouble getting 1% CPU.
>
Something is odd, very odd indeed. But surprise-surprise, it does not
seem to be something merely SD-releated.
In short, the question is - can we always believe statistics being
provided by "top" (i.e. the way it's being collected by the kernel)?
The tests are below. Somewhere in the middle are thoughts on how HZ
and an interval of cpu usage by a given task may be connected to such
a behaviour.
The system: Pentiium 3 Coppermine 750 MHz (iThinkPad T21), 256 RAM.
I tested 3 configurations:
(1) 2.6.13-15 (default in SuSE 10)
(2) 2.6.19
(3) 2.6.21-rc5 + sd-0.39
TEST: just a tenp.c, i.e. without Mike's "steal" (either as xx.c or as
a part of modified fairtest.c) thingy, but
tenp -- a tenp.c with a single running copy;
tenp2 -- a tenp.c with 2 (1 additionally forked) running copies
tenp15 - 15 copies (only for SD)
(1) 2.6.13-15
Tasks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.6% us, 0.7% sy, 0.0% ni, 90.4% id, 0.0% wa, 0.3% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5582 dimm 15 0 1460 428 348 S 6.0 0.2 0:02.03 tenp
4047 messageb 17 0 3520 1584 1324 S 1.3 0.6 0:00.28 dbus-daemon
Tasks: 76 total, 1 running, 75 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.9% us, 0.3% sy, 0.0% ni, 84.8% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5598 dimm 15 0 1460 428 348 S 7.2 0.2 0:01.42 tenp2
5599 dimm 15 0 1460 432 352 S 6.9 0.2 0:00.87 tenp2
5591 dimm 16 0 2108 988 764 R 0.3 0.4 0:00.47 top
1 root 16 0 688 260 224 S 0.0 0.1 0:01.78 init
I repeated 7 times each of the tests (tenp and tenp2). All are ok.
Now an interesting part starts.
(2) 2.6.19
[ 2.1 ]
ks: 73 total, 1 running, 72 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3% us, 0.7% sy, 0.0% ni, 98.0% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8312 root 15 0 27168 14m 2128 S 0.7 5.6 0:08.29 X
8640 dimm 15 0 28656 13m 9m S 0.7 5.4 0:03.44 konsole
8813 dimm 15 0 1460 432 352 S 0.3 0.2 0:00.32 tenp
1 root 15 0 696 268 228 S 0.0 0.1 0:01.12 init
[ 2.2 ]
ks: 73 total, 3 running, 70 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.6% us, 0.7% sy, 0.0% ni, 92.7% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8816 dimm 15 0 1464 432 352 S 5.0 0.2 0:01.49 tenp
8312 root 15 0 27168 14m 2128 R 1.7 5.6 0:09.08 X
See a difference between [ 2.1 ] and [ 2.2 ] ? [ 2.2 ] (which is ok)
has happened 3 out of 10 times.
Now for tenp2
[ 2.3 ]
ks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.6% us, 0.3% sy, 0.0% ni, 85.1% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8850 dimm 15 0 1460 432 352 S 6.6 0.2 0:01.32 tenp2
8851 dimm 15 0 1460 112 32 S 6.3 0.0 0:00.77 tenp2
8312 root 15 0 27168 14m 2128 S 0.7 5.6 0:11.73 X
[ 2.4 ]
ks: 74 total, 2 running, 72 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.3% us, 0.3% sy, 0.0% ni, 96.3% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8312 root 15 0 27168 14m 2128 S 2.0 5.6 0:12.97 X
8640 dimm 15 0 28748 13m 9m R 0.7 5.4 0:07.22 konsole
8532 dimm 18 0 2476 416 268 S 0.3 0.2 0:00.04 gpg-agent
8852 dimm 15 0 2116 996 772 R 0.3 0.4 0:00.27 top
8859 dimm 15 0 1460 432 352 S 0.3 0.2 0:00.44 tenp2
8860 dimm 15 0 1460 112 32 S 0.3 0.0 0:00.02 tenp2
1 root 15 0 696 268 228 S 0.0 0.1 0:01.12 init
Again, [ 2.3 ] took place only 3 times.
Some observations:
/1/ for the "ok" ( [ 2.2 ] and [ 2.3 ] ) cases, the "will run" and
"will sleep" times from tenp's calibration output look /higher/ than
on average :
e.g.
Each fork will run for 5863 usecs and sleep for 52767 usecs
v.s. something in between
Each fork will run for 2392 usecs and sleep for 21528 usecs
Each fork will run for 3880 usecs and sleep for 34920 usecs
for the most part of cases (when tenp's cpu% ~0.3).
/2/ HZ = 250 for 2.6.19 and I think it was still 1000 for 2.6.13
(arghh.. forgot to check and would like to avoid a reboot in this
already late hour... but I believe it was still the time of 1000 by
default).
=============
(*)
HZ == 250 ==> timer_tick is once in 4 ms. So - "will run for" < 4 ms -
may come well unaccounted? :o)
The funny thing is (at least in theory) - if a task is using CPU in
portions < 1/HZ s. and specially shifted against timer interrupts ->
scheduler_tick() - its time_slice isn't decreasing at all (very
theoretically) or just more or less slower (with another load a moment
when the task is running should drift wrt moments of timer interrupts
and hit them from time to time -> get accounted).
=============
(3) 2.6.21-rc5 + sd-0.39
Here the results are similar to (2).
Cpu(s): 10.6% us, 0.0% sy, 0.0% ni, 89.4% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3624 dimm 20 0 1460 428 352 S 5.0 0.2 0:01.50 tenp2
3625 dimm 20 0 1460 108 32 S 4.6 0.0 0:00.76 tenp2
2797 root 20 0 27112 13m 2128 S 0.3 5.6 0:18.62 X
Cpu(s): 2.0% us, 0.3% sy, 0.0% ni, 97.7% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3738 dimm 20 0 1460 432 352 S 0.3 0.2 0:00.60 tenp2
3739 dimm 20 0 1460 112 32 S 0.3 0.0 0:00.18 tenp2
and now let's run both tenp and tenp15
look at the "tenp" below
/1/
Tasks: 82 total, 10 running, 72 sleeping, 0 stopped, 0 zombie
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3760 dimm 31 0 1464 112 32 S 7.9 0.0 0:01.82 tenp15
3773 dimm 31 0 1460 432 352 R 7.2 0.2 0:01.24 tenp
3758 dimm 31 0 1464 112 32 S 6.9 0.0 0:01.73 tenp15
3759 dimm 31 0 1464 112 32 S 6.9 0.0 0:01.65 tenp15
3757 dimm 31 0 1464 432 352 R 6.2 0.2 0:01.89 tenp15
3762 dimm 31 0 1464 112 32 R 6.2 0.0 0:01.70 tenp15
3763 dimm 31 0 1464 112 32 S 6.2 0.0 0:01.75 tenp15
3765 dimm 31 0 1464 112 32 S 6.2 0.0 0:01.82 tenp15
3767 dimm 31 0 1464 112 32 S 6.2 0.0 0:01.73 tenp15
3764 dimm 31 0 1464 112 32 R 5.9 0.0 0:01.70 tenp15
3769 dimm 31 0 1464 112 32 R 5.9 0.0 0:01.65 tenp15
3761 dimm 31 0 1464 112 32 S 5.6 0.0 0:01.66 tenp15
3766 dimm 31 0 1464 112 32 R 5.6 0.0 0:01.68 tenp15
3768 dimm 31 0 1464 112 32 R 5.6 0.0 0:01.77 tenp15
3771 dimm 31 0 1464 112 32 R 5.6 0.0 0:01.72 tenp15
3770 dimm 31 0 1464 112 32 S 5.2 0.0 0:01.54 tenp15
3178 dimm 20 0 28996 13m 9m R 0.7 5.6 0:17.49 konsole
and now let's kill tenp15 so tenp remains alone.
Cpu(s): 0.0% us, 0.3% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3773 dimm 20 0 1460 432 352 S 0.3 0.2 0:03.59 tenp
3775 dimm 20 0 2120 1000 772 R 0.3 0.4 0:00.19 top
strange.
I doesn't happen for tenp15 (each always consumes ~6%). I repeated
about 10 times.
Well, it's a late hour, so maybe I'm missing something... but it does
look to be HZ and "will run" time interval related issue. Like
described in (*). Or maybe we both observe similar situations but have
different reasons behind them.
> -Mike
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SD scheduler testing hitch
2007-04-09 0:14 ` Dmitry Adamushko
@ 2007-04-09 0:23 ` Dmitry Adamushko
2007-04-09 5:54 ` Mike Galbraith
0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-09 0:23 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Linux Kernel
> [...]
> Well, it's a late hour, so maybe I'm missing something... but it does
> look to be HZ and "will run" time interval related issue. Like
> described in (*). Or maybe we both observe similar situations but have
> different reasons behind them.
I meant that account_user_time() is also called from timer_ISR ->
update_process_times() like scheduler_tick(). So if task's running
intervals are shorter than 1/HZ, it's not always accounted --> so cpu%
may be wrong for such a task...
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SD scheduler testing hitch
2007-04-09 0:23 ` Dmitry Adamushko
@ 2007-04-09 5:54 ` Mike Galbraith
0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2007-04-09 5:54 UTC (permalink / raw)
To: Dmitry Adamushko; +Cc: Linux Kernel
On Mon, 2007-04-09 at 02:23 +0200, Dmitry Adamushko wrote:
> > [...]
> > Well, it's a late hour, so maybe I'm missing something... but it does
> > look to be HZ and "will run" time interval related issue. Like
> > described in (*). Or maybe we both observe similar situations but have
> > different reasons behind them.
>
> I meant that account_user_time() is also called from timer_ISR ->
> update_process_times() like scheduler_tick(). So if task's running
> intervals are shorter than 1/HZ, it's not always accounted --> so cpu%
> may be wrong for such a task...
I think you're right wrt percentages, and that's making accurate
measurement of SD fairness difficult. However, total runtime for user
tasks should be pretty accurate for kernels that use nanoseconds,
because they're added every time a tasks passes through schedule().
BTW, the aberration I noticed with my unverified "testcase" does _seem_
to be repeatable here. Once behavior changes, after a reboot the
repeatability returns. I have no idea what's going on, but something is
sure fishy.
-Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-04-09 5:54 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-04 14:04 [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Dmitry Adamushko
2007-04-04 14:15 ` Ingo Molnar
2007-04-04 15:23 ` Dmitry Adamushko
2007-04-04 20:05 ` [PATCH] " Dmitry Adamushko
2007-04-07 0:03 ` Andrew Morton
2007-04-07 9:16 ` Dmitry Adamushko
2007-04-07 9:24 ` Ingo Molnar
2007-04-07 16:20 ` SD scheduler testing hitch Mike Galbraith
2007-04-07 17:17 ` Mike Galbraith
2007-04-08 8:02 ` Mike Galbraith
2007-04-09 0:14 ` Dmitry Adamushko
2007-04-09 0:23 ` Dmitry Adamushko
2007-04-09 5:54 ` Mike Galbraith
2007-04-07 9:19 ` [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox