public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
@ 2007-04-04 14:04 Dmitry Adamushko
  2007-04-04 14:15 ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-04 14:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel

Hello,

Scenario:

Currently running [task1] boosts a priority (lowers a static_prio) of
[task2] via { setpriority -> set_user_nice } and [task2] happens to be
in the "expired" array at the moment.

According to the set_user_nice(), "delta" is negative (the prio is
boosted) and, hence, resched_task(rq->curr) is called.

As the [task2] is in the "expired" array and there are still tasks (at
least [task1]) in the "active" one, the triggered reschedule is just
useless (e.g. gets control back to [task1]).

Am I missing something?

The same is applicable to rt_mutex_setprio().

Of course, not a big deal, but it's easily avoidable, e.g. (delta < 0
&& array == rq->active).

Thanks in advance for any comments.

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-04 14:04 [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Dmitry Adamushko
@ 2007-04-04 14:15 ` Ingo Molnar
  2007-04-04 15:23   ` Dmitry Adamushko
  2007-04-04 20:05   ` [PATCH] " Dmitry Adamushko
  0 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2007-04-04 14:15 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Andrew Morton, Linux Kernel


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> Hello,
> 
> Scenario:
> 
> Currently running [task1] boosts a priority (lowers a static_prio) of 
> [task2] via { setpriority -> set_user_nice } and [task2] happens to be 
> in the "expired" array at the moment.
> 
> According to the set_user_nice(), "delta" is negative (the prio is 
> boosted) and, hence, resched_task(rq->curr) is called.
> 
> As the [task2] is in the "expired" array and there are still tasks (at 
> least [task1]) in the "active" one, the triggered reschedule is just 
> useless (e.g. gets control back to [task1]).
> 
> Am I missing something?
> 
> The same is applicable to rt_mutex_setprio().
> 
> Of course, not a big deal, but it's easily avoidable, e.g. (delta < 0 
> && array == rq->active).

i think you are right and a micro-optimization could be done here. Would 
you like to do a patch for this?

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-04 14:15 ` Ingo Molnar
@ 2007-04-04 15:23   ` Dmitry Adamushko
  2007-04-04 20:05   ` [PATCH] " Dmitry Adamushko
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-04 15:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel

On 04/04/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> > [...]
> >
> > The same is applicable to rt_mutex_setprio().
> >
> > Of course, not a big deal, but it's easily avoidable, e.g. (delta < 0
> > && array == rq->active).
>
> i think you are right and a micro-optimization could be done here. Would
> you like to do a patch for this?

Yes, I'll do it.

in fact, "delta < 0 && array == rq->active" is also sub-optimal.

"TASK_PREEMPTS_CURR(p, rq) && array == rq->active" seems to be ok.. or
maybe even TASK_PREEMPTS_CURR() should internally check for "p->array
== rq->active"...

will come with some solution.

Thanks.

>
>         Ingo
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-04 14:15 ` Ingo Molnar
  2007-04-04 15:23   ` Dmitry Adamushko
@ 2007-04-04 20:05   ` Dmitry Adamushko
  2007-04-07  0:03     ` Andrew Morton
  2007-04-07  9:19     ` [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Ingo Molnar
  1 sibling, 2 replies; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-04 20:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linux Kernel

Ingo,

following the conversation on "a redundant reschedule call in set_user_prio()",
here is a possible approach.

The patch is somewhat intrusive as it even dares to adapt TASK_PREEMPTS_CURR().

Nevertheless, this adaptation seems to be ok with all the current use-cases.

Presupposition: TASK_PREEMPTS_CURR(p, rq) will /never/ be used as "a
mere prio comparator" - e.g. to make decisions on which array a task
has to be placed in.


=====

o  Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's
prio is higher than the current's one and the task is in the "active"
array.
This ensures we don't make redundant resched_task() calls when the
task is in the "expired" array (as may happen now in set_user_prio(),
rt_mutex_setprio() and pull_task() ) ;

o  generilise conditions for a call to resched_task() in
set_user_nice(), rt_mutex_setprio() and sched_setscheduler()

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
--

--- linux-2.6.21-rc5/kernel/sched-orig.c        2007-04-04
18:26:19.000000000 +0200
+++ linux-2.6.21-rc5/kernel/sched.c     2007-04-04 18:26:43.000000000 +0200
@@ -168,7 +168,7 @@ unsigned long long __attribute__((weak))
                (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))

 #define TASK_PREEMPTS_CURR(p, rq) \
-       ((p)->prio < (rq)->curr->prio)
+       (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active))

 #define SCALE_PRIO(x, prio) \
        max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_TIMESLICE)
@@ -3847,13 +3847,13 @@ void rt_mutex_setprio(struct task_struct
        struct prio_array *array;
        unsigned long flags;
        struct rq *rq;
-       int oldprio;
+       int delta;

        BUG_ON(prio < 0 || prio > MAX_PRIO);

        rq = task_rq_lock(p, &flags);

-       oldprio = p->prio;
+       delta = prio - p->prio;
        array = p->array;
        if (array)
                dequeue_task(p, array);
@@ -3869,13 +3869,10 @@ void rt_mutex_setprio(struct task_struct
                enqueue_task(p, array);
                /*
                 * Reschedule if we are currently running on this runqueue and
-                * our priority decreased, or if we are not currently running on
-                * this runqueue and our priority is higher than the current's
+                * our priority decreased, or if our priority became higher
+                * than the current's.
                 */
-               if (task_running(rq, p)) {
-                       if (p->prio > oldprio)
-                               resched_task(rq->curr);
-               } else if (TASK_PREEMPTS_CURR(p, rq))
+               if (TASK_PREEMPTS_CURR(p, rq) || (delta > 0 &&
task_running(rq, p)))
                        resched_task(rq->curr);
        }
        task_rq_unlock(rq, &flags);
@@ -3923,10 +3920,11 @@ void set_user_nice(struct task_struct *p
                enqueue_task(p, array);
                inc_raw_weighted_load(rq, p);
                /*
-                * If the task increased its priority or is running and
-                * lowered its priority, then reschedule its CPU:
+                * Reschedule if we are currently running on this runqueue and
+                * our priority decreased, or if our priority became higher
+                * than the current's.
                 */
-               if (delta < 0 || (delta > 0 && task_running(rq, p)))
+               if (TASK_PREEMPTS_CURR(p, rq) || (delta > 0 &&
task_running(rq, p)))
                        resched_task(rq->curr);
        }
 out_unlock:
@@ -4153,13 +4151,10 @@ recheck:
                __activate_task(p, rq);
                /*
                 * Reschedule if we are currently running on this runqueue and
-                * our priority decreased, or if we are not currently running on
-                * this runqueue and our priority is higher than the current's
+                * our priority decreased, or our priority became higher
+                * than the current's.
                 */
-               if (task_running(rq, p)) {
-                       if (p->prio > oldprio)
-                               resched_task(rq->curr);
-               } else if (TASK_PREEMPTS_CURR(p, rq))
+               if (TASK_PREEMPTS_CURR(p, rq) || (task_running(rq, p)
&& p->prio > oldprio))
                        resched_task(rq->curr);
        }
        __task_rq_unlock(rq);

=====

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-04 20:05   ` [PATCH] " Dmitry Adamushko
@ 2007-04-07  0:03     ` Andrew Morton
  2007-04-07  9:16       ` Dmitry Adamushko
  2007-04-07  9:24       ` Ingo Molnar
  2007-04-07  9:19     ` [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Ingo Molnar
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2007-04-07  0:03 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Ingo Molnar, Linux Kernel, Con Kolivas, Mike Galbraith

On Wed, 4 Apr 2007 22:05:40 +0200 "Dmitry Adamushko" <dmitry.adamushko@gmail.com> wrote:

> Ingo,
> 
> following the conversation on "a redundant reschedule call in set_user_prio()",
> here is a possible approach.
> 
> The patch is somewhat intrusive as it even dares to adapt TASK_PREEMPTS_CURR().
> 
> Nevertheless, this adaptation seems to be ok with all the current use-cases.
> 
> Presupposition: TASK_PREEMPTS_CURR(p, rq) will /never/ be used as "a
> mere prio comparator" - e.g. to make decisions on which array a task
> has to be placed in.
> 
> 
> =====
> 
> o  Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's
> prio is higher than the current's one and the task is in the "active"
> array.
> This ensures we don't make redundant resched_task() calls when the
> task is in the "expired" array (as may happen now in set_user_prio(),
> rt_mutex_setprio() and pull_task() ) ;
> 
> o  generilise conditions for a call to resched_task() in
> set_user_nice(), rt_mutex_setprio() and sched_setscheduler()
> 

grief.  This patch conflicts seriously with the staircase scheduler in -mm.
So to merge it I need to

- apply it 
- then apply a revert-it-again patch
- then apply staircase
- then ask Con to cook up a staircase-based equivalent of your change.

so

- your code only gets publically tested in its against-staircase version

- the against-mainline version will get merged without having been
  publically tested outside of staircase

which is probably all OK for a 2.6.22-rc1 thing, provided Ingo can give a
confident ack.


Where are we at with staircase anyway?  Is it looking like a 2.6.22 thing? 
I don't personally think we've yet seen enough serious performance testing
to permit a merge, apart from other issues...



> --- linux-2.6.21-rc5/kernel/sched-orig.c        2007-04-04
> 18:26:19.000000000 +0200
> +++ linux-2.6.21-rc5/kernel/sched.c     2007-04-04 18:26:43.000000000 +0200
> @@ -168,7 +168,7 @@ unsigned long long __attribute__((weak))
>                 (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
> 
>  #define TASK_PREEMPTS_CURR(p, rq) \
> -       ((p)->prio < (rq)->curr->prio)
> +       (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active))

Your patch was wordwrapped and had its tabs replaced with spaces.  Please
fix your email client.

(I might as well make that paragraph my .signature)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-07  0:03     ` Andrew Morton
@ 2007-04-07  9:16       ` Dmitry Adamushko
  2007-04-07  9:24       ` Ingo Molnar
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-07  9:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Con Kolivas, Mike Galbraith, Linux Kernel

On 07/04/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 4 Apr 2007 22:05:40 +0200 "Dmitry Adamushko"
> > [...]
> >
> > o  Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's
> > prio is higher than the current's one and the task is in the "active"
> > array.
> > This ensures we don't make redundant resched_task() calls when the
> > task is in the "expired" array (as may happen now in set_user_prio(),
> > rt_mutex_setprio() and pull_task() ) ;
> >
> > o  generilise conditions for a call to resched_task() in
> > set_user_nice(), rt_mutex_setprio() and sched_setscheduler()
> >
>
> grief.  This patch conflicts seriously with the staircase scheduler in -mm.
> So to merge it I need to
>
> - apply it
> - then apply a revert-it-again patch
> - then apply staircase
> - then ask Con to cook up a staircase-based equivalent of your change.

I'll make a SD-based version and send it to Con.


> so
>
> - your code only gets publically tested in its against-staircase version
>
> - the against-mainline version will get merged without having been
>   publically tested outside of staircase
>
> which is probably all OK for a 2.6.22-rc1 thing, provided Ingo can give a
> confident ack.

Ok, thanks.

btw, just out of curiosity. The very first approach I was thinking of
- was to move a task from the "expired" to the "active" array when its
priority is boosted (like rt_mutex_setprio() does for rt tasks).

Reasoning: getting a higher static_prio means getting an additional
quota of timeslice which still could be used during this rotation.

delta = task_timeslice(p->static_prio) - task_timeslice(old_static_prio)

Aha.. /here I'm looking at the mainline now/ another funny thing is
that a time_slice is not immediately affected by the change of
static_prio in set_user_nice(). If a task is in the expired array, it
will run the next rotation with the *old* time_slice (i.e. calculated
in task_running_tick() before putting the task in the expired array
and based on the *old* static_prio).
In theory, set_user_nice() could adjust a p->time_slice with "delta"
being calculated as shown above.. But ok, it's not more than a minor
inconsistency (of course, if I'm not missing something).


>
> > --- linux-2.6.21-rc5/kernel/sched-orig.c        2007-04-04
> > 18:26:19.000000000 +0200
> > +++ linux-2.6.21-rc5/kernel/sched.c     2007-04-04 18:26:43.000000000 +0200
> > @@ -168,7 +168,7 @@ unsigned long long __attribute__((weak))
> >                 (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
> >
> >  #define TASK_PREEMPTS_CURR(p, rq) \
> > -       ((p)->prio < (rq)->curr->prio)
> > +       (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active))
>
> Your patch was wordwrapped and had its tabs replaced with spaces.  Please
> fix your email client.

I apologize for this. Will fix.


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-04 20:05   ` [PATCH] " Dmitry Adamushko
  2007-04-07  0:03     ` Andrew Morton
@ 2007-04-07  9:19     ` Ingo Molnar
  1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2007-04-07  9:19 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Andrew Morton, Linux Kernel


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> following the conversation on "a redundant reschedule call in 
> set_user_prio()", here is a possible approach.
> 
> The patch is somewhat intrusive as it even dares to adapt 
> TASK_PREEMPTS_CURR().

looks good to me, but the patch seems seriously whitespace-damaged: all 
tabs were converted to spaces.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
  2007-04-07  0:03     ` Andrew Morton
  2007-04-07  9:16       ` Dmitry Adamushko
@ 2007-04-07  9:24       ` Ingo Molnar
  2007-04-07 16:20         ` SD scheduler testing hitch Mike Galbraith
  1 sibling, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2007-04-07  9:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dmitry Adamushko, Linux Kernel, Con Kolivas, Mike Galbraith


* Andrew Morton <akpm@linux-foundation.org> wrote:

> so
> 
> - your code only gets publically tested in its against-staircase 
>   version
> 
> - the against-mainline version will get merged without having been
>   publically tested outside of staircase
> 
> which is probably all OK for a 2.6.22-rc1 thing, provided Ingo can 
> give a confident ack.

it looks good to me - and once i get a non-whitespace-damaged patch i'll 
put it into -rt so we'll have testing. (this patch should have at most a 
latency impact, if we forget to preempt somewhere, and -rt users are 
quite touchy about latencies.)

> Where are we at with staircase anyway?  Is it looking like a 2.6.22 
> thing? I don't personally think we've yet seen enough serious 
> performance testing to permit a merge, apart from other issues...

yes, that's my thinking too at the moment. I'd also like to see a 
summary of 'open design questions' list from Mike (if Mike has 
time/energy for that?) - many questions were raised, a good number of 
them were answered, various changes done to SD but there's no good 
summary of the current state of affairs.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SD scheduler testing hitch
  2007-04-07  9:24       ` Ingo Molnar
@ 2007-04-07 16:20         ` Mike Galbraith
  2007-04-07 17:17           ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2007-04-07 16:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Dmitry Adamushko, Linux Kernel, Con Kolivas

[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]

On Sat, 2007-04-07 at 11:24 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:

> > Where are we at with staircase anyway?  Is it looking like a 2.6.22 
> > thing? I don't personally think we've yet seen enough serious 
> > performance testing to permit a merge, apart from other issues...
> 
> yes, that's my thinking too at the moment. I'd also like to see a 
> summary of 'open design questions' list from Mike (if Mike has 
> time/energy for that?) - many questions were raised, a good number of 
> them were answered, various changes done to SD but there's no good 
> summary of the current state of affairs.

I'm working on it. I started testing fairness, but ran into a snag.

What I was testing was my theory that SD can't possibly be fair to
sleeping tasks because the differential between long burn short sleep
tasks and long sleep short burn tasks is tossed at the end of every
rotation.  That theory seems to be true, but here's the snag...

2.6.21-rc6-sd-0.39, box is 3GHz P4/HT

tenpercent: tenpercent.c compiled to run 1 10% duty cycle task.
100ms and friends:  tenpercent.c hard coded for N ms burn + 1 usec sleep.

taskset -c 1 ./tenpercent
taskset -c 1 ./100ms (or ilk)

top - 10:47:57 up  3:11, 13 users,  load average: 1.65, 1.63, 2.50

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 7357 root       9   0  1568  440  360 R   92  0.0  10:55.01 1 100ms
 7356 root       1   0  1568  444  360 S    8  0.0   1:00.01 1 tenpercent
 5557 root       1   0  164m  21m 4876 S    0  2.1   1:58.90 0 Xorg
 6343 root       3   0  2376 1068  768 R    0  0.1   2:51.19 0 top

top - 11:05:16 up  3:29, 13 users,  load average: 1.52, 1.50, 1.81

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 7395 root       5   0  1568  444  360 R   90  0.0   8:54.25 1 100ms
 7394 root       0 -10  1568  440  360 S   10  0.0   1:00.21 1 tenpercent
 6343 root       3   0  2376 1068  768 R    0  0.1   3:04.16 0 top
    1 root       1   0   736  288  240 S    0  0.0   0:00.90 0 init

top - 11:20:58 up  3:44, 13 users,  load average: 1.89, 1.87, 1.78

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 7429 root       2 -10  1568  444  360 R   92  0.0  12:03.81 1 100ms
 7428 root       0 -10  1568  444  360 R    8  0.0   1:00.08 1 tenpercent
 6343 root       3   0  2376 1068  768 R    1  0.1   3:19.36 0 top
    1 root       1   0   736  288  240 S    0  0.0   0:00.90 0 init

top - 12:22:27 up  4:46, 13 users,  load average: 1.90, 1.92, 1.94

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 8235 root       1 -20  1568  444  360 R   95  0.0  19:31.20 1 100ms
 8234 root       0 -20  1568  444  360 S    5  0.0   1:00.01 1 tenpercent
 6343 root       3   0  2376 1068  768 R    1  0.1   4:24.24 0 top
 4926 root       1   0  1820  632  544 S    0  0.1   0:02.34 0 hald-addon-stor

top - 13:38:22 up  6:02, 13 users,  load average: 1.53, 1.51, 1.51

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 8643 root       5   0  1564  444  360 R   93  0.0  12:15.49 1 50ms
 8642 root       1   0  1564  444  360 S    7  0.0   1:00.28 1 tenpercent
 6343 root       3   0  2376 1080  768 R    0  0.1   5:27.22 0 top
    1 root       1   0   736  288  240 S    0  0.0   0:00.91 0 init

top - 14:02:39 up  6:26, 13 users,  load average: 1.75, 1.71, 1.56

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 8726 root       5   0  1564  444  360 R   94  0.0  15:19.07 1 8ms
 8727 root       1   0  1564  444  360 R    6  0.0   1:00.11 1 tenpercent
 5557 root       1   0  164m  21m 4632 S    0  2.1   3:20.92 0 Xorg
 6079 root       1   0 31584  17m  12m S    0  1.7   0:04.35 0 konsole

top - 16:22:01 up  8:45, 13 users,  load average: 1.73, 1.81, 1.60

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
  10622 root       1   0  1428  264  212 R   98  0.0  10:00.43 1 xx
  10621 root       1   0  1564  440  360 S    1  0.0   0:06.49 1 tenpercent
  10423 root       3   0  2248 1052  764 R    0  0.1   0:27.45 0 top
      1 root       1   0   736  288  240 S    0  0.0   0:00.91 0 init

xx.c just tries to terminate the rotation if it gets preempted, and
seems to succeed.  It usually isn't this bad, but every few starts it
gets this bad.  I thought it might be screwing up the calibration of
tenpercent if xx started first, but I plugged it into tenp.c (attached)
after the calibration, and still see this every few starts.  It always
gets more cpu than it should, but sometimes it's extreme.

I have yet to see tenpercent start at 1 percent usage in many many
tries, but I just repeated it with the attached in seven tries.

xx.c

#include <stdio.h>
#include <sys/time.h>

#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))

int main(void)
{
    struct timeval then, now;
    struct timespec t = {0, 1000}, r;

    for(;;) {
        int t1, t2;
        short i;

        if (gettimeofday(&then, 0))
            break;
        for (i = 1; i > 0; i++);
        if (gettimeofday(&now, 0))
            break;
        t2 = max(then.tv_usec, now.tv_usec);
        t1 = min(then.tv_usec, now.tv_usec);
        if (t2 - t1 >= 1000 && nanosleep(&t, &r))
            break;
    }
    return 0;
}

[-- Attachment #2: fairtest.c --]
[-- Type: text/x-csrc, Size: 4379 bytes --]

// gcc -O2 -o tenp tenp.c -lrt
// code from interbench.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
/*
 * Start $forks processes that run for 10% cpu time each. Set this to
 * 15 * number of cpus for best effect.
 */
int forks = 1;

unsigned long run_us = 1000000000, sleep_us;
unsigned long loops_per_ms;

void terminal_error(const char *name)
{
	fprintf(stderr, "\n");
	perror(name);
	exit (1);
}

unsigned long long get_nsecs(struct timespec *myts)
{
	if (clock_gettime(CLOCK_REALTIME, myts))
		terminal_error("clock_gettime");
	return (myts->tv_sec * 1000000000 + myts->tv_nsec );
}

void burn_loops(unsigned long loops)
{
	unsigned long i;

	/*
	 * We need some magic here to prevent the compiler from optimising
	 * this loop away. Otherwise trying to emulate a fixed cpu load
	 * with this loop will not work.
	 */
	for (i = 0 ; i < loops ; i++)
	     asm volatile("" : : : "memory");
}

/* Use this many usecs of cpu time */
void burn_usecs(unsigned long usecs)
{
	unsigned long ms_loops;

	ms_loops = loops_per_ms / 1000 * usecs;
	burn_loops(ms_loops);
}

void microsleep(unsigned long long usecs)
{
	struct timespec req, rem;

	rem.tv_sec = rem.tv_nsec = 0;

	req.tv_sec = usecs / 1000000;
	req.tv_nsec = (usecs - (req.tv_sec * 1000000)) * 1000;
continue_sleep:
	if ((nanosleep(&req, &rem)) == -1) {
		if (errno == EINTR) {
			if (rem.tv_sec || rem.tv_nsec) {
				req.tv_sec = rem.tv_sec;
				req.tv_nsec = rem.tv_nsec;
				goto continue_sleep;
			}
			goto out;
		}
		terminal_error("nanosleep");
	}
out:
	return;
}

/*
 * In an unoptimised loop we try to benchmark how many meaningless loops
 * per second we can perform on this hardware to fairly accurately
 * reproduce certain percentage cpu usage
 */
void calibrate_loop(void)
{
	unsigned long long start_time, loops_per_msec, run_time = 0,
		min_run_us = run_us;
	unsigned long loops;
	struct timespec myts;
	int i;

	printf("Calibrating loop\n");
	loops_per_msec = 1000000;
redo:
	/* Calibrate to within 1% accuracy */
	while (run_time > 1010000 || run_time < 990000) {
		loops = loops_per_msec;
		start_time = get_nsecs(&myts);
		burn_loops(loops);
		run_time = get_nsecs(&myts) - start_time;
		loops_per_msec = (1000000 * loops_per_msec / run_time ? :
			loops_per_msec);
	}

	/* Rechecking after a pause increases reproducibility */
	microsleep(1);
	loops = loops_per_msec;
	start_time = get_nsecs(&myts);
	burn_loops(loops);
	run_time = get_nsecs(&myts) - start_time;

	/* Tolerate 5% difference on checking */
	if (run_time > 1050000 || run_time < 950000)
		goto redo;
	loops_per_ms=loops_per_msec;
	printf("Calibrating sleep interval\n");
	microsleep(1);
	/* Find the smallest time interval close to 1ms that we can sleep */
	for (i = 0; i < 100; i++) {
		start_time=get_nsecs(&myts);
		microsleep(1000);
		run_time=get_nsecs(&myts)-start_time;
		run_time /= 1000;
		if (run_time < run_us && run_us > 1000)
			run_us = run_time;
	}
	/* Then set run_us to that duration and sleep_us to 9 x that */
	sleep_us = run_us * 9;
	printf("Calibrating run interval\n");
	microsleep(1);
	/* Do a few runs to see what really gets us run_us runtime */
	for (i = 0; i < 100; i++) {
		start_time=get_nsecs(&myts);
		burn_usecs(run_us);
		run_time=get_nsecs(&myts)-start_time;
		run_time /= 1000;
		if (run_time < min_run_us && run_time > run_us)
			min_run_us = run_time;
	}
	if (min_run_us < run_us)
		run_us = run_us * run_us / min_run_us;
	printf("Each fork will run for %lu usecs and sleep for %lu usecs\n",
		run_us, sleep_us);
}


#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))

void steal(void)
{
    struct timeval then, now;
    struct timespec t = {0, 1000}, r;

    for(;;) {
        int t1, t2;
        short i;

        if (gettimeofday(&then, 0))
            break;
        for (i = 1; i > 0; i++);
        if (gettimeofday(&now, 0))
            break;
        t2 = max(then.tv_usec, now.tv_usec);
        t1 = min(then.tv_usec, now.tv_usec);
        if (t2 - t1 >= 1000 && nanosleep(&t, &r))
            break;
    }
}

int main(void){
	int i, child;

	calibrate_loop();
	printf("starting %d forks\n", forks);
	for(i = 0; i < forks; i++){
		if(!(child = fork()))
			break;
	}
        if (child)
            steal();
	else while(1){
		burn_usecs(run_us);
		microsleep(sleep_us);
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SD scheduler testing hitch
  2007-04-07 16:20         ` SD scheduler testing hitch Mike Galbraith
@ 2007-04-07 17:17           ` Mike Galbraith
  2007-04-08  8:02             ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2007-04-07 17:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Dmitry Adamushko, Linux Kernel, Con Kolivas

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

On Sat, 2007-04-07 at 18:20 +0200, Mike Galbraith wrote:

> xx.c
> 
> #include <stdio.h>
> #include <sys/time.h>
> 
> #define max(a,b) ((a) > (b) ? (a) : (b))
> #define min(a,b) ((a) < (b) ? (a) : (b))
> 
> int main(void)
> {
>     struct timeval then, now;
>     struct timespec t = {0, 1000}, r;
> 
>     for(;;) {
>         int t1, t2;
>         short i;
> 
>         if (gettimeofday(&then, 0))
>             break;
>         for (i = 1; i > 0; i++);
>         if (gettimeofday(&now, 0))
>             break;
>         t2 = max(then.tv_usec, now.tv_usec);
>         t1 = min(then.tv_usec, now.tv_usec);
>         if (t2 - t1 >= 1000 && nanosleep(&t, &r))
>             break;
>     }
>     return 0;
> }

I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
here every time.  (ran as taskset -c 1 nice -n -10 ./fairtest)  The
starving 10% duty cycle task has trouble getting 1% CPU.

	-Mike

[-- Attachment #2: fairtest.c --]
[-- Type: text/x-csrc, Size: 4377 bytes --]

// gcc -O2 -o tenp tenp.c -lrt
// code from interbench.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
/*
 * Start $forks processes that run for 10% cpu time each. Set this to
 * 15 * number of cpus for best effect.
 */
int forks = 1;

unsigned long run_us = 1000000000, sleep_us;
unsigned long loops_per_ms;

void terminal_error(const char *name)
{
	fprintf(stderr, "\n");
	perror(name);
	exit (1);
}

unsigned long long get_nsecs(struct timespec *myts)
{
	if (clock_gettime(CLOCK_REALTIME, myts))
		terminal_error("clock_gettime");
	return (myts->tv_sec * 1000000000 + myts->tv_nsec );
}

void burn_loops(unsigned long loops)
{
	unsigned long i;

	/*
	 * We need some magic here to prevent the compiler from optimising
	 * this loop away. Otherwise trying to emulate a fixed cpu load
	 * with this loop will not work.
	 */
	for (i = 0 ; i < loops ; i++)
	     asm volatile("" : : : "memory");
}

/* Use this many usecs of cpu time */
void burn_usecs(unsigned long usecs)
{
	unsigned long ms_loops;

	ms_loops = loops_per_ms / 1000 * usecs;
	burn_loops(ms_loops);
}

void microsleep(unsigned long long usecs)
{
	struct timespec req, rem;

	rem.tv_sec = rem.tv_nsec = 0;

	req.tv_sec = usecs / 1000000;
	req.tv_nsec = (usecs - (req.tv_sec * 1000000)) * 1000;
continue_sleep:
	if ((nanosleep(&req, &rem)) == -1) {
		if (errno == EINTR) {
			if (rem.tv_sec || rem.tv_nsec) {
				req.tv_sec = rem.tv_sec;
				req.tv_nsec = rem.tv_nsec;
				goto continue_sleep;
			}
			goto out;
		}
		terminal_error("nanosleep");
	}
out:
	return;
}

/*
 * In an unoptimised loop we try to benchmark how many meaningless loops
 * per second we can perform on this hardware to fairly accurately
 * reproduce certain percentage cpu usage
 */
void calibrate_loop(void)
{
	unsigned long long start_time, loops_per_msec, run_time = 0,
		min_run_us = run_us;
	unsigned long loops;
	struct timespec myts;
	int i;

	printf("Calibrating loop\n");
	loops_per_msec = 1000000;
redo:
	/* Calibrate to within 1% accuracy */
	while (run_time > 1010000 || run_time < 990000) {
		loops = loops_per_msec;
		start_time = get_nsecs(&myts);
		burn_loops(loops);
		run_time = get_nsecs(&myts) - start_time;
		loops_per_msec = (1000000 * loops_per_msec / run_time ? :
			loops_per_msec);
	}

	/* Rechecking after a pause increases reproducibility */
	microsleep(1);
	loops = loops_per_msec;
	start_time = get_nsecs(&myts);
	burn_loops(loops);
	run_time = get_nsecs(&myts) - start_time;

	/* Tolerate 5% difference on checking */
	if (run_time > 1050000 || run_time < 950000)
		goto redo;
	loops_per_ms=loops_per_msec;
	printf("Calibrating sleep interval\n");
	microsleep(1);
	/* Find the smallest time interval close to 1ms that we can sleep */
	for (i = 0; i < 100; i++) {
		start_time=get_nsecs(&myts);
		microsleep(1000);
		run_time=get_nsecs(&myts)-start_time;
		run_time /= 1000;
		if (run_time < run_us && run_us > 1000)
			run_us = run_time;
	}
	/* Then set run_us to that duration and sleep_us to 9 x that */
	sleep_us = run_us * 9;
	printf("Calibrating run interval\n");
	microsleep(1);
	/* Do a few runs to see what really gets us run_us runtime */
	for (i = 0; i < 100; i++) {
		start_time=get_nsecs(&myts);
		burn_usecs(run_us);
		run_time=get_nsecs(&myts)-start_time;
		run_time /= 1000;
		if (run_time < min_run_us && run_time > run_us)
			min_run_us = run_time;
	}
	if (min_run_us < run_us)
		run_us = run_us * run_us / min_run_us;
	printf("Each fork will run for %lu usecs and sleep for %lu usecs\n",
		run_us, sleep_us);
}


#define max(a,b) ((a) > (b) ? (a) : (b))
#define min(a,b) ((a) < (b) ? (a) : (b))

void steal(void)
{
    struct timeval then, now;
    struct timespec t = {0, 500}, r;

    for(;;) {
        int t1, t2;
        short i;

        if (gettimeofday(&then, 0))
            break;
        for (i = 1; i > 0; i++);
        if (gettimeofday(&now, 0))
            break;
        t2 = max(then.tv_usec, now.tv_usec);
        t1 = min(then.tv_usec, now.tv_usec);
        if (t2 - t1 >= 500 && nanosleep(&t, &r))
            break;
    }
}

int main(void){
	int i, child;

	calibrate_loop();
	printf("starting %d forks\n", forks);
	for(i = 0; i < forks; i++){
		if(!(child = fork()))
			break;
	}
        if (child)
            steal();
	else while(1){
		burn_usecs(run_us);
		microsleep(sleep_us);
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SD scheduler testing hitch
  2007-04-07 17:17           ` Mike Galbraith
@ 2007-04-08  8:02             ` Mike Galbraith
  2007-04-09  0:14               ` Dmitry Adamushko
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2007-04-08  8:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Dmitry Adamushko, Linux Kernel, Con Kolivas

On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:

> I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> here every time.  (ran as taskset -c 1 nice -n -10 ./fairtest)  The
> starving 10% duty cycle task has trouble getting 1% CPU.

Hmm.  Playing with it some more today, it still happens, but it's not
very repeatable.  Something is odd.  I wonder if any SD using readers
will try it.

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SD scheduler testing hitch
  2007-04-08  8:02             ` Mike Galbraith
@ 2007-04-09  0:14               ` Dmitry Adamushko
  2007-04-09  0:23                 ` Dmitry Adamushko
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-09  0:14 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, Andrew Morton, Con Kolivas, Linux Kernel

On 08/04/07, Mike Galbraith <efault@gmx.de> wrote:
> On Sat, 2007-04-07 at 19:17 +0200, Mike Galbraith wrote:
>
> > I lowered the time to 500us, and ran at nice -10.. it starves tenpercent
> > here every time.  (ran as taskset -c 1 nice -n -10 ./fairtest)  The
> > starving 10% duty cycle task has trouble getting 1% CPU.
>

Something is odd, very odd indeed. But surprise-surprise, it does not
seem to be something merely SD-releated.

In short, the question is - can we always believe statistics being
provided by "top" (i.e. the way it's being collected by the kernel)?

The tests are below. Somewhere in the middle are thoughts on how HZ
and an interval of cpu usage by a given task may be connected to such
a behaviour.

The system: Pentiium 3 Coppermine 750 MHz (iThinkPad T21), 256 RAM.

I tested 3 configurations:

(1)  2.6.13-15 (default in SuSE 10)
(2)  2.6.19
(3)  2.6.21-rc5 + sd-0.39

TEST: just a tenp.c, i.e. without Mike's "steal" (either as xx.c or as
a part of modified fairtest.c) thingy, but

tenp    -- a tenp.c with a single running copy;
tenp2  -- a tenp.c with 2 (1 additionally forked) running copies
tenp15 - 15 copies (only for SD)


(1)  2.6.13-15

Tasks:  74 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.6% us,  0.7% sy,  0.0% ni, 90.4% id,  0.0% wa,  0.3% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5582 dimm      15   0  1460  428  348 S  6.0  0.2   0:02.03 tenp
 4047 messageb  17   0  3520 1584 1324 S  1.3  0.6   0:00.28 dbus-daemon


Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.9% us,  0.3% sy,  0.0% ni, 84.8% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5598 dimm      15   0  1460  428  348 S  7.2  0.2   0:01.42 tenp2
 5599 dimm      15   0  1460  432  352 S  6.9  0.2   0:00.87 tenp2
 5591 dimm      16   0  2108  988  764 R  0.3  0.4   0:00.47 top
    1 root      16   0   688  260  224 S  0.0  0.1   0:01.78 init

I repeated 7 times each of the tests (tenp and tenp2). All are ok.


Now an interesting part starts.

(2)  2.6.19

[ 2.1 ]

ks:  73 total,   1 running,  72 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.3% us,  0.7% sy,  0.0% ni, 98.0% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8312 root      15   0 27168  14m 2128 S  0.7  5.6   0:08.29 X
 8640 dimm      15   0 28656  13m   9m S  0.7  5.4   0:03.44 konsole
 8813 dimm      15   0  1460  432  352 S  0.3  0.2   0:00.32 tenp
    1 root      15   0   696  268  228 S  0.0  0.1   0:01.12 init

[ 2.2 ]

ks:  73 total,   3 running,  70 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.6% us,  0.7% sy,  0.0% ni, 92.7% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8816 dimm      15   0  1464  432  352 S  5.0  0.2   0:01.49 tenp
 8312 root      15   0 27168  14m 2128 R  1.7  5.6   0:09.08 X

See a difference between [ 2.1 ] and [ 2.2 ] ?  [ 2.2 ] (which is ok)
has happened 3 out of 10 times.

Now for tenp2

[ 2.3 ]

ks:  74 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.6% us,  0.3% sy,  0.0% ni, 85.1% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8850 dimm      15   0  1460  432  352 S  6.6  0.2   0:01.32 tenp2
 8851 dimm      15   0  1460  112   32 S  6.3  0.0   0:00.77 tenp2
 8312 root      15   0 27168  14m 2128 S  0.7  5.6   0:11.73 X

[ 2.4 ]

ks:  74 total,   2 running,  72 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.3% us,  0.3% sy,  0.0% ni, 96.3% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8312 root      15   0 27168  14m 2128 S  2.0  5.6   0:12.97 X
 8640 dimm      15   0 28748  13m   9m R  0.7  5.4   0:07.22 konsole
 8532 dimm      18   0  2476  416  268 S  0.3  0.2   0:00.04 gpg-agent
 8852 dimm      15   0  2116  996  772 R  0.3  0.4   0:00.27 top
 8859 dimm      15   0  1460  432  352 S  0.3  0.2   0:00.44 tenp2
 8860 dimm      15   0  1460  112   32 S  0.3  0.0   0:00.02 tenp2
    1 root      15   0   696  268  228 S  0.0  0.1   0:01.12 init

Again, [ 2.3 ] took place only 3 times.

Some observations:

/1/  for the "ok" ( [ 2.2 ] and [ 2.3 ] ) cases, the "will run" and
"will sleep" times from tenp's calibration output look /higher/ than
on average :

e.g.
Each fork will run for 5863 usecs and sleep for 52767 usecs

v.s. something in between

Each fork will run for 2392 usecs and sleep for 21528 usecs
Each fork will run for 3880 usecs and sleep for 34920 usecs

for the most part of cases (when tenp's cpu% ~0.3).


/2/  HZ = 250 for 2.6.19 and I think it was still 1000 for 2.6.13
(arghh.. forgot to check and would like to avoid a reboot in this
already late hour... but I believe it was still the time of 1000 by
default).

=============

(*)

HZ == 250 ==> timer_tick is once in 4 ms. So - "will run for" < 4 ms -
may come well unaccounted? :o)

The funny thing is (at least in theory) - if a task is using CPU in
portions < 1/HZ s. and specially shifted against timer interrupts ->
scheduler_tick() - its time_slice isn't decreasing at all (very
theoretically) or just more or less slower (with another load a moment
when the task is running should drift wrt moments of timer interrupts
and hit them from time to time -> get accounted).

=============

(3)  2.6.21-rc5 + sd-0.39

Here the results are similar to (2).

Cpu(s): 10.6% us,  0.0% sy,  0.0% ni, 89.4% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3624 dimm      20   0  1460  428  352 S  5.0  0.2   0:01.50 tenp2
 3625 dimm      20   0  1460  108   32 S  4.6  0.0   0:00.76 tenp2
 2797 root      20   0 27112  13m 2128 S  0.3  5.6   0:18.62 X

Cpu(s):  2.0% us,  0.3% sy,  0.0% ni, 97.7% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3738 dimm      20   0  1460  432  352 S  0.3  0.2   0:00.60 tenp2
 3739 dimm      20   0  1460  112   32 S  0.3  0.0   0:00.18 tenp2

and now let's run both tenp and tenp15

look at the "tenp" below

/1/

 Tasks:  82 total,  10 running,  72 sleeping,   0 stopped,   0 zombie

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3760 dimm      31   0  1464  112   32 S  7.9  0.0   0:01.82 tenp15
 3773 dimm      31   0  1460  432  352 R  7.2  0.2   0:01.24 tenp
 3758 dimm      31   0  1464  112   32 S  6.9  0.0   0:01.73 tenp15
 3759 dimm      31   0  1464  112   32 S  6.9  0.0   0:01.65 tenp15
 3757 dimm      31   0  1464  432  352 R  6.2  0.2   0:01.89 tenp15
 3762 dimm      31   0  1464  112   32 R  6.2  0.0   0:01.70 tenp15
 3763 dimm      31   0  1464  112   32 S  6.2  0.0   0:01.75 tenp15
 3765 dimm      31   0  1464  112   32 S  6.2  0.0   0:01.82 tenp15
 3767 dimm      31   0  1464  112   32 S  6.2  0.0   0:01.73 tenp15
 3764 dimm      31   0  1464  112   32 R  5.9  0.0   0:01.70 tenp15
 3769 dimm      31   0  1464  112   32 R  5.9  0.0   0:01.65 tenp15
 3761 dimm      31   0  1464  112   32 S  5.6  0.0   0:01.66 tenp15
 3766 dimm      31   0  1464  112   32 R  5.6  0.0   0:01.68 tenp15
 3768 dimm      31   0  1464  112   32 R  5.6  0.0   0:01.77 tenp15
 3771 dimm      31   0  1464  112   32 R  5.6  0.0   0:01.72 tenp15
 3770 dimm      31   0  1464  112   32 S  5.2  0.0   0:01.54 tenp15
 3178 dimm      20   0 28996  13m   9m R  0.7  5.6   0:17.49 konsole


and now let's kill tenp15 so tenp remains alone.

Cpu(s):  0.0% us,  0.3% sy,  0.0% ni, 99.7% id,  0.0% wa,  0.0% hi,  0.0% si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3773 dimm      20   0  1460  432  352 S  0.3  0.2   0:03.59 tenp
 3775 dimm      20   0  2120 1000  772 R  0.3  0.4   0:00.19 top

strange.

I doesn't happen for tenp15 (each always consumes ~6%). I repeated
about 10 times.

Well, it's a late hour, so maybe I'm missing something... but it does
look to be HZ and "will run" time interval related issue. Like
described in (*). Or maybe we both observe similar situations but have
different reasons behind them.


>         -Mike

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SD scheduler testing hitch
  2007-04-09  0:14               ` Dmitry Adamushko
@ 2007-04-09  0:23                 ` Dmitry Adamushko
  2007-04-09  5:54                   ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Adamushko @ 2007-04-09  0:23 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Linux Kernel

> [...]
> Well, it's a late hour, so maybe I'm missing something... but it does
> look to be HZ and "will run" time interval related issue. Like
> described in (*). Or maybe we both observe similar situations but have
> different reasons behind them.

I meant that account_user_time() is also called from timer_ISR ->
update_process_times() like scheduler_tick(). So if task's running
intervals are shorter than 1/HZ, it's not always accounted --> so cpu%
may be wrong for such a task...


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SD scheduler testing hitch
  2007-04-09  0:23                 ` Dmitry Adamushko
@ 2007-04-09  5:54                   ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2007-04-09  5:54 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Linux Kernel

On Mon, 2007-04-09 at 02:23 +0200, Dmitry Adamushko wrote:
> > [...]
> > Well, it's a late hour, so maybe I'm missing something... but it does
> > look to be HZ and "will run" time interval related issue. Like
> > described in (*). Or maybe we both observe similar situations but have
> > different reasons behind them.
> 
> I meant that account_user_time() is also called from timer_ISR ->
> update_process_times() like scheduler_tick(). So if task's running
> intervals are shorter than 1/HZ, it's not always accounted --> so cpu%
> may be wrong for such a task...

I think you're right wrt percentages, and that's making accurate
measurement of SD fairness difficult.  However, total runtime for user
tasks should be pretty accurate for kernels that use nanoseconds,
because they're added every time a tasks passes through schedule().

BTW, the aberration I noticed with my unverified "testcase" does _seem_
to be repeatable here.  Once behavior changes, after a reboot the
repeatability returns.  I have no idea what's going on, but something is
sure fishy.

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-04-09  5:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-04 14:04 [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Dmitry Adamushko
2007-04-04 14:15 ` Ingo Molnar
2007-04-04 15:23   ` Dmitry Adamushko
2007-04-04 20:05   ` [PATCH] " Dmitry Adamushko
2007-04-07  0:03     ` Andrew Morton
2007-04-07  9:16       ` Dmitry Adamushko
2007-04-07  9:24       ` Ingo Molnar
2007-04-07 16:20         ` SD scheduler testing hitch Mike Galbraith
2007-04-07 17:17           ` Mike Galbraith
2007-04-08  8:02             ` Mike Galbraith
2007-04-09  0:14               ` Dmitry Adamushko
2007-04-09  0:23                 ` Dmitry Adamushko
2007-04-09  5:54                   ` Mike Galbraith
2007-04-07  9:19     ` [PATCH] [sched] redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox