Re: Fwd: Re: [patch][rfc] quell interactive feeding frenzy

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Fwd: Re: [patch][rfc] quell interactive feeding frenzy
       [not found] <200604112100.28725.kernel@kolivas.org>
@ 2006-04-11 17:03 ` Al Boldi
  2006-04-11 22:56   ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-11 17:03 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, Mike Galbraith

Con Kolivas wrote:
> Hi Al

Hi Con!

> On Tuesday 11 April 2006 00:43, Al Boldi wrote:
> > After that the loadavg starts to wrap.
> > And even then it is possible to login.
> > And that's not with the default 2.6 scheduler, but rather w/ spa.
>
> Since you seem to use plugsched, I wonder if you could tell me how does
> current staircase perform with a load like that?

With plugsched-2.6.16 your staircase sched reaches about 40 then slows down, 
maxing around 100.  Setting sched_compute=1 causes console lock-ups.

With staircase14.2-test3 it reaches around 300 then slows down, halting at 
around 500.

Your scheduler seems to be tuned for single-user multi-tasking, i.e. 
concurrent tasks around 10, where its aggressive nature is sustained by a 
short run-queue.  Once you go above 50, this aggressiveness starts to  
express itself as very jumpy.

This is of course very cpu/mem/ctxt dependent and it would be great, if your 
scheduler could maybe do some simple on-the-fly benchmarking as it 
reschedules, thus adjusting this aggressiveness depending on its 
sustainability.

Thanks!

--
Al

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-11 17:03 ` Fwd: Re: [patch][rfc] quell interactive feeding frenzy Al Boldi
@ 2006-04-11 22:56   ` Con Kolivas
  2006-04-12  5:41     ` Al Boldi
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-11 22:56 UTC (permalink / raw)
  To: Al Boldi, ck list; +Cc: linux-kernel, Mike Galbraith

On Wednesday 12 April 2006 03:03, Al Boldi wrote:
> With plugsched-2.6.16 your staircase sched reaches about 40 then slows
> down, maxing around 100.  Setting sched_compute=1 causes console lock-ups.

Which is fine because sched_compute isn't designed for heavily multithreaded 
usage.

> With staircase14.2-test3 it reaches around 300 then slows down, halting at
> around 500.

Oh that's good because staircase14.2_test3 is basically staircase15 which is 
in the current plugsched (ie newer than the staircase you tested in 
plugsched-2.6.16 above). So it tolerates a load of up to 500 on single cpu? 
That seems very robust to me. 

> Your scheduler seems to be tuned for single-user multi-tasking, i.e.
> concurrent tasks around 10, where its aggressive nature is sustained by a
> short run-queue.  Once you go above 50, this aggressiveness starts to
> express itself as very jumpy.

Oh no it's nothing like "tuned for single-user multi tasking". It seems a 
common misconception because interactivity is a prime concern for staircase 
but the idea is that we should be able to do interactivity without 
sacrificing fairness. The same mechanism that is responsible for maintaining 
fairness is also responsible for creating its interactivity. That's what I 
mean by "interactive by design", and what makes it different from extracting 
interactivity out of other designs that have some form of estimator to add 
unfairness to create that interactivity.

> This is of course very cpu/mem/ctxt dependent and it would be great, if
> your scheduler could maybe do some simple on-the-fly benchmarking as it
> reschedules, thus adjusting this aggressiveness depending on its
> sustainability.

I know you're _very_ keen on the idea of some autotuning but I think this is 
the wrong thing to autotune. The whole point of staircase is it's a simple 
design without any interactivity estimator. It uses pure cpu accounting to 
change priority and that is a percentage which is effectively already tuned 
to the underlying cpu. Any benchmarking/aggressiveness "tuning" would undo 
the (effectively) very simple design. 

Feel free to look at the code. Sleep for time Y, increase priority by 
Y/RR_INTERVAL. Run for time X, drop priority by X/RR_INTERVAL. If it drops to 
lowest priority it then jumps back up to best priority again (to prevent it 
being "batch starved").

Thanks very much for testing :)

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-11 22:56   ` Con Kolivas
@ 2006-04-12  5:41     ` Al Boldi
  2006-04-12  6:22       ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-12  5:41 UTC (permalink / raw)
  To: Con Kolivas, ck list; +Cc: linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wednesday 12 April 2006 03:03, Al Boldi wrote:
> > With plugsched-2.6.16 your staircase sched reaches about 40 then slows
> > down, maxing around 100.  Setting sched_compute=1 causes console
> > lock-ups.
>
> Which is fine because sched_compute isn't designed for heavily
> multithreaded usage.

What's it good for?

> > With staircase14.2-test3 it reaches around 300 then slows down, halting
> > at around 500.
>
> Oh that's good because staircase14.2_test3 is basically staircase15 which
> is in the current plugsched (ie newer than the staircase you tested in
> plugsched-2.6.16 above). So it tolerates a load of up to 500 on single
> cpu? That seems very robust to me.

Yes, better than the default 2.6 scheduler.

> > Your scheduler seems to be tuned for single-user multi-tasking, i.e.
> > concurrent tasks around 10, where its aggressive nature is sustained by
> > a short run-queue.  Once you go above 50, this aggressiveness starts to
> > express itself as very jumpy.
>
> Oh no it's nothing like "tuned for single-user multi tasking". It seems a
> common misconception because interactivity is a prime concern for
> staircase but the idea is that we should be able to do interactivity
> without sacrificing fairness.

Agreed.

> The same mechanism that is responsible for
> maintaining fairness is also responsible for creating its interactivity.
> That's what I mean by "interactive by design", and what makes it different
> from extracting interactivity out of other designs that have some form of
> estimator to add unfairness to create that interactivity.

Yes, but staircase isn't really fair, and it's definitely not smooth.  You 
are trying to get ia by aggressively attacking priority which kills 
smoothness, and is only fair with a short run-queue.

> > This is of course very cpu/mem/ctxt dependent and it would be great, if
> > your scheduler could maybe do some simple on-the-fly benchmarking as it
> > reschedules, thus adjusting this aggressiveness depending on its
> > sustainability.
>
> I know you're _very_ keen on the idea of some autotuning but I think this
> is the wrong thing to autotune. The whole point of staircase is it's a
> simple design without any interactivity estimator. It uses pure cpu
> accounting to change priority and that is a percentage which is
> effectively already tuned to the underlying cpu. Any
> benchmarking/aggressiveness "tuning" would undo the (effectively) very
> simple design.

I like simple designs.  They tend to keep things to the point and aid 
efficiency.  But staircase doesn't look efficient to me under heavy load, 
and I would think this may be easily improved.

> Feel free to look at the code. Sleep for time Y, increase priority by
> Y/RR_INTERVAL. Run for time X, drop priority by X/RR_INTERVAL. If it drops
> to lowest priority it then jumps back up to best priority again (to
> prevent it being "batch starved").

Looks simple enough, and should work for short run'q's, but this looks 
unsustainable for long run'q's, due to the unconditional jump from lowest to 
best prio.  Making it conditional and maybe moderating X,Y,RR_INTERVAL could 
be helpful.

Also, can you export  lowest/best prio as well as timeslice and friends to 
procfs/sysfs?

> Thanks very much for testing :)

Thank you!

--
Al


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  5:41     ` Al Boldi
@ 2006-04-12  6:22       ` Con Kolivas
  2006-04-12  8:17         ` Al Boldi
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-12  6:22 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Wed, 12 Apr 2006 03:41 pm, Al Boldi wrote:
> Con Kolivas wrote:
> > Which is fine because sched_compute isn't designed for heavily
> > multithreaded usage.
>
> What's it good for?

Single heavily cpu bound computationally intensive tasks (think rendering 
etc).

> > Oh that's good because staircase14.2_test3 is basically staircase15 which
> > is in the current plugsched (ie newer than the staircase you tested in
> > plugsched-2.6.16 above). So it tolerates a load of up to 500 on single
> > cpu? That seems very robust to me.
>
> Yes, better than the default 2.6 scheduler.
>
> > > Your scheduler seems to be tuned for single-user multi-tasking, i.e.
> > > concurrent tasks around 10, where its aggressive nature is sustained by
> > > a short run-queue.  Once you go above 50, this aggressiveness starts to
> > > express itself as very jumpy.
> >
> > Oh no it's nothing like "tuned for single-user multi tasking". It seems a
> > common misconception because interactivity is a prime concern for
> > staircase but the idea is that we should be able to do interactivity
> > without sacrificing fairness.
>
> Agreed.
>
> > The same mechanism that is responsible for
> > maintaining fairness is also responsible for creating its interactivity.
> > That's what I mean by "interactive by design", and what makes it
> > different from extracting interactivity out of other designs that have
> > some form of estimator to add unfairness to create that interactivity.
>
> Yes, but staircase isn't really fair, and it's definitely not smooth.  You
> are trying to get ia by aggressively attacking priority which kills
> smoothness, and is only fair with a short run-queue.

Sorry I don't understand what you mean. Why do you say it's not fair (got a 
testcase?). What do you mean by "definitely not smooth". What is smoothness 
and on what workloads is it not smooth? Also by ia you mean what? 

> > I know you're _very_ keen on the idea of some autotuning but I think this
> > is the wrong thing to autotune. The whole point of staircase is it's a
> > simple design without any interactivity estimator. It uses pure cpu
> > accounting to change priority and that is a percentage which is
> > effectively already tuned to the underlying cpu. Any
> > benchmarking/aggressiveness "tuning" would undo the (effectively) very
> > simple design.
>
> I like simple designs.  They tend to keep things to the point and aid
> efficiency.  But staircase doesn't look efficient to me under heavy load,
> and I would think this may be easily improved.

Again I don't understand. Just how heavy a load is heavy? Your testcases are 
already in what I would call stratospheric range. I don't personally think a 
cpu scheduler should be optimised for load infinity. And how are you defining 
efficient? You say it doesn't "look" efficient? What "looks" inefficient 
about it?

> > Feel free to look at the code. Sleep for time Y, increase priority by
> > Y/RR_INTERVAL. Run for time X, drop priority by X/RR_INTERVAL. If it
> > drops to lowest priority it then jumps back up to best priority again (to
> > prevent it being "batch starved").
>
> Looks simple enough, and should work for short run'q's, but this looks
> unsustainable for long run'q's, due to the unconditional jump from lowest
> to best prio.

Looks? How? You've shown what I consider very long runqueues work fine 
already.

> Making it conditional and maybe moderating X,Y,RR_INTERVAL 
> could be helpful.

I think over all meaningful loads, and into absurdly high load ranges it 
works. I don't think undoing the incredible simplicity that works over all 
that range should be done to optimise for loads even greater than that.

> Also, can you export  lowest/best prio as well as timeslice and friends to
> procfs/sysfs?

You want tunables? The only tunable in staircase is rr_interval which (in -ck) 
has an on/off for big/small (sched_compute) since most other numbers in 
between (in my experience) are pretty meaningless. I could export rr_interval 
directly instead... I've not seen a good argument for doing that. Got one? 
However there are no other tunables at all (just look at the code). All tasks 
of any nice level have available the whole priority range from 100-139 which 
appears as PRIO 0-39 on top. Limiting that (again) changes the semantics.

> > Thanks very much for testing :)
>
> Thank you!

And another round of thanks :) But many more questions.

--
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  6:22       ` Con Kolivas
@ 2006-04-12  8:17         ` Al Boldi
  2006-04-12  9:36           ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-12  8:17 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wed, 12 Apr 2006 03:41 pm, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Which is fine because sched_compute isn't designed for heavily
> > > multithreaded usage.
> >
> > What's it good for?
>
> Single heavily cpu bound computationally intensive tasks (think rendering
> etc).

Why do you need a switch for that?

> > > The same mechanism that is responsible for
> > > maintaining fairness is also responsible for creating its
> > > interactivity. That's what I mean by "interactive by design", and what
> > > makes it different from extracting interactivity out of other designs
> > > that have some form of estimator to add unfairness to create that
> > > interactivity.
> >
> > Yes, but staircase isn't really fair, and it's definitely not smooth. 
> > You are trying to get ia by aggressively attacking priority which kills
> > smoothness, and is only fair with a short run-queue.
>
> Sorry I don't understand what you mean. Why do you say it's not fair (got
> a testcase?). What do you mean by "definitely not smooth". What is
> smoothness and on what workloads is it not smooth? Also by ia you mean
> what?

ia=interactivity i.e: responsiveness under high load.
smooth=not jumpy i.e: run '# gears & morph3d & reflect &' w/o stutter
fair=non hogging i.e: spreading cpu-load across tasks evenly (top d.1)

> > > I know you're _very_ keen on the idea of some autotuning but I think
> > > this is the wrong thing to autotune. The whole point of staircase is
> > > it's a simple design without any interactivity estimator. It uses pure
> > > cpu accounting to change priority and that is a percentage which is
> > > effectively already tuned to the underlying cpu. Any
> > > benchmarking/aggressiveness "tuning" would undo the (effectively) very
> > > simple design.
> >
> > I like simple designs.  They tend to keep things to the point and aid
> > efficiency.  But staircase doesn't look efficient to me under heavy
> > load, and I would think this may be easily improved.
>
> Again I don't understand. Just how heavy a load is heavy? Your testcases
> are already in what I would call stratospheric range. I don't personally
> think a cpu scheduler should be optimised for load infinity. And how are
> you defining efficient? You say it doesn't "look" efficient? What "looks"
> inefficient about it?

The idea here is to expose inefficiencies by driving the system into 
saturation, and although staircase is more efficient than the default 2.6 
scheduler, it is obviously less efficient than spa.

> > Also, can you export  lowest/best prio as well as timeslice and friends
> > to procfs/sysfs?
>
> You want tunables? The only tunable in staircase is rr_interval which (in
> -ck) has an on/off for big/small (sched_compute) since most other numbers
> in between (in my experience) are pretty meaningless. I could export
> rr_interval directly instead... I've not seen a good argument for doing
> that. Got one? 

Smoothness control, maybe?

> However there are no other tunables at all (just look at
> the code). All tasks of any nice level have available the whole priority
> range from 100-139 which appears as PRIO 0-39 on top. Limiting that
> (again) changes the semantics.

Yes, limiting this could change the semantics for the sake of fairness, it's 
up to you.

> And another round of thanks :) But many more questions.

No problem.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  8:17         ` Al Boldi
@ 2006-04-12  9:36           ` Con Kolivas
  2006-04-12 10:39             ` Al Boldi
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-12  9:36 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Wednesday 12 April 2006 18:17, Al Boldi wrote:
> Con Kolivas wrote:
> > Single heavily cpu bound computationally intensive tasks (think rendering
> > etc).
>
> Why do you need a switch for that?

Because avoiding doing need_resched and reassessing priority at less regular 
intervals means less overhead, and there is always something else running on 
a pc. At low loads the longer timeslices and delayed preemption contribute 
considerably to cache warmth and throughput. Comparing staircase's 
sched_compute mode on kernbench at "optimal loads" (make -j4 x num_cpus) 
showed the best throughput of all the schedulers tested.

> > Sorry I don't understand what you mean. Why do you say it's not fair (got
> > a testcase?). What do you mean by "definitely not smooth". What is
> > smoothness and on what workloads is it not smooth? Also by ia you mean
> > what?
>
> ia=interactivity i.e: responsiveness under high load.
> smooth=not jumpy i.e: run '# gears & morph3d & reflect &' w/o stutter

Installed and tested here just now. They run smoothly concurrently here. Are 
you testing on staircase15?

> fair=non hogging i.e: spreading cpu-load across tasks evenly (top d.1)

Only unblocked processes/threads where one depends on the other don't get 
equal share, which is as broken a testcase as relying on sched_yield. I have 
not seen a testcase demonstrating unfairness on current staircase. top shows 
me fair cpu usage.

> > Again I don't understand. Just how heavy a load is heavy? Your testcases
> > are already in what I would call stratospheric range. I don't personally
> > think a cpu scheduler should be optimised for load infinity. And how are
> > you defining efficient? You say it doesn't "look" efficient? What "looks"
> > inefficient about it?
>
> The idea here is to expose inefficiencies by driving the system into
> saturation, and although staircase is more efficient than the default 2.6
> scheduler, it is obviously less efficient than spa.

Where do you stop calling something saturation and start calling it absurd? By 
your reckoning staircase is stable to loads of 300 on one cpu. spa being 
stable to higher loads is hardly comparable given the interactivity disparity 
between it and staircase. A compromise is one that does both very well; not 
one perfectly and the other poorly.

> > You want tunables? The only tunable in staircase is rr_interval which (in
> > -ck) has an on/off for big/small (sched_compute) since most other numbers
> > in between (in my experience) are pretty meaningless. I could export
> > rr_interval directly instead... I've not seen a good argument for doing
> > that. Got one?
>
> Smoothness control, maybe?

Have to think about that one. I'm not seeing a smoothness issue.

> > However there are no other tunables at all (just look at
> > the code). All tasks of any nice level have available the whole priority
> > range from 100-139 which appears as PRIO 0-39 on top. Limiting that
> > (again) changes the semantics.
>
> Yes, limiting this could change the semantics for the sake of fairness,
> it's up to you.

There is no problem with fairness that I am aware of.

Thanks!

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  9:36           ` Con Kolivas
@ 2006-04-12 10:39             ` Al Boldi
  2006-04-12 11:27               ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-12 10:39 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wednesday 12 April 2006 18:17, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Single heavily cpu bound computationally intensive tasks (think
> > > rendering etc).
> >
> > Why do you need a switch for that?
>
> Because avoiding doing need_resched and reassessing priority at less
> regular intervals means less overhead, and there is always something else
> running on a pc. At low loads the longer timeslices and delayed preemption
> contribute considerably to cache warmth and throughput. Comparing
> staircase's sched_compute mode on kernbench at "optimal loads" (make -j4 x
> num_cpus) showed the best throughput of all the schedulers tested.

Great!

> > > Sorry I don't understand what you mean. Why do you say it's not fair
> > > (got a testcase?). What do you mean by "definitely not smooth". What
> > > is smoothness and on what workloads is it not smooth? Also by ia you
> > > mean what?
> >
> > ia=interactivity i.e: responsiveness under high load.
> > smooth=not jumpy i.e: run '# gears & morph3d & reflect &' w/o stutter
>
> Installed and tested here just now. They run smoothly concurrently here.
> Are you testing on staircase15?

staircase14.2-test3.  Are you testing w/ DRM?  If not then all mesa requests 
will be queued into X, and then runs as one task (check top d.1)

> > fair=non hogging i.e: spreading cpu-load across tasks evenly (top d.1)
>
> Only unblocked processes/threads where one depends on the other don't get
> equal share, which is as broken a testcase as relying on sched_yield. I
> have not seen a testcase demonstrating unfairness on current staircase.
> top shows me fair cpu usage.

Try ping -A (10x).  top d.1 should show skewed times.  If you have a fast 
machine, you may have to increase the load.

> > > Again I don't understand. Just how heavy a load is heavy? Your
> > > testcases are already in what I would call stratospheric range. I
> > > don't personally think a cpu scheduler should be optimised for load
> > > infinity. And how are you defining efficient? You say it doesn't
> > > "look" efficient? What "looks" inefficient about it?
> >
> > The idea here is to expose inefficiencies by driving the system into
> > saturation, and although staircase is more efficient than the default
> > 2.6 scheduler, it is obviously less efficient than spa.
>
> Where do you stop calling something saturation and start calling it
> absurd? By your reckoning staircase is stable to loads of 300 on one cpu.
> spa being stable to higher loads is hardly comparable given the
> interactivity disparity between it and staircase. A compromise is one that
> does both very well; not one perfectly and the other poorly.
>
> > > You want tunables? The only tunable in staircase is rr_interval which
> > > (in -ck) has an on/off for big/small (sched_compute) since most other
> > > numbers in between (in my experience) are pretty meaningless. I could
> > > export rr_interval directly instead... I've not seen a good argument
> > > for doing that. Got one?
> >
> > Smoothness control, maybe?
>
> Have to think about that one. I'm not seeing a smoothness issue.
>
> > > However there are no other tunables at all (just look at
> > > the code). All tasks of any nice level have available the whole
> > > priority range from 100-139 which appears as PRIO 0-39 on top.
> > > Limiting that (again) changes the semantics.
> >
> > Yes, limiting this could change the semantics for the sake of fairness,
> > it's up to you.
>
> There is no problem with fairness that I am aware of.

Let's see after you retry the tests.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 10:39             ` Al Boldi
@ 2006-04-12 11:27               ` Con Kolivas
  2006-04-12 15:25                 ` Al Boldi
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-12 11:27 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Wednesday 12 April 2006 20:39, Al Boldi wrote:
> Con Kolivas wrote:
> > Installed and tested here just now. They run smoothly concurrently here.
> > Are you testing on staircase15?
>
> staircase14.2-test3.  Are you testing w/ DRM?  If not then all mesa
> requests will be queued into X, and then runs as one task (check top d.1)

Nvidia driver; all separate tasks in top.

> Try ping -A (10x).  top d.1 should show skewed times.  If you have a fast
> machine, you may have to increase the load.

Ran for a bit over 10 mins outside of X to avoid other tasks influencing 
results. I was too lazy to go to init 1.

ps -eALo pid,spid,user,priority,ni,pcpu,vsize,time,args

15648 15648 root      39   0  9.2  1740 00:01:03 ping -A localhost
15649 15649 root      28   0  9.8  1740 00:01:06 ping -A localhost
15650 15650 root      39   0  9.9  1744 00:01:07 ping -A localhost
15651 15651 root      39   0  9.3  1740 00:01:03 ping -A localhost
15652 15652 root      39   0 10.3  1740 00:01:10 ping -A localhost
15653 15653 root      39   0 10.8  1740 00:01:13 ping -A localhost
15654 15654 root      39   0 10.0  1740 00:01:08 ping -A localhost
15655 15655 root      39   0 10.5  1740 00:01:11 ping -A localhost
15656 15656 root      39   0  9.9  1740 00:01:07 ping -A localhost
15657 15657 root      39   0 10.2  1740 00:01:09 ping -A localhost

mean 68.7 seconds

range 63-73 seconds.

For a load that wakes up so frequently for such a short period of time I think 
that is pretty fair cpu distribution over 10 mins. Over shorter periods top 
is hopeless at representing accurate cpu usage, especially at low HZ settings 
of the kernel. You can see the current cpu distribution on ps there is pretty 
consistent across the tasks and gives what I consider quite a fair cpu 
distribution over 10 mins.

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 11:27               ` Con Kolivas
@ 2006-04-12 15:25                 ` Al Boldi
  2006-04-13 11:51                   ` Con Kolivas
  2006-04-16  6:02                   ` Con Kolivas
  0 siblings, 2 replies; 27+ messages in thread
From: Al Boldi @ 2006-04-12 15:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wednesday 12 April 2006 20:39, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Installed and tested here just now. They run smoothly concurrently
> > > here. Are you testing on staircase15?
> >
> > staircase14.2-test3.  Are you testing w/ DRM?  If not then all mesa
> > requests will be queued into X, and then runs as one task (check top
> > d.1)
>
> Nvidia driver; all separate tasks in top.

On a 400MhzP2 i810drm w/ kernel HZ=1000 it stutters.
You may want to compensate for nvidia w/ a few cpu-hogs.
How many gears fps do you get?

> > Try ping -A (10x).  top d.1 should show skewed times.  If you have a
> > fast machine, you may have to increase the load.
>
> Ran for a bit over 10 mins outside of X to avoid other tasks influencing
> results. I was too lazy to go to init 1.
>
> ps -eALo pid,spid,user,priority,ni,pcpu,vsize,time,args
>
> 15648 15648 root      39   0  9.2  1740 00:01:03 ping -A localhost
> 15649 15649 root      28   0  9.8  1740 00:01:06 ping -A localhost
> 15650 15650 root      39   0  9.9  1744 00:01:07 ping -A localhost
> 15651 15651 root      39   0  9.3  1740 00:01:03 ping -A localhost
> 15652 15652 root      39   0 10.3  1740 00:01:10 ping -A localhost
> 15653 15653 root      39   0 10.8  1740 00:01:13 ping -A localhost
> 15654 15654 root      39   0 10.0  1740 00:01:08 ping -A localhost
> 15655 15655 root      39   0 10.5  1740 00:01:11 ping -A localhost
> 15656 15656 root      39   0  9.9  1740 00:01:07 ping -A localhost
> 15657 15657 root      39   0 10.2  1740 00:01:09 ping -A localhost
>
> mean 68.7 seconds
>
> range 63-73 seconds.

Could this 10s skew be improved to around 1s to aid smoothness?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 15:25                 ` Al Boldi
@ 2006-04-13 11:51                   ` Con Kolivas
  2006-04-14  3:16                     ` Al Boldi
  2006-04-16  6:02                   ` Con Kolivas
  1 sibling, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-13 11:51 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Thursday 13 April 2006 01:25, Al Boldi wrote:
> Con Kolivas wrote:
> > Nvidia driver; all separate tasks in top.
>
> On a 400MhzP2 i810drm w/ kernel HZ=1000 it stutters.
> You may want to compensate for nvidia w/ a few cpu-hogs.

I tried adding cpu hogs and it gets extremely slow very soon but still doesn't 
stutter here.

> How many gears fps do you get?

When those 3 are running concurrently (without any other cpu hogs) gears is 
showing 317 fps.

> > range 63-73 seconds.
>
> Could this 10s skew be improved to around 1s to aid smoothness?

I'm happy to try... but I doubt it. 10% difference over 10 tasks over 10 mins 
of tasks of that wake/sleep nature is pretty good IMO. I'll see if there's 
anywhere else I can make the cpu accounting any better. 

As an aside, note that sched_clock and nanosecond timing with TSC isn't 
actually used if you use the pm timer which undoes any high res accounting 
the cpu scheduler can do (I noticed this when playing with pm timer that 
sched_clock just returns jiffies resolution instead of real nanosecond res). 
This could undo any smoothness that good cpu accounting can do.

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-13 11:51                   ` Con Kolivas
@ 2006-04-14  3:16                     ` Al Boldi
  2006-04-15  7:05                       ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-14  3:16 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]

Con Kolivas wrote:
> On Thursday 13 April 2006 01:25, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Nvidia driver; all separate tasks in top.
> >
> > On a 400MhzP2 i810drm w/ kernel HZ=1000 it stutters.
> > You may want to compensate for nvidia w/ a few cpu-hogs.
>
> I tried adding cpu hogs and it gets extremely slow very soon but still
> doesn't stutter here.
>
> > How many gears fps do you get?
>
> When those 3 are running concurrently (without any other cpu hogs) gears
> is showing 317 fps.

Your machine is probably too fast to show the problem, due to enough 
cpu-cycles/timeslice to complete the request.

Can you try the attached mem-eater passing it the number of kb to be eaten.

        i.e. '# while :; do ./eatm 9999 ; done' 

This will print the number of bytes eaten and the timing in ms.

Assuming timeslice=100, adjust the number of kb to be eaten such that the 
timing will be less than timeslice (something like 60ms).  Switch to another 
vt and start another eatm w/ the number of kb yielding more than timeslice 
(something like 140ms).  This eatm should starve completely after exceeding 
timeslice.

This problem also exists in mainline, but it is able to break out of it to 
some extent.  Setting eatm kb to a timing larger than timeslice does not 
exhibit this problem.

> > > range 63-73 seconds.
> >
> > Could this 10s skew be improved to around 1s to aid smoothness?
>
> I'm happy to try... but I doubt it. 10% difference over 10 tasks over 10
> mins of tasks of that wake/sleep nature is pretty good IMO. I'll see if
> there's anywhere else I can make the cpu accounting any better.

Great!

> As an aside, note that sched_clock and nanosecond timing with TSC isn't
> actually used if you use the pm timer which undoes any high res accounting
> the cpu scheduler can do (I noticed this when playing with pm timer that
> sched_clock just returns jiffies resolution instead of real nanosecond
> res). This could undo any smoothness that good cpu accounting can do.

Yes, pm-timer looks rather broken, at least on my machine.  Too bad it's on 
by default, as I always have to turn it off.

Thanks!

--
Al



[-- Attachment #2: eatm.c --]
[-- Type: text/x-csrc, Size: 810 bytes --]

#include <stdio.h>
#include <sys/time.h>

unsigned long elapsed(int start) {

	static struct timeval s,e;

	if (start) return gettimeofday(&s, NULL);

	gettimeofday(&e, NULL);

	return ((e.tv_sec - s.tv_sec) * 1000 + (e.tv_usec - s.tv_usec) / 1000);

}

int main(int argc, char **argv) {

    unsigned long int i,j,max;
    unsigned char *p;

    if (argc>1)
	max=atol(argv[1]);
    else
	max=0x60000;


    elapsed(1); 

    for (i=0;((i<max/1024) && (p = (char *)malloc(1024*1024)));i++) {
        for (j=0;j<1024;p[1024*j++]=0);
	fprintf(stderr,"\r%d MB ",i+1);
    }

    for (j=max-(i*=1024);((i<max) && (p = (char *)malloc(1024)));i++) {
	*p = 0;
    }
    fprintf(stderr,"%d KB ",j-(max-i));

    fprintf(stderr,"eaten in %lu msec (%lu MB/s)\n",elapsed(0),i/(elapsed(0)?:1)*1000/1024);

    return 0;
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-14  3:16                     ` Al Boldi
@ 2006-04-15  7:05                       ` Con Kolivas
  2006-04-15 18:23                         ` [ck] " Michael Gerdau
                                           ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Con Kolivas @ 2006-04-15  7:05 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Friday 14 April 2006 13:16, Al Boldi wrote:
> Can you try the attached mem-eater passing it the number of kb to be eaten.
>
>         i.e. '# while :; do ./eatm 9999 ; done'
>
> This will print the number of bytes eaten and the timing in ms.
>
> Assuming timeslice=100, adjust the number of kb to be eaten such that the
> timing will be less than timeslice (something like 60ms).  Switch to
> another vt and start another eatm w/ the number of kb yielding more than
> timeslice (something like 140ms).  This eatm should starve completely after
> exceeding timeslice.
>
> This problem also exists in mainline, but it is able to break out of it to
> some extent.  Setting eatm kb to a timing larger than timeslice does not
> exhibit this problem.

Thanks for bringing this to my attention. A while back I had different 
management of forked tasks and merged it with PF_NONSLEEP. Since then I've 
changed the management of NONSLEEP tasks and didn't realise it had adversely 
affected the accounting of forking tasks. This patch should rectify it.

Thanks!
---
 include/linux/sched.h |    1 +
 kernel/sched.c        |    9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

Index: linux-2.6.16-ck5/include/linux/sched.h
===================================================================
--- linux-2.6.16-ck5.orig/include/linux/sched.h	2006-04-15 16:32:18.000000000 +1000
+++ linux-2.6.16-ck5/include/linux/sched.h	2006-04-15 16:34:36.000000000 +1000
@@ -961,6 +961,7 @@ static inline void put_task_struct(struc
 #define PF_SWAPWRITE	0x01000000	/* Allowed to write to swap */
 #define PF_NONSLEEP	0x02000000	/* Waiting on in kernel activity */
 #define PF_ISOREF	0x04000000	/* SCHED_ISO task has used up quota */
+#define PF_FORKED	0x08000000	/* Task just forked another process */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.16-ck5/kernel/sched.c
===================================================================
--- linux-2.6.16-ck5.orig/kernel/sched.c	2006-04-15 16:32:18.000000000 +1000
+++ linux-2.6.16-ck5/kernel/sched.c	2006-04-15 16:34:35.000000000 +1000
@@ -18,7 +18,7 @@
  *  2004-04-02	Scheduler domains code by Nick Piggin
  *  2006-04-02	Staircase scheduling policy by Con Kolivas with help
  *		from William Lee Irwin III, Zwane Mwaikambo & Peter Williams.
- *		Staircase v15
+ *		Staircase v15_test2
  */
 
 #include <linux/mm.h>
@@ -809,6 +809,9 @@ static inline void recalc_task_prio(task
 	else
 		sleep_time = 0;
 
+	if (unlikely(p->flags & PF_FORKED))
+		sleep_time = 0;
+
 	/*
 	 * If we sleep longer than our running total and have not set the
 	 * PF_NONSLEEP flag we gain a bonus.
@@ -847,7 +850,7 @@ static void activate_task(task_t *p, run
 	p->time_slice = p->slice % rr ? : rr;
 	if (!rt_task(p)) {
 		recalc_task_prio(p, now);
-		p->flags &= ~PF_NONSLEEP;
+		p->flags &= ~(PF_NONSLEEP | PF_FORKED);
 		p->systime = 0;
 		p->prio = effective_prio(p);
 	}
@@ -1464,7 +1467,7 @@ void fastcall wake_up_new_task(task_t *p
 
 	/* Forked process gets no bonus to prevent fork bombs. */
 	p->bonus = 0;
-	current->flags |= PF_NONSLEEP;
+	current->flags |= PF_FORKED;
 
 	if (likely(cpu == this_cpu)) {
 		activate_task(p, rq, 1);
-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15  7:05                       ` Con Kolivas
@ 2006-04-15 18:23                         ` Michael Gerdau
  2006-04-15 20:45                         ` Al Boldi
  2006-04-15 22:32                         ` jos poortvliet
  2 siblings, 0 replies; 27+ messages in thread
From: Michael Gerdau @ 2006-04-15 18:23 UTC (permalink / raw)
  To: ck; +Cc: Con Kolivas, Al Boldi, Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

> Thanks for bringing this to my attention. A while back I had different 
> management of forked tasks and merged it with PF_NONSLEEP. Since then I've 
> changed the management of NONSLEEP tasks and didn't realise it had adversely 
> affected the accounting of forking tasks. This patch should rectify it.

[snip]

At least here this patch solves the "previously working" (i.e. starving)
testcase, i.e. now all eatm-processes continue to work.

Best,
Michael
-- 
 Vote against SPAM - see http://www.politik-digital.de/spam/
 Michael Gerdau       email: mgd@technosis.de
 GPG-keys available on request or at public keyserver

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15  7:05                       ` Con Kolivas
  2006-04-15 18:23                         ` [ck] " Michael Gerdau
@ 2006-04-15 20:45                         ` Al Boldi
  2006-04-15 23:22                           ` Con Kolivas
  2006-04-15 22:32                         ` jos poortvliet
  2 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-15 20:45 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Friday 14 April 2006 13:16, Al Boldi wrote:
> > Can you try the attached mem-eater passing it the number of kb to be
> > eaten.
> >
> >         i.e. '# while :; do ./eatm 9999 ; done'
> >
> > This will print the number of bytes eaten and the timing in ms.
> >
> > Assuming timeslice=100, adjust the number of kb to be eaten such that
> > the timing will be less than timeslice (something like 60ms).  Switch to
> > another vt and start another eatm w/ the number of kb yielding more than
> > timeslice (something like 140ms).  This eatm should starve completely
> > after exceeding timeslice.
> >
> > This problem also exists in mainline, but it is able to break out of it
> > to some extent.  Setting eatm kb to a timing larger than timeslice does
> > not exhibit this problem.
>
> Thanks for bringing this to my attention. A while back I had different
> management of forked tasks and merged it with PF_NONSLEEP. Since then I've
> changed the management of NONSLEEP tasks and didn't realise it had
> adversely affected the accounting of forking tasks. This patch should
> rectify it.

Congrats!

Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:

9 MB 783 KB eaten in 131 msec (74 MB/s)
9 MB 783 KB eaten in 129 msec (75 MB/s)
9 MB 783 KB eaten in 129 msec (75 MB/s)
9 MB 783 KB eaten in 131 msec (74 MB/s)
9 MB 783 KB eaten in 133 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 128 msec (76 MB/s)
9 MB 783 KB eaten in 133 msec (73 MB/s)
9 MB 783 KB eaten in 129 msec (75 MB/s)
9 MB 783 KB eaten in 130 msec (74 MB/s)
9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
9 MB 783 KB eaten in 197 msec (48 MB/s)
9 MB 783 KB eaten in 133 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 126 msec (77 MB/s)
9 MB 783 KB eaten in 135 msec (72 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 134 msec (72 MB/s)
9 MB 783 KB eaten in 64 msec (152 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 64 msec (152 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 64 msec (152 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)

You may have to adjust the kb to get the same effect.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15  7:05                       ` Con Kolivas
  2006-04-15 18:23                         ` [ck] " Michael Gerdau
  2006-04-15 20:45                         ` Al Boldi
@ 2006-04-15 22:32                         ` jos poortvliet
  2006-04-15 23:06                           ` Con Kolivas
  2 siblings, 1 reply; 27+ messages in thread
From: jos poortvliet @ 2006-04-15 22:32 UTC (permalink / raw)
  To: ck; +Cc: Con Kolivas, Al Boldi, Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1848 bytes --]

Op zaterdag 15 april 2006 09:05, schreef Con Kolivas:
> Thanks for bringing this to my attention. A while back I had different
> management of forked tasks and merged it with PF_NONSLEEP. Since then I've
> changed the management of NONSLEEP tasks and didn't realise it had
> adversely affected the accounting of forking tasks. This patch should
> rectify it.
>
> Thanks!

hey con, i get this:

can't find file to patch at input line 9
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
| include/linux/sched.h |    1 +
| kernel/sched.c        |    9 ++++++---
| 2 files changed, 7 insertions(+), 3 deletions(-)
|
|Index: linux-2.6.16-ck5/include/linux/sched.h
|===================================================================
|--- linux-2.6.16-ck5.orig/include/linux/sched.h 2006-04-15 16:32:18.000000000 
+1000
|+++ linux-2.6.16-ck5/include/linux/sched.h      2006-04-15 16:34:36.000000000 
+1000
--------------------------
File to patch: include/linux/sched.h
patching file include/linux/sched.h
patch: **** malformed patch at line 10:  #define 
PF_SWAPWRITE   0x01000000      /* Allowed to write to swap */

doesn't compile if i add the patch by hand... tried on 2.6.17-rc1-ck1 and on 
2.6.16-ck3. 

----------------
In file included from include/linux/mm.h:4,
                 from kernel/sched.c:24:
include/linux/sched.h:975:18: warning: missing whitespace after the macro name
kernel/sched.c: In function ‘recalc_task_prio’:
kernel/sched.c:820: error: stray ‘\194’ in program
-----------

i'm no coder at all, and have no idea what's goin on.

grtz

Jos

ps 2.6.17-rc2 won't be long i guess, maybe you can just roll this up in -ck...

-- 
You will have good luck and overcome many hardships.

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15 22:32                         ` jos poortvliet
@ 2006-04-15 23:06                           ` Con Kolivas
  0 siblings, 0 replies; 27+ messages in thread
From: Con Kolivas @ 2006-04-15 23:06 UTC (permalink / raw)
  To: jos poortvliet; +Cc: ck, Al Boldi, Mike Galbraith, linux-kernel

On Sunday 16 April 2006 08:32, jos poortvliet wrote:
> Op zaterdag 15 april 2006 09:05, schreef Con Kolivas:
> > Thanks for bringing this to my attention. A while back I had different
> > management of forked tasks and merged it with PF_NONSLEEP. Since then
> > I've changed the management of NONSLEEP tasks and didn't realise it had
> > adversely affected the accounting of forking tasks. This patch should
> > rectify it.
> >
> > Thanks!
>
> hey con, i get this:
>
> can't find file to patch at input line 9
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:

That's because it's not an attachment but inserted into the mail and your 
mailer is mangling it on extraction. Save the whole email unmodified (eg with 
a "save as" function) and use that as the patch. Don't fret as it's a 
non-critical fix since it's a corner case only and it will be in the next -ck 
anyway.

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15 20:45                         ` Al Boldi
@ 2006-04-15 23:22                           ` Con Kolivas
  2006-04-16 18:44                             ` [ck] " Andreas Mohr
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-15 23:22 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Sunday 16 April 2006 06:45, Al Boldi wrote:
> Con Kolivas wrote:
> > Thanks for bringing this to my attention. A while back I had different
> > management of forked tasks and merged it with PF_NONSLEEP. Since then
> > I've changed the management of NONSLEEP tasks and didn't realise it had
> > adversely affected the accounting of forking tasks. This patch should
> > rectify it.
>
> Congrats!
>
> Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:

> 9 MB 783 KB eaten in 130 msec (74 MB/s)
> 9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
> 9 MB 783 KB eaten in 197 msec (48 MB/s)

> You may have to adjust the kb to get the same effect.

I've seen it. It's an artefact of timekeeping that it takes an accumulation of 
data to get all the information. Not much I can do about it except to have 
timeslices so small that they thrash the crap out of cpu caches and 
completely destroy throughput.

The current value, 6ms at 1000HZ, is chosen because it's the largest value 
that can schedule a task in less than normal human perceptible range when two 
competing heavily cpu bound tasks are the same priority. At 250HZ it works 
out to 7.5ms and 10ms at 100HZ. Ironically in my experimenting I found the 
cpu cache improvements become much less significant above 7ms so I'm very 
happy with this compromise.

Thanks!

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 15:25                 ` Al Boldi
  2006-04-13 11:51                   ` Con Kolivas
@ 2006-04-16  6:02                   ` Con Kolivas
  2006-04-16  8:31                     ` Al Boldi
  1 sibling, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-16  6:02 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Thursday 13 April 2006 01:25, Al Boldi wrote:
> Con Kolivas wrote:
> > mean 68.7 seconds
> >
> > range 63-73 seconds.
>
> Could this 10s skew be improved to around 1s to aid smoothness?

It turns out to be dependant on accounting of system time which only staircase 
does at the moment btw. Currently it's done on a jiffy basis. To increase the 
accuracy of this would incur incredible cost which I don't consider worth it.

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-16  6:02                   ` Con Kolivas
@ 2006-04-16  8:31                     ` Al Boldi
  2006-04-16  8:58                       ` Con Kolivas
  2006-04-16 10:37                       ` was " Con Kolivas
  0 siblings, 2 replies; 27+ messages in thread
From: Al Boldi @ 2006-04-16  8:31 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Thursday 13 April 2006 01:25, Al Boldi wrote:
> > Con Kolivas wrote:
> > > mean 68.7 seconds
> > >
> > > range 63-73 seconds.
> >
> > Could this 10s skew be improved to around 1s to aid smoothness?
>
> It turns out to be dependant on accounting of system time which only
> staircase does at the moment btw. Currently it's done on a jiffy basis. To
> increase the accuracy of this would incur incredible cost which I don't
> consider worth it.

Is this also related to that?

> > Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:
> >
> > 9 MB 783 KB eaten in 130 msec (74 MB/s)
> > 9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
> > 9 MB 783 KB eaten in 197 msec (48 MB/s)
> >
> > You may have to adjust the kb to get the same effect.
>
> I've seen it. It's an artefact of timekeeping that it takes an
> accumulation of data to get all the information. Not much I can do about
> it except to have timeslices so small that they thrash the crap out of cpu
> caches and completely destroy throughput.

So why is this not visible in other schedulers?

Are you sure this is not a priority boost problem?

> The current value, 6ms at 1000HZ, is chosen because it's the largest value
> that can schedule a task in less than normal human perceptible range when
> two competing heavily cpu bound tasks are the same priority. At 250HZ it
> works out to 7.5ms and 10ms at 100HZ. Ironically in my experimenting I
> found the cpu cache improvements become much less significant above 7ms so
> I'm very happy with this compromise.

Would you think this is dependent on cache-size and cpu-speed?

Also, what's this iso_cpu thing?

> Thanks!

Thank you!

--
Al


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-16  8:31                     ` Al Boldi
@ 2006-04-16  8:58                       ` Con Kolivas
  2006-04-16 10:37                       ` was " Con Kolivas
  1 sibling, 0 replies; 27+ messages in thread
From: Con Kolivas @ 2006-04-16  8:58 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Sunday 16 April 2006 18:31, Al Boldi wrote:
> Con Kolivas wrote:
> > On Thursday 13 April 2006 01:25, Al Boldi wrote:
> > > Con Kolivas wrote:
> > > > mean 68.7 seconds
> > > >
> > > > range 63-73 seconds.
> > >
> > > Could this 10s skew be improved to around 1s to aid smoothness?
> >
> > It turns out to be dependant on accounting of system time which only
> > staircase does at the moment btw. Currently it's done on a jiffy basis.
> > To increase the accuracy of this would incur incredible cost which I
> > don't consider worth it.
>
> Is this also related to that?

No.

> > > Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:
> > >
> > > 9 MB 783 KB eaten in 130 msec (74 MB/s)
> > > 9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
> > > 9 MB 783 KB eaten in 197 msec (48 MB/s)
> > >
> > > You may have to adjust the kb to get the same effect.
> >
> > I've seen it. It's an artefact of timekeeping that it takes an
> > accumulation of data to get all the information. Not much I can do about
> > it except to have timeslices so small that they thrash the crap out of
> > cpu caches and completely destroy throughput.
>
> So why is this not visible in other schedulers?

When I said there's not much I can do about it I mean with respect to the 
design.

> Are you sure this is not a priority boost problem?

Indeed it is related to the way cpu is proportioned out in staircase being 
both priority and slice. Problem? The magnitude of said problem is up to the 
observer to decide. It's a phenomenon of only two infinitely repeating 
concurrent rapidly forking workloads when one forks every less than 100ms and 
the other more; ie your test case. I'm sure there's a real world workload 
somewhere somehow that exhibits this, but it's important to remember that 
overall it's fair with the occasional blip.

> > The current value, 6ms at 1000HZ, is chosen because it's the largest
> > value that can schedule a task in less than normal human perceptible
> > range when two competing heavily cpu bound tasks are the same priority.
> > At 250HZ it works out to 7.5ms and 10ms at 100HZ. Ironically in my
> > experimenting I found the cpu cache improvements become much less
> > significant above 7ms so I'm very happy with this compromise.
>
> Would you think this is dependent on cache-size and cpu-speed?

It is. Cache warmth time varies on architecture and design. Of course you're 
going to tell me to add a tunable and/or autotune this. Then that undoes the 
limiting it to human perception range. It really does cost us to export these 
things which are otherwise compile time constants... sigh.

> Also, what's this iso_cpu thing?

SCHED_ISO cpu usage which you're not using.

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* was Re: quell interactive feeding frenzy
  2006-04-16  8:31                     ` Al Boldi
  2006-04-16  8:58                       ` Con Kolivas
@ 2006-04-16 10:37                       ` Con Kolivas
  2006-04-16 19:03                         ` Al Boldi
  1 sibling, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-16 10:37 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

Al Since you have an unhealthy interest in cpu schedulers you may also want to 
look at my ultimate fairness with mild interactivity builtin cpu scheduler I 
hacked on briefly. I was bored for a couple of days and came up with the 
design and hacked it together. I never got around to finishing it to live up 
fully to its design intent but it's working embarassingly well at the moment. 
It makes no effort to optimise for interactivity in anyw way. Maybe if I ever 
find some spare time I'll give it more polish and port it to plugsched. 
Ignore the lovely name I give it; the patch is for 2.6.16. It's a dual 
priority array rr scheduler that iterates over all priorities. This is as 
opposed to staircase which is a single priority array scheduler where the 
tasks themselves iterate over all priorities.

http://ck.kolivas.org/patches/crap/sched-crap-1.patch

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15 23:22                           ` Con Kolivas
@ 2006-04-16 18:44                             ` Andreas Mohr
  2006-04-17  0:08                               ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Andreas Mohr @ 2006-04-16 18:44 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Al Boldi, ck list, Mike Galbraith, linux-kernel

Hi,

On Sun, Apr 16, 2006 at 09:22:59AM +1000, Con Kolivas wrote:
> The current value, 6ms at 1000HZ, is chosen because it's the largest value 
> that can schedule a task in less than normal human perceptible range when two 
> competing heavily cpu bound tasks are the same priority. At 250HZ it works 
> out to 7.5ms and 10ms at 100HZ. Ironically in my experimenting I found the 
> cpu cache improvements become much less significant above 7ms so I'm very 
> happy with this compromise.

Heh, this part is *EXACTLY* a fully sufficient explanation of what I was
wondering about myself just these days ;)
(I'm experimenting with different timeslice values on my P3/450 to verify
what performance impact exactly it has)
However with a measly 256kB cache it probably doesn't matter too much,
I think.

But I think it's still important to mention that your perception might be
twisted by your P4 limitation (no testing with slower and really slow
machines).

Andreas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: was Re: quell interactive feeding frenzy
  2006-04-16 10:37                       ` was " Con Kolivas
@ 2006-04-16 19:03                         ` Al Boldi
  2006-04-16 23:26                           ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Al Boldi @ 2006-04-16 19:03 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel

Con Kolivas wrote:
> Al Since you have an unhealthy interest in cpu schedulers you may also
> want to look at my ultimate fairness with mild interactivity builtin cpu
> scheduler I hacked on briefly. I was bored for a couple of days and came
> up with the design and hacked it together. I never got around to finishing
> it to live up fully to its design intent but it's working embarassingly
> well at the moment. It makes no effort to optimise for interactivity in
> anyw way. Maybe if I ever find some spare time I'll give it more polish
> and port it to plugsched. Ignore the lovely name I give it; the patch is
> for 2.6.16. It's a dual priority array rr scheduler that iterates over all
> priorities. This is as opposed to staircase which is a single priority
> array scheduler where the tasks themselves iterate over all priorities.

It's not bad, but it seems to allow cpu-hogs to steal left-over timeslices, 
which increases unfairness as the proc load increases.  Conditionalizing 
prio-boosting based on hogginess maybe one way to compensate for this.  This 
would involve resisting any prio-change unless hogged, which should be 
scaled by hogginess, something like SleepAVG but much simpler and less 
fluctuating.

Really, the key to a successful scheduler would be to build it step by step 
by way of abstraction, modularization, and extension.  Starting w/ a 
noop/RR-scheduler, each step would need to be analyzed for stability and 
efficiency, before moving to the next step, thus exposing problems as you 
move from step to step.

Thanks!

--
Al

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: was Re: quell interactive feeding frenzy
  2006-04-16 19:03                         ` Al Boldi
@ 2006-04-16 23:26                           ` Con Kolivas
  0 siblings, 0 replies; 27+ messages in thread
From: Con Kolivas @ 2006-04-16 23:26 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

On Monday 17 April 2006 05:03, Al Boldi wrote:
> It's not bad, but it seems to allow cpu-hogs to steal left-over timeslices,
> which increases unfairness as the proc load increases.

Spot on.

> Conditionalizing 
> prio-boosting based on hogginess maybe one way to compensate for this. 
> This would involve resisting any prio-change unless hogged, which should be
> scaled by hogginess, something like SleepAVG but much simpler and less
> fluctuating.

Not interested in hacking on something like that onto it. It was more of an 
experiment in the simplest possible starvation free design that still 
supported nice levels.

> Really, the key to a successful scheduler would be to build it step by step
> by way of abstraction, modularization, and extension.  Starting w/ a
> noop/RR-scheduler, each step would need to be analyzed for stability and
> efficiency, before moving to the next step, thus exposing problems as you
> move from step to step.

While this may be the key, it is not the reason we aren't getting maximum 
roundness in our designs in linux. Our major enemy is cpu accounting of work 
done in kernel context on behalf of everyone else. Putting architecture 
dependant hooks into the assembly code to account for entry and exit would be 
the accurate way of doing this.

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-16 18:44                             ` [ck] " Andreas Mohr
@ 2006-04-17  0:08                               ` Con Kolivas
  2006-04-19  8:37                                 ` Andreas Mohr
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2006-04-17  0:08 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: Al Boldi, ck list, Mike Galbraith, linux-kernel

On Monday 17 April 2006 04:44, Andreas Mohr wrote:
> Hi,
>
> On Sun, Apr 16, 2006 at 09:22:59AM +1000, Con Kolivas wrote:
> > The current value, 6ms at 1000HZ, is chosen because it's the largest
> > value that can schedule a task in less than normal human perceptible
> > range when two competing heavily cpu bound tasks are the same priority.
> > At 250HZ it works out to 7.5ms and 10ms at 100HZ. Ironically in my
> > experimenting I found the cpu cache improvements become much less
> > significant above 7ms so I'm very happy with this compromise.
>
> Heh, this part is *EXACTLY* a fully sufficient explanation of what I was
> wondering about myself just these days ;)
> (I'm experimenting with different timeslice values on my P3/450 to verify
> what performance impact exactly it has)
> However with a measly 256kB cache it probably doesn't matter too much,
> I think.
>
> But I think it's still important to mention that your perception might be
> twisted by your P4 limitation (no testing with slower and really slow
> machines).

You underestimate me. Those cpu cache effects were performance effects 
measured down to a PII 233, but all were i386 architecture. As for 
"perception" this isn't my testing I'm talking about; these are 
neuropsychiatric tests that have nothing to do with pcs or what processor you 
use ;)

-- 
-ck

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-17  0:08                               ` Con Kolivas
@ 2006-04-19  8:37                                 ` Andreas Mohr
  2006-04-19  8:59                                   ` jos poortvliet
  0 siblings, 1 reply; 27+ messages in thread
From: Andreas Mohr @ 2006-04-19  8:37 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Al Boldi, ck list, Mike Galbraith, linux-kernel

Hi,

On Mon, Apr 17, 2006 at 10:08:08AM +1000, Con Kolivas wrote:
> On Monday 17 April 2006 04:44, Andreas Mohr wrote:
> > Hi,
> >
> > On Sun, Apr 16, 2006 at 09:22:59AM +1000, Con Kolivas wrote:
> > > The current value, 6ms at 1000HZ, is chosen because it's the largest
> > > value that can schedule a task in less than normal human perceptible
> > > range when two competing heavily cpu bound tasks are the same priority.
> > > At 250HZ it works out to 7.5ms and 10ms at 100HZ. Ironically in my
> > > experimenting I found the cpu cache improvements become much less
> > > significant above 7ms so I'm very happy with this compromise.
> >
> > Heh, this part is *EXACTLY* a fully sufficient explanation of what I was
> > wondering about myself just these days ;)
> > (I'm experimenting with different timeslice values on my P3/450 to verify
> > what performance impact exactly it has)
> > However with a measly 256kB cache it probably doesn't matter too much,
> > I think.
> >
> > But I think it's still important to mention that your perception might be
> > twisted by your P4 limitation (no testing with slower and really slow
> > machines).
> 
> You underestimate me. Those cpu cache effects were performance effects 
> measured down to a PII 233, but all were i386 architecture. As for 
> "perception" this isn't my testing I'm talking about; these are 
> neuropsychiatric tests that have nothing to do with pcs or what processor you 
> use ;)

OK, but I was not worrying about the interactivity aspects, rather the
performance aspects (GUI updates of KDE 3.5.2 on P3/450/256MB on Ubuntu are
about as slow as medium-hot lava). While of course it's mostly KDE (and
probably also the S3 Savage driver/card) which is to blame here,
I'm trying to first do as much as possible at kernel level before eventually
going higher up the chain...

Andreas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ck] Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-19  8:37                                 ` Andreas Mohr
@ 2006-04-19  8:59                                   ` jos poortvliet
  0 siblings, 0 replies; 27+ messages in thread
From: jos poortvliet @ 2006-04-19  8:59 UTC (permalink / raw)
  To: ck; +Cc: Andreas Mohr, Con Kolivas, Al Boldi, linux-kernel, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

Op woensdag 19 april 2006 10:37, schreef Andreas Mohr:
> Hi,
>
> On Mon, Apr 17, 2006 at 10:08:08AM +1000, Con Kolivas wrote:
> > On Monday 17 April 2006 04:44, Andreas Mohr wrote:
> > > Hi,
> > >
> > > On Sun, Apr 16, 2006 at 09:22:59AM +1000, Con Kolivas wrote:
> > > > The current value, 6ms at 1000HZ, is chosen because it's the largest
> > > > value that can schedule a task in less than normal human perceptible
> > > > range when two competing heavily cpu bound tasks are the same
> > > > priority. At 250HZ it works out to 7.5ms and 10ms at 100HZ.
> > > > Ironically in my experimenting I found the cpu cache improvements
> > > > become much less significant above 7ms so I'm very happy with this
> > > > compromise.
> > >
> > > Heh, this part is *EXACTLY* a fully sufficient explanation of what I
> > > was wondering about myself just these days ;)
> > > (I'm experimenting with different timeslice values on my P3/450 to
> > > verify what performance impact exactly it has)
> > > However with a measly 256kB cache it probably doesn't matter too much,
> > > I think.
> > >
> > > But I think it's still important to mention that your perception might
> > > be twisted by your P4 limitation (no testing with slower and really
> > > slow machines).
> >
> > You underestimate me. Those cpu cache effects were performance effects
> > measured down to a PII 233, but all were i386 architecture. As for
> > "perception" this isn't my testing I'm talking about; these are
> > neuropsychiatric tests that have nothing to do with pcs or what processor
> > you use ;)
>
> OK, but I was not worrying about the interactivity aspects, rather the
> performance aspects (GUI updates of KDE 3.5.2 on P3/450/256MB on Ubuntu are
> about as slow as medium-hot lava). While of course it's mostly KDE (and
> probably also the S3 Savage driver/card) which is to blame here,
> I'm trying to first do as much as possible at kernel level before
> eventually going higher up the chain...

if it's slow redrawing, i'd blame the video driver or X, not KDE/Qt - tough 
you might want to try another style and windowdecoration. plastik is 
generally speaking quite fast, or try the 'light' style. Xorg xcomposite 
might help too, but i guess your video card won't like that :D

slow app startup will be better in Kubuntu Dapper+1, as they will finally 
incorporate the new fontconfig (*g* i hope so) and use GCC's C++ symbol 
invisibillity in KDE/Qt. those both can give speedups in the area of 10-20%, 
so that should be noticable. KDE 3.5.3 also will also have a few other 
patches to speed up the KDE startup process, so - faster login, too.

now with al this, its hard to imagine KDE 4 will again sport some 20/30% 
speedup due to reduced memory usage/binary size ;-)

> Andreas

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-04-19  8:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200604112100.28725.kernel@kolivas.org>
2006-04-11 17:03 ` Fwd: Re: [patch][rfc] quell interactive feeding frenzy Al Boldi
2006-04-11 22:56   ` Con Kolivas
2006-04-12  5:41     ` Al Boldi
2006-04-12  6:22       ` Con Kolivas
2006-04-12  8:17         ` Al Boldi
2006-04-12  9:36           ` Con Kolivas
2006-04-12 10:39             ` Al Boldi
2006-04-12 11:27               ` Con Kolivas
2006-04-12 15:25                 ` Al Boldi
2006-04-13 11:51                   ` Con Kolivas
2006-04-14  3:16                     ` Al Boldi
2006-04-15  7:05                       ` Con Kolivas
2006-04-15 18:23                         ` [ck] " Michael Gerdau
2006-04-15 20:45                         ` Al Boldi
2006-04-15 23:22                           ` Con Kolivas
2006-04-16 18:44                             ` [ck] " Andreas Mohr
2006-04-17  0:08                               ` Con Kolivas
2006-04-19  8:37                                 ` Andreas Mohr
2006-04-19  8:59                                   ` jos poortvliet
2006-04-15 22:32                         ` jos poortvliet
2006-04-15 23:06                           ` Con Kolivas
2006-04-16  6:02                   ` Con Kolivas
2006-04-16  8:31                     ` Al Boldi
2006-04-16  8:58                       ` Con Kolivas
2006-04-16 10:37                       ` was " Con Kolivas
2006-04-16 19:03                         ` Al Boldi
2006-04-16 23:26                           ` Con Kolivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox