Re: [PATCH]O14int

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH]O14int
@ 2003-08-08 20:08 Voluspa
  2003-08-09  0:36 ` [PATCH]O14int Con Kolivas
  0 siblings, 1 reply; 16+ messages in thread
From: Voluspa @ 2003-08-08 20:08 UTC (permalink / raw)
  To: linux-kernel

On 2003-08-08 15:49:25 Con Kolivas wrote:

> More duck tape interactivity tweaks

Do you have a premonition... Game-test goes down in flames. Volatile to
the extent where I can't catch head or tail. It can behave like in
A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching to
a text console. Sometimes it recovers, sometimes not. Sometimes fast,
sometimes slowly (when it does recover).

I'll withdraw under my rock now. Won't come forth until everything
smells of roses. Getting stressed by being a bringer of bad news only.
Please speak up, all you other testers. Divide the burden. Even out the
scores.

Greetings,
Mats Johannesson

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-08 20:08 [PATCH]O14int Voluspa
@ 2003-08-09  0:36 ` Con Kolivas
  2003-08-10  8:48   ` [PATCH]O14int Simon Kirby
  0 siblings, 1 reply; 16+ messages in thread
From: Con Kolivas @ 2003-08-09  0:36 UTC (permalink / raw)
  To: Voluspa, linux-kernel

On Sat, 9 Aug 2003 06:08, Voluspa wrote:
> On 2003-08-08 15:49:25 Con Kolivas wrote:
> > More duck tape interactivity tweaks
>
> Do you have a premonition... Game-test goes down in flames. Volatile to
> the extent where I can't catch head or tail. It can behave like in
> A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching to
> a text console. 

Ah. There's the answer. You've totally changed the behaviour of the 
application in question by moving to the text console. No longer is it the 
sizable cpu hog that it is when it's in the foreground on X, so you've 
totally changed it's behaviour and how it is treated.

> Sometimes it recovers, sometimes not. Sometimes fast, 
> sometimes slowly (when it does recover).

Depends on whether the scheduler has decided firmly "you're interactive or 
not". 

Your question of course is can this be changed? Well of course everything 
_can_ be... It may be simple tuning. In the meantime the answer is don't 
switch to the text console. (Doc it hurts when I do this... Well don't do 
that). Might be useful for you to see how long it has run when it recovers, 
and how long when it no longer recovers.

> I'll withdraw under my rock now. Won't come forth until everything
> smells of roses. Getting stressed by being a bringer of bad news only.
> Please speak up, all you other testers. Divide the burden. Even out the
> scores.

Wine, women and song^H^H^H games and scheduling are not a good mix. It's not 
your fault. Please do not hold back any reports.

Con

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-09  0:36 ` [PATCH]O14int Con Kolivas
@ 2003-08-10  8:48   ` Simon Kirby
  2003-08-10  9:06     ` [PATCH]O14int Con Kolivas
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Simon Kirby @ 2003-08-10  8:48 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Sat, Aug 09, 2003 at 10:36:17AM +1000, Con Kolivas wrote:

> On Sat, 9 Aug 2003 06:08, Voluspa wrote:
> > On 2003-08-08 15:49:25 Con Kolivas wrote:
> > > More duck tape interactivity tweaks
> >
> > Do you have a premonition... Game-test goes down in flames. Volatile to
> > the extent where I can't catch head or tail. It can behave like in
> > A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching to
> > a text console. 
> 
> Ah. There's the answer. You've totally changed the behaviour of the 
> application in question by moving to the text console. No longer is it the 
> sizable cpu hog that it is when it's in the foreground on X, so you've 
> totally changed it's behaviour and how it is treated.

I haven't been following this as closely as I would have liked to
(recent vacation and all), but I am definitely seeing issues with the
recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
these threads.

I don't really understand why these changes were made at all to the
scheduler.  As I understand it, the 2.2.x and older 2.4.x scheduler was
simple in that it allowed any process to wake up if it had available
ticks, and would switch to that process if any new event occurred and
woke it up.  The rest was just limiting the ticks based on nice value
and remembering to switch when the ticks run out.

It seems that newer schedulers are now temporarily postponing the
waking up of other processes when the running process is running with
"preemptive" ticks, and that there's all sorts of hacks involved in
trying to hide the bad effects of this decision.

If this is indeed what is going on, what is the reasoning behind it?
I didn't really see any problems before with the simple scheduler, so
it seems to me like this may just be a hack to make poorly-written
applications seem to be a bit "faster" by starving other processes of
CPU when the poorly-written applications decide they want to do
something (such as rendering a page with a large table in Mozilla
-- grr).  Is this really making a large enough difference to be worth
all of this trouble?

To me it would seem the best algorithm would be what we had before all
of this started.  Isn't it best to switch to a task as soon as an event
(such as disk I/O finishing or a mouse move waking up X to read mouse
input) occurs for both latency and cache reasons (queued in LIFO
order)?  DMA may make some this more complicated, I don't know.

I am seeing similar starvation problems that others are seeing in these
threads.  At first it was whenever I clicked a link in Mozilla -- xmms
would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
More recently I found that loading a web page consisting of several
large animated gif images (a security camera web page) caused
absolutely horrible jerking of mouse and keyboard input in all other
windows, even when the browser window was minimized or hidden.  What's
worse is the jerking tends to subside if I do a lot of typing or more
the mouse a lot, probably because I'm changing the scheduler's idea of
what "kind" of processes are running (which makes this stuff even
harder to debug).

Simon-

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-10  8:48   ` [PATCH]O14int Simon Kirby
@ 2003-08-10  9:06     ` Con Kolivas
  2003-08-12 17:56       ` [PATCH]O14int Simon Kirby
  2003-08-10 10:08     ` [PATCH]O14int William Lee Irwin III
  2003-08-10 11:17     ` [PATCH]O14int Mike Galbraith
  2 siblings, 1 reply; 16+ messages in thread
From: Con Kolivas @ 2003-08-10  9:06 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

On Sun, 10 Aug 2003 18:48, Simon Kirby wrote:
> On Sat, Aug 09, 2003 at 10:36:17AM +1000, Con Kolivas wrote:
> > On Sat, 9 Aug 2003 06:08, Voluspa wrote:
> > > On 2003-08-08 15:49:25 Con Kolivas wrote:
> > > > More duck tape interactivity tweaks
> > >
> > > Do you have a premonition... Game-test goes down in flames. Volatile to
> > > the extent where I can't catch head or tail. It can behave like in
> > > A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching
> > > to a text console.
> >
> > Ah. There's the answer. You've totally changed the behaviour of the
> > application in question by moving to the text console. No longer is it
> > the sizable cpu hog that it is when it's in the foreground on X, so
> > you've totally changed it's behaviour and how it is treated.
>
> I haven't been following this as closely as I would have liked to
> (recent vacation and all), but I am definitely seeing issues with the
> recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
> these threads.
>
> I don't really understand why these changes were made at all to the
> scheduler.  As I understand it, the 2.2.x and older 2.4.x scheduler was
> simple in that it allowed any process to wake up if it had available
> ticks, and would switch to that process if any new event occurred and
> woke it up.  The rest was just limiting the ticks based on nice value
> and remembering to switch when the ticks run out.
>
> It seems that newer schedulers are now temporarily postponing the
> waking up of other processes when the running process is running with
> "preemptive" ticks, and that there's all sorts of hacks involved in
> trying to hide the bad effects of this decision.
>
> If this is indeed what is going on, what is the reasoning behind it?
> I didn't really see any problems before with the simple scheduler, so
> it seems to me like this may just be a hack to make poorly-written
> applications seem to be a bit "faster" by starving other processes of
> CPU when the poorly-written applications decide they want to do
> something (such as rendering a page with a large table in Mozilla
> -- grr).  Is this really making a large enough difference to be worth
> all of this trouble?
>
> To me it would seem the best algorithm would be what we had before all
> of this started.  Isn't it best to switch to a task as soon as an event
> (such as disk I/O finishing or a mouse move waking up X to read mouse
> input) occurs for both latency and cache reasons (queued in LIFO
> order)?  DMA may make some this more complicated, I don't know.
>
> I am seeing similar starvation problems that others are seeing in these
> threads.  At first it was whenever I clicked a link in Mozilla -- xmms
> would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> More recently I found that loading a web page consisting of several
> large animated gif images (a security camera web page) caused
> absolutely horrible jerking of mouse and keyboard input in all other
> windows, even when the browser window was minimized or hidden.  What's
> worse is the jerking tends to subside if I do a lot of typing or more
> the mouse a lot, probably because I'm changing the scheduler's idea of
> what "kind" of processes are running (which makes this stuff even
> harder to debug).

Is this with or without my changes? The old scheduler was not very scalable; 
that's why we moved. The new one has other intrinsic issues that I (and 
others) have been trying to address, but is much much more scalable. It was 
not possible to make the old one more scalable, but it is possible to make 
this one more interactive.

Con


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-10  8:48   ` [PATCH]O14int Simon Kirby
  2003-08-10  9:06     ` [PATCH]O14int Con Kolivas
@ 2003-08-10 10:08     ` William Lee Irwin III
  2003-08-12 18:36       ` [PATCH]O14int Simon Kirby
  2003-08-10 11:17     ` [PATCH]O14int Mike Galbraith
  2 siblings, 1 reply; 16+ messages in thread
From: William Lee Irwin III @ 2003-08-10 10:08 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Con Kolivas, linux-kernel

On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> I haven't been following this as closely as I would have liked to
> (recent vacation and all), but I am definitely seeing issues with the
> recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
> these threads.
> I don't really understand why these changes were made at all to the
> scheduler.  As I understand it, the 2.2.x and older 2.4.x scheduler was
> simple in that it allowed any process to wake up if it had available
> ticks, and would switch to that process if any new event occurred and
> woke it up.  The rest was just limiting the ticks based on nice value
> and remembering to switch when the ticks run out.

Most of this isn't of much concern; most of the 2.4.x semantics have
largely been carried over to 2.6.x with algorithmic improvements, apart
from the same-mm heuristic (which was of dubious value anyway). Even
epochs are still there in the form of the duelling arrays, which
renders the thing vaguely timeout-based like 2.4.x.

On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> It seems that newer schedulers are now temporarily postponing the
> waking up of other processes when the running process is running with
> "preemptive" ticks, and that there's all sorts of hacks involved in
> trying to hide the bad effects of this decision.

If this would deliberate it would be a "selfish" scheduling algorithm,
where the delay in preemptively capturing the cpu is a number of ticks
equal to whatever the value of beta/alpha was chosen to be, and some
raw scheduling algorithm is used otherwise unaltered for those tasks in
the service box. I see no evidence of such an organization (it'd be
really obvious, as a queue box and service box would need to exist),
hence this is probably just something in need of a performance tweak
if it's a real problem.

On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> If this is indeed what is going on, what is the reasoning behind it?
> I didn't really see any problems before with the simple scheduler, so
> it seems to me like this may just be a hack to make poorly-written
> applications seem to be a bit "faster" by starving other processes of
> CPU when the poorly-written applications decide they want to do
> something (such as rendering a page with a large table in Mozilla
> -- grr).  Is this really making a large enough difference to be worth
> all of this trouble?

Yes. The SMP issues addressed by the algorithmic improvements in the
scheduler are performance issues so severe, they may safely be called
functional issues.

On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> To me it would seem the best algorithm would be what we had before all
> of this started.  Isn't it best to switch to a task as soon as an event
> (such as disk I/O finishing or a mouse move waking up X to read mouse
> input) occurs for both latency and cache reasons (queued in LIFO
> order)?  DMA may make some this more complicated, I don't know.

This sounds like either LCFS or FB. FB's not usable out of the box for
long-running tasks, as its context switch rates are excessive there.
LCFS has some rather undesirable properties that render it unsuitable
for general purpose operating systems. Something like multilevel
processor sharing would be a much better alternative, as long-running
tasks can be classified and scheduled according to a more appropriate
discipline with a lower context switch rate while maintaining the
(essentially infinitely) strong preference for short-running tasks.

On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> I am seeing similar starvation problems that others are seeing in these
> threads.  At first it was whenever I clicked a link in Mozilla -- xmms
> would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> More recently I found that loading a web page consisting of several
> large animated gif images (a security camera web page) caused
> absolutely horrible jerking of mouse and keyboard input in all other
> windows, even when the browser window was minimized or hidden.  What's
> worse is the jerking tends to subside if I do a lot of typing or more
> the mouse a lot, probably because I'm changing the scheduler's idea of
> what "kind" of processes are running (which makes this stuff even
> harder to debug).

One problem with these kinds of reports is that they aren't coming with
enough information to determine if the scheduler truly is the cause of
the problem, and worse yet, assuming the scheduler did cause these
problems, this isn't enough actual information to address it. We're
going to need proper instrumentation at some point here.

Until then, when you deliver these reports, could you do the following:

(a) vmstat 1 | cat -n | tee -a vmstat.log

(b) run top under script

(c) regularly snapshot profiles with
	n=1
	while true
	do
		readprofile -n -m /boot/System.map-`uname -r` \
			| sort -k 2,2 > prof.$n
		n=`expr $n + 1`
		sleep 1
	done

while running interactivity tests?

(a) will give some moderately useful information about how much io is
	going on and interrupt and context switch rates.

(b) will report dynamic priorities and other general conditions so the
	scheduler's decisions can be examined.

(c) will determine if the issue is due to in-kernel algorithms consuming
	excessive amounts of cpu and causing application-level latency
	issues via cpu burn

Also, send in bootlogs (dmesg), so that general information about the
system can be communicated.

-- wli

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-10  8:48   ` [PATCH]O14int Simon Kirby
  2003-08-10  9:06     ` [PATCH]O14int Con Kolivas
  2003-08-10 10:08     ` [PATCH]O14int William Lee Irwin III
@ 2003-08-10 11:17     ` Mike Galbraith
  2003-08-11 18:19       ` [PATCH]O14int [SCHED_SOFTRR please] Roger Larsson
       [not found]       ` <200308112019.38613.roger.larsson@skelleftea.mail.telia.com >
  2 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2003-08-10 11:17 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Con Kolivas, linux-kernel

At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
>On Sat, Aug 09, 2003 at 10:36:17AM +1000, Con Kolivas wrote:
>
> > On Sat, 9 Aug 2003 06:08, Voluspa wrote:
> > > On 2003-08-08 15:49:25 Con Kolivas wrote:
> > > > More duck tape interactivity tweaks
> > >
> > > Do you have a premonition... Game-test goes down in flames. Volatile to
> > > the extent where I can't catch head or tail. It can behave like in
> > > A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching to
> > > a text console.
> >
> > Ah. There's the answer. You've totally changed the behaviour of the
> > application in question by moving to the text console. No longer is it the
> > sizable cpu hog that it is when it's in the foreground on X, so you've
> > totally changed it's behaviour and how it is treated.
>
>I haven't been following this as closely as I would have liked to
>(recent vacation and all), but I am definitely seeing issues with the
>recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
>these threads.
>
>I don't really understand why these changes were made at all to the
>scheduler.  As I understand it, the 2.2.x and older 2.4.x scheduler was
>simple in that it allowed any process to wake up if it had available
>ticks, and would switch to that process if any new event occurred and
>woke it up.  The rest was just limiting the ticks based on nice value
>and remembering to switch when the ticks run out.
>
>It seems that newer schedulers are now temporarily postponing the
>waking up of other processes when the running process is running with
>"preemptive" ticks, and that there's all sorts of hacks involved in
>trying to hide the bad effects of this decision.

I don't see this as a bad decision at all, it's just that there are some 
annoying cases where the deliberate starvation which works nicely in my 
favor for both interactivity and throughput in most cases can and does kick 
my ass in others.  This is nothing new.  I have no memory of the scheduler 
ever being perfect (0.96->today).  This scheduler is very nice to me; it's 
very simple, it's generally highly effective, and it's easily 
tweakable.  It just has some irritating rough edges.

>If this is indeed what is going on, what is the reasoning behind it?
>I didn't really see any problems before with the simple scheduler, so
>it seems to me like this may just be a hack to make poorly-written
>applications seem to be a bit "faster" by starving other processes of
>CPU when the poorly-written applications decide they want to do
>something (such as rendering a page with a large table in Mozilla
>-- grr).  Is this really making a large enough difference to be worth
>all of this trouble?
>
>To me it would seem the best algorithm would be what we had before all
>of this started.  Isn't it best to switch to a task as soon as an event
>(such as disk I/O finishing or a mouse move waking up X to read mouse
>input) occurs for both latency and cache reasons (queued in LIFO
>order)?  DMA may make some this more complicated, I don't know.

Hmm.  If a mouse event happened to be queued but not yet run when a slew of 
disk events arrived, LIFO would immediately suck.  LIFO may be good for the 
cache, but it doesn't seem like it could be good for average 
latency.  Other than that, what you describe is generally what 
happens.  Tasks which are waiting for hardware a lot rapidly attain a very 
high priority, and preempt whoever happened to service the interrupt 
(waker) almost instantly.  I'd have to look closer at the old scheduler to 
be sure, but I don't think there's anything much different between old/new 
handling.

>I am seeing similar starvation problems that others are seeing in these
>threads.  At first it was whenever I clicked a link in Mozilla -- xmms
>would stop, sometimes for a second or so, on a Celeron 466 MHz machine.

Do you see this with test-X and Ingo's latest changes too?  I can only 
imagine one scenario off the top of my head where this could happen; if 
xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could land in 
the expired array and remain unserviced for the period of time it takes for 
all tasks remaining in the active array to exhaust their slices.  Seems 
like that should be pretty rare though.

         -Mike 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int [SCHED_SOFTRR please]
  2003-08-10 11:17     ` [PATCH]O14int Mike Galbraith
@ 2003-08-11 18:19       ` Roger Larsson
  2003-08-11 21:53         ` Con Kolivas
       [not found]       ` <200308112019.38613.roger.larsson@skelleftea.mail.telia.com >
  1 sibling, 1 reply; 16+ messages in thread
From: Roger Larsson @ 2003-08-11 18:19 UTC (permalink / raw)
  To: linux-kernel

On Sunday 10 August 2003 13.17, Mike Galbraith wrote:
> At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
> >I am seeing similar starvation problems that others are seeing in these
> >threads.  At first it was whenever I clicked a link in Mozilla -- xmms
> >would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
>
> Do you see this with test-X and Ingo's latest changes too?  I can only
> imagine one scenario off the top of my head where this could happen; if
> xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could land in
> the expired array and remain unserviced for the period of time it takes for
> all tasks remaining in the active array to exhaust their slices.  Seems
> like that should be pretty rare though.
>

xmms is a RT process - it does not really have interactivity problems...
It will be extremely hard to fix this in a generic scheduler, instead
let xmms be the RT process it is with SCHED_SOFTRR (or whatever
it will be named).
Do this for arts, and other audio/video path applications.

Then start the race for interactivity tuning
 (X, X applications, console, login, etc)

interactivity = two-way
	http://www.m-w.com/cgi-bin/dictionary?va=interactive

Listening to music is not interactive.

Changing equalization on a media playback need to be interactive in
two ways.
1) The slider should move in the GUI.
2) The volume should change, but the big buffers needed in todays audio path
   will delay the audible changes...
Note: audio path starvation is not one of them...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int [SCHED_SOFTRR please]
       [not found]       ` <200308112019.38613.roger.larsson@skelleftea.mail.telia.com >
@ 2003-08-11 19:46         ` Mike Galbraith
  2003-08-12  0:26           ` What is interactivity? " Roger Larsson
       [not found]           ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2003-08-11 19:46 UTC (permalink / raw)
  To: Roger Larsson; +Cc: linux-kernel

At 08:19 PM 8/11/2003 +0200, Roger Larsson wrote:
>On Sunday 10 August 2003 13.17, Mike Galbraith wrote:
> > At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
> > >I am seeing similar starvation problems that others are seeing in these
> > >threads.  At first it was whenever I clicked a link in Mozilla -- xmms
> > >would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> >
> > Do you see this with test-X and Ingo's latest changes too?  I can only
> > imagine one scenario off the top of my head where this could happen; if
> > xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could land in
> > the expired array and remain unserviced for the period of time it takes for
> > all tasks remaining in the active array to exhaust their slices.  Seems
> > like that should be pretty rare though.
> >
>
>xmms is a RT process - it does not really have interactivity problems...
>It will be extremely hard to fix this in a generic scheduler, instead
>let xmms be the RT process it is with SCHED_SOFTRR (or whatever
>it will be named).
>Do this for arts, and other audio/video path applications.

(For the scenario described, it doesn't matter what scheduler policy is used)

>Then start the race for interactivity tuning
>  (X, X applications, console, login, etc)
>
>interactivity = two-way
>         http://www.m-w.com/cgi-bin/dictionary?va=interactive
>
>Listening to music is not interactive.

?!?  <tilt> What makes you say that?  What in the world am I doing when I 
fire up xmms?
(can't be the two way thing... that's happening until I stop listening)

         -Mike

         -Mike 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int [SCHED_SOFTRR please]
  2003-08-11 18:19       ` [PATCH]O14int [SCHED_SOFTRR please] Roger Larsson
@ 2003-08-11 21:53         ` Con Kolivas
  0 siblings, 0 replies; 16+ messages in thread
From: Con Kolivas @ 2003-08-11 21:53 UTC (permalink / raw)
  To: Roger Larsson, linux-kernel

On Tue, 12 Aug 2003 04:19, Roger Larsson wrote:
> xmms is a RT process - it does not really have interactivity problems...
> It will be extremely hard to fix this in a generic scheduler, instead
> let xmms be the RT process it is with SCHED_SOFTRR (or whatever
> it will be named).

Have you actually _tried_ the tweaked generic scheduler before this big claim?

Con


^ permalink raw reply	[flat|nested] 16+ messages in thread

* What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR please]
  2003-08-11 19:46         ` Mike Galbraith
@ 2003-08-12  0:26           ` Roger Larsson
       [not found]           ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
  1 sibling, 0 replies; 16+ messages in thread
From: Roger Larsson @ 2003-08-12  0:26 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

On Monday 11 August 2003 21.46, Mike Galbraith wrote:
> At 08:19 PM 8/11/2003 +0200, Roger Larsson wrote:
> >On Sunday 10 August 2003 13.17, Mike Galbraith wrote:
> > > At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
> > > >I am seeing similar starvation problems that others are seeing in
> > > > these threads.  At first it was whenever I clicked a link in Mozilla
> > > > -- xmms would stop, sometimes for a second or so, on a Celeron 466
> > > > MHz machine.
> > >
> > > Do you see this with test-X and Ingo's latest changes too?  I can only
> > > imagine one scenario off the top of my head where this could happen; if
> > > xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could
> > > land in the expired array and remain unserviced for the period of time
> > > it takes for all tasks remaining in the active array to exhaust their
> > > slices.  Seems like that should be pretty rare though.
> >
> >xmms is a RT process - it does not really have interactivity problems...
> >It will be extremely hard to fix this in a generic scheduler, instead
> >let xmms be the RT process it is with SCHED_SOFTRR (or whatever
> >it will be named).
> >Do this for arts, and other audio/video path applications.
>
> (For the scenario described, it doesn't matter what scheduler policy is
> used)

It matters if the SOFTRR processes are well behaved, they will get their share
as long as _they_ do not overuse CPU.

Suppose you have xmms running SOFTRR. Whatever you do that is not SOFTRR
(or higher SCHED_FIFO, SCHED_RR) can't touch is scheduler wice.
It will remain SOFTRR and will not run out of its timeslice unless it uses too 
much CPU - its timeslice is refilled immediately whenever it gets empty (it 
is put last on the SOFTRR run queue - not in the expired array...)
But if it SOFTRR processes has used too much CPU there are no guarantees.

>
> >Then start the race for interactivity tuning
> >  (X, X applications, console, login, etc)
> >
> >interactivity = two-way
> >         http://www.m-w.com/cgi-bin/dictionary?va=interactive
> >
> >Listening to music is not interactive.
>
> ?!?  <tilt> What makes you say that?  What in the world am I doing when I
> fire up xmms?
> --- snip ---

You expect sound to start soon - that is the interactive behaviour.

Suppose xmms starts after four seconds and then won't miss a beat.
Compare with if it starts after ten seconds and then won't miss a beat.
If you relate each frame to the start action then you will see that _every_
frame in the first case is one second late, and in the second case ten
seconds late. (Best possible interactivity would be an immediate start - don't 
you agree?)

xmms is interactive if you see the audioboard as the second part.
But I think that if we could concentrate on human users the problem will
become easier. If I leave home while compiling KDE and playing audio with xmms 
- is xmms still interactive? (this will be hard to fix but it is not 
impossible, someone (on a MAC I think) have done a application that logged in 
when you arrived with your bluetooth device and logged off when you left)

"make all" - interactive? It depends on my expectations, my expectations
depends on how big the _total task_ is.
* If it is run from a shell script - like the kde-build I have in the
  background right now. No way!
* If it is my kdeveloper test project ("Hello world" for remote debugging).
  Yes it is! I waiting for it and expect it to be ready NOW.

make bzImage - total rebuild, Not interactive - I expect to be able to get a 
cup of coffe while waiting.
make bzImage - one .c file changed, interactive

I think that the work done this far is great. It is great that the scheduler
almost can handle xmms under all kinds of loads - but enough is enough.

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR please]
       [not found]           ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
@ 2003-08-12  5:40             ` Mike Galbraith
  2003-08-12 15:29               ` Timothy Miller
  2003-08-13  1:43               ` Rob Landley
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2003-08-12  5:40 UTC (permalink / raw)
  To: Roger Larsson; +Cc: linux-kernel

At 02:26 AM 8/12/2003 +0200, Roger Larsson wrote:
>On Monday 11 August 2003 21.46, Mike Galbraith wrote:
> > At 08:19 PM 8/11/2003 +0200, Roger Larsson wrote:
> > >On Sunday 10 August 2003 13.17, Mike Galbraith wrote:
> > > > At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
> > > > >I am seeing similar starvation problems that others are seeing in
> > > > > these threads.  At first it was whenever I clicked a link in Mozilla
> > > > > -- xmms would stop, sometimes for a second or so, on a Celeron 466
> > > > > MHz machine.
> > > >
> > > > Do you see this with test-X and Ingo's latest changes too?  I can only
> > > > imagine one scenario off the top of my head where this could happen; if
> > > > xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could
> > > > land in the expired array and remain unserviced for the period of time
> > > > it takes for all tasks remaining in the active array to exhaust their
> > > > slices.  Seems like that should be pretty rare though.
> > >
> > >xmms is a RT process - it does not really have interactivity problems...
> > >It will be extremely hard to fix this in a generic scheduler, instead
> > >let xmms be the RT process it is with SCHED_SOFTRR (or whatever
> > >it will be named).
> > >Do this for arts, and other audio/video path applications.
> >
> > (For the scenario described, it doesn't matter what scheduler policy is
> > used)
>
>It matters if the SOFTRR processes are well behaved, they will get their share
>as long as _they_ do not overuse CPU.
>
>Suppose you have xmms running SOFTRR. Whatever you do that is not SOFTRR
>(or higher SCHED_FIFO, SCHED_RR) can't touch is scheduler wice.
>It will remain SOFTRR and will not run out of its timeslice unless it uses 
>too
>much CPU - its timeslice is refilled immediately whenever it gets empty (it
>is put last on the SOFTRR run queue - not in the expired array...)

Yup, brainfart on my part.  Realtime tasks are immune.

>But if it SOFTRR processes has used too much CPU there are no guarantees.
>
> >
> > >Then start the race for interactivity tuning
> > >  (X, X applications, console, login, etc)
> > >
> > >interactivity = two-way
> > >         http://www.m-w.com/cgi-bin/dictionary?va=interactive
> > >
> > >Listening to music is not interactive.
> >
> > ?!?  <tilt> What makes you say that?  What in the world am I doing when I
> > fire up xmms?
> > --- snip ---
>
>You expect sound to start soon - that is the interactive behaviour.
>
>Suppose xmms starts after four seconds and then won't miss a beat.
>Compare with if it starts after ten seconds and then won't miss a beat.
>If you relate each frame to the start action then you will see that _every_
>frame in the first case is one second late, and in the second case ten
>seconds late. (Best possible interactivity would be an immediate start - 
>don't
>you agree?)
>
>xmms is interactive if you see the audioboard as the second part.
>But I think that if we could concentrate on human users the problem will
>become easier. If I leave home while compiling KDE and playing audio with 
>xmms
>- is xmms still interactive? (this will be hard to fix but it is not
>impossible, someone (on a MAC I think) have done a application that logged in
>when you arrived with your bluetooth device and logged off when you left)

If I leave the room, or even become distracted enough, xmms ceases to be 
interactive.

>"make all" - interactive? It depends on my expectations, my expectations
>depends on how big the _total task_ is.

If you're watching it, I'd call it interactive.  I see no difference 
between watching a movie and watching compiler output scroll by.

>* If it is run from a shell script - like the kde-build I have in the
>   background right now. No way!

Agreed.  If you're not watching the output scroll by, it's not interactive.

>* If it is my kdeveloper test project ("Hello world" for remote debugging).
>   Yes it is! I waiting for it and expect it to be ready NOW.
>
>make bzImage - total rebuild, Not interactive - I expect to be able to get a
>cup of coffe while waiting.
>make bzImage - one .c file changed, interactive

Well, interactivity can certainly be viewed like one of those tricky 
philosophy questions (bears farting in the woods, trees falling over etc;), 
but I consider any task which is connected to a human via any of our senses 
to be interactive.  Perhaps it's not a 100% accurate use of the term, but 
for lack of a better term...

>I think that the work done this far is great. It is great that the scheduler
>almost can handle xmms under all kinds of loads - but enough is enough.

I don't care if xmms skips or my mouse pointer stalls while I'm testing at 
the heavy end of the load scale, you flat can't have low latency and max 
throughput at the same time.  If xmms skips and the mouse becomes sticks at 
less than "heavy" though, something is wrong (defining heavy is one of 
those tricky judgement calls).  It's the mozilla loading a webpage type of 
reports that I worry about.

         -Mike 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR  please]
  2003-08-12  5:40             ` Mike Galbraith
@ 2003-08-12 15:29               ` Timothy Miller
  2003-08-13  1:43               ` Rob Landley
  1 sibling, 0 replies; 16+ messages in thread
From: Timothy Miller @ 2003-08-12 15:29 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Roger Larsson, linux-kernel



Mike Galbraith wrote:

> 
> If I leave the room, or even become distracted enough, xmms ceases to be 
> interactive.
> 

xmms skipping can be very distracting.  A small skip in the background 
brain stimulus can cause a major skip in the foreground concentration.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-10  9:06     ` [PATCH]O14int Con Kolivas
@ 2003-08-12 17:56       ` Simon Kirby
  2003-08-12 21:21         ` [PATCH]O14int Con Kolivas
  0 siblings, 1 reply; 16+ messages in thread
From: Simon Kirby @ 2003-08-12 17:56 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Sun, Aug 10, 2003 at 07:06:34PM +1000, Con Kolivas wrote:

> Is this with or without my changes? The old scheduler was not very scalable; 
> that's why we moved. The new one has other intrinsic issues that I (and 
> others) have been trying to address, but is much much more scalable. It was 
> not possible to make the old one more scalable, but it is possible to make 
> this one more interactive.

Without your changes.  Are you changing the design or just tuning certain
cases?  I was talking more about the theory behind the scheduling
decisions and not about particular cases.

The O(1) scheduler changes definitely help scalability and I don't have
any problem with that change (unless it introduced the behavior I'm
talking about).

Simon-

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-10 10:08     ` [PATCH]O14int William Lee Irwin III
@ 2003-08-12 18:36       ` Simon Kirby
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Kirby @ 2003-08-12 18:36 UTC (permalink / raw)
  To: William Lee Irwin III, Con Kolivas, linux-kernel

On Sun, Aug 10, 2003 at 03:08:36AM -0700, William Lee Irwin III wrote:

> Most of this isn't of much concern; most of the 2.4.x semantics have
> largely been carried over to 2.6.x with algorithmic improvements, apart
> from the same-mm heuristic (which was of dubious value anyway). Even
> epochs are still there in the form of the duelling arrays, which
> renders the thing vaguely timeout-based like 2.4.x.

Hmm.  I admit I haven't read the code enough to understand really what is
going on -- I'm just guessing how it is working (and how it did work)
based on experiences I've had with it over the years.

> On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> > It seems that newer schedulers are now temporarily postponing the
> > waking up of other processes when the running process is running with
> > "preemptive" ticks, and that there's all sorts of hacks involved in
> > trying to hide the bad effects of this decision.
> 
> If this would deliberate it would be a "selfish" scheduling algorithm,
> where the delay in preemptively capturing the cpu is a number of ticks
> equal to whatever the value of beta/alpha was chosen to be, and some
> raw scheduling algorithm is used otherwise unaltered for those tasks in
> the service box. I see no evidence of such an organization (it'd be
> really obvious, as a queue box and service box would need to exist),
> hence this is probably just something in need of a performance tweak
> if it's a real problem.

Perhaps I should read the code to see what is actually going on (though
it is now fairly complex), but it definitely feels like this is
happening.  Why else would my keystrokes to an otherwise-idle rxvt be
delayed while my browser is rendering a page?  I suppose there may be
interactions with X.  This never used to happen, however.

The simple question: Does the scheduler ever intend to delay a context
switch to a process (which has been idle long enough to rebuild its
maximum timeslice) when a wake up event occurs?  If so, what is the
reasoning for this?

> > If this is indeed what is going on, what is the reasoning behind it?
> > I didn't really see any problems before with the simple scheduler, so
> > it seems to me like this may just be a hack to make poorly-written
> > applications seem to be a bit "faster" by starving other processes of
> > CPU when the poorly-written applications decide they want to do
> > something (such as rendering a page with a large table in Mozilla
> > -- grr).  Is this really making a large enough difference to be worth
> > all of this trouble?
> 
> Yes. The SMP issues addressed by the algorithmic improvements in the
> scheduler are performance issues so severe, they may safely be called
> functional issues.

Obviously the scheduler O(1) changes and other scalability improvements
are worthwhile, but I don't think (unless I'm missing something) they
explain the problem I'm seeing.

> On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> > To me it would seem the best algorithm would be what we had before all
> > of this started.  Isn't it best to switch to a task as soon as an event
> > (such as disk I/O finishing or a mouse move waking up X to read mouse
> > input) occurs for both latency and cache reasons (queued in LIFO
> > order)?  DMA may make some this more complicated, I don't know.
> 
> This sounds like either LCFS or FB. FB's not usable out of the box for
> long-running tasks, as its context switch rates are excessive there.
> LCFS has some rather undesirable properties that render it unsuitable
> for general purpose operating systems. Something like multilevel
> processor sharing would be a much better alternative, as long-running
> tasks can be classified and scheduled according to a more appropriate
> discipline with a lower context switch rate while maintaining the
> (essentially infinitely) strong preference for short-running tasks.

What makes the context switches excessive?  As far as I can see, the
only thing that can initiate a context switch are a process sleeping or
finishing, a timer tick and the scheduler deciding to switch, or a device
causing a wake up event.  I was also wondering: Isn't it best to always
switch to the process which has just had an event for cache coherency?

> > I am seeing similar starvation problems that others are seeing in these
> > threads.  At first it was whenever I clicked a link in Mozilla -- xmms
> > would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> > More recently I found that loading a web page consisting of several
> > large animated gif images (a security camera web page) caused
> > absolutely horrible jerking of mouse and keyboard input in all other
> > windows, even when the browser window was minimized or hidden.  What's
> > worse is the jerking tends to subside if I do a lot of typing or more
> > the mouse a lot, probably because I'm changing the scheduler's idea of
> > what "kind" of processes are running (which makes this stuff even
> > harder to debug).
> 
> One problem with these kinds of reports is that they aren't coming with
> enough information to determine if the scheduler truly is the cause of
> the problem, and worse yet, assuming the scheduler did cause these
> problems, this isn't enough actual information to address it. We're
> going to need proper instrumentation at some point here.

I can do this, but I'm not seeing inefficiency, I'm seeing large decision
problems.  If the context switches were up in the hundreds of thousands
or higher, I would understand, but they're in the low hundreds.  Isn't
top far too slow to figure out what is actually going on?  Also, kernel
time is less than 10 percent, so I don't think kernel profiles will help.

Maybe I'm dreaming, but shouldn't the scheduler be simple enough so that
it can be considered "obviously correct"?  ...Or close to that? :)

Simon-

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]O14int
  2003-08-12 17:56       ` [PATCH]O14int Simon Kirby
@ 2003-08-12 21:21         ` Con Kolivas
  0 siblings, 0 replies; 16+ messages in thread
From: Con Kolivas @ 2003-08-12 21:21 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

On Wed, 13 Aug 2003 03:56, Simon Kirby wrote:
> On Sun, Aug 10, 2003 at 07:06:34PM +1000, Con Kolivas wrote:
> > Is this with or without my changes? The old scheduler was not very
> > scalable; that's why we moved. The new one has other intrinsic issues
> > that I (and others) have been trying to address, but is much much more
> > scalable. It was not possible to make the old one more scalable, but it
> > is possible to make this one more interactive.
>
> Without your changes.  Are you changing the design or just tuning certain
> cases?  I was talking more about the theory behind the scheduling
> decisions and not about particular cases.

I'm just changing the algorithm that gives priority boost or penalty, and 
creating code to further feedback into that algorithm.

> The O(1) scheduler changes definitely help scalability and I don't have
> any problem with that change (unless it introduced the behavior I'm
> talking about).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR  please]
  2003-08-12  5:40             ` Mike Galbraith
  2003-08-12 15:29               ` Timothy Miller
@ 2003-08-13  1:43               ` Rob Landley
  1 sibling, 0 replies; 16+ messages in thread
From: Rob Landley @ 2003-08-13  1:43 UTC (permalink / raw)
  To: Mike Galbraith, Roger Larsson; +Cc: linux-kernel

On Tuesday 12 August 2003 01:40, Mike Galbraith wrote:

> Well, interactivity can certainly be viewed like one of those tricky
> philosophy questions (bears farting in the woods, trees falling over etc;),
> but I consider any task which is connected to a human via any of our senses
> to be interactive.  Perhaps it's not a 100% accurate use of the term, but
> for lack of a better term...

"Interactivity"  is being used as a proxy for at least two different 
conditions: smooth spooling and snappy response to (possibly repeated) 
asynchronous wakeups.

The smooth spooler problem is where you're trying to input or output stuff at 
a constant rate, somewhere below your theoretical maximum capacity.  Sound 
output is like this.  Whether you're listening or not, the tree in the forest 
still falls.  A skip is a skip, the output could be being recorded to tape or 
who knows what.  Correctness here is emprical; if it skips something went 
wrong.

Sound is just one example, and a relatively easy one since the CPU 
requirements are so low on modern machines.  Personal Video Recorders ala 
Tivo are a more demanding application (often coming perilously close to your 
memory or disk bandwidth capacity), and skips or dropouts are saved for 
posterity there.  A human doesn't even have to be in the room, that task is 
still "interactive".

Repeated asynchronous wakeups come from typing on the keyboard and wiggling 
the mouse.  If your mouse is dragging a window, the asynchronous wakeups 
could provoke a lot of CPU activity.

The difference between these two is that they are different types of waits.  
Smooth spooling involves waiting for a known period of time, and being woken 
up by a timer.  Asynchronous wakeups come out of the blue, the application 
has know way of knowing the mouse is about to move or the keyboard is about 
to press until it happens.

(Some things combine these behaviors.  First person shooters (30 frames per 
second, plus responding to the joystick NOW), but that kind of thing could 
also collapse into the smooth spooler case if the frame rate's high enough 
and polling for input is cheap...)

True CPU hogs do block, but they only block when they're requesting more work.  
Any read or write to a block device is a "request more work" type of block, 
for example.  If the block device gets faster, the app runs faster.

With a CPU hog, there is no system so powerful that this thing won't try to 
speed to completion as fast as it can.  With an "interactive" task, the speed 
of the system is not the limiting factor (or at least shouldn't be).

Now there's a lot of fuzzy bits where you can't tell what kind of block you're 
doing.  Blocking on the network, blocking on pipes, etc.  Could be anything.  
But I think it's pretty safe to say that a timer is always an interactive 
wait, and a block device never is.  (And considering that the  I/O scheduler 
and the CPU scheduler may have to work together in the future to make things 
like the anticipatory schedulerwork properly, it shouldn't be TOO much of a 
stretch to distinguish between waiting on a block device and waiting on 
something else...)

> >I think that the work done this far is great. It is great that the
> > scheduler almost can handle xmms under all kinds of loads - but enough is
> > enough.
>
> I don't care if xmms skips or my mouse pointer stalls while I'm testing at
> the heavy end of the load scale,

I do.  I believe you're in the minority here.

> you flat can't have low latency and max
> throughput at the same time.

If you're talking about keeping your cache hot, I agree.  But a lot of times, 
minimizing latency DOES help throughput.  (Anticipatory scheduler, case in 
point. :)

What you're saying is that you want your CPU hog loads to complete as quickly 
as possible at the expense of smooth mouse movement.  This is what "nice" is 
for, isn't it?  (If you've got a dedicated, throughput-optimized server 
running X in the first place, you have more fundamental problems.)

And your uber-optimized configuration is still going to lose out to an 
unoptimized configuration running on hardware that's three months newer... :)

The linux-kernel gurus focused their optimizations almost exclusively on 
throughput for almost the first full decade of kernel development.  
Interactive latency started explicitly showing up as a concern in 2.4, and 
has only really become a priority in 2.5.  There are a few tradeoffs, but 
some of them are a bit overdue if you ask me.

If you can document a throughput degredation and give a repeatable benchmark, 
I'm sure Con and Ingo will be thrilled to address it.  A lot of contest is 
about throughput, you know.  They're trying very hard to avoid regressions...

>  If xmms skips and the mouse becomes sticks at
> less than "heavy" though, something is wrong (defining heavy is one of
> those tricky judgement calls).

You know, I used to beat OS/2 to DEATH, and the mouse never went funky on me.  
(Of course the mouse was updated directly from an interrupt routine in kernel 
memory that never swapped out.  But still... :)

> It's the mozilla loading a webpage type of reports that I worry about.

It could be worse.  It could be OpenOffice. :)

>          -Mike

Rob

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-08-13  6:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-08 20:08 [PATCH]O14int Voluspa
2003-08-09  0:36 ` [PATCH]O14int Con Kolivas
2003-08-10  8:48   ` [PATCH]O14int Simon Kirby
2003-08-10  9:06     ` [PATCH]O14int Con Kolivas
2003-08-12 17:56       ` [PATCH]O14int Simon Kirby
2003-08-12 21:21         ` [PATCH]O14int Con Kolivas
2003-08-10 10:08     ` [PATCH]O14int William Lee Irwin III
2003-08-12 18:36       ` [PATCH]O14int Simon Kirby
2003-08-10 11:17     ` [PATCH]O14int Mike Galbraith
2003-08-11 18:19       ` [PATCH]O14int [SCHED_SOFTRR please] Roger Larsson
2003-08-11 21:53         ` Con Kolivas
     [not found]       ` <200308112019.38613.roger.larsson@skelleftea.mail.telia.com >
2003-08-11 19:46         ` Mike Galbraith
2003-08-12  0:26           ` What is interactivity? " Roger Larsson
     [not found]           ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
2003-08-12  5:40             ` Mike Galbraith
2003-08-12 15:29               ` Timothy Miller
2003-08-13  1:43               ` Rob Landley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox