2.6 vs 2.4, ssh terminal slowdown

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6 vs 2.4, ssh terminal slowdown
@ 2006-02-08  2:11 Grant Coady
  2006-02-08  2:24 ` Benjamin LaHaise
  2006-02-08  2:35 ` Con Kolivas
  0 siblings, 2 replies; 41+ messages in thread
From: Grant Coady @ 2006-02-08  2:11 UTC (permalink / raw)
  To: linux-kernel

Hi there,

grant@deltree:~$ uname -r
2.6.15.3a
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut -c-95
...
2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET /test/linux-2.6/tosh/ HTTP/

real    0m8.537s
user    0m0.970s
sys     0m1.100s

--> reboot to 2.4.32-hf32.2

grant@deltree:~$ uname -r
2.4.32-hf32.2
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut -c-95
...
2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET /test/linux-2.6/tosh/ HTTP/

real    0m2.271s
user    0m0.730s
sys     0m0.540s

Still a 4:1 slowdown, machine .config and dmesg info:
  http://bugsplatter.mine.nu/test/boxen/deltree/

While it is very nice to know I can run recent 2.6 kernels without a fuss, 
2.4.latest feels better in this particular context.

This console sluggishness is noticeable enough on older hardware for me to 
forgo exercising 2.6.latest.stable bugs for much time on it ;)

For those suffering deja vu, yes, I reported this last month (or, recently).

Grant.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:11 2.6 vs 2.4, ssh terminal slowdown Grant Coady
@ 2006-02-08  2:24 ` Benjamin LaHaise
  2006-02-08  2:50   ` Grant Coady
  2006-02-08  2:35 ` Con Kolivas
  1 sibling, 1 reply; 41+ messages in thread
From: Benjamin LaHaise @ 2006-02-08  2:24 UTC (permalink / raw)
  To: Grant Coady; +Cc: linux-kernel

On Wed, Feb 08, 2006 at 01:11:49PM +1100, Grant Coady wrote:
> This console sluggishness is noticeable enough on older hardware for me to 
> forgo exercising 2.6.latest.stable bugs for much time on it ;)
> 
> For those suffering deja vu, yes, I reported this last month (or, recently).

This bug report is a bit vague in terms of what the problem is -- the 
test case hits 3 major subsystems (io, vm, net), all of which have changed 
rather substantially in the course of 2.6 development.  Would it be possible 
to profile the system using oprofile to get an idea what the hotspots are?  
Have you compared basic hard disk throughput with hdparm, as well as 
ensuring DMA is enabled with 32 bit io?  What about testing network 
performance with netperf (or a netcat of /dev/zero)?  A few more data points 
would be quite helpful.  Cheers,

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:11 2.6 vs 2.4, ssh terminal slowdown Grant Coady
  2006-02-08  2:24 ` Benjamin LaHaise
@ 2006-02-08  2:35 ` Con Kolivas
  2006-02-08  2:55   ` Grant Coady
  2006-02-09 17:06   ` Jan Engelhardt
  1 sibling, 2 replies; 41+ messages in thread
From: Con Kolivas @ 2006-02-08  2:35 UTC (permalink / raw)
  To: gcoady; +Cc: linux-kernel

On Wed, 8 Feb 2006 01:11 pm, Grant Coady wrote:
> Hi there,
>
> grant@deltree:~$ uname -r
> 2.6.15.3a
> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> -c-95 ...
> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> /test/linux-2.6/tosh/ HTTP/
>
> real    0m8.537s
> user    0m0.970s
> sys     0m1.100s
>
> --> reboot to 2.4.32-hf32.2
>
> grant@deltree:~$ uname -r
> 2.4.32-hf32.2
> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> -c-95 ...
> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> /test/linux-2.6/tosh/ HTTP/
>
> real    0m2.271s
> user    0m0.730s
> sys     0m0.540s
>
> Still a 4:1 slowdown, machine .config and dmesg info:
>   http://bugsplatter.mine.nu/test/boxen/deltree/

What happens if you add "| cat" on the end of your command?

Cheers,
Con

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:24 ` Benjamin LaHaise
@ 2006-02-08  2:50   ` Grant Coady
  2006-02-08  3:02     ` Benjamin LaHaise
  0 siblings, 1 reply; 41+ messages in thread
From: Grant Coady @ 2006-02-08  2:50 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel

On Tue, 7 Feb 2006 21:24:11 -0500, Benjamin LaHaise <bcrl@kvack.org> wrote:

>On Wed, Feb 08, 2006 at 01:11:49PM +1100, Grant Coady wrote:
>> This console sluggishness is noticeable enough on older hardware for me to 
>> forgo exercising 2.6.latest.stable bugs for much time on it ;)
>> 
>> For those suffering deja vu, yes, I reported this last month (or, recently).
>
>This bug report is a bit vague in terms of what the problem is -- the 
>test case hits 3 major subsystems (io, vm, net), all of which have changed 
>rather substantially in the course of 2.6 development.

Vague 'cos I do not know where the problem is.  One might say slowdown 
is like a near a 1ms delay per line output, but slowdown does not 
correlate to kernel tick frequency.  :(

> Would it be possible 
>to profile the system using oprofile to get an idea what the hotspots are?  
Perhaps, I've yet to try that.  

>Have you compared basic hard disk throughput with hdparm, as well as 
>ensuring DMA is enabled with 32 bit io?  What about testing network 
>performance with netperf (or a netcat of /dev/zero)?  A few more data points 
>would be quite helpful.

Yes, the gross datapoints such as basic net i/o, disk i/o, say 2.6 is 
better/faster than 2.4 on this hardware, my problem is how to describe 
measured console slowdown in terms meaningful to you. 

I'll take a look at oprofile, report back if I can make sense of it ;)

Grant.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:35 ` Con Kolivas
@ 2006-02-08  2:55   ` Grant Coady
  2006-02-08  3:00     ` Con Kolivas
  2006-02-09 17:06   ` Jan Engelhardt
  1 sibling, 1 reply; 41+ messages in thread
From: Grant Coady @ 2006-02-08  2:55 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Wed, 8 Feb 2006 13:35:18 +1100, Con Kolivas <kernel@kolivas.org> wrote:

>On Wed, 8 Feb 2006 01:11 pm, Grant Coady wrote:
>> Hi there,
>>
>> grant@deltree:~$ uname -r
>> 2.6.15.3a
>> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
>> -c-95 ...
>> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
>> /test/linux-2.6/tosh/ HTTP/
>>
>> real    0m8.537s
>> user    0m0.970s
>> sys     0m1.100s
>>
>> --> reboot to 2.4.32-hf32.2
>>
>> grant@deltree:~$ uname -r
>> 2.4.32-hf32.2
>> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
>> -c-95 ...
>> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
>> /test/linux-2.6/tosh/ HTTP/
>>
>> real    0m2.271s
>> user    0m0.730s
>> sys     0m0.540s
>>
>> Still a 4:1 slowdown, machine .config and dmesg info:
>>   http://bugsplatter.mine.nu/test/boxen/deltree/
>
>What happens if you add "| cat" on the end of your command?

It gets faster with 2.4.32-hf32.2 by a little bit (I forgot to copy)

reboot to 2.6.15.3a, without...

real    0m8.737s
user    0m1.030s
sys     0m1.200s

with...  oh shit / surprise!!

real    0m1.861s
user    0m0.560s
sys     0m0.370s

What is that telling me / you / us?

Thanks,
Grant
>
>Cheers,
>Con


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:55   ` Grant Coady
@ 2006-02-08  3:00     ` Con Kolivas
  2006-02-08  4:12       ` Barry K. Nathan
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Con Kolivas @ 2006-02-08  3:00 UTC (permalink / raw)
  To: gcoady; +Cc: linux-kernel

On Wed, 8 Feb 2006 01:55 pm, Grant Coady wrote:
> On Wed, 8 Feb 2006 13:35:18 +1100, Con Kolivas <kernel@kolivas.org> wrote:
> >On Wed, 8 Feb 2006 01:11 pm, Grant Coady wrote:
> >> Hi there,
> >>
> >> grant@deltree:~$ uname -r
> >> 2.6.15.3a
> >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> >> -c-95 ...
> >> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> >> /test/linux-2.6/tosh/ HTTP/
> >>
> >> real    0m8.537s
> >> user    0m0.970s
> >> sys     0m1.100s
> >>
> >> --> reboot to 2.4.32-hf32.2
> >>
> >> grant@deltree:~$ uname -r
> >> 2.4.32-hf32.2
> >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> >> -c-95 ...
> >> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> >> /test/linux-2.6/tosh/ HTTP/
> >>
> >> real    0m2.271s
> >> user    0m0.730s
> >> sys     0m0.540s
> >>
> >> Still a 4:1 slowdown, machine .config and dmesg info:
> >>   http://bugsplatter.mine.nu/test/boxen/deltree/
> >
> >What happens if you add "| cat" on the end of your command?
>
> It gets faster with 2.4.32-hf32.2 by a little bit (I forgot to copy)
>
> reboot to 2.6.15.3a, without...
>
> real    0m8.737s
> user    0m1.030s
> sys     0m1.200s
>
> with...  oh shit / surprise!!
>
> real    0m1.861s
> user    0m0.560s
> sys     0m0.370s
>
> What is that telling me / you / us?

Heh.

This is the terminal's fault. xterm et al use an algorithm to determine how 
fast your machine is and decide whether to jump scroll or smooth scroll. This 
algorithm is basically broken with the 2.6 scheduler and it decides to mostly 
smooth scroll.

Cheers,
Con

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:50   ` Grant Coady
@ 2006-02-08  3:02     ` Benjamin LaHaise
  0 siblings, 0 replies; 41+ messages in thread
From: Benjamin LaHaise @ 2006-02-08  3:02 UTC (permalink / raw)
  To: Grant Coady; +Cc: linux-kernel

On Wed, Feb 08, 2006 at 01:50:10PM +1100, Grant Coady wrote:
> Vague 'cos I do not know where the problem is.  One might say slowdown 
> is like a near a 1ms delay per line output, but slowdown does not 
> correlate to kernel tick frequency.  :(

Two things come to mind: can you try doing a vmstat 1 while running the 
test and compare 2.4 vs 2.6?  Also, does it make a difference if you switch 
from the e100 driver to eepro100?

> I'll take a look at oprofile, report back if I can make sense of it ;)

If the CPU is pegged that will guide fixing things quite nicely, but the 
fact that it's 1ms per line sounds like something more sinister.

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  3:00     ` Con Kolivas
@ 2006-02-08  4:12       ` Barry K. Nathan
  2006-02-08  4:41         ` Grant Coady
  2006-02-08  4:51       ` Grant Coady
  2006-02-08  7:43       ` Lee Revell
  2 siblings, 1 reply; 41+ messages in thread
From: Barry K. Nathan @ 2006-02-08  4:12 UTC (permalink / raw)
  To: Con Kolivas; +Cc: gcoady, linux-kernel

On 2/7/06, Con Kolivas <kernel@kolivas.org> wrote:
> This is the terminal's fault. xterm et al use an algorithm to determine how
> fast your machine is and decide whether to jump scroll or smooth scroll. This
> algorithm is basically broken with the 2.6 scheduler and it decides to mostly
> smooth scroll.

Recent versions of xterm are supposed to fix this. (Skimming xterm's
changelog, I think it might have been fixed in version 201, but I'm
not completely sure.)
--
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  4:12       ` Barry K. Nathan
@ 2006-02-08  4:41         ` Grant Coady
  0 siblings, 0 replies; 41+ messages in thread
From: Grant Coady @ 2006-02-08  4:41 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: Con Kolivas, linux-kernel

On Tue, 7 Feb 2006 20:12:40 -0800, "Barry K. Nathan" <barryn@pobox.com> wrote:

>On 2/7/06, Con Kolivas <kernel@kolivas.org> wrote:
>> This is the terminal's fault. xterm et al use an algorithm to determine how
>> fast your machine is and decide whether to jump scroll or smooth scroll. This
>> algorithm is basically broken with the 2.6 scheduler and it decides to mostly
>> smooth scroll.
>
>Recent versions of xterm are supposed to fix this. (Skimming xterm's
>changelog, I think it might have been fixed in version 201, but I'm
>not completely sure.)

Yeah but I'm using ssh to PuTTY on windoze.  No GUI on linux box.

Grant.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  3:00     ` Con Kolivas
  2006-02-08  4:12       ` Barry K. Nathan
@ 2006-02-08  4:51       ` Grant Coady
  2006-02-08  5:17         ` Willy Tarreau
  2006-02-08  7:43       ` Lee Revell
  2 siblings, 1 reply; 41+ messages in thread
From: Grant Coady @ 2006-02-08  4:51 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Wed, 8 Feb 2006 14:00:59 +1100, Con Kolivas <kernel@kolivas.org> wrote:

>This is the terminal's fault. xterm et al use an algorithm to determine how 
>fast your machine is and decide whether to jump scroll or smooth scroll. This 
>algorithm is basically broken with the 2.6 scheduler and it decides to mostly 
>smooth scroll.

Strange it does that over localnet to a PuTTY terminal on windoze.

Seems a strange thing to do in the kernel though, presentation 
buffering / management surely can be done in userspace?

Grant.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  4:51       ` Grant Coady
@ 2006-02-08  5:17         ` Willy Tarreau
  2006-02-08  5:39           ` Grant Coady
  0 siblings, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2006-02-08  5:17 UTC (permalink / raw)
  To: Grant Coady; +Cc: Con Kolivas, linux-kernel

Hi Grant,

On Wed, Feb 08, 2006 at 03:51:24PM +1100, Grant Coady wrote:
> On Wed, 8 Feb 2006 14:00:59 +1100, Con Kolivas <kernel@kolivas.org> wrote:
> 
> >This is the terminal's fault. xterm et al use an algorithm to determine how 
> >fast your machine is and decide whether to jump scroll or smooth scroll. This 
> >algorithm is basically broken with the 2.6 scheduler and it decides to mostly 
> >smooth scroll.
> 
> Strange it does that over localnet to a PuTTY terminal on windoze.
> 
> Seems a strange thing to do in the kernel though, presentation 
> buffering / management surely can be done in userspace?

I suspect the sshd on the firewall gets woken up for each line and it
behaves exactly like an xterm. After having done a lot of "ls -l|cat"
on 2.6, I'm not surprized at all :-/

A good test would be to strace sshd under 2.4 and 2.6. You could even
use strace -tt. Probably that you will see something like 1 ms between
two reads on 2.6 and nearly nothing between them in 2.4.

> Grant.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  5:17         ` Willy Tarreau
@ 2006-02-08  5:39           ` Grant Coady
  0 siblings, 0 replies; 41+ messages in thread
From: Grant Coady @ 2006-02-08  5:39 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Con Kolivas, linux-kernel

Hi Willy,
On Wed, 8 Feb 2006 06:17:09 +0100, Willy Tarreau <willy@w.ods.org> wrote:

>Hi Grant,
>
>On Wed, Feb 08, 2006 at 03:51:24PM +1100, Grant Coady wrote:
>> On Wed, 8 Feb 2006 14:00:59 +1100, Con Kolivas <kernel@kolivas.org> wrote:
>> 
>> >This is the terminal's fault. xterm et al use an algorithm to determine how 
>> >fast your machine is and decide whether to jump scroll or smooth scroll. This 
>> >algorithm is basically broken with the 2.6 scheduler and it decides to mostly 
>> >smooth scroll.
>> 
>> Strange it does that over localnet to a PuTTY terminal on windoze.
>> 
>> Seems a strange thing to do in the kernel though, presentation 
>> buffering / management surely can be done in userspace?
>
>I suspect the sshd on the firewall gets woken up for each line and it
>behaves exactly like an xterm. After having done a lot of "ls -l|cat"
>on 2.6, I'm not surprized at all :-/
>
>A good test would be to strace sshd under 2.4 and 2.6. You could even
>use strace -tt. Probably that you will see something like 1 ms between
>two reads on 2.6 and nearly nothing between them in 2.4.

Yes, it is nearly 1ms per line delay with 2.6, but 2.4 and 2.6 with the 
trailing '|cat' give similar times, didn't try that notion last time.  

We know now it isn't the network cards, disk I/O, just an oddness in 2.6 ;)

Cheers,
Grant.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  3:00     ` Con Kolivas
  2006-02-08  4:12       ` Barry K. Nathan
  2006-02-08  4:51       ` Grant Coady
@ 2006-02-08  7:43       ` Lee Revell
  2 siblings, 0 replies; 41+ messages in thread
From: Lee Revell @ 2006-02-08  7:43 UTC (permalink / raw)
  To: Con Kolivas; +Cc: gcoady, linux-kernel

On Wed, 2006-02-08 at 14:00 +1100, Con Kolivas wrote:
> This is the terminal's fault. xterm et al use an algorithm to
> determine how fast your machine is and decide whether to jump scroll
> or smooth scroll. This algorithm is basically broken with the 2.6
> scheduler and it decides to mostly smooth scroll.
> 

Hmm, I've been having a similar problem for ages.  If I just do "ls" in
my home directory 10 or 20 times, approximately 20% of the time it's
"fast":

real    0m0.177s
user    0m0.028s
sys     0m0.027s

And the rest of the times it's "slow":

real    0m1.240s
user    0m0.036s
sys     0m0.040s

I rarely get anything in between - it's either ~1.2s or ~0.2s.

"time ls | cat" is always fast - 0.18 - 0.35s.

real    0m0.188s
user    0m0.014s
sys     0m0.018s

It has been this way as long as I can remember and it never made
sense...

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-08  2:35 ` Con Kolivas
  2006-02-08  2:55   ` Grant Coady
@ 2006-02-09 17:06   ` Jan Engelhardt
  2006-02-09 20:06     ` Lee Revell
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Engelhardt @ 2006-02-09 17:06 UTC (permalink / raw)
  To: Con Kolivas; +Cc: gcoady, linux-kernel

>> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
>> -c-95 ...
>
>What happens if you add "| cat" on the end of your command?
>
Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12, 
don't remember exactly)


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-09 17:06   ` Jan Engelhardt
@ 2006-02-09 20:06     ` Lee Revell
  2006-02-10  6:35       ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-09 20:06 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Con Kolivas, gcoady, linux-kernel

On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> >> -c-95 ...
> >
> >What happens if you add "| cat" on the end of your command?
> >
> Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12, 
> don't remember exactly)

If it's the same problem I've been seeing it goes back much farther than
2.6.10.

Lately I suspect the scheduler.

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-09 20:06     ` Lee Revell
@ 2006-02-10  6:35       ` MIke Galbraith
  2006-02-12 13:47         ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-10  6:35 UTC (permalink / raw)
  To: Lee Revell; +Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel

On Thu, 2006-02-09 at 15:06 -0500, Lee Revell wrote:
> On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> > >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> > >> -c-95 ...
> > >
> > >What happens if you add "| cat" on the end of your command?
> > >
> > Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12, 
> > don't remember exactly)
> 
> If it's the same problem I've been seeing it goes back much farther than
> 2.6.10.
> 
> Lately I suspect the scheduler.

Hmm.  I ran into an oddity while testing a modified kernel, and see
something in schedule() that I don't think is right...

Down where it does requeue_task(next, array) if a freshly awakened task
is to possibly receive a priority boost for the time it sat on the
runqueue, I see a potential problem.  If the task didn't sit on the
queue long enough to be promoted, and isn't at the very top, it is going
to the back of the bus as soon it gets preempted by say xmms.  For a
task that possibly just sat through the full rotation of a busy queue
waiting for a shot at the cpu, that has got to hurt.  Speculating, that
requeue looks like it's there to increase the queue rotation rate, ie to
reduce latency, but it looks to me like it can also accomplish the
opposite if the context switch rate for your queue isn't very high.

... I ended up sharing a queue with a few rampaging irman2 threads, and
each keystroke took ages.  [btw, i wonder how the heck next->array could
not be rq->active there]  

	-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-10  6:35       ` MIke Galbraith
@ 2006-02-12 13:47         ` MIke Galbraith
  2006-02-12 19:03           ` Lee Revell
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-12 13:47 UTC (permalink / raw)
  To: Lee Revell; +Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 20672 bytes --]

On Fri, 2006-02-10 at 07:35 +0100, MIke Galbraith wrote:
> On Thu, 2006-02-09 at 15:06 -0500, Lee Revell wrote:
> > On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> > > >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> > > >> -c-95 ...
> > > >
> > > >What happens if you add "| cat" on the end of your command?
> > > >
> > > Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12, 
> > > don't remember exactly)
> > 
> > If it's the same problem I've been seeing it goes back much farther than
> > 2.6.10.
> > 
> > Lately I suspect the scheduler.
> 
> Hmm.  I ran into an oddity while testing a modified kernel, and see
> something in schedule() that I don't think is right...
> 
> Down where it does requeue_task(next, array) if a freshly awakened task
> is to possibly receive a priority boost for the time it sat on the
> runqueue, I see a potential problem.  If the task didn't sit on the
> queue long enough to be promoted, and isn't at the very top, it is going
> to the back of the bus as soon it gets preempted by say xmms.  For a
> task that possibly just sat through the full rotation of a busy queue
> waiting for a shot at the cpu, that has got to hurt.  Speculating, that
> requeue looks like it's there to increase the queue rotation rate, ie to
> reduce latency, but it looks to me like it can also accomplish the
> opposite if the context switch rate for your queue isn't very high.
> 
> ... I ended up sharing a queue with a few rampaging irman2 threads, and
> each keystroke took ages.  [btw, i wonder how the heck next->array could
> not be rq->active there]  

I guess you didn't try my pseudo suggestion.  Since I happen to be
actively tinkering in this very area, I'll be a bit more direct :)

If you think it's the scheduler, how about try the patch below.  It's
against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
logic in the scheduler or not.  I don't see other candidates in there,
not that that means there aren't any of course.

With this patch in place, running an irman2 in one window, a make -j4
over nfs in another, and multimedia_sim as a test application in
another, I get these results...
 
[mikeg@Homer]:> ./multimedia_sim 0 60
nice_level = 0
duration = 60 seconds
[frames] received: 1784 dropped: 0
[latency] mean: 0.000627 max: 0.019647 stddev: 0.000786
score: 0.004240

.... test proggy attached for inspection.

To be extra sure, set both /proc/sys/kernel/sched_g1 and g2 to 0.  That
will (I mean should of course;) more or less restore the original O(1)
scheduler behavior.

	-Mike

Maybe not pretty, but effective counts too...

--- linux-2.6.16-rc2-mm1x/include/linux/sched.h.org	2006-02-09 13:15:50.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/include/linux/sched.h	2006-02-09 13:16:30.000000000 +0100
@@ -721,14 +721,14 @@
 
 	unsigned short ioprio;
 
-	unsigned long sleep_avg;
+	unsigned long sleep_avg, last_slice, throttle_stamp;
 	unsigned long long timestamp, last_ran;
 	unsigned long long sched_time; /* sched_clock time spent running */
 	enum sleep_type sleep_type;
 
 	unsigned long policy;
 	cpumask_t cpus_allowed;
-	unsigned int time_slice, first_time_slice;
+	unsigned int time_slice, slice_info;
 
 #ifdef CONFIG_SCHEDSTATS
 	struct sched_info sched_info;
--- linux-2.6.16-rc2-mm1x/include/linux/sysctl.h.org	2006-02-09 13:16:02.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/include/linux/sysctl.h	2006-02-09 13:16:30.000000000 +0100
@@ -147,6 +147,8 @@
 	KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
 	KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
+	KERN_SCHED_THROTTLE1=72,  /* int: throttling grace period 1 in secs */
+	KERN_SCHED_THROTTLE2=73,  /* int: throttling grace period 2 in secs */
 };
 
 
--- linux-2.6.16-rc2-mm1x/kernel/sched.c.org	2006-02-09 13:15:04.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/kernel/sched.c	2006-02-10 16:55:09.000000000 +0100
@@ -158,9 +158,195 @@
 #define TASK_INTERACTIVE(p) \
 	((p)->prio <= (p)->static_prio - DELTA(p))
 
-#define INTERACTIVE_SLEEP(p) \
-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+/*
+ * Interactivity boost can lead to serious starvation problems if the
+ * task being boosted turns out to be a cpu hog.  To combat this, we
+ * compute a running slice_avg, which is the sane upper limit for the
+ * task's sleep_avg.  If an 'interactive' task begins burning cpu, it's
+ * slice_avg will decay, making it visible as a problem so corrective
+ * measures can be applied.
+ *
+ * /proc/sys/kernel tunables.
+ *
+ * sched_g1: Grace period in seconds that a task is allowed to run unchecked.
+ * sched_g2: seconds thereafter, to force a priority adjustment.
+ */
+
+int sched_g1 = 20;
+int sched_g2 = 10;
+
+/*
+ * Offset from the time we noticed a potential problem until we disable the
+ * interactive bonus multiplier, and adjust sleep_avg consumption rate.
+ */
+#define G1 (sched_g1 * HZ)
+
+/*
+ * Offset thereafter that we disable the interactive bonus divisor, and adjust
+ * a runaway task's priority.
+ */
+#define G2 (sched_g2 * HZ + G1)
+
+/*
+ * Grace period has expired.
+ */
+#define grace_expired(p, grace) ((p)->throttle_stamp && \
+	time_after_eq(jiffies, (p)->throttle_stamp + (grace)))
+
+#define NEXT_PRIO (NS_MAX_SLEEP_AVG / MAX_BONUS)
+
+/*
+ * Warning: do not reduce threshold below NS_MAX_SLEEP_AVG / MAX_BONUS
+ * else you may break the case where one of a pair of communicating tasks
+ * only sleeps a miniscule amount of time, but must to be able to preempt
+ * it's partner in order to get any cpu time to speak of.  If you push that
+ * task to the same level or below it's partner, it will not be able to
+ * preempt and will starve.  This scenario was fixed for bonus calculation
+ * by converting sleep_avg to ns.
+ */
+#define THROTTLE_THRESHOLD (NEXT_PRIO)
+
+#define NS_MAX_SLEEP_AVG_PCNT (NS_MAX_SLEEP_AVG / 100)
+
+/*
+ * Masks for p->slice_info, formerly p->first_time_slice.
+ * SLICE_FTS:   0x80000000  Task is in it's first ever timeslice.
+ * SLICE_NEW:   0x40000000  Slice refreshed.
+ * SLICE_SPA:   0x3FFF8000  Spare bits.
+ * SLICE_LTS:   0x00007F80  Last time slice
+ * SLICE_AVG:   0x0000007F  Task slice_avg stored as percentage.
+ */
+#define SLICE_AVG_BITS    7
+#define SLICE_LTS_BITS   10
+#define SLICE_SPA_BITS   13
+#define SLICE_NEW_BITS    1
+#define SLICE_FTS_BITS    1
+
+#define SLICE_AVG_SHIFT   0
+#define SLICE_LTS_SHIFT   (SLICE_AVG_SHIFT + SLICE_AVG_BITS)
+#define SLICE_SPA_SHIFT   (SLICE_LTS_SHIFT + SLICE_LTS_BITS) 
+#define SLICE_NEW_SHIFT   (SLICE_SPA_SHIFT + SLICE_SPA_BITS)
+#define SLICE_FTS_SHIFT   (SLICE_NEW_SHIFT + SLICE_NEW_BITS)
+
+#define INFO_MASK(x)      ((1U << (x))-1)
+#define SLICE_AVG_MASK    (INFO_MASK(SLICE_AVG_BITS) << SLICE_AVG_SHIFT)
+#define SLICE_LTS_MASK    (INFO_MASK(SLICE_LTS_BITS) << SLICE_LTS_SHIFT)
+#define SLICE_SPA_MASK    (INFO_MASK(SLICE_SPA_BITS) << SLICE_SPA_SHIFT)
+#define SLICE_NEW_MASK    (INFO_MASK(SLICE_NEW_BITS) << SLICE_NEW_SHIFT)
+#define SLICE_FTS_MASK    (INFO_MASK(SLICE_FTS_BITS) << SLICE_FTS_SHIFT)
+
+#define first_time_slice(p) ((p)->slice_info & SLICE_FTS_MASK)
+#define set_first_time_slice(p) ((p)->slice_info |= SLICE_FTS_MASK)
+#define clr_first_time_slice(p) ((p)->slice_info &= ~SLICE_FTS_MASK)
+
+#define slice_is_new(p) ((p)->slice_info & SLICE_NEW_MASK)
+#define set_slice_is_new(p) ((p)->slice_info |= SLICE_NEW_MASK)
+#define clr_slice_is_new(p) ((p)->slice_info &= ~SLICE_NEW_MASK)
+
+#define last_slice(p) \
+	((((p)->slice_info & SLICE_LTS_MASK) >> SLICE_LTS_SHIFT) ? : \
+	DEF_TIMESLICE)
+#define set_last_slice(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_LTS_MASK) | (((n) << SLICE_LTS_SHIFT) & SLICE_LTS_MASK))) 
+
+#define slice_avg(p) \
+	((((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT) * \
+	NS_MAX_SLEEP_AVG_PCNT)
+#define set_slice_avg(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | ((((n) / NS_MAX_SLEEP_AVG_PCNT) \
+	<< SLICE_AVG_SHIFT) & SLICE_AVG_MASK))) 
+#define slice_avg_raw(p)  \
+	(((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT)
+#define set_slice_avg_raw(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | (((n) << SLICE_AVG_SHIFT) & SLICE_AVG_MASK))) 
+
+#define cpu_avg(p) \
+	(100 - slice_avg_raw(p))
+
+#define slice_time_avg(p) \
+	(100 * last_slice(p) / max((unsigned)cpu_avg(p), 1U))
+
+#define time_this_slice(p) \
+	(jiffies - (p)->last_slice)
+
+#define cpu_this_slice(p) \
+	(100 * last_slice(p) / max((unsigned)time_this_slice(p), \
+	(unsigned)last_slice(p)))
+
+#define this_slice_avg(p) \
+	((100 - cpu_this_slice(p)) * NS_MAX_SLEEP_AVG_PCNT)
+
+/*
+ * In order to prevent tasks from thrashing between domesticated livestock
+ * and irate rhino, once a throttle is hung on a task, the only way to get
+ * rid of it is to change behavior.  We push the throttle stamp forward in
+ * time as things improve until the stamp is in the future.  Only then may
+ * we safely pull our 'tranquilizer dart'. 
+ */
+#define conditional_tag(p) ((!(p)->throttle_stamp && 			\
+	(p)->sleep_avg > slice_avg(p) + THROTTLE_THRESHOLD) ?		\
+({									\
+	((p)->throttle_stamp = jiffies) ? : 1;				\
+}) : 0)
+
+/*
+ * Those who use the least cpu receive the most encouragement.
+ */
+#define SLICE_AVG_MULTIPLIER(p) \
+	(1 + NS_TO_JIFFIES(this_slice_avg(p)) * MAX_BONUS / MAX_SLEEP_AVG)
+
+#define conditional_release(p) (((p)->throttle_stamp &&			\
+	(p)->sched_time >= (G2 ? JIFFIES_TO_NS(HZ) : ~0ULL) &&		\
+	((20 + cpu_this_slice(p) < cpu_avg(p) && (p)->sleep_avg < 	\
+	slice_avg(p) + THROTTLE_THRESHOLD) || cpu_avg(p) <= 5)) ?	\
+({									\
+	int __ret = 0;							\
+	int delay = slice_time_avg(p) - last_slice(p);			\
+	if (delay > 0) {						\
+		delay *= SLICE_AVG_MULTIPLIER(p);			\
+		(p)->throttle_stamp += delay;				\
+	}								\
+	if (time_before(jiffies, (p)->throttle_stamp)) {		\
+		(p)->throttle_stamp = 0;				\
+		__ret++;						\
+		if (!((p)->state & TASK_NONINTERACTIVE))		\
+			(p)->sleep_type = SLEEP_NORMAL;			\
+	}								\
+	__ret;								\
+}) : 0)
+
+/*
+ * CURRENT_BONUS(p) adjusted to match slice_avg after grace expiration.
+ */
+#define ADJUSTED_BONUS(p, grace)				\
+({								\
+	unsigned long sleep_avg = (p)->sleep_avg;		\
+	if (grace_expired(p, (grace)))				\
+		sleep_avg = min((unsigned long)(p)->sleep_avg,	\
+		(unsigned long)slice_avg(p));			\
+	NS_TO_JIFFIES(sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG;	\
+})
+
+#define BONUS_MULTIPLIER(p) \
+	(grace_expired(p, G1) ? : SLICE_AVG_MULTIPLIER(p))
+
+#define BONUS_DIVISOR(p) \
+	(grace_expired(p, G2) ? : (1 + ADJUSTED_BONUS(p, G1)))
+
+#define INTERACTIVE_SLEEP_AVG(p) \
+	(min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / MAX_BONUS), \
+	NS_MAX_SLEEP_AVG))
+
+/*
+ * The quantity of sleep quaranteed to elevate a task to interactive status,
+ * or if already there, to elevate it to the next priority or beyond.
+ */
+#define INTERACTIVE_SLEEP_NS(p, ns) \
+	(BONUS_MULTIPLIER(p) * (ns) >= INTERACTIVE_SLEEP_AVG(p)	|| \
+	((p)->sleep_avg < INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+	(ns) + (p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p))      || \
+	((p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+	(ns) + ((p)->sleep_avg % NEXT_PRIO) >= NEXT_PRIO))
 
 #define TASK_PREEMPTS_CURR(p, rq) \
 	((p)->prio < (rq)->curr->prio)
@@ -668,7 +854,7 @@
 	if (rt_task(p))
 		return p->prio;
 
-	bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
+	bonus = ADJUSTED_BONUS(p, G2) - MAX_BONUS / 2;
 
 	prio = p->static_prio - bonus;
 	if (prio < MAX_RT_PRIO)
@@ -794,19 +980,39 @@
 
 	if (likely(sleep_time > 0)) {
 		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
+		 * Tasks that sleep a long time are categorised as idle.
+		 * They will only have their sleep_avg increased to a
 		 * level that makes them just interactive priority to stay
 		 * active yet prevent them suddenly becoming cpu hogs and
-		 * starving other processes.
+		 * starving other processes.  All tasks must stop at each
+		 * TASK_INTERACTIVE boundry before moving on so that no
+		 * single sleep slams it straight into NS_MAX_SLEEP_AVG.
 		 */
-		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
-				unsigned long ceiling;
+		if (INTERACTIVE_SLEEP_NS(p, sleep_time)) {
+			int ticks = last_slice(p) / BONUS_DIVISOR(p);
+			unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
+ 
+			ticks = JIFFIES_TO_NS(ticks);
+
+			if (grace_expired(p, G2) && slice_avg(p) < ceiling)
+				ceiling = slice_avg(p);
+			/* Promote previously interactive task. */
+			else if (p->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) &&
+					!grace_expired(p, G2)) {
+
+				ceiling = p->sleep_avg / NEXT_PRIO;
+				if (ceiling < MAX_BONUS)
+					ceiling++;
+				ceiling *= NEXT_PRIO;
+			}
 
-				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-					DEF_TIMESLICE);
-				if (p->sleep_avg < ceiling)
-					p->sleep_avg = ceiling;
+			ceiling += ticks;
+
+			if (ceiling > NS_MAX_SLEEP_AVG)
+				ceiling = NS_MAX_SLEEP_AVG;
+
+			if (p->sleep_avg < ceiling)
+				p->sleep_avg = ceiling;
 		} else {
 
 			/*
@@ -816,9 +1022,8 @@
 			 * If a task was sleeping with the noninteractive
 			 * label do not apply this non-linear boost
 			 */
-			if (p->sleep_type != SLEEP_NONINTERACTIVE || !p->mm)
-				sleep_time *=
-					(MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
+			if (p->sleep_type != SLEEP_NONINTERACTIVE)
+				sleep_time *= BONUS_MULTIPLIER(p);
 
 			/*
 			 * This code gives a bonus to interactive tasks.
@@ -1367,7 +1572,10 @@
 
 out_activate:
 #endif /* CONFIG_SMP */
-	if (old_state == TASK_UNINTERRUPTIBLE) {
+
+	conditional_release(p);
+
+	if (old_state & TASK_UNINTERRUPTIBLE) {
 		rq->nr_uninterruptible--;
 		/*
 		 * Tasks waking from uninterruptible sleep are likely
@@ -1468,9 +1676,27 @@
 	 * The remainder of the first timeslice might be recovered by
 	 * the parent if the child exits early enough.
 	 */
-	p->first_time_slice = 1;
+	set_first_time_slice(p);
 	current->time_slice >>= 1;
 	p->timestamp = sched_clock();
+
+	/*
+	 * Set up slice_info for the child.
+	 *
+	 * Note: The child inherits the parent's throttle,
+	 * and must shake it loose.  It does not inherit
+	 * the parent's slice_avg.
+	 */
+	set_slice_avg(p, NS_MAX_SLEEP_AVG);
+	set_last_slice(p, p->time_slice);
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+	/*
+	 * Limit the difficulty to what the parent faced.
+	 */
+	if (p->throttle_stamp && grace_expired(p, G2))
+		p->throttle_stamp = jiffies - G2;
+
 	if (unlikely(!current->time_slice)) {
 		/*
 		 * This case is rare, it happens when the parent has only
@@ -1584,7 +1810,7 @@
 	 * the sleep_avg of the parent as well.
 	 */
 	rq = task_rq_lock(p->parent, &flags);
-	if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
+	if (first_time_slice(p) && task_cpu(p) == task_cpu(p->parent)) {
 		p->parent->time_slice += p->time_slice;
 		if (unlikely(p->parent->time_slice > task_timeslice(p)))
 			p->parent->time_slice = task_timeslice(p);
@@ -2665,6 +2891,51 @@
 }
 
 /*
+ * Calculate a task's average cpu usage rate in terms of sleep_avg, and
+ * check whether the task may soon need throttling.  Must be called after
+ * refreshing the task's time slice.
+ * @p: task for which slice_avg should be computed.
+ */
+static void recalc_task_slice_avg(task_t *p)
+{
+	unsigned int slice_avg = slice_avg_raw(p);
+	unsigned int time_slice = last_slice(p);
+	int w = MAX_BONUS, idle;
+
+	if (unlikely(!time_slice))
+		set_last_slice(p, p->time_slice);
+
+	idle = 100 - cpu_this_slice(p);
+
+	/*
+	 * If the task is lowering it's cpu usage, speed up the
+	 * effect on slice_avg so we don't over-throttle.
+	 */
+	if (idle > slice_avg) {
+		w -= idle / w;
+		if (!w)
+			w = 1;
+	}
+
+	slice_avg = (w * (slice_avg ? : 1) + idle) / (w + 1);
+
+	/* Check to see if we should start/stop throttling. */
+	if(!rt_task(p) && !conditional_release(p))
+		conditional_tag(p);
+
+	/* Update slice_avg. */
+	set_slice_avg_raw(p, slice_avg);
+
+	/* Update cached slice length. */
+	if (time_slice != p->time_slice)
+		set_last_slice(p, p->time_slice);
+
+	/* And finally, stamp and tag the new slice. */
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+}
+
+/*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  *
@@ -2709,20 +2980,24 @@
 		 */
 		if ((p->policy == SCHED_RR) && !--p->time_slice) {
 			p->time_slice = task_timeslice(p);
-			p->first_time_slice = 0;
+			recalc_task_slice_avg(p);
+			clr_first_time_slice(p);
 			set_tsk_need_resched(p);
 
 			/* put it at the end of the queue: */
 			requeue_task(p, rq->active);
 		}
+		if (unlikely(p->throttle_stamp))
+			p->throttle_stamp = 0;
 		goto out_unlock;
 	}
 	if (!--p->time_slice) {
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
-		p->prio = effective_prio(p);
 		p->time_slice = task_timeslice(p);
-		p->first_time_slice = 0;
+		recalc_task_slice_avg(p);
+		p->prio = effective_prio(p);
+		clr_first_time_slice(p);
 
 		if (!rq->expired_timestamp)
 			rq->expired_timestamp = jiffies;
@@ -3033,7 +3308,7 @@
 	 * Tasks charged proportionately less run_time at high sleep_avg to
 	 * delay them losing their interactive status
 	 */
-	run_time /= (CURRENT_BONUS(prev) ? : 1);
+	run_time /= BONUS_DIVISOR(prev);
 
 	spin_lock_irq(&rq->lock);
 
@@ -3047,7 +3322,7 @@
 				unlikely(signal_pending(prev))))
 			prev->state = TASK_RUNNING;
 		else {
-			if (prev->state == TASK_UNINTERRUPTIBLE)
+			if (prev->state & TASK_UNINTERRUPTIBLE)
 				rq->nr_uninterruptible++;
 			deactivate_task(prev, rq);
 		}
@@ -3096,6 +3371,7 @@
 		rq->best_expired_prio = MAX_PRIO;
 	}
 
+repeat_selection:
 	idx = sched_find_first_bit(array->bitmap);
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
@@ -3115,8 +3391,14 @@
 			dequeue_task(next, array);
 			next->prio = new_prio;
 			enqueue_task(next, array);
-		} else
-			requeue_task(next, array);
+
+			/*
+			 * We may have just been demoted below other
+			 * runnable tasks in our previous queue.
+			 */
+			next->sleep_type = SLEEP_NORMAL;
+			goto repeat_selection;
+		}
 	}
 	next->sleep_type = SLEEP_NORMAL;
 switch_tasks:
@@ -3134,6 +3416,14 @@
 		prev->sleep_avg = 0;
 	prev->timestamp = prev->last_ran = now;
 
+	/*
+	 * Tag start of execution of a new timeslice.
+	 */
+	if (unlikely(slice_is_new(next))) {
+		next->last_slice = jiffies;
+		clr_slice_is_new(next);
+	}
+
 	sched_info_switch(prev, next);
 	if (likely(prev != next)) {
 		next->timestamp = now;
--- linux-2.6.16-rc2-mm1x/kernel/sysctl.c.org	2006-02-09 13:15:17.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/kernel/sysctl.c	2006-02-09 13:16:30.000000000 +0100
@@ -69,6 +69,8 @@
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
+extern int sched_g1;
+extern int sched_g2;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 int unknown_nmi_panic;
@@ -224,6 +226,11 @@
 	{ .ctl_name = 0 }
 };
 
+/* Constants for minimum and maximum testing in vm_table and
+ * kern_table.  We use these as one-element integer vectors. */
+static int zero;
+static int one_hundred = 100;
+
 static ctl_table kern_table[] = {
 	{
 		.ctl_name	= KERN_OSTYPE,
@@ -666,15 +673,29 @@
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE1,
+		.procname	= "sched_g1",
+		.data		= &sched_g1,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE2,
+		.procname	= "sched_g2",
+		.data		= &sched_g2,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
 	{ .ctl_name = 0 }
 };
 
-/* Constants for minimum and maximum testing in vm_table.
-   We use these as one-element integer vectors. */
-static int zero;
-static int one_hundred = 100;
-
-
 static ctl_table vm_table[] = {
 	{
 		.ctl_name	= VM_OVERCOMMIT_MEMORY,
--- linux-2.6.16-rc2-mm1x/fs/pipe.c.org	2006-02-09 13:15:35.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/fs/pipe.c	2006-02-09 13:16:30.000000000 +0100
@@ -39,11 +39,7 @@
 {
 	DEFINE_WAIT(wait);
 
-	/*
-	 * Pipes are system-local resources, so sleeping on them
-	 * is considered a noninteractive wait:
-	 */
-	prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE|TASK_NONINTERACTIVE);
+	prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE);
 	mutex_unlock(PIPE_MUTEX(*inode));
 	schedule();
 	finish_wait(PIPE_WAIT(*inode), &wait);


[-- Attachment #2: multimedia_sim.c --]
[-- Type: text/x-csrc, Size: 9678 bytes --]

/* multimedia_sim.c v0.3
 *
 * Dec 2002 - Miguel Freitas
 *
 * this is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 *
 * this program is meant to simulate a dummy multimedia application and
 * measure how it would perform in a loaded system. it basicaly tries to
 * identify frame skipping problems that would have affected the movie
 * playback.
 *
 * although the model of threads is heavily based on xine's architecture,
 * its results should be also comparable with any other player program
 * like mplayer or avifile. the idea is to measure when the player isn't 
 * scheduled in time for sending images at full frame rate and if X server
 * would also be scheduled in time for displaying.
 *
 * of course one might try some tricks to improve performance like decreasing
 * nice values for both XFree86 and player. however some distros don't
 * ship the X reniced and modifying desktop menu entries to add "nice" and
 * "sudo" commands is beyond most of users who just want to play their dvds...
 * 
 * compile with: gcc -o multimedia_sim multimedia_sim.c -lpthread -lm
 * run as: ./multimedia_sim [nice_level] [test_duration]
 *
 * note1: default CPU_BURNING value should simulate more or less a mpeg2-class
 * decoding cpu usage. that will require, at least, a 300MHz processor.
 * 
 * note2: a better simulation of xine's backend/frontend architecture would
 * also include another thread (frontend) to receive the xshm completion
 * events. i have intentionally not implemented it here.
 */
 
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <math.h>
#include <inttypes.h>

#define FRAME_PERIOD     1000000/30    /* NTSC period in us */
#define FRAME_SIZE       720*480*3/2   /* std resolution in yv12 format */
#define PREBUFFER_FRAMES 15            /* how many frames to "decode" ahead */
#define CPU_BURNING      8             /* reduce if your cpu isn't fast enough */

int nice_level = 0;

int decoder_running;
int video_out_running;
int server_running;

pthread_mutex_t counters_lock;
pthread_cond_t wakeup_server;
pthread_mutex_t queue_lock;
pthread_cond_t enqueued;
pthread_cond_t dequeued;
int frames_sent = 0;
int frames_received = 0;
int frames_enqueued = 0;
double start_time;

pthread_mutex_t counters_lock;
pthread_cond_t wakeup_server;


/* statistics */
double total_latency = 0.0;
double total_square = 0.0;
double max_latency = 0.0;
int frames_dropped = 0;
int dropped_in_burst = 0;
int bursts = 0;


static void *alloc_frame( void ) {
  void *frame;
  
  frame = (void *)malloc(FRAME_SIZE);
  memset(frame,0,FRAME_SIZE);
  return frame;  
}

static void dummy_enqueue_frame( void ) {
  pthread_mutex_lock( &queue_lock ); 
  while( frames_enqueued == PREBUFFER_FRAMES )
    pthread_cond_wait( &dequeued, &queue_lock );
  frames_enqueued++;  
  pthread_cond_signal( &enqueued );
  pthread_mutex_unlock( &queue_lock ); 
}

static void dummy_dequeue_frame( void ) {
  pthread_mutex_lock( &queue_lock ); 
  while( !frames_enqueued )
    pthread_cond_wait( &enqueued, &queue_lock );
  frames_enqueued--;  
  pthread_cond_signal( &dequeued );
  pthread_mutex_unlock( &queue_lock ); 
}

static double get_us_time( void ) {
  struct timeval tv;
  double us;
   
  gettimeofday(&tv, NULL);
  us = tv.tv_sec * 1e6;
  us += tv.tv_usec;
  
  return us;
}

/* from libmpeg2, used to burn cpu cycles */
#define W1 2841 /* 2048*sqrt (2)*cos (1*pi/16) */
#define W2 2676 /* 2048*sqrt (2)*cos (2*pi/16) */
#define W3 2408 /* 2048*sqrt (2)*cos (3*pi/16) */
#define W5 1609 /* 2048*sqrt (2)*cos (5*pi/16) */
#define W6 1108 /* 2048*sqrt (2)*cos (6*pi/16) */
#define W7 565  /* 2048*sqrt (2)*cos (7*pi/16) */
static void idct_row (int16_t * block)
{
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;

    x1 = block[4] << 11;
    x2 = block[6];
    x3 = block[2];
    x4 = block[1];
    x5 = block[7];
    x6 = block[5];
    x7 = block[3];

    x0 = (block[0] << 11) + 128; /* for proper rounding in the fourth stage */

    /* first stage */
    x8 = W7 * (x4 + x5);
    x4 = x8 + (W1 - W7) * x4;
    x5 = x8 - (W1 + W7) * x5;
    x8 = W3 * (x6 + x7);
    x6 = x8 - (W3 - W5) * x6;
    x7 = x8 - (W3 + W5) * x7;
 
    /* second stage */
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2);
    x2 = x1 - (W2 + W6) * x2;
    x3 = x1 + (W2 - W6) * x3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;
 
    /* third stage */
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8;
    x4 = (181 * (x4 - x5) + 128) >> 8;
 
    /* fourth stage */
    block[0] = (x7 + x1) >> 8;
    block[1] = (x3 + x2) >> 8;
    block[2] = (x0 + x4) >> 8;
    block[3] = (x8 + x6) >> 8;
    block[4] = (x8 - x6) >> 8;
    block[5] = (x0 - x4) >> 8;
    block[6] = (x3 - x2) >> 8;
    block[7] = (x7 - x1) >> 8;
}


static void *decoder_loop (void *this_gen) {

  int16_t *frame;
  int i,j;
  
  frame = alloc_frame();
  /* dummy data */
  for( i = 0; i < FRAME_SIZE/sizeof(int16_t); i++ )
    frame[i] = i;

  while( decoder_running )
  {
    /* eat some cpu cycles */
    for( j = 0; j < CPU_BURNING; j++ )
     for( i = 0; i < FRAME_SIZE/8/sizeof(int16_t); i+=8 )
       idct_row( &frame[i] );
    
    dummy_enqueue_frame();
  }
  
  free(frame);
  
  pthread_exit(NULL);  
}


static void *video_out_loop (void *this_gen) {

  double ttf; /* time to frame */
  void *frame1, *frame2;
  
  nice(nice_level);
  
  frame1 = alloc_frame();
  frame2 = alloc_frame();

  start_time = get_us_time() + (FRAME_PERIOD * PREBUFFER_FRAMES);
  
  while( video_out_running )
  {
    dummy_dequeue_frame();
    
    /* eat some cpu cycles */
    /*memcpy(frame1, frame2, FRAME_SIZE);*/
    
    ttf = start_time + (frames_sent * FRAME_PERIOD);
    ttf -= get_us_time();
                   
    if( ttf > 0 )
      usleep( ttf );
    
    pthread_mutex_lock( &counters_lock ); 
    frames_sent++;
    pthread_cond_signal( &wakeup_server );
    pthread_mutex_unlock( &counters_lock ); 
  }
  
  free(frame1);
  free(frame2);
  
  pthread_exit(NULL);  
}


static void *server_loop (void *this_gen) {

  double estimated;
  double latency;
  void *frame1, *frame2;
  
  nice(nice_level);

  frame1 = alloc_frame();
  frame2 = alloc_frame();
  
  while( server_running )
  {
    pthread_mutex_lock( &counters_lock ); 
    while( frames_sent <= frames_received )
      pthread_cond_wait( &wakeup_server, &counters_lock );
    pthread_mutex_unlock( &counters_lock ); 
    
    estimated = start_time + (frames_received * FRAME_PERIOD);
    latency = (get_us_time() - estimated)/1.0e6;
     
    if( latency > max_latency )
      max_latency = latency;
       
    frames_received++;
    if( latency > FRAME_PERIOD/1.0e6 ) {
      frames_dropped++;
      dropped_in_burst++;
    } else if (dropped_in_burst) {
      dropped_in_burst = 0;
      bursts++;
    }
    
    total_latency += latency;
    total_square += latency * latency;
    
    /* eat some cpu cycles */
    memcpy(frame1, frame2, FRAME_SIZE);
  }
    
  if (dropped_in_burst)
    bursts++;
  
  free(frame1);
  free(frame2);
  
  pthread_exit(NULL);  
}


int main(int argc, char *argv[])
{
  pthread_t decoder_thread;
  pthread_t video_thread;
  pthread_t server_thread;
  void *p;
  double mean, var, stddev;
  double burst_size;
  double score;
  int duration = 10;

  if( argc > 1 ) {
    nice_level = atoi(argv[1]);
    printf("nice_level = %d\n", nice_level );
    if( nice_level < 0 )
      printf("(make sure you are root for negative nice)\n");
    if( argc > 2 ) {
      duration = atoi(argv[2]);
      printf("duration = %d seconds\n", duration );
    }
  }
  
  pthread_mutex_init (&counters_lock, NULL);
  pthread_cond_init  (&wakeup_server, NULL);
  pthread_mutex_init (&queue_lock, NULL);
  pthread_cond_init  (&enqueued, NULL);
  pthread_cond_init  (&dequeued, NULL);

  server_running = 1;
  if ( pthread_create (&server_thread, NULL, server_loop, NULL) != 0) {
    printf("Error creating server thread.\n");
    return 1;
  }
    
  video_out_running = 1;
  if ( pthread_create (&video_thread, NULL, video_out_loop, NULL) != 0) {
    printf("Error creating video thread.\n");
    return 1;
  }

  decoder_running = 1;
  if ( pthread_create (&decoder_thread, NULL, decoder_loop, NULL) != 0) {
    printf("Error creating decoder thread.\n");
    return 1;
  }
  
  sleep(duration);
  
  server_running = 0;
  pthread_join(server_thread,&p);
  video_out_running = 0;
  pthread_join(video_thread, &p);
  
  printf("[frames] received: %d dropped: %d\n", frames_received, frames_dropped );
  if( bursts ) {
    burst_size = (double)frames_dropped / bursts;
    printf("[frames] mean dropped per burst: %lf\n", burst_size );
  } else
    burst_size = 1.0;
  
  mean = total_latency / frames_received;
  var = total_square / frames_received - mean*mean;
  stddev = sqrt(var);
  printf("[latency] mean: %lf max: %lf stddev: %lf\n", mean, max_latency, stddev);
  
  /* score: lower is better. it tries to measure "how bad" the playback
   * was to the user. it counts fraction of dropped frames, if the dropped
   * frames were somewhat evenly distributed (instead of in bursts), and
   * also the mean and standard deviation. 
   * note that the formula is actually arbitrary, i'm just trying to count
   * all these factors and weight them.
   */
  
  score = 0.0; 
  score += 0.90 * (double) frames_dropped / frames_received * sqrt(burst_size);
  score += 0.10 * (mean + stddev)/(FRAME_PERIOD/1.0e6);
  printf("score: %lf\n", score );
}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-12 13:47         ` MIke Galbraith
@ 2006-02-12 19:03           ` Lee Revell
  2006-02-12 21:36             ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-12 19:03 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> If you think it's the scheduler, how about try the patch below.  It's
> against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> logic in the scheduler or not.  I don't see other candidates in there,
> not that that means there aren't any of course. 

I'll try, but it's a serious pain for me to build an -mm kernel.  A
patch against 2.6.16-rc1 would be much easier.

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-12 19:03           ` Lee Revell
@ 2006-02-12 21:36             ` MIke Galbraith
  2006-02-12 23:23               ` Lee Revell
  2006-02-12 23:39               ` Lee Revell
  0 siblings, 2 replies; 41+ messages in thread
From: MIke Galbraith @ 2006-02-12 21:36 UTC (permalink / raw)
  To: Lee Revell; +Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > If you think it's the scheduler, how about try the patch below.  It's
> > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > logic in the scheduler or not.  I don't see other candidates in there,
> > not that that means there aren't any of course. 
> 
> I'll try, but it's a serious pain for me to build an -mm kernel.  A
> patch against 2.6.16-rc1 would be much easier.

Ok, here she comes.  It's a bit too reluctant to release a task so it
can reach interactive status at the moment, but for this test, that's a
feature. In fact, for this test, it's probably best to jump straight to
setting both g1 and g2 to zero.

	-Mike

--- linux-2.6.16-rc1/include/linux/sched.h.org	2006-02-12 21:28:28.000000000 +0100
+++ linux-2.6.16-rc1/include/linux/sched.h	2006-02-12 21:54:40.000000000 +0100
@@ -688,6 +688,13 @@
 struct audit_context;		/* See audit.c */
 struct mempolicy;
 
+enum sleep_type {
+	SLEEP_NORMAL,
+	SLEEP_NONINTERACTIVE,
+	SLEEP_INTERACTIVE,
+	SLEEP_INTERRUPTED,
+};
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	struct thread_info *thread_info;
@@ -709,14 +716,14 @@
 
 	unsigned short ioprio;
 
-	unsigned long sleep_avg;
+	unsigned long sleep_avg, last_slice, throttle_stamp;
 	unsigned long long timestamp, last_ran;
 	unsigned long long sched_time; /* sched_clock time spent running */
-	int activated;
+	enum sleep_type sleep_type;
 
 	unsigned long policy;
 	cpumask_t cpus_allowed;
-	unsigned int time_slice, first_time_slice;
+	unsigned int time_slice, slice_info;
 
 #ifdef CONFIG_SCHEDSTATS
 	struct sched_info sched_info;
--- linux-2.6.16-rc1/include/linux/sysctl.h.org	2006-02-12 21:28:44.000000000 +0100
+++ linux-2.6.16-rc1/include/linux/sysctl.h	2006-02-12 21:34:46.000000000 +0100
@@ -146,6 +146,8 @@
 	KERN_RANDOMIZE=68, /* int: randomize virtual address space */
 	KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
+	KERN_SCHED_THROTTLE1=71,  /* int: throttling grace period 1 in secs */
+	KERN_SCHED_THROTTLE2=72,  /* int: throttling grace period 2 in secs */
 };
 
 
--- linux-2.6.16-rc1/kernel/sched.c.org	2006-02-12 21:29:13.000000000 +0100
+++ linux-2.6.16-rc1/kernel/sched.c	2006-02-12 21:58:14.000000000 +0100
@@ -149,9 +149,195 @@
 #define TASK_INTERACTIVE(p) \
 	((p)->prio <= (p)->static_prio - DELTA(p))
 
-#define INTERACTIVE_SLEEP(p) \
-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+/*
+ * Interactivity boost can lead to serious starvation problems if the
+ * task being boosted turns out to be a cpu hog.  To combat this, we
+ * compute a running slice_avg, which is the sane upper limit for the
+ * task's sleep_avg.  If an 'interactive' task begins burning cpu, it's
+ * slice_avg will decay, making it visible as a problem so corrective
+ * measures can be applied.
+ *
+ * /proc/sys/kernel tunables.
+ *
+ * sched_g1: Grace period in seconds that a task is allowed to run unchecked.
+ * sched_g2: seconds thereafter, to force a priority adjustment.
+ */
+
+int sched_g1 = 20;
+int sched_g2 = 10;
+
+/*
+ * Offset from the time we noticed a potential problem until we disable the
+ * interactive bonus multiplier, and adjust sleep_avg consumption rate.
+ */
+#define G1 (sched_g1 * HZ)
+
+/*
+ * Offset thereafter that we disable the interactive bonus divisor, and adjust
+ * a runaway task's priority.
+ */
+#define G2 (sched_g2 * HZ + G1)
+
+/*
+ * Grace period has expired.
+ */
+#define grace_expired(p, grace) ((p)->throttle_stamp && \
+	time_after_eq(jiffies, (p)->throttle_stamp + (grace)))
+
+#define NEXT_PRIO (NS_MAX_SLEEP_AVG / MAX_BONUS)
+
+/*
+ * Warning: do not reduce threshold below NS_MAX_SLEEP_AVG / MAX_BONUS
+ * else you may break the case where one of a pair of communicating tasks
+ * only sleeps a miniscule amount of time, but must to be able to preempt
+ * it's partner in order to get any cpu time to speak of.  If you push that
+ * task to the same level or below it's partner, it will not be able to
+ * preempt and will starve.  This scenario was fixed for bonus calculation
+ * by converting sleep_avg to ns.
+ */
+#define THROTTLE_THRESHOLD (NEXT_PRIO)
+
+#define NS_MAX_SLEEP_AVG_PCNT (NS_MAX_SLEEP_AVG / 100)
+
+/*
+ * Masks for p->slice_info, formerly p->first_time_slice.
+ * SLICE_FTS:   0x80000000  Task is in it's first ever timeslice.
+ * SLICE_NEW:   0x40000000  Slice refreshed.
+ * SLICE_SPA:   0x3FFF8000  Spare bits.
+ * SLICE_LTS:   0x00007F80  Last time slice
+ * SLICE_AVG:   0x0000007F  Task slice_avg stored as percentage.
+ */
+#define SLICE_AVG_BITS    7
+#define SLICE_LTS_BITS   10
+#define SLICE_SPA_BITS   13
+#define SLICE_NEW_BITS    1
+#define SLICE_FTS_BITS    1
+
+#define SLICE_AVG_SHIFT   0
+#define SLICE_LTS_SHIFT   (SLICE_AVG_SHIFT + SLICE_AVG_BITS)
+#define SLICE_SPA_SHIFT   (SLICE_LTS_SHIFT + SLICE_LTS_BITS) 
+#define SLICE_NEW_SHIFT   (SLICE_SPA_SHIFT + SLICE_SPA_BITS)
+#define SLICE_FTS_SHIFT   (SLICE_NEW_SHIFT + SLICE_NEW_BITS)
+
+#define INFO_MASK(x)      ((1U << (x))-1)
+#define SLICE_AVG_MASK    (INFO_MASK(SLICE_AVG_BITS) << SLICE_AVG_SHIFT)
+#define SLICE_LTS_MASK    (INFO_MASK(SLICE_LTS_BITS) << SLICE_LTS_SHIFT)
+#define SLICE_SPA_MASK    (INFO_MASK(SLICE_SPA_BITS) << SLICE_SPA_SHIFT)
+#define SLICE_NEW_MASK    (INFO_MASK(SLICE_NEW_BITS) << SLICE_NEW_SHIFT)
+#define SLICE_FTS_MASK    (INFO_MASK(SLICE_FTS_BITS) << SLICE_FTS_SHIFT)
+
+#define first_time_slice(p) ((p)->slice_info & SLICE_FTS_MASK)
+#define set_first_time_slice(p) ((p)->slice_info |= SLICE_FTS_MASK)
+#define clr_first_time_slice(p) ((p)->slice_info &= ~SLICE_FTS_MASK)
+
+#define slice_is_new(p) ((p)->slice_info & SLICE_NEW_MASK)
+#define set_slice_is_new(p) ((p)->slice_info |= SLICE_NEW_MASK)
+#define clr_slice_is_new(p) ((p)->slice_info &= ~SLICE_NEW_MASK)
+
+#define last_slice(p) \
+	((((p)->slice_info & SLICE_LTS_MASK) >> SLICE_LTS_SHIFT) ? : \
+	DEF_TIMESLICE)
+#define set_last_slice(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_LTS_MASK) | (((n) << SLICE_LTS_SHIFT) & SLICE_LTS_MASK))) 
+
+#define slice_avg(p) \
+	((((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT) * \
+	NS_MAX_SLEEP_AVG_PCNT)
+#define set_slice_avg(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | ((((n) / NS_MAX_SLEEP_AVG_PCNT) \
+	<< SLICE_AVG_SHIFT) & SLICE_AVG_MASK))) 
+#define slice_avg_raw(p)  \
+	(((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT)
+#define set_slice_avg_raw(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | (((n) << SLICE_AVG_SHIFT) & SLICE_AVG_MASK))) 
+
+#define cpu_avg(p) \
+	(100 - slice_avg_raw(p))
+
+#define slice_time_avg(p) \
+	(100 * last_slice(p) / max((unsigned)cpu_avg(p), 1U))
+
+#define time_this_slice(p) \
+	(jiffies - (p)->last_slice)
+
+#define cpu_this_slice(p) \
+	(100 * last_slice(p) / max((unsigned)time_this_slice(p), \
+	(unsigned)last_slice(p)))
+
+#define this_slice_avg(p) \
+	((100 - cpu_this_slice(p)) * NS_MAX_SLEEP_AVG_PCNT)
+
+/*
+ * In order to prevent tasks from thrashing between domesticated livestock
+ * and irate rhino, once a throttle is hung on a task, the only way to get
+ * rid of it is to change behavior.  We push the throttle stamp forward in
+ * time as things improve until the stamp is in the future.  Only then may
+ * we safely pull our 'tranquilizer dart'. 
+ */
+#define conditional_tag(p) ((!(p)->throttle_stamp && 			\
+	(p)->sleep_avg > slice_avg(p) + THROTTLE_THRESHOLD) ?		\
+({									\
+	((p)->throttle_stamp = jiffies) ? : 1;				\
+}) : 0)
+
+/*
+ * Those who use the least cpu receive the most encouragement.
+ */
+#define SLICE_AVG_MULTIPLIER(p) \
+	(1 + NS_TO_JIFFIES(this_slice_avg(p)) * MAX_BONUS / MAX_SLEEP_AVG)
+
+#define conditional_release(p) (((p)->throttle_stamp &&			\
+	(p)->sched_time >= (G2 ? JIFFIES_TO_NS(HZ) : ~0ULL) &&		\
+	((20 + cpu_this_slice(p) < cpu_avg(p) && (p)->sleep_avg < 	\
+	slice_avg(p) + THROTTLE_THRESHOLD) || cpu_avg(p) <= 5)) ?	\
+({									\
+	int __ret = 0;							\
+	int delay = slice_time_avg(p) - last_slice(p);			\
+	if (delay > 0) {						\
+		delay *= SLICE_AVG_MULTIPLIER(p);			\
+		(p)->throttle_stamp += delay;				\
+	}								\
+	if (time_before(jiffies, (p)->throttle_stamp)) {		\
+		(p)->throttle_stamp = 0;				\
+		__ret++;						\
+		if (!((p)->state & TASK_NONINTERACTIVE))		\
+			(p)->sleep_type = SLEEP_NORMAL;			\
+	}								\
+	__ret;								\
+}) : 0)
+
+/*
+ * CURRENT_BONUS(p) adjusted to match slice_avg after grace expiration.
+ */
+#define ADJUSTED_BONUS(p, grace)				\
+({								\
+	unsigned long sleep_avg = (p)->sleep_avg;		\
+	if (grace_expired(p, (grace)))				\
+		sleep_avg = min((unsigned long)(p)->sleep_avg,	\
+		(unsigned long)slice_avg(p));			\
+	NS_TO_JIFFIES(sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG;	\
+})
+
+#define BONUS_MULTIPLIER(p) \
+	(grace_expired(p, G1) ? : SLICE_AVG_MULTIPLIER(p))
+
+#define BONUS_DIVISOR(p) \
+	(grace_expired(p, G2) ? : (1 + ADJUSTED_BONUS(p, G1)))
+
+#define INTERACTIVE_SLEEP_AVG(p) \
+	(min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / MAX_BONUS), \
+	NS_MAX_SLEEP_AVG))
+
+/*
+ * The quantity of sleep quaranteed to elevate a task to interactive status,
+ * or if already there, to elevate it to the next priority or beyond.
+ */
+#define INTERACTIVE_SLEEP_NS(p, ns) \
+	(BONUS_MULTIPLIER(p) * (ns) >= INTERACTIVE_SLEEP_AVG(p)	|| \
+	((p)->sleep_avg < INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+	(ns) + (p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p))      || \
+	((p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+	(ns) + ((p)->sleep_avg % NEXT_PRIO) >= NEXT_PRIO))
 
 #define TASK_PREEMPTS_CURR(p, rq) \
 	((p)->prio < (rq)->curr->prio)
@@ -659,7 +845,7 @@
 	if (rt_task(p))
 		return p->prio;
 
-	bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
+	bonus = ADJUSTED_BONUS(p, G2) - MAX_BONUS / 2;
 
 	prio = p->static_prio - bonus;
 	if (prio < MAX_RT_PRIO)
@@ -759,36 +945,50 @@
 
 	if (likely(sleep_time > 0)) {
 		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle and will get just interactive status to stay active &
-		 * prevent them suddenly becoming cpu hogs and starving
-		 * other processes.
+		 * Tasks that sleep a long time are categorised as idle.
+		 * They will only have their sleep_avg increased to a
+		 * level that makes them just interactive priority to stay
+		 * active yet prevent them suddenly becoming cpu hogs and
+		 * starving other processes.  All tasks must stop at each
+		 * TASK_INTERACTIVE boundry before moving on so that no
+		 * single sleep slams it straight into NS_MAX_SLEEP_AVG.
 		 */
-		if (p->mm && p->activated != -1 &&
-			sleep_time > INTERACTIVE_SLEEP(p)) {
-				p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-						DEF_TIMESLICE);
+		if (INTERACTIVE_SLEEP_NS(p, sleep_time)) {
+			int ticks = last_slice(p) / BONUS_DIVISOR(p);
+			unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
+
+			ticks = JIFFIES_TO_NS(ticks);
+
+			if (grace_expired(p, G2) && slice_avg(p) < ceiling)
+				ceiling = slice_avg(p);
+			/* Promote previously interactive task. */
+			else if (p->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) &&
+					!grace_expired(p, G2)) {
+
+				ceiling = p->sleep_avg / NEXT_PRIO;
+				if (ceiling < MAX_BONUS)
+					ceiling++;
+				ceiling *= NEXT_PRIO;
+			}
+
+			ceiling += ticks;
+
+			if (ceiling > NS_MAX_SLEEP_AVG)
+				ceiling = NS_MAX_SLEEP_AVG;
+
+			if (p->sleep_avg < ceiling)
+				p->sleep_avg = ceiling;
 		} else {
-			/*
-			 * The lower the sleep avg a task has the more
-			 * rapidly it will rise with sleep time.
-			 */
-			sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
 
 			/*
-			 * Tasks waking from uninterruptible sleep are
-			 * limited in their sleep_avg rise as they
-			 * are likely to be waiting on I/O
+			 * The lower the sleep avg a task has the more
+			 * rapidly it will rise with sleep time. This enables
+			 * tasks to rapidly recover to a low latency priority.
+			 * If a task was sleeping with the noninteractive
+			 * label do not apply this non-linear boost
 			 */
-			if (p->activated == -1 && p->mm) {
-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
-					sleep_time = 0;
-				else if (p->sleep_avg + sleep_time >=
-						INTERACTIVE_SLEEP(p)) {
-					p->sleep_avg = INTERACTIVE_SLEEP(p);
-					sleep_time = 0;
-				}
-			}
+			if (p->sleep_type != SLEEP_NONINTERACTIVE)
+				sleep_time *= BONUS_MULTIPLIER(p);
 
 			/*
 			 * This code gives a bonus to interactive tasks.
@@ -835,7 +1035,7 @@
 	 * This checks to make sure it's not an uninterruptible task
 	 * that is now waking up.
 	 */
-	if (!p->activated) {
+	if (p->sleep_type != SLEEP_NONINTERACTIVE) {
 		/*
 		 * Tasks which were woken up by interrupts (ie. hw events)
 		 * are most likely of interactive nature. So we give them
@@ -844,13 +1044,13 @@
 		 * on a CPU, first time around:
 		 */
 		if (in_interrupt())
-			p->activated = 2;
+			p->sleep_type = SLEEP_INTERRUPTED;
 		else {
 			/*
 			 * Normal first-time wakeups get a credit too for
 			 * on-runqueue time, but it will be weighted down:
 			 */
-			p->activated = 1;
+			p->sleep_type = SLEEP_INTERACTIVE;
 		}
 	}
 	p->timestamp = now;
@@ -1371,25 +1571,28 @@
 
 out_activate:
 #endif /* CONFIG_SMP */
-	if (old_state == TASK_UNINTERRUPTIBLE) {
-		rq->nr_uninterruptible--;
+
+	conditional_release(p);
+
+	if (old_state & TASK_UNINTERRUPTIBLE) {
 		/*
-		 * Tasks on involuntary sleep don't earn
-		 * sleep_avg beyond just interactive state.
+		 * Tasks waking from uninterruptible sleep are likely
+		 * to be sleeping involuntarily on I/O and are otherwise
+		 * cpu bound so label them as noninteractive.
 		 */
-		p->activated = -1;
-	}
+		p->sleep_type = SLEEP_NONINTERACTIVE;
+	} else
 
 	/*
 	 * Tasks that have marked their sleep as noninteractive get
-	 * woken up without updating their sleep average. (i.e. their
-	 * sleep is handled in a priority-neutral manner, no priority
-	 * boost and no penalty.)
+	 * woken up with their sleep average not weighted in an
+	 * interactive way.
 	 */
-	if (old_state & TASK_NONINTERACTIVE)
-		__activate_task(p, rq);
-	else
-		activate_task(p, rq, cpu == this_cpu);
+		if (old_state & TASK_NONINTERACTIVE)
+			p->sleep_type = SLEEP_NONINTERACTIVE;
+
+
+	activate_task(p, rq, cpu == this_cpu);
 	/*
 	 * Sync wakeups (i.e. those types of wakeups where the waker
 	 * has indicated that it will leave the CPU in short order)
@@ -1471,9 +1674,27 @@
 	 * The remainder of the first timeslice might be recovered by
 	 * the parent if the child exits early enough.
 	 */
-	p->first_time_slice = 1;
+	set_first_time_slice(p);
 	current->time_slice >>= 1;
 	p->timestamp = sched_clock();
+
+	/*
+	 * Set up slice_info for the child.
+	 *
+	 * Note: The child inherits the parent's throttle,
+	 * and must shake it loose.  It does not inherit
+	 * the parent's slice_avg.
+	 */
+	set_slice_avg(p, NS_MAX_SLEEP_AVG);
+	set_last_slice(p, p->time_slice);
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+	/*
+	 * Limit the difficulty to what the parent faced.
+	 */
+	if (p->throttle_stamp && grace_expired(p, G2))
+		p->throttle_stamp = jiffies - G2;
+
 	if (unlikely(!current->time_slice)) {
 		/*
 		 * This case is rare, it happens when the parent has only
@@ -1587,7 +1808,7 @@
 	 * the sleep_avg of the parent as well.
 	 */
 	rq = task_rq_lock(p->parent, &flags);
-	if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
+	if (first_time_slice(p) && task_cpu(p) == task_cpu(p->parent)) {
 		p->parent->time_slice += p->time_slice;
 		if (unlikely(p->parent->time_slice > task_timeslice(p)))
 			p->parent->time_slice = task_timeslice(p);
@@ -2655,6 +2876,51 @@
 }
 
 /*
+ * Calculate a task's average cpu usage rate in terms of sleep_avg, and
+ * check whether the task may soon need throttling.  Must be called after
+ * refreshing the task's time slice.
+ * @p: task for which slice_avg should be computed.
+ */
+static void recalc_task_slice_avg(task_t *p)
+{
+	unsigned int slice_avg = slice_avg_raw(p);
+	unsigned int time_slice = last_slice(p);
+	int w = MAX_BONUS, idle;
+
+	if (unlikely(!time_slice))
+		set_last_slice(p, p->time_slice);
+
+	idle = 100 - cpu_this_slice(p);
+
+	/*
+	 * If the task is lowering it's cpu usage, speed up the
+	 * effect on slice_avg so we don't over-throttle.
+	 */
+	if (idle > slice_avg) {
+		w -= idle / w;
+		if (!w)
+			w = 1;
+	}
+
+	slice_avg = (w * (slice_avg ? : 1) + idle) / (w + 1);
+
+	/* Check to see if we should start/stop throttling. */
+	if(!rt_task(p) && !conditional_release(p))
+		conditional_tag(p);
+
+	/* Update slice_avg. */
+	set_slice_avg_raw(p, slice_avg);
+
+	/* Update cached slice length. */
+	if (time_slice != p->time_slice)
+		set_last_slice(p, p->time_slice);
+
+	/* And finally, stamp and tag the new slice. */
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+}
+
+/*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  *
@@ -2699,20 +2965,24 @@
 		 */
 		if ((p->policy == SCHED_RR) && !--p->time_slice) {
 			p->time_slice = task_timeslice(p);
-			p->first_time_slice = 0;
+			recalc_task_slice_avg(p);
+			clr_first_time_slice(p);
 			set_tsk_need_resched(p);
 
 			/* put it at the end of the queue: */
 			requeue_task(p, rq->active);
 		}
+		if (unlikely(p->throttle_stamp))
+			p->throttle_stamp = 0;
 		goto out_unlock;
 	}
 	if (!--p->time_slice) {
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
-		p->prio = effective_prio(p);
 		p->time_slice = task_timeslice(p);
-		p->first_time_slice = 0;
+		recalc_task_slice_avg(p);
+		p->prio = effective_prio(p);
+		clr_first_time_slice(p);
 
 		if (!rq->expired_timestamp)
 			rq->expired_timestamp = jiffies;
@@ -2959,6 +3229,12 @@
 
 #endif
 
+static inline int interactive_sleep(enum sleep_type sleep_type)
+{
+	return (sleep_type == SLEEP_INTERACTIVE ||
+		sleep_type == SLEEP_INTERRUPTED);
+}
+
 /*
  * schedule() is the main scheduler function.
  */
@@ -3017,7 +3293,7 @@
 	 * Tasks charged proportionately less run_time at high sleep_avg to
 	 * delay them losing their interactive status
 	 */
-	run_time /= (CURRENT_BONUS(prev) ? : 1);
+	run_time /= BONUS_DIVISOR(prev);
 
 	spin_lock_irq(&rq->lock);
 
@@ -3031,7 +3307,7 @@
 				unlikely(signal_pending(prev))))
 			prev->state = TASK_RUNNING;
 		else {
-			if (prev->state == TASK_UNINTERRUPTIBLE)
+			if (prev->state & TASK_UNINTERRUPTIBLE)
 				rq->nr_uninterruptible++;
 			deactivate_task(prev, rq);
 		}
@@ -3080,16 +3356,17 @@
 		rq->best_expired_prio = MAX_PRIO;
 	}
 
+repeat_selection:
 	idx = sched_find_first_bit(array->bitmap);
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (!rt_task(next) && next->activated > 0) {
+	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
 		unsigned long long delta = now - next->timestamp;
 		if (unlikely((long long)(now - next->timestamp) < 0))
 			delta = 0;
 
-		if (next->activated == 1)
+		if (next->sleep_type == SLEEP_INTERACTIVE)
 			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
 
 		array = next->array;
@@ -3099,10 +3376,16 @@
 			dequeue_task(next, array);
 			next->prio = new_prio;
 			enqueue_task(next, array);
-		} else
-			requeue_task(next, array);
+
+			/*
+			 * We may have just been demoted below other
+			 * runnable tasks in our previous queue.
+			 */
+			next->sleep_type = SLEEP_NORMAL;
+			goto repeat_selection;
+		}
 	}
-	next->activated = 0;
+	next->sleep_type = SLEEP_NORMAL;
 switch_tasks:
 	if (next == rq->idle)
 		schedstat_inc(rq, sched_goidle);
@@ -3118,6 +3401,14 @@
 		prev->sleep_avg = 0;
 	prev->timestamp = prev->last_ran = now;
 
+	/*
+	 * Tag start of execution of a new timeslice.
+	 */
+	if (unlikely(slice_is_new(next))) {
+		next->last_slice = jiffies;
+		clr_slice_is_new(next);
+	}
+
 	sched_info_switch(prev, next);
 	if (likely(prev != next)) {
 		next->timestamp = now;
--- linux-2.6.16-rc1/kernel/sysctl.c.org	2006-02-12 21:29:24.000000000 +0100
+++ linux-2.6.16-rc1/kernel/sysctl.c	2006-02-12 21:29:53.000000000 +0100
@@ -71,6 +71,8 @@
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
+extern int sched_g1;
+extern int sched_g2;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 int unknown_nmi_panic;
@@ -226,6 +228,11 @@
 	{ .ctl_name = 0 }
 };
 
+/* Constants for minimum and maximum testing in vm_table and
+ * kern_table.  We use these as one-element integer vectors. */
+static int zero;
+static int one_hundred = 100;
+
 static ctl_table kern_table[] = {
 	{
 		.ctl_name	= KERN_OSTYPE,
@@ -658,15 +665,29 @@
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE1,
+		.procname	= "sched_g1",
+		.data		= &sched_g1,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE2,
+		.procname	= "sched_g2",
+		.data		= &sched_g2,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
 	{ .ctl_name = 0 }
 };
 
-/* Constants for minimum and maximum testing in vm_table.
-   We use these as one-element integer vectors. */
-static int zero;
-static int one_hundred = 100;
-
-
 static ctl_table vm_table[] = {
 	{
 		.ctl_name	= VM_OVERCOMMIT_MEMORY,
--- linux-2.6.16-rc1/fs/pipe.c.org	2006-02-12 21:29:35.000000000 +0100
+++ linux-2.6.16-rc1/fs/pipe.c	2006-02-12 21:29:53.000000000 +0100
@@ -39,11 +39,7 @@
 {
 	DEFINE_WAIT(wait);
 
-	/*
-	 * Pipes are system-local resources, so sleeping on them
-	 * is considered a noninteractive wait:
-	 */
-	prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE|TASK_NONINTERACTIVE);
+	prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE);
 	mutex_unlock(PIPE_MUTEX(*inode));
 	schedule();
 	finish_wait(PIPE_WAIT(*inode), &wait);



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-12 21:36             ` MIke Galbraith
@ 2006-02-12 23:23               ` Lee Revell
  2006-02-12 23:39               ` Lee Revell
  1 sibling, 0 replies; 41+ messages in thread
From: Lee Revell @ 2006-02-12 23:23 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > If you think it's the scheduler, how about try the patch below.  It's
> > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > logic in the scheduler or not.  I don't see other candidates in there,
> > > not that that means there aren't any of course. 
> > 
> > I'll try, but it's a serious pain for me to build an -mm kernel.  A
> > patch against 2.6.16-rc1 would be much easier.
> 
> Ok, here she comes.  It's a bit too reluctant to release a task so it
> can reach interactive status at the moment, but for this test, that's a
> feature. In fact, for this test, it's probably best to jump straight to
> setting both g1 and g2 to zero.

Thanks, this solves the "ls" problem I was having!  The strange
"oscillating" behavior is gone, now it consistently takes 0.19-0.45s.

It's still not as consistent as "time ls | cat", which takes 0.19-0.26s,
but MUCH better.

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-12 21:36             ` MIke Galbraith
  2006-02-12 23:23               ` Lee Revell
@ 2006-02-12 23:39               ` Lee Revell
  2006-02-13  3:09                 ` MIke Galbraith
  1 sibling, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-12 23:39 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > If you think it's the scheduler, how about try the patch below.  It's
> > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > logic in the scheduler or not.  I don't see other candidates in there,
> > > not that that means there aren't any of course. 
> > 
> > I'll try, but it's a serious pain for me to build an -mm kernel.  A
> > patch against 2.6.16-rc1 would be much easier.
> 
> Ok, here she comes.  It's a bit too reluctant to release a task so it
> can reach interactive status at the moment, but for this test, that's a
> feature. In fact, for this test, it's probably best to jump straight to
> setting both g1 and g2 to zero.

Not only does this fix my "time ls" test case, it seems to drastically
improve interactivity for my desktop apps.  I was really being plagued
by weird stalls, it's much smoother now.

Now to regression test it...

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-12 23:39               ` Lee Revell
@ 2006-02-13  3:09                 ` MIke Galbraith
  2006-02-13  3:39                   ` Lee Revell
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  3:09 UTC (permalink / raw)
  To: Lee Revell; +Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Sun, 2006-02-12 at 18:39 -0500, Lee Revell wrote:
> On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> > On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > > If you think it's the scheduler, how about try the patch below.  It's
> > > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > > logic in the scheduler or not.  I don't see other candidates in there,
> > > > not that that means there aren't any of course. 
> > > 
> > > I'll try, but it's a serious pain for me to build an -mm kernel.  A
> > > patch against 2.6.16-rc1 would be much easier.
> > 
> > Ok, here she comes.  It's a bit too reluctant to release a task so it
> > can reach interactive status at the moment, but for this test, that's a
> > feature. In fact, for this test, it's probably best to jump straight to
> > setting both g1 and g2 to zero.
> 
> Not only does this fix my "time ls" test case, it seems to drastically
> improve interactivity for my desktop apps.  I was really being plagued
> by weird stalls, it's much smoother now.

Yeah, but under load, that reluctance to release is fairly annoying...

> 
> Now to regression test it...

...and may cause test applications to not reach their proper priority
before measurement begins.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  3:09                 ` MIke Galbraith
@ 2006-02-13  3:39                   ` Lee Revell
  2006-02-13  4:59                     ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-13  3:39 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 04:09 +0100, MIke Galbraith wrote:
> On Sun, 2006-02-12 at 18:39 -0500, Lee Revell wrote:
> > On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> > > On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > > > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > > > If you think it's the scheduler, how about try the patch below.  It's
> > > > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > > > logic in the scheduler or not.  I don't see other candidates in there,
> > > > > not that that means there aren't any of course. 
> > > > 
> > > > I'll try, but it's a serious pain for me to build an -mm kernel.  A
> > > > patch against 2.6.16-rc1 would be much easier.
> > > 
> > > Ok, here she comes.  It's a bit too reluctant to release a task so it
> > > can reach interactive status at the moment, but for this test, that's a
> > > feature. In fact, for this test, it's probably best to jump straight to
> > > setting both g1 and g2 to zero.
> > 
> > Not only does this fix my "time ls" test case, it seems to drastically
> > improve interactivity for my desktop apps.  I was really being plagued
> > by weird stalls, it's much smoother now.
> 
> Yeah, but under load, that reluctance to release is fairly annoying...

This seems to manifest on my system as the mouse getting jerky under
load.  Still, I don't mind - the overall feel is still smoother - as if
the X server was getting too much CPU before.

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  3:39                   ` Lee Revell
@ 2006-02-13  4:59                     ` MIke Galbraith
  2006-02-13  5:05                       ` Con Kolivas
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  4:59 UTC (permalink / raw)
  To: Lee Revell; +Cc: Jan Engelhardt, Con Kolivas, gcoady, linux-kernel, Ingo Molnar

On Sun, 2006-02-12 at 22:39 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 04:09 +0100, MIke Galbraith wrote:
> > On Sun, 2006-02-12 at 18:39 -0500, Lee Revell wrote:

> > > Not only does this fix my "time ls" test case, it seems to drastically
> > > improve interactivity for my desktop apps.  I was really being plagued
> > > by weird stalls, it's much smoother now.
> > 
> > Yeah, but under load, that reluctance to release is fairly annoying...
> 
> This seems to manifest on my system as the mouse getting jerky under
> load.  Still, I don't mind - the overall feel is still smoother - as if
> the X server was getting too much CPU before.

Not only the X server, the interactivity boost in general is way too
severe.  It still needs work, but it's shaping up.  Who knows, it may
even get to the point of applying for inclusion.

Now, let's see if we can get your problem fixed with something that can
possibly go into 2.6.16 as a bugfix.  Can you please try the below?

This patch fixes two of what I would call thinkos in the interactivity
logic.  One, the requeuing of a freshly awakened task can lead to that
task losing it's slice of cpu upon preempt if it isn't alone in it's
queue.  Two, tasks which sleep for > INTERACTIVE_SLEEP(p), approximately
700ms, are treated as being idle, and prevented from slamming straight
to max dynamic priority for obvious reasons, yet a pure cpu hog that
sleeps for 100ms will fail the test, and subsequently have it's sleep
time multiplied by MAX_BONUS.

Just in case it does fix it for you, I'll even add a blame line.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-rc3/kernel/sched.c.org	2006-02-13 04:44:57.000000000 +0100
+++ linux-2.6.16-rc3/kernel/sched.c	2006-02-13 05:11:18.000000000 +0100
@@ -150,8 +150,11 @@
 	((p)->prio <= (p)->static_prio - DELTA(p))
 
 #define INTERACTIVE_SLEEP(p) \
-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+	(min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA((p)) + 1) / \
+	MAX_BONUS - 1), NS_MAX_SLEEP_AVG))
+
+#define BONUS_MULTIPLIER(p) \
+	((MAX_BONUS - CURRENT_BONUS(p)) ? : 1)
 
 #define TASK_PREEMPTS_CURR(p, rq) \
 	((p)->prio < (rq)->curr->prio)
@@ -708,7 +711,7 @@
 		 * prevent them suddenly becoming cpu hogs and starving
 		 * other processes.
 		 */
-		if (p->mm && p->activated != -1 &&
+		if (p->mm && p->activated != -1 && BONUS_MULTIPLIER(p) *
 			sleep_time > INTERACTIVE_SLEEP(p)) {
 				p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG -
 						DEF_TIMESLICE);
@@ -717,7 +720,7 @@
 			 * The lower the sleep avg a task has the more
 			 * rapidly it will rise with sleep time.
 			 */
-			sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
+			sleep_time *= BONUS_MULTIPLIER(p);
 
 			/*
 			 * Tasks waking from uninterruptible sleep are
@@ -3009,8 +3012,7 @@
 			dequeue_task(next, array);
 			next->prio = new_prio;
 			enqueue_task(next, array);
-		} else
-			requeue_task(next, array);
+		}
 	}
 	next->activated = 0;
 switch_tasks:



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  4:59                     ` MIke Galbraith
@ 2006-02-13  5:05                       ` Con Kolivas
  2006-02-13  5:32                         ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Con Kolivas @ 2006-02-13  5:05 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> Now, let's see if we can get your problem fixed with something that can
> possibly go into 2.6.16 as a bugfix.  Can you please try the below?

These sorts of changes definitely need to pass through -mm first... and don't 
forget -mm looks quite different to mainline.

Cheers,
Con


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  5:05                       ` Con Kolivas
@ 2006-02-13  5:32                         ` MIke Galbraith
  2006-02-13  5:37                           ` Con Kolivas
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  5:32 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > Now, let's see if we can get your problem fixed with something that can
> > possibly go into 2.6.16 as a bugfix.  Can you please try the below?
> 
> These sorts of changes definitely need to pass through -mm first... and don't 
> forget -mm looks quite different to mainline.

I'll leave that up to Ingo of course, and certainly have no problem with
them burning in mm.  However, I must say that I personally classify
these two changes as being trivial and obviously correct enough to be
included in 2.6.16.  I realize that mm is different in this area (we
talked about it), but the problems addressed by this patch still remain.
Anyway, let's wait and see if it fixes Lee's problem.

	-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  5:32                         ` MIke Galbraith
@ 2006-02-13  5:37                           ` Con Kolivas
  2006-02-13  5:57                             ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Con Kolivas @ 2006-02-13  5:37 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > Now, let's see if we can get your problem fixed with something that can
> > > possibly go into 2.6.16 as a bugfix.  Can you please try the below?
> >
> > These sorts of changes definitely need to pass through -mm first... and
> > don't forget -mm looks quite different to mainline.
>
> I'll leave that up to Ingo of course, and certainly have no problem with
> them burning in mm.  However, I must say that I personally classify
> these two changes as being trivial and obviously correct enough to be
> included in 2.6.16.  

This part I agree with:
-               } else
-                       requeue_task(next, array);
+               }

The rest changes behaviour; it's not a "bug" so needs testing, should be a 
separate patch from this part, and modified to suit -mm.

Cheers,
Con

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  5:37                           ` Con Kolivas
@ 2006-02-13  5:57                             ` MIke Galbraith
  2006-02-13  6:08                               ` Con Kolivas
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  5:57 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > Now, let's see if we can get your problem fixed with something that can
> > > > possibly go into 2.6.16 as a bugfix.  Can you please try the below?
> > >
> > > These sorts of changes definitely need to pass through -mm first... and
> > > don't forget -mm looks quite different to mainline.
> >
> > I'll leave that up to Ingo of course, and certainly have no problem with
> > them burning in mm.  However, I must say that I personally classify
> > these two changes as being trivial and obviously correct enough to be
> > included in 2.6.16.  
> 
> This part I agree with:
> -               } else
> -                       requeue_task(next, array);
> +               }
> 
> The rest changes behaviour; it's not a "bug" so needs testing, should be a 
> separate patch from this part, and modified to suit -mm.

Well, both change behavior, and I heartily disagree.  Blocking a 700ms
sleep while allowing a 100ms sleep to bypass the same checkpoint only to
then be multiplied by 10 is a bug.

Actually, the point at which a task becomes interactive is the point at
which scheduler semantics change.  Ergo, as far as I'm concerned, this
should be a boundary which must be crossed before proceeding further.
That, I agree, would be a behavioral change which should be baked in mm.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  5:57                             ` MIke Galbraith
@ 2006-02-13  6:08                               ` Con Kolivas
  2006-02-13  6:35                                 ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Con Kolivas @ 2006-02-13  6:08 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > Now, let's see if we can get your problem fixed with something that
> > > > > can possibly go into 2.6.16 as a bugfix.  Can you please try the
> > > > > below?
> > > >
> > > > These sorts of changes definitely need to pass through -mm first...
> > > > and don't forget -mm looks quite different to mainline.
> > >
> > > I'll leave that up to Ingo of course, and certainly have no problem
> > > with them burning in mm.  However, I must say that I personally
> > > classify these two changes as being trivial and obviously correct
> > > enough to be included in 2.6.16.
> >
> > This part I agree with:
> > -               } else
> > -                       requeue_task(next, array);
> > +               }
> >
> > The rest changes behaviour; it's not a "bug" so needs testing, should be
> > a separate patch from this part, and modified to suit -mm.
>
> Well, both change behavior, and I heartily disagree.  

The first change was the previous behaviour for some time. Your latter change 
while it makes sense has never been in the kernel. Either way I don't 
disagree with your reasoning but most things that change behaviour should go 
through -mm. The first as I said was the behaviour in mainline for some time 
till my silly requeue change.

Cheers,
Con

> Blocking a 700ms 
> sleep while allowing a 100ms sleep to bypass the same checkpoint only to
> then be multiplied by 10 is a bug.
>
> Actually, the point at which a task becomes interactive is the point at
> which scheduler semantics change.  Ergo, as far as I'm concerned, this
> should be a boundary which must be crossed before proceeding further.
> That, I agree, would be a behavioral change which should be baked in mm.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  6:08                               ` Con Kolivas
@ 2006-02-13  6:35                                 ` MIke Galbraith
  2006-02-13  6:38                                   ` Lee Revell
  2006-02-13  7:15                                   ` Con Kolivas
  0 siblings, 2 replies; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  6:35 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 17:08 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > > Now, let's see if we can get your problem fixed with something that
> > > > > > can possibly go into 2.6.16 as a bugfix.  Can you please try the
> > > > > > below?
> > > > >
> > > > > These sorts of changes definitely need to pass through -mm first...
> > > > > and don't forget -mm looks quite different to mainline.
> > > >
> > > > I'll leave that up to Ingo of course, and certainly have no problem
> > > > with them burning in mm.  However, I must say that I personally
> > > > classify these two changes as being trivial and obviously correct
> > > > enough to be included in 2.6.16.
> > >
> > > This part I agree with:
> > > -               } else
> > > -                       requeue_task(next, array);
> > > +               }
> > >
> > > The rest changes behaviour; it's not a "bug" so needs testing, should be
> > > a separate patch from this part, and modified to suit -mm.
> >
> > Well, both change behavior, and I heartily disagree.  
> 
> The first change was the previous behaviour for some time. Your latter change 
> while it makes sense has never been in the kernel. Either way I don't 
> disagree with your reasoning but most things that change behaviour should go 
> through -mm. The first as I said was the behaviour in mainline for some time 
> till my silly requeue change.

Ok, we're basically in agreement on these changes, it's just a matter of
when.  As maintainer, Ingo has to weigh the benefit, danger, etc etc.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  6:35                                 ` MIke Galbraith
@ 2006-02-13  6:38                                   ` Lee Revell
  2006-02-13  7:08                                     ` MIke Galbraith
  2006-02-13  7:15                                   ` Con Kolivas
  1 sibling, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-13  6:38 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 07:35 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 17:08 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > > > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > > > Now, let's see if we can get your problem fixed with something that
> > > > > > > can possibly go into 2.6.16 as a bugfix.  Can you please try the
> > > > > > > below?
> > > > > >
> > > > > > These sorts of changes definitely need to pass through -mm first...
> > > > > > and don't forget -mm looks quite different to mainline.
> > > > >
> > > > > I'll leave that up to Ingo of course, and certainly have no problem
> > > > > with them burning in mm.  However, I must say that I personally
> > > > > classify these two changes as being trivial and obviously correct
> > > > > enough to be included in 2.6.16.
> > > >
> > > > This part I agree with:
> > > > -               } else
> > > > -                       requeue_task(next, array);
> > > > +               }
> > > >
> > > > The rest changes behaviour; it's not a "bug" so needs testing, should be
> > > > a separate patch from this part, and modified to suit -mm.
> > >
> > > Well, both change behavior, and I heartily disagree.  
> > 
> > The first change was the previous behaviour for some time. Your latter change 
> > while it makes sense has never been in the kernel. Either way I don't 
> > disagree with your reasoning but most things that change behaviour should go 
> > through -mm. The first as I said was the behaviour in mainline for some time 
> > till my silly requeue change.
> 
> Ok, we're basically in agreement on these changes, it's just a matter of
> when.  As maintainer, Ingo has to weigh the benefit, danger, etc etc.

Do you know which of those changes fixes the "ls" problem?

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  6:38                                   ` Lee Revell
@ 2006-02-13  7:08                                     ` MIke Galbraith
  2006-02-13  8:43                                       ` Lee Revell
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  7:08 UTC (permalink / raw)
  To: Lee Revell; +Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> Do you know which of those changes fixes the "ls" problem?

No, it could be either, both, or neither.  Heck, it _could_ be a
combination of all of the things in my experimental tree for that
matter.  I put this patch out there because I know they're both bugs,
and strongly suspect it'll cure the worst of the interactivity related
delays.

I'm hoping you'll test it and confirm that it fixes yours.

	-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  6:35                                 ` MIke Galbraith
  2006-02-13  6:38                                   ` Lee Revell
@ 2006-02-13  7:15                                   ` Con Kolivas
  2006-02-13  7:41                                     ` MIke Galbraith
  1 sibling, 1 reply; 41+ messages in thread
From: Con Kolivas @ 2006-02-13  7:15 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Monday 13 February 2006 17:35, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 17:08 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > > > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > > > Now, let's see if we can get your problem fixed with something
> > > > > > > that can possibly go into 2.6.16 as a bugfix.  Can you please
> > > > > > > try the below?
> > > > > >
> > > > > > These sorts of changes definitely need to pass through -mm
> > > > > > first... and don't forget -mm looks quite different to mainline.
> > > > >
> > > > > I'll leave that up to Ingo of course, and certainly have no problem
> > > > > with them burning in mm.  However, I must say that I personally
> > > > > classify these two changes as being trivial and obviously correct
> > > > > enough to be included in 2.6.16.
> > > >
> > > > This part I agree with:
> > > > -               } else
> > > > -                       requeue_task(next, array);
> > > > +               }
> > > >
> > > > The rest changes behaviour; it's not a "bug" so needs testing, should
> > > > be a separate patch from this part, and modified to suit -mm.
> > >
> > > Well, both change behavior, and I heartily disagree.
> >
> > The first change was the previous behaviour for some time. Your latter
> > change while it makes sense has never been in the kernel. Either way I
> > don't disagree with your reasoning but most things that change behaviour
> > should go through -mm. The first as I said was the behaviour in mainline
> > for some time till my silly requeue change.
>
> Ok, we're basically in agreement on these changes, it's just a matter of
> when.  As maintainer, Ingo has to weigh the benefit, danger, etc etc.

Aye and do note the change to the idle detection code currently in -mm will 
already make substantial difference there, possibly related to fluctuating 
behaviour.

Cheers,
Con

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  7:15                                   ` Con Kolivas
@ 2006-02-13  7:41                                     ` MIke Galbraith
  0 siblings, 0 replies; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13  7:41 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Lee Revell, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 18:15 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 17:35, MIke Galbraith wrote:
> >
> > Ok, we're basically in agreement on these changes, it's just a matter of
> > when.  As maintainer, Ingo has to weigh the benefit, danger, etc etc.
> 
> Aye and do note the change to the idle detection code currently in -mm will 
> already make substantial difference there, possibly related to fluctuating 
> behaviour.

Possibly...  but there also lies a two edged sword.  Previously, some
bad boys were being truncated back to user_prio 17.  In mm, that's no
longer true.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  7:08                                     ` MIke Galbraith
@ 2006-02-13  8:43                                       ` Lee Revell
  2006-02-13 10:06                                         ` MIke Galbraith
  2006-02-13 12:35                                         ` MIke Galbraith
  0 siblings, 2 replies; 41+ messages in thread
From: Lee Revell @ 2006-02-13  8:43 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > Do you know which of those changes fixes the "ls" problem?
> 
> No, it could be either, both, or neither.  Heck, it _could_ be a
> combination of all of the things in my experimental tree for that
> matter.  I put this patch out there because I know they're both bugs,
> and strongly suspect it'll cure the worst of the interactivity related
> delays.
> 
> I'm hoping you'll test it and confirm that it fixes yours.

Nope, this does not fix it.  "time ls" ping-pongs back and forth between
~0.1s and ~0.9s.  Must have been something else in the first patch.

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  8:43                                       ` Lee Revell
@ 2006-02-13 10:06                                         ` MIke Galbraith
  2006-02-13 12:35                                         ` MIke Galbraith
  1 sibling, 0 replies; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13 10:06 UTC (permalink / raw)
  To: Lee Revell; +Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > Do you know which of those changes fixes the "ls" problem?
> > 
> > No, it could be either, both, or neither.  Heck, it _could_ be a
> > combination of all of the things in my experimental tree for that
> > matter.  I put this patch out there because I know they're both bugs,
> > and strongly suspect it'll cure the worst of the interactivity related
> > delays.
> > 
> > I'm hoping you'll test it and confirm that it fixes yours.
> 
> Nope, this does not fix it.  "time ls" ping-pongs back and forth between
> ~0.1s and ~0.9s.  Must have been something else in the first patch.

Oh well. Thanks for testing Lee.  I was hoping this would be a case of
instant gratification, 2.6.16 being near, but it's not to be.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13  8:43                                       ` Lee Revell
  2006-02-13 10:06                                         ` MIke Galbraith
@ 2006-02-13 12:35                                         ` MIke Galbraith
  2006-02-15  4:22                                           ` Lee Revell
  1 sibling, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-13 12:35 UTC (permalink / raw)
  To: Lee Revell; +Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > Do you know which of those changes fixes the "ls" problem?
> > 
> > No, it could be either, both, or neither.  Heck, it _could_ be a
> > combination of all of the things in my experimental tree for that
> > matter.  I put this patch out there because I know they're both bugs,
> > and strongly suspect it'll cure the worst of the interactivity related
> > delays.
> > 
> > I'm hoping you'll test it and confirm that it fixes yours.
> 
> Nope, this does not fix it.  "time ls" ping-pongs back and forth between
> ~0.1s and ~0.9s.  Must have been something else in the first patch.

Hmm.  Thinking about it some more, it's probably more than this alone,
but it could well be the boost qualifier I'm using...

Instead of declaring a task to be deserving of large quantities of boost
based upon their present shortage of sleep_avg, I based it upon their
not using their full slice.  He who uses the least gets the most.  This
made a large contribution to mitigating the parallel compile over NFS
problem the current scheduler has.  The fact that (current) heuristics
which mandate that any task which sleeps for 5% of it's slice may use
95% cpu practically forever can not only work, but work quite well in
the general case, tells me that the vast majority of all tasks are, and
will forever remain, cpu hogs.

The present qualifier creates positive feedback for cpu hogs by giving
them the most reward for being the biggest hog by our own definition.
If you'll pardon the pun, we gives pigs wings, and hope that they don't
actually use them and fly directly over head.  This is the root problem
as I see it, that and the fact that even if sleep_avg acquisition and
consumption were purely 1:1 as the original O(1) scheduler was, if you
sleep 1 ns longer than you run, you'll eventually be up to you neck in
sleep_avg. (a darn good reason to use something like slice_avg to help
determine when to drain off the excess)

Changing that qualifier would also mean that he who is _getting_ the
least cpu would get the most boost as well, so it should help with
fairness, and things like the test case mentioned in comments in the
patch where one task can end up starving it's own partner.

Is there any reason that "he who uses the least gets the most" would be
inferior to "he who has the least for whatever reason gets the most"?

If I were to put a patch together that did only that (IMHO sensible)
thing, would anyone be interested in trying it?

	-Mike

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-13 12:35                                         ` MIke Galbraith
@ 2006-02-15  4:22                                           ` Lee Revell
  2006-02-15  5:22                                             ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-15  4:22 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Mon, 2006-02-13 at 13:35 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> > On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > > Do you know which of those changes fixes the "ls" problem?
> > > 
> > > No, it could be either, both, or neither.  Heck, it _could_ be a
> > > combination of all of the things in my experimental tree for that
> > > matter.  I put this patch out there because I know they're both bugs,
> > > and strongly suspect it'll cure the worst of the interactivity related
> > > delays.
> > > 
> > > I'm hoping you'll test it and confirm that it fixes yours.
> > 
> > Nope, this does not fix it.  "time ls" ping-pongs back and forth between
> > ~0.1s and ~0.9s.  Must have been something else in the first patch.
> 
> Hmm.  Thinking about it some more, it's probably more than this alone,
> but it could well be the boost qualifier I'm using...

OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and 0.50s.
Better than mainline but the large seemingly random variance is still
perceptible and annoying.  And, "ls | cat" behaves about the same as
"ls", while on mainline it was consistently faster (!).

Do you have an updated patch against -mm that I can test?

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-15  4:22                                           ` Lee Revell
@ 2006-02-15  5:22                                             ` MIke Galbraith
  2006-02-15  6:11                                               ` Lee Revell
  0 siblings, 1 reply; 41+ messages in thread
From: MIke Galbraith @ 2006-02-15  5:22 UTC (permalink / raw)
  To: Lee Revell; +Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Tue, 2006-02-14 at 23:22 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 13:35 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> > > On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > > > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > > > Do you know which of those changes fixes the "ls" problem?
> > > > 
> > > > No, it could be either, both, or neither.  Heck, it _could_ be a
> > > > combination of all of the things in my experimental tree for that
> > > > matter.  I put this patch out there because I know they're both bugs,
> > > > and strongly suspect it'll cure the worst of the interactivity related
> > > > delays.
> > > > 
> > > > I'm hoping you'll test it and confirm that it fixes yours.
> > > 
> > > Nope, this does not fix it.  "time ls" ping-pongs back and forth between
> > > ~0.1s and ~0.9s.  Must have been something else in the first patch.
> > 
> > Hmm.  Thinking about it some more, it's probably more than this alone,
> > but it could well be the boost qualifier I'm using...
> 
> OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and 0.50s.
> Better than mainline but the large seemingly random variance is still
> perceptible and annoying.  And, "ls | cat" behaves about the same as
> "ls", while on mainline it was consistently faster (!).

Ok.  That means the reduction in fluctuation had nothing to do with my
changes.  It also suggests that there may be something of a regression
in the changes that are in mm, which I also carried in my patch, since
the timing for both kernels appear to be ~identical with or without my
bits.  That seems a little odd to me considering what those changes do.

> 
> Do you have an updated patch against -mm that I can test?

I will soon if you still want to try it. I've fixed the throttle release
thing, and am fine tuning the interactivity bits.  I have it working
very well now, but want to try to squeeze some more from it.

Drop me a line if you're still interested from the interactivity side,
but I think the ls delay reduction has turned out to be a red herring.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-15  5:22                                             ` MIke Galbraith
@ 2006-02-15  6:11                                               ` Lee Revell
  2006-02-15  7:17                                                 ` MIke Galbraith
  0 siblings, 1 reply; 41+ messages in thread
From: Lee Revell @ 2006-02-15  6:11 UTC (permalink / raw)
  To: MIke Galbraith
  Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Wed, 2006-02-15 at 06:22 +0100, MIke Galbraith wrote:
> > OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and
> 0.50s.
> > Better than mainline but the large seemingly random variance is
> still
> > perceptible and annoying.  And, "ls | cat" behaves about the same as
> > "ls", while on mainline it was consistently faster (!).
> 
> Ok.  That means the reduction in fluctuation had nothing to do with my
> changes.  It also suggests that there may be something of a regression
> in the changes that are in mm, which I also carried in my patch, since
> the timing for both kernels appear to be ~identical with or without my
> bits.  That seems a little odd to me considering what those changes
> do.
> 
> > 
> > Do you have an updated patch against -mm that I can test?
> 
> I will soon if you still want to try it. I've fixed the throttle
> release
> thing, and am fine tuning the interactivity bits.  I have it working
> very well now, but want to try to squeeze some more from it.
> 
> Drop me a line if you're still interested from the interactivity side,
> but I think the ls delay reduction has turned out to be a red
> herring. 

Just to be clear - this is 2.6.16-rc2-mm1 *without* your patch that I am
talking about.

Lee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6 vs 2.4, ssh terminal slowdown
  2006-02-15  6:11                                               ` Lee Revell
@ 2006-02-15  7:17                                                 ` MIke Galbraith
  0 siblings, 0 replies; 41+ messages in thread
From: MIke Galbraith @ 2006-02-15  7:17 UTC (permalink / raw)
  To: Lee Revell; +Cc: Con Kolivas, Jan Engelhardt, gcoady, linux-kernel, Ingo Molnar

On Wed, 2006-02-15 at 01:11 -0500, Lee Revell wrote:
> On Wed, 2006-02-15 at 06:22 +0100, MIke Galbraith wrote:
> > > OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and
> > 0.50s.
> > > Better than mainline but the large seemingly random variance is
> > still
> > > perceptible and annoying.  And, "ls | cat" behaves about the same as
> > > "ls", while on mainline it was consistently faster (!).
> > 
> > Ok.  That means the reduction in fluctuation had nothing to do with my
> > changes.  It also suggests that there may be something of a regression
> > in the changes that are in mm, which I also carried in my patch, since
> > the timing for both kernels appear to be ~identical with or without my
> > bits.  That seems a little odd to me considering what those changes
> > do.
> > 
> > > 
> > > Do you have an updated patch against -mm that I can test?
> > 
> > I will soon if you still want to try it. I've fixed the throttle
> > release
> > thing, and am fine tuning the interactivity bits.  I have it working
> > very well now, but want to try to squeeze some more from it.
> > 
> > Drop me a line if you're still interested from the interactivity side,
> > but I think the ls delay reduction has turned out to be a red
> > herring. 
> 
> Just to be clear - this is 2.6.16-rc2-mm1 *without* your patch that I am
> talking about.

Exactly.  2.6.16-rc2-mm1 without my patch has a delay of .15 to .50s.
2.6.16-rc1 with my patch had a reported delay of from .19 to .45s.
That's identical in my book.  My patch to rc1 also contained Con's
changes that are in mm, that's constant.  Subtracting the variable, my
patch, made no difference.  Con's changes may be responsible for the
behavior change, but mine are certainly not.

	-Mike


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2006-02-15  7:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-08  2:11 2.6 vs 2.4, ssh terminal slowdown Grant Coady
2006-02-08  2:24 ` Benjamin LaHaise
2006-02-08  2:50   ` Grant Coady
2006-02-08  3:02     ` Benjamin LaHaise
2006-02-08  2:35 ` Con Kolivas
2006-02-08  2:55   ` Grant Coady
2006-02-08  3:00     ` Con Kolivas
2006-02-08  4:12       ` Barry K. Nathan
2006-02-08  4:41         ` Grant Coady
2006-02-08  4:51       ` Grant Coady
2006-02-08  5:17         ` Willy Tarreau
2006-02-08  5:39           ` Grant Coady
2006-02-08  7:43       ` Lee Revell
2006-02-09 17:06   ` Jan Engelhardt
2006-02-09 20:06     ` Lee Revell
2006-02-10  6:35       ` MIke Galbraith
2006-02-12 13:47         ` MIke Galbraith
2006-02-12 19:03           ` Lee Revell
2006-02-12 21:36             ` MIke Galbraith
2006-02-12 23:23               ` Lee Revell
2006-02-12 23:39               ` Lee Revell
2006-02-13  3:09                 ` MIke Galbraith
2006-02-13  3:39                   ` Lee Revell
2006-02-13  4:59                     ` MIke Galbraith
2006-02-13  5:05                       ` Con Kolivas
2006-02-13  5:32                         ` MIke Galbraith
2006-02-13  5:37                           ` Con Kolivas
2006-02-13  5:57                             ` MIke Galbraith
2006-02-13  6:08                               ` Con Kolivas
2006-02-13  6:35                                 ` MIke Galbraith
2006-02-13  6:38                                   ` Lee Revell
2006-02-13  7:08                                     ` MIke Galbraith
2006-02-13  8:43                                       ` Lee Revell
2006-02-13 10:06                                         ` MIke Galbraith
2006-02-13 12:35                                         ` MIke Galbraith
2006-02-15  4:22                                           ` Lee Revell
2006-02-15  5:22                                             ` MIke Galbraith
2006-02-15  6:11                                               ` Lee Revell
2006-02-15  7:17                                                 ` MIke Galbraith
2006-02-13  7:15                                   ` Con Kolivas
2006-02-13  7:41                                     ` MIke Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox