linux scheduler limitations?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* linux scheduler limitations?
@ 2001-03-29 21:19 Fabio Riccardi
  2001-03-29 21:26 ` David Lang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Fabio Riccardi @ 2001-03-29 21:19 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm working on an enhanced version of Apache and I'm hitting my head
against something I don't understand.

I've found a (to me) unexplicable system behaviour when the number of
Apache forked instances goes somewhere beyond 1050, the machine
suddently slows down almost top a halt and becomes totally unresponsive,
until I stop the test (SpecWeb).

Profiling the kernel shows that the scheduler and the interrupt handler
are taking most of the CPU time.

I understand that there must be a limit to the number of processes that
the scheduler can efficiently handle, but I would expect some sort of
gradual performance degradation when increasing the number of tasks,
instead I observe that by increasing Apache's MaxClient linit by as
little as 10 can cause a sudden transition between smooth working with
lots (30-40%) of CPU idle to a total lock-up.

Moreover the max number of processes is not even constant. If I increase
the server load gradually then I manage to have 1500 processes running
with no problem, but if the transition is sharp (the SpecWeb case) than
I end-up having a lock up.

Anybody seen this before? Any clues?

 - Fabio

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 21:19 linux scheduler limitations? Fabio Riccardi
@ 2001-03-29 21:26 ` David Lang
  2001-03-29 21:55   ` Fabio Riccardi
  2001-03-29 21:35 ` J . A . Magallon
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: David Lang @ 2001-03-29 21:26 UTC (permalink / raw)
  To: Fabio Riccardi; +Cc: linux-kernel

2.2 or 2.4 kernel?

the 2.4 does a MUCH better job of dealing with large numbers of processes.

David Lang

On Thu, 29 Mar 2001, Fabio Riccardi wrote:

> Date: Thu, 29 Mar 2001 13:19:05 -0800
> From: Fabio Riccardi <fabio@chromium.com>
> To: linux-kernel@vger.kernel.org
> Subject: linux scheduler limitations?
>
> Hello,
>
> I'm working on an enhanced version of Apache and I'm hitting my head
> against something I don't understand.
>
> I've found a (to me) unexplicable system behaviour when the number of
> Apache forked instances goes somewhere beyond 1050, the machine
> suddently slows down almost top a halt and becomes totally unresponsive,
> until I stop the test (SpecWeb).
>
> Profiling the kernel shows that the scheduler and the interrupt handler
> are taking most of the CPU time.
>
> I understand that there must be a limit to the number of processes that
> the scheduler can efficiently handle, but I would expect some sort of
> gradual performance degradation when increasing the number of tasks,
> instead I observe that by increasing Apache's MaxClient linit by as
> little as 10 can cause a sudden transition between smooth working with
> lots (30-40%) of CPU idle to a total lock-up.
>
> Moreover the max number of processes is not even constant. If I increase
> the server load gradually then I manage to have 1500 processes running
> with no problem, but if the transition is sharp (the SpecWeb case) than
> I end-up having a lock up.
>
> Anybody seen this before? Any clues?
>
>  - Fabio
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 21:19 linux scheduler limitations? Fabio Riccardi
  2001-03-29 21:26 ` David Lang
@ 2001-03-29 21:35 ` J . A . Magallon
  2001-03-29 22:12   ` Fabio Riccardi
  2001-03-30  6:52 ` Giuliano Pochini
  2001-04-02 22:58 ` Alan Cox
  3 siblings, 1 reply; 11+ messages in thread
From: J . A . Magallon @ 2001-03-29 21:35 UTC (permalink / raw)
  To: Fabio Riccardi; +Cc: linux-kernel

On 03.29 Fabio Riccardi wrote:
> 
> I've found a (to me) unexplicable system behaviour when the number of
> Apache forked instances goes somewhere beyond 1050, the machine
> suddently slows down almost top a halt and becomes totally unresponsive,
> until I stop the test (SpecWeb).
> 

Have you though about pthreads (when you talk about fork, I suppose you
say literally 'fork()') ?

I give a course on Parallel Programming at the university and the practical
work was done with POSIX threads. One of my students caught the idea and
used it to modify his assignment from one other matter on Networks, and
changed the traditional 'fork()' in a simple ftp server he had to implement
by 'pthread_create' and got a 10-30 speedup (conns per second).

And you will get rid of some process-per-user limit. But you will fall into
an threads-per-user limit, if there is any.

And you cal also control its scheduling, to make each thread fight against
the whole system or only its siblings.

-- 
J.A. Magallon                                          #  Let the source
mailto:jamagallon@able.es                              #  be with you, Luke... 

Linux werewolf 2.4.2-ac28 #1 SMP Thu Mar 29 16:41:17 CEST 2001 i686

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 21:26 ` David Lang
@ 2001-03-29 21:55   ` Fabio Riccardi
  2001-03-30  1:45     ` Mike Kravetz
  0 siblings, 1 reply; 11+ messages in thread
From: Fabio Riccardi @ 2001-03-29 21:55 UTC (permalink / raw)
  To: David Lang; +Cc: linux-kernel

I'm using 2.4.2-ac26, but I've noticed the same behavior with all the 2.4
kernels I've seen so far.

I haven't even tried on 2.2

 - Fabio

David Lang wrote:

> 2.2 or 2.4 kernel?
>
> the 2.4 does a MUCH better job of dealing with large numbers of processes.
>
> David Lang
>
> On Thu, 29 Mar 2001, Fabio Riccardi wrote:
>
> > Date: Thu, 29 Mar 2001 13:19:05 -0800
> > From: Fabio Riccardi <fabio@chromium.com>
> > To: linux-kernel@vger.kernel.org
> > Subject: linux scheduler limitations?
> >
> > Hello,
> >
> > I'm working on an enhanced version of Apache and I'm hitting my head
> > against something I don't understand.
> >
> > I've found a (to me) unexplicable system behaviour when the number of
> > Apache forked instances goes somewhere beyond 1050, the machine
> > suddently slows down almost top a halt and becomes totally unresponsive,
> > until I stop the test (SpecWeb).
> >
> > Profiling the kernel shows that the scheduler and the interrupt handler
> > are taking most of the CPU time.
> >
> > I understand that there must be a limit to the number of processes that
> > the scheduler can efficiently handle, but I would expect some sort of
> > gradual performance degradation when increasing the number of tasks,
> > instead I observe that by increasing Apache's MaxClient linit by as
> > little as 10 can cause a sudden transition between smooth working with
> > lots (30-40%) of CPU idle to a total lock-up.
> >
> > Moreover the max number of processes is not even constant. If I increase
> > the server load gradually then I manage to have 1500 processes running
> > with no problem, but if the transition is sharp (the SpecWeb case) than
> > I end-up having a lock up.
> >
> > Anybody seen this before? Any clues?
> >
> >  - Fabio
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 21:35 ` J . A . Magallon
@ 2001-03-29 22:12   ` Fabio Riccardi
  2001-03-29 22:33     ` J . A . Magallon
  0 siblings, 1 reply; 11+ messages in thread
From: Fabio Riccardi @ 2001-03-29 22:12 UTC (permalink / raw)
  To: J . A . Magallon; +Cc: linux-kernel

Apache uses a pre-fork "threading" mechanism, it spawns (fork()s) new instances
of itself whenever it finds out that the number of idle "threads" is below a
certain (configurable) threshold.

Despite of all apparences this method performs beautifully on Linux, pthreads are
actually slower in many cases, since you will incur some additional overhead due
to thread synchronization and scheduling.

The problem is that beyond a certain number of processes the scheduler just goes
bananas, or so it seems to me.

Since Linux threads are mapped on processes, I don't think that (p)threads woud
help in any way, unless it is the VM context switch overhead that is playing a
role here, which I wouldn't think is the case.

 - Fabio

"J . A . Magallon" wrote:

> On 03.29 Fabio Riccardi wrote:
> >
> > I've found a (to me) unexplicable system behaviour when the number of
> > Apache forked instances goes somewhere beyond 1050, the machine
> > suddently slows down almost top a halt and becomes totally unresponsive,
> > until I stop the test (SpecWeb).
> >
>
> Have you though about pthreads (when you talk about fork, I suppose you
> say literally 'fork()') ?
>
> I give a course on Parallel Programming at the university and the practical
> work was done with POSIX threads. One of my students caught the idea and
> used it to modify his assignment from one other matter on Networks, and
> changed the traditional 'fork()' in a simple ftp server he had to implement
> by 'pthread_create' and got a 10-30 speedup (conns per second).
>
> And you will get rid of some process-per-user limit. But you will fall into
> an threads-per-user limit, if there is any.
>
> And you cal also control its scheduling, to make each thread fight against
> the whole system or only its siblings.
>
> --
> J.A. Magallon                                          #  Let the source
> mailto:jamagallon@able.es                              #  be with you, Luke...
>
> Linux werewolf 2.4.2-ac28 #1 SMP Thu Mar 29 16:41:17 CEST 2001 i686


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 22:12   ` Fabio Riccardi
@ 2001-03-29 22:33     ` J . A . Magallon
  2001-03-29 22:51       ` Fabio Riccardi
  0 siblings, 1 reply; 11+ messages in thread
From: J . A . Magallon @ 2001-03-29 22:33 UTC (permalink / raw)
  To: Fabio Riccardi; +Cc: linux-kernel


On 03.30 Fabio Riccardi wrote:
> 
> Despite of all apparences this method performs beautifully on Linux, pthreads
> are
> actually slower in many cases, since you will incur some additional overhead
> due
> to thread synchronization and scheduling.
>

It all depends on your app, as every parallel algorithm. In a web-ftp-whatever
server, you do not need any synchro. You can start threads in free run and
let them die alone.

> The problem is that beyond a certain number of processes the scheduler just
> goes
> bananas, or so it seems to me.
> 
> Since Linux threads are mapped on processes, I don't think that (p)threads
> woud
> help in any way, unless it is the VM context switch overhead that is playing a
> role here, which I wouldn't think is the case.
> 

You said, 'mapped'.
AFAIK, that is the advantage, you can avoid the VM switch by sharing memory.

-- 
J.A. Magallon                                          #  Let the source
mailto:jamagallon@able.es                              #  be with you, Luke... 

Linux werewolf 2.4.2-ac28 #1 SMP Thu Mar 29 16:41:17 CEST 2001 i686


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 22:33     ` J . A . Magallon
@ 2001-03-29 22:51       ` Fabio Riccardi
  0 siblings, 0 replies; 11+ messages in thread
From: Fabio Riccardi @ 2001-03-29 22:51 UTC (permalink / raw)
  To: J . A . Magallon; +Cc: linux-kernel

"J . A . Magallon" wrote:

> It all depends on your app, as every parallel algorithm. In a web-ftp-whatever
> server, you do not need any synchro. You can start threads in free run and
> let them die alone.

even if you don't need synchronization you pay for it anyway, since you will have
to use the pthread version of libc that is reentrant. Moreover many calls (i.e.
accept) are "scheduling points" for pthreads, whenever you call them the runtime
will perform quite a bit of bookeeping.

it is instructive to use a profiler on your application and see what happens when
you use pthreads...

> You said, 'mapped'.
> AFAIK, that is the advantage, you can avoid the VM switch by sharing memory.

If your application uses lots of memory than I agree, Apache only uses a tiny
amount of RAM per instance though, so I don't think that that is my case.

 - Fabio

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 21:55   ` Fabio Riccardi
@ 2001-03-30  1:45     ` Mike Kravetz
  2001-03-30  2:58       ` Fabio Riccardi
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2001-03-30  1:45 UTC (permalink / raw)
  To: Fabio Riccardi; +Cc: linux-kernel

On Thu, Mar 29, 2001 at 01:55:11PM -0800, Fabio Riccardi wrote:
> I'm using 2.4.2-ac26, but I've noticed the same behavior with all the 2.4
> kernels I've seen so far.
> 
> I haven't even tried on 2.2
> 
>  - Fabio

Fabio,

Just for fun, you might want to try out some of our scheduler patches
located at:

http://lse.sourceforge.net/scheduling/

I would be interested in your observations.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-30  1:45     ` Mike Kravetz
@ 2001-03-30  2:58       ` Fabio Riccardi
  0 siblings, 0 replies; 11+ messages in thread
From: Fabio Riccardi @ 2001-03-30  2:58 UTC (permalink / raw)
  To: Mike Kravetz, linux-kernel

Hi Mike,

somebody else on the list already pointed me at your stuff and I quickly
downloaded your multiqueue patch for 2.4.1 to try it out.

It works great! I finally manage to have 100% CPU utilization and keep the
machine decently responsive.

On a two 1GHz pentium box i went from 1300 specweb to 1600. That's pretty
amazing.

There is a bit more overhead though, I'd say arount 5%, when the CPU is not
fully loaded.

What is the status of your code? Is it going to end-up in the mainstream
kernel?

Do you have a port to the 2.4.2x kernels?

In my enthousiasm I tried to port the patch to 2.4.2-ac26 but I broke
something and it didn't work anymore... :)

I havent't tried the pooling patch yet, it didn't seem to make much sense on a
2-way box. I have an 8-way on which I'm planning to bench my web server
enhancements, I'll try the pooling stuff on it.

BTW: interested in the fastest linux web server?

BTW2: what about the HP scheduler patches?

Thanks, ciao,

 - Fabio

Mike Kravetz wrote:

> On Thu, Mar 29, 2001 at 01:55:11PM -0800, Fabio Riccardi wrote:
> > I'm using 2.4.2-ac26, but I've noticed the same behavior with all the 2.4
> > kernels I've seen so far.
> >
> > I haven't even tried on 2.2
> >
> >  - Fabio
>
> Fabio,
>
> Just for fun, you might want to try out some of our scheduler patches
> located at:
>
> http://lse.sourceforge.net/scheduling/
>
> I would be interested in your observations.
>
> --
> Mike Kravetz                                 mkravetz@sequent.com
> IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: linux scheduler limitations?
  2001-03-29 21:19 linux scheduler limitations? Fabio Riccardi
  2001-03-29 21:26 ` David Lang
  2001-03-29 21:35 ` J . A . Magallon
@ 2001-03-30  6:52 ` Giuliano Pochini
  2001-04-02 22:58 ` Alan Cox
  3 siblings, 0 replies; 11+ messages in thread
From: Giuliano Pochini @ 2001-03-30  6:52 UTC (permalink / raw)
  To: Fabio Riccardi; +Cc: linux-kernel


On 29-Mar-01 Fabio Riccardi wrote:
> Hello,
> 
> I'm working on an enhanced version of Apache and I'm hitting my head
> against something I don't understand.
> 
> I've found a (to me) unexplicable system behaviour when the number of
> Apache forked instances goes somewhere beyond 1050, the machine
> suddently slows down almost top a halt and becomes totally unresponsive,
> until I stop the test (SpecWeb).

Are you using 2.2.x ?  I had the same problem here until I switched
to 2.4.x. 2.2 internal locks are not fine grained enough.


Bye.
    Giuliano Pochini ->)|(<- Shiny Network {AS6665} ->)|(<-


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux scheduler limitations?
  2001-03-29 21:19 linux scheduler limitations? Fabio Riccardi
                   ` (2 preceding siblings ...)
  2001-03-30  6:52 ` Giuliano Pochini
@ 2001-04-02 22:58 ` Alan Cox
  3 siblings, 0 replies; 11+ messages in thread
From: Alan Cox @ 2001-04-02 22:58 UTC (permalink / raw)
  To: Fabio Riccardi; +Cc: linux-kernel

> I've found a (to me) unexplicable system behaviour when the number of
> Apache forked instances goes somewhere beyond 1050, the machine
> suddently slows down almost top a halt and becomes totally unresponsive,
> until I stop the test (SpecWeb).

Im suprised it gets that far

> Moreover the max number of processes is not even constant. If I increase
> the server load gradually then I manage to have 1500 processes running
> with no problem, but if the transition is sharp (the SpecWeb case) than
> I end-up having a lock up.

With that many servers and a sudden load you are probably causing a lot of
paging. What kernel version. And while this isnt a solution to kernel issues
take a look at thttpd instead (www.acme.com). If you have 1500 8K stacks 
thrashing in your cache you are not going to have good performance.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-04-02 22:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-29 21:19 linux scheduler limitations? Fabio Riccardi
2001-03-29 21:26 ` David Lang
2001-03-29 21:55   ` Fabio Riccardi
2001-03-30  1:45     ` Mike Kravetz
2001-03-30  2:58       ` Fabio Riccardi
2001-03-29 21:35 ` J . A . Magallon
2001-03-29 22:12   ` Fabio Riccardi
2001-03-29 22:33     ` J . A . Magallon
2001-03-29 22:51       ` Fabio Riccardi
2001-03-30  6:52 ` Giuliano Pochini
2001-04-02 22:58 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox