CPU scheduler question: processes created faster than destroyed?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* CPU scheduler question: processes created faster than destroyed?
@ 2002-04-18 20:06 Nicolae P. Costescu
  2002-04-18 21:20 ` Anders Peter Fugmann
  0 siblings, 1 reply; 3+ messages in thread
From: Nicolae P. Costescu @ 2002-04-18 20:06 UTC (permalink / raw)
  To: linux-kernel

First, thanks to all for doing a great job with the Linux kernel. We 
appreciate your work!

I've been seeing this behavior on our servers (2.4.2 and 2.4.18 kernel) and 
would like to ask your opinion. I have read the linux internals doc. 
section on the scheduler, and have read through the sched.c source code.

I have a certain # of client processes on various machines that connect to 
a linux server box. A forking server (DAServer.t) waits for client 
connections, then forks a copy to handle the client request. Each forked 
DAServer.t connects to three database server back ends (postgres), all of 
which fork from a postmaster process.

So each client connection causes 4 forks on the server.

The forked server does some work for the client, communicating with the 3 
database backends, then replies to the client, and the client disconnects. 
Now the server does some cleanup and exist.

At the point where the server replied to the client, the client disconnects 
and is ready to send another message to the master server, which will cause 
another 4 forks, etc.

Given a fixed # of clients (say 14) I'd expect the # of processes to have 
some upper bound, but what I see is that sometimes the # of sleeping 
processes will grow unbounded (limited only by the 512 limit on the # of 
database backends).

I've logged load average, # of processes, # of DAServer.t processes, # of 
database server and plotted these at

http://www.strongholdtech.com/plots/

The # of sleeping processes will grow to say 900, and then suddenly 140 or 
them will become ready to run, run for a few seconds (driving load avg to 
around 100) and then disappear. Then shortly the system returns to normal.

Swap is not an issue, we have 1 gig RAM and 2 gig swap, and the swap isn't 
used, we usually only have about 400 meg in use when we hit the 900 process 
mark.

Is this just bad design on our part, or is there something in the CPU 
scheduler that leads to this behavior - where processes are started quicker 
than they die?

This problem is only exacerbated on a dual CPU box (goes unstable quicker).

Any suggestions?

We're going to try throttling the forked DAServer.t when the # of sleeping 
processes is large, making it sleep so that hopefully the other processes 
(some of which hang around in the sleeping state for 30 minutes or longer) 
will be able to run and die.

Thanks for your help,
Nick
****************************************************
Nicolae P. Costescu, Ph.D.  / Senior Developer
Stronghold Technologies
46040 Center Oak Plaza, Suite 160 / Sterling, Va 20166
Tel: 571-434-1472 / Fax: 571-434-1478

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CPU scheduler question: processes created faster than  destroyed?
  2002-04-18 20:06 CPU scheduler question: processes created faster than destroyed? Nicolae P. Costescu
@ 2002-04-18 21:20 ` Anders Peter Fugmann
  2002-04-19  5:57   ` Nicolae P. Costescu
  0 siblings, 1 reply; 3+ messages in thread
From: Anders Peter Fugmann @ 2002-04-18 21:20 UTC (permalink / raw)
  To: Nicolae P. Costescu; +Cc: linux-kernel

Nicolae P. Costescu wrote:
> At the point where the server replied to the client, the client 
> disconnects and is ready to send another message to the master server, 
> which will cause another 4 forks, etc.

So, your clients are contacting the server repeatably...

First there is something in your desctiption that was not entirely clear.
After the server has received a request and has spawned four processes, does it sleep while
waiting for data?

If yes, the server would get a high counter. This means that the "dynamic proirity" of the server
process would be higher than the spawned processes, and hence be able to starve the child processes
for a small ammount of time. Therefore it is able to send the answer back to the client and receive a
new request before any of the first spawned processes has terminated. Also the new spawned children will
possibly have a higher "dynamic priority" (Really hate to use this term) than the first spawned processes.

Try and understand the line:
     p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
in kernel/sched.c#624 - 2.4.18, especially when a process is sleeping.

 > Is this just bad design on our part, or is there something in the CPU scheduler that leads to this
 > behavior - where processes are started quicker than they die?

Well. If my assumptions are correct it is both. The scheduler works in this way, but
that should not harm your application (it actually speed it up), but you might want
to redesign your application to avoid too many proccesse being spawned.

I would suggest one of two ways to do this.

1) Let the server wait for the spawned processes to die, before accepting new requests.
The draw back migh be that it will slow down the server process a bit.

2) Dont spawn new processes all the time. Spawn the four needed processes once and for all,
and insted of terminating after proccessing, let them wait for a new command (acting just like your server).

Hope it helps.
Anders Fugmann

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CPU scheduler question: processes created faster than  destroyed?
  2002-04-18 21:20 ` Anders Peter Fugmann
@ 2002-04-19  5:57   ` Nicolae P. Costescu
  0 siblings, 0 replies; 3+ messages in thread
From: Nicolae P. Costescu @ 2002-04-19  5:57 UTC (permalink / raw)
  To: Anders Peter Fugmann; +Cc: linux-kernel


>So, your clients are contacting the server repeatably...
>
>First there is something in your desctiption that was not entirely clear.
>After the server has received a request and has spawned four processes, 
>does it sleep while
>waiting for data?

The master server always is blocked on accept() waiting for clients to 
connect. When a client connects, he forks() and his child takes over. The 
parent closes his file descriptors to decrease ref. count and then gets 
back to waiting for another client.

>If yes, the server would get a high counter. This means that the "dynamic 
>proirity" of the server
>process would be higher than the spawned processes, and hence be able to 
>starve the child processes
>for a small ammount of time. Therefore it is able to send the answer back 
>to the client and receive a
>new request before any of the first spawned processes has terminated. Also 
>the new spawned children will
>possibly have a higher "dynamic priority" (Really hate to use this term) 
>than the first spawned processes.

That makes sense.

>Try and understand the line:
>     p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
>in kernel/sched.c#624 - 2.4.18, especially when a process is sleeping.

Ok will review it, thanks. Your explanation certainly makes sense, since 
the master server uses only a fraction of its time quantum and then goes 
back to sleep (wake up on accept(), fork, block on accept).

>Well. If my assumptions are correct it is both. The scheduler works in 
>this way, but
>that should not harm your application (it actually speed it up), but you 
>might want
>to redesign your application to avoid too many proccesse being spawned.
>
>I would suggest one of two ways to do this.
>
>1) Let the server wait for the spawned processes to die, before accepting 
>new requests.
>The draw back migh be that it will slow down the server process a bit.

Not possible because we must handle multiple simultaneous clients.

>2) Dont spawn new processes all the time. Spawn the four needed processes 
>once and for all,
>and insted of terminating after proccessing, let them wait for a new 
>command (acting just like your server).

Again need multiple simultaneous processes.

>Hope it helps.
>Anders Fugmann

It did, thank you for taking the time to read & reply! I think our long 
term solution is to use a pool of servers instead of constantly forking. 
Short term we need to keep the client connected while the forked server 
closes his database connections, that way the client can't jump back in and 
cause another set of servers to fork.

Thanks again,
Nick

****************************************************
Nicolae P. Costescu, Ph.D.  / Senior Developer
Stronghold Technologies
46040 Center Oak Plaza, Suite 160 / Sterling, Va 20166
Tel: 571-434-1472 / Fax: 571-434-1478


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-04-19  5:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-18 20:06 CPU scheduler question: processes created faster than destroyed? Nicolae P. Costescu
2002-04-18 21:20 ` Anders Peter Fugmann
2002-04-19  5:57   ` Nicolae P. Costescu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox