Process Creation Speed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Process Creation Speed
@ 2004-04-17  2:16 Stephan T. Lavavej
  2004-04-18  5:44 ` Eric
  0 siblings, 1 reply; 13+ messages in thread
From: Stephan T. Lavavej @ 2004-04-17  2:16 UTC (permalink / raw)
  To: linux-kernel

Why does creating and then terminating a process in GNU/Linux take about 6.3
ms on a Prestonia-2.2?  I observe basically the same thing on a PIII-600.

I'm pretty sure both systems run 2.4.x kernels.  Does this suck less under
2.6.x?  Not sucking at all would mean about 100 microseconds to me.  I don't
understand why it doesn't scale with processor speed.  Does this interact
with the length of a timeslice?

It matters to me because the Common Gateway Interface spawns and destroys a
process to handle each request, and I wish it were just fast, rather than
having to use FastCGI.

A fair amount of Googling and RTFFAQ didn't answer this.

Stephan T. Lavavej
http://nuwen.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-17  2:16 Process Creation Speed Stephan T. Lavavej
@ 2004-04-18  5:44 ` Eric
  2004-04-19  0:30   ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Eric @ 2004-04-18  5:44 UTC (permalink / raw)
  To: stl; +Cc: linux-kernel

On Friday 16 April 2004 21:16, Stephan T. Lavavej wrote:
> Why does creating and then terminating a process in GNU/Linux take about
> 6.3 ms on a Prestonia-2.2?  I observe basically the same thing on a
> PIII-600.
>
> I'm pretty sure both systems run 2.4.x kernels.  Does this suck less under
> 2.6.x?  Not sucking at all would mean about 100 microseconds to me.  I
> don't understand why it doesn't scale with processor speed.  Does this
> interact with the length of a timeslice?
>
> It matters to me because the Common Gateway Interface spawns and destroys a
> process to handle each request, and I wish it were just fast, rather than
> having to use FastCGI.
	The difference in speed between regular and FastCGI shouldnt be related to 
process creation time. The speed up you see from FastCGI is because it 
doesn't have to be read from disk each time. So, you're really looking for 
performace enhancements in the wrong place. Tweaking process creation can't 
make your platters spin faster.

> A fair amount of Googling and RTFFAQ didn't answer this.
>
> Stephan T. Lavavej
> http://nuwen.net
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-18  5:44 ` Eric
@ 2004-04-19  0:30   ` Jamie Lokier
  2004-04-19  2:15     ` Eric
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2004-04-19  0:30 UTC (permalink / raw)
  To: Eric; +Cc: stl, linux-kernel

Eric wrote:
> > It matters to me because the Common Gateway Interface spawns and destroys a
> > process to handle each request, and I wish it were just fast, rather than
> > having to use FastCGI.
>
> 	The difference in speed between regular and FastCGI shouldnt
> be related to process creation time. The speed up you see from
> FastCGI is because it doesn't have to be read from disk each
> time. So, you're really looking for performace enhancements in the
> wrong place. Tweaking process creation can't make your platters spin
> faster.

Wrong explanation.  CGI does not "read from disk each time".  Files,
including executables, are cached in RAM.  Platter speed is irrelevant
unless your server is overloaded, which this one plainly isn't.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19  0:30   ` Jamie Lokier
@ 2004-04-19  2:15     ` Eric
  2004-04-19  3:04       ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Eric @ 2004-04-19  2:15 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

On Sunday 18 April 2004 19:30, you wrote:
> Eric wrote:
> > > It matters to me because the Common Gateway Interface spawns and
> > > destroys a process to handle each request, and I wish it were just
> > > fast, rather than having to use FastCGI.
> >
> > 	The difference in speed between regular and FastCGI shouldnt
> > be related to process creation time. The speed up you see from
> > FastCGI is because it doesn't have to be read from disk each
> > time. So, you're really looking for performace enhancements in the
> > wrong place. Tweaking process creation can't make your platters spin
> > faster.
>
> Wrong explanation.  CGI does not "read from disk each time".  Files,
> including executables, are cached in RAM.  Platter speed is irrelevant
> unless your server is overloaded, which this one plainly isn't.
Ok ok my explanation is a bit off. But you;re still looking in the wrong 
place. 100ms isn't that long, and by just tweaking this you won't achieve 
with regular CGI what fastCGI does. And what happens when your CGI is removed 
from disk cache due to a spike in requests? It has to be read again, 
degrading performance. You can't count on an object being is disk cache every 
time if the system isn't under load. What about filesystems that use access 
timestamps? This will have to be written to the disk every time the 
application is run, so under some circumstances just being in disk cache 
isn't enough.

>From http://www.fastcgi.com/devkit/doc/fcgi-perf.htm

"CGI applications couldn't perform in-memory caching, because they exited 
after processing just one request. Web server APIs promised to solve this 
problem. But how effective is the solution?"

"FastCGI is designed to allow effective in-memory caching. Requests are routed 
from any child process to a FastCGI application server. The FastCGI 
application process maintains an in-memory cache."

Look at these two statements and you will realize that they are optimizing 
memory access patterns too. Normally, even if the file is in disk cache it 
will still have to get copied to an area that the webserver child process can 
work with. This wastes memory. So if you have 100-1000 clients and a 100k CGI 
application, it may be in disk cache once, but parts of it are getting fed to 
child processes each time it needs to be run. How long, or how many clients 
before it gets bumped out of disk cache? Or how about a plain waste of memory 
that could go to more webserver children. 

"With multi-threading you run an application process that is designed to 
handle several requests at the same time. The threads handling concurrent 
requests share process memory, so they all have access to the same cache. 
Multi-threaded programming is complex -- concurrency makes programs difficult 
to test and debug -- but with FastCGI you can write single threaded or 
multithreaded applications."

Moreover they can turn a normal application into a (pseudo)threaded 
application which has significant benefits for SMP systems as well as a 
system that just handles many concurrent connections.

IMHO, the problem still isn't related to creation time, but is an inherit 
problem of the webserver's API's. Furthermore, if I read correctly, fastCGI 
still has to spawn a child process each time a request comes in, so even if 
you tuned process creation time, fastCGI would STILL be faster. Look at it 
mathematically. Say the time it takes for fastCGI to run a CGI(F) is 10 
units. A regular server CGI implementation(C) is 100. If you shorten process 
creation time by five units(S) then C-S > F-S ALWAYS, you just would be 
helping both implementations by the SAME AMOUNT.

If you want CGI to perform faster, you will need a solution like FastCGI, or 
to rewrite your webserver's CGI APIs. If you want information on howto 
optimize CGI, post on your webserver's mailing list or fastCGI lists, there 
is no need to toy with the kernel. IMHO this is a userspace issue.

To answer your other question, 2.6 should perform better in a webserver 
application because of improvements to the VM system and the scheduler, but 
not directly because of shortend process creation time(if it was even 
shortened in 2.6). I would benchmark the server under both kernels. Also 
remember there are different scheduler algorithms and VM tunables. Check the 
Documentation folder in the kernel source. However, I have never tweaked 
those for a webserver so someone else would have to recommend a good setup 
for a webserver. 

Anyone feel free to correct me if Im wrong on some parts. Sorry for the 
longwinded reply but I could use a good refresher on this. 
--Eric Bambach

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19  2:15     ` Eric
@ 2004-04-19  3:04       ` Jamie Lokier
  2004-04-19  5:43         ` Eric
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2004-04-19  3:04 UTC (permalink / raw)
  To: Eric; +Cc: linux-kernel

Eric wrote:
> > Wrong explanation.  CGI does not "read from disk each time".  Files,
> > including executables, are cached in RAM.  Platter speed is irrelevant
> > unless your server is overloaded, which this one plainly isn't.
>
> Ok ok my explanation is a bit off. But you;re still looking in the
> wrong place. 100ms isn't that long, and by just tweaking this you
> won't achieve with regular CGI what fastCGI does.

That's true: FastCGI is a good solution, as are mod_perl and similar.

But the reasons you give for it are bogus.

> And what happens when your CGI is removed from disk cache due to a
> spike in requests?  It has to be read again, degrading
> performance. You can't count on an object being is disk cache every
> time if the system isn't under load.

What you miss is that pages being removed from the cache affects
FastCGI and CGI identically.

_Parts_ of a CGI image will be removed from memory if there is paging
pressure due to other requests (not for this CGI).  The whole file is
not dropped out in one go, but individual pages are.  If it isn't used
for a long time, it may all go.

Exactly the same thing happens to a long-running FastCGI process:
individual pages are dropped under memory pressure, when those pages
aren't currently being used.  This occurs even though the FastCGI
process lasts over multiple requests.

File paging is determined by the pattern in which pages are actually
being used at any time, and has very little to do with whether pages
are part of a running process.

> What about filesystems that use access timestamps? This will
> have to be written to the disk every time the application is run, so
> under some circumstances just being in disk cache isn't enough.

No: the timestamp is written to disk later, asynchronously, and if
there are many requests which update the timestamp it will still only
be written once per update period (30 seconds or so on ext2, 5 seconds
on ext3 I think).  It's very unlikely to affect response time.

Note that static pages (served directly by the webserver) also update
the access timestamp: the effect of these is much worse than any CGI
program.  You should use the "noatime" mount option if this is ever a
problem.

> >From http://www.fastcgi.com/devkit/doc/fcgi-perf.htm
> 
> "CGI applications couldn't perform in-memory caching, because they exited 
> after processing just one request. Web server APIs promised to solve this 
> problem. But how effective is the solution?"
> 
> "FastCGI is designed to allow effective in-memory caching. Requests
> are routed from any child process to a FastCGI application
> server. The FastCGI application process maintains an in-memory
> cache."
> 
> Look at these two statements and you will realize that they are
> optimizing memory access patterns too. Normally, even if the file is
> in disk cache it will still have to get copied to an area that the
> webserver child process can work with. This wastes memory. So if you
> have 100-1000 clients and a 100k CGI application, it may be in disk
> cache once, but parts of it are getting fed to child processes each
> time it needs to be run. How long, or how many clients before it
> gets bumped out of disk cache? Or how about a plain waste of memory
> that could go to more webserver children.

Every one of these claims is technically bogus, even the ones from
fastcgi.com, but the gist is accurate.

They are bogus because CGI programs are able to maintain in-memory
caches as well.  That's what storing data in cache files, database
servers, shared memory, memcached and so forth accomplish.  Also, the
copying you describe is not necessary.  That's why we have mmap().

They are accurate because it is much more complicated to do those
things in single-request CGI than FastCGI (or an equivalent like
mod_perl), and there is no point: writing a persistent server is much
easier than writing a complicated sharing scheme among CGI processes.

Probably the biggest speedup in practice is when people write CGI
programs in scripting languages, or with complex libraries, which
incurs a huge initialisation cost for each request.  The
initialisation doesn't occur with every request when using FastCGI.
That tends to make the difference between 0.5 requests per second and
100 requests per second.  It's a shame you didn't mention that :)

> "With multi-threading you run an application process that is
> designed to handle several requests at the same time. The threads
> handling concurrent requests share process memory, so they all have
> access to the same cache.  Multi-threaded programming is complex --
> concurrency makes programs difficult to test and debug -- but with
> FastCGI you can write single threaded or multithreaded
> applications."
>
> Moreover they can turn a normal application into a (pseudo)threaded 
> application which has significant benefits for SMP systems as well as a 
> system that just handles many concurrent connections.

True, although sometimes you find that forked applications run faster
than threaded, especially on SMP.

> If you want CGI to perform faster, you will need a solution like FastCGI, or 
> to rewrite your webserver's CGI APIs. If you want information on howto 
> optimize CGI, post on your webserver's mailing list or fastCGI lists, there 
> is no need to toy with the kernel. IMHO this is a userspace issue.
> [...]
> I would benchmark the server under both kernels. Also remember there
> are different scheduler algorithms and VM tunables. Check the
> Documentation folder in the kernel source. However, I have never
> tweaked those for a webserver so someone else would have to
> recommend a good setup for a webserver.

With all of this I agree.  Especially that it's a userspace issue.

Fwiw, all good webservers have built-in capabilities for persistent
CGI-handling processes, more or less equivalent to FastCGI.  You said
that FastCGI requires a process to be created for every request.  I
thought this wasn't true, as the protocol doesn't require it, but if
it is true that's a large overhead, as 7.5ms per request is
significant, and that would be a reason to _not_ use FastCGI and use
the web server's built-in capabilities instead.

None of this answers the question which is relevant to linux-kernel:
why does process creation take 7.5ms and fail to scale with CPU
internal clock speed over a factor of 4 (600MHz x86 to 2.2GHz x86).

Perhaps it is because we still don't have shared page tables.
That would be the most likely dominant overhead of fork().

Alternatively, the original poster may have included program
initialisation time in the 7.5ms, and that could be substantial if
there are many complex libraries being loaded.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19  3:04       ` Jamie Lokier
@ 2004-04-19  5:43         ` Eric
  2004-04-19  9:48           ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Eric @ 2004-04-19  5:43 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, Stephan T. Lavavej

On Sunday 18 April 2004 22:04, you wrote:
> Eric wrote:
> > > Wrong explanation.  CGI does not "read from disk each time".  Files,
> > > including executables, are cached in RAM.  Platter speed is irrelevant
> > > unless your server is overloaded, which this one plainly isn't.
> >
> > Ok ok my explanation is a bit off. But you;re still looking in the
> > wrong place. 100ms isn't that long, and by just tweaking this you
> > won't achieve with regular CGI what fastCGI does.
>
> That's true: FastCGI is a good solution, as are mod_perl and similar.
>
> But the reasons you give for it are bogus.


> Every one of these claims is technically bogus, even the ones from
> fastcgi.com, but the gist is accurate.

Ya, I have the gist down, but im by no means an expert programmer (yet), so I 
appreciate your well-thought out and detailed responses instead of just 
flaming or something equally bad.

> They are bogus because CGI programs are able to maintain in-memory
> caches as well.
Across instances and concurrently? It sounds complicated unless you are using 
a persistent server cgi-application....like fastCGI.
> That's what storing data in cache files,
A file is much different than in-memory.
> database servers, shared memory,
Yes, and these would be persistent applications.
> memcached and so forth accomplish.  Also, the 
> copying you describe is not necessary.  That's why we have mmap().

Ok, shot down. Fair enough.

> They are accurate because it is much more complicated to do those
> things in single-request CGI than FastCGI (or an equivalent like
> mod_perl), and there is no point: writing a persistent server is much
> easier than writing a complicated sharing scheme among CGI processes.

Yes, that is what I was going for....just in a sketchy fashion.

> Probably the biggest speedup in practice is when people write CGI
> programs in scripting languages, or with complex libraries, which
> incurs a huge initialisation cost for each request.  The
> initialisation doesn't occur with every request when using FastCGI.
> That tends to make the difference between 0.5 requests per second and
> 100 requests per second.  It's a shame you didn't mention that :)

It is. Now that you mention it, im surprised I didn't think of it. I did some 
research in this area because sysVinit scheme uses huge amounts of scripts 
and the total initialization cost per bootup is probably on the order of 5-15 
seconds depending on machine, etc. etc.... This is the situation that I was 
thinking of, perl, etc.... interpreted languages that have a huge start up 
cost would benefit with fastCGI. There isn't a whole lot you can do with 
regular CGI.

> > "With multi-threading you run an application process that is
> > designed to handle several requests at the same time. The threads
> > handling concurrent requests share process memory, so they all have
> > access to the same cache.  Multi-threaded programming is complex --
> > concurrency makes programs difficult to test and debug -- but with
> > FastCGI you can write single threaded or multithreaded
> > applications."
> >
> > Moreover they can turn a normal application into a (pseudo)threaded
> > application which has significant benefits for SMP systems as well as a
> > system that just handles many concurrent connections.
>
> True, although sometimes you find that forked applications run faster
> than threaded, especially on SMP.

Either way it is still faster. I haven't looked at fastCGI specs, but it seems 
like they were claiming to do some sort of pseudo threading/concurrency for 
performance reasons.

> > If you want CGI to perform faster, you will need a solution like FastCGI,
> > or to rewrite your webserver's CGI APIs. If you want information on howto
> > optimize CGI, post on your webserver's mailing list or fastCGI lists,
> > there is no need to toy with the kernel. IMHO this is a userspace issue.
> > [...]
> > I would benchmark the server under both kernels. Also remember there
> > are different scheduler algorithms and VM tunables. Check the
> > Documentation folder in the kernel source. However, I have never
> > tweaked those for a webserver so someone else would have to
> > recommend a good setup for a webserver.
>
> With all of this I agree.  Especially that it's a userspace issue.

Yep. There isn't a whole lot the kernel can do to help you here.

> Fwiw, all good webservers have built-in capabilities for persistent
> CGI-handling processes, more or less equivalent to FastCGI.  You said
> that FastCGI requires a process to be created for every request.  I
> thought this wasn't true, as the protocol doesn't require it, but if
> it is true that's a large overhead, as 7.5ms per request is
> significant, and that would be a reason to _not_ use FastCGI and use
> the web server's built-in capabilities instead.

Hmmm...lemme look a little more deeply into that. After research I realize 
that I glossed over the fastCGI whitepaper a little too much. 

http://www.fastcgi.com/devkit/doc/fastcgi-whitepaper/fastcgi.htm

"For each request, the server creates a new process and the process 
initializes itself."

Is referring to other CGI implementations and not to itself DOH.

However, just like a normal sever performance will suffer if fastCGI process 
have to be created on demand.

"The Web server creates FastCGI application processes to handle requests. The 
processes may be created at startup, or created on demand."

> None of this answers the question which is relevant to linux-kernel:
> why does process creation take 7.5ms and fail to scale with CPU
> internal clock speed over a factor of 4 (600MHz x86 to 2.2GHz x86).

The reason it doesn't scale is probably because the kernel always runs at a 
specified speed, 100HZ which leaves 10ms(i believe?) timeslices. I would try 
a HZ patch and bump it up to 1000, i bet you would see a big difference then.

> Perhaps it is because we still don't have shared page tables.
> That would be the most likely dominant overhead of fork().
>
> Alternatively, the original poster may have included program
> initialisation time in the 7.5ms, and that could be substantial if
> there are many complex libraries being loaded.

Yea, hopefully Stephan can provide a little more insight into how he obtained 
6.3 ms. 

> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19  5:43         ` Eric
@ 2004-04-19  9:48           ` Jamie Lokier
  2004-04-19 12:09             ` Johannes Stezenbach
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2004-04-19  9:48 UTC (permalink / raw)
  To: Eric; +Cc: linux-kernel, Stephan T. Lavavej

Eric wrote:
> > None of this answers the question which is relevant to linux-kernel:
> > why does process creation take 7.5ms and fail to scale with CPU
> > internal clock speed over a factor of 4 (600MHz x86 to 2.2GHz x86).
> 
> The reason it doesn't scale is probably because the kernel always runs at a 
> specified speed, 100HZ which leaves 10ms(i believe?) timeslices. I would try 
> a HZ patch and bump it up to 1000, i bet you would see a big difference then.

Hmm.  The timer speed shouldn't affect the measured speed of fork() at
all.  It might show up if the measuring program is dependent on the
timer in some way, though.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19  9:48           ` Jamie Lokier
@ 2004-04-19 12:09             ` Johannes Stezenbach
  2004-04-19 12:44               ` Stephan T. Lavavej
  2004-04-19 13:28               ` Jamie Lokier
  0 siblings, 2 replies; 13+ messages in thread
From: Johannes Stezenbach @ 2004-04-19 12:09 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Eric, linux-kernel, Stephan T. Lavavej

Jamie Lokier wrote:
> Eric wrote:
> > > None of this answers the question which is relevant to linux-kernel:
> > > why does process creation take 7.5ms and fail to scale with CPU
> > > internal clock speed over a factor of 4 (600MHz x86 to 2.2GHz x86).
> > 
> > The reason it doesn't scale is probably because the kernel always runs at a 
> > specified speed, 100HZ which leaves 10ms(i believe?) timeslices. I would try 
> > a HZ patch and bump it up to 1000, i bet you would see a big difference then.
> 
> Hmm.  The timer speed shouldn't affect the measured speed of fork() at
> all.  It might show up if the measuring program is dependent on the
> timer in some way, though.

http://bulk.fefe.de/scalability/ has some benchmarks on the issue.
But I guess the numbers depend heavily on the server/CGI software used.

Johannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Process Creation Speed
  2004-04-19 12:09             ` Johannes Stezenbach
@ 2004-04-19 12:44               ` Stephan T. Lavavej
  2004-04-19 22:48                 ` David Lang
  2004-04-22 13:40                 ` Jakob Oestergaard
  2004-04-19 13:28               ` Jamie Lokier
  1 sibling, 2 replies; 13+ messages in thread
From: Stephan T. Lavavej @ 2004-04-19 12:44 UTC (permalink / raw)
  To: linux-kernel

Thanks to all who have responded.

I had been measuring the time to create and terminate a do-nothing program.
I had not been measuring CGI programs, though that was why I was doing the
measurement in the first place.

I changed my measurement strategy, and I now get about 110 microseconds for
creation and termination of a do-nothing process (fork() followed by
execve()).  Statically linking everything gave a significant speedup, which
allowed me to reach that value.  This was on a 2.6.x kernel.  110
microseconds is well within my "doesn't suck" range, so I'm happy - CGI will
be fast enough for my needs, and I can always turn to FastCGI later if
necessary.

I am writing a web-based forum entirely in C++, rejecting interpreted
languages (Perl, PHP, ASP, etc.) and relational databases (MySQL,
PostGreSQL, etc.) entirely.  My forum consists of "kiddy" CGI processes
which talk over the network to a persistent "mommy" daemon who keeps all
forum state in main memory.

My code runs on both Windows and GNU/Linux with no configuration needed, but
separate measurements indicate that XP takes about 3.3 ms to create and
terminate a do-nothing process.  Thus it looks like Linux 2.6.x will be the
kernel of choice for my forum.

Thanks again!

Stephan T. Lavavej
http://nuwen.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Process Creation Speed
  2004-04-19 12:44               ` Stephan T. Lavavej
@ 2004-04-19 22:48                 ` David Lang
  2004-04-22 13:40                 ` Jakob Oestergaard
  1 sibling, 0 replies; 13+ messages in thread
From: David Lang @ 2004-04-19 22:48 UTC (permalink / raw)
  To: Stephan T. Lavavej; +Cc: linux-kernel

the 2.6 kernel does have significant advantages in file creation/shutdown
time when you have large numbers of processes.

I have a box that sits with ~3000 processes on it and with a 2.4.25 kernel
it resulted in 400 connections/sec and with a 2.6.4 it got 650
connections/sec (cutting the number of processes down to <50 resulted in
~620 connections/sec with 2.4.5)

however if you can compile you code staticly and not do shared library
lookups you may see even more drastic improvements. with the code above I
hard-coded a protocol lookup, eliminated a few hostname lookups and under
2.6.4 got it up to 2500 connections/sec!! (I eliminated hostname lookups
and got up to ~700/sec, then changed nsswitch so that it didn't look for
protocols.db and it climbed to 850/sec, I shortened /etc/protocols to a
minimal set and it climbed to 900/sec, I eliminated the
getpotobyname("ip") and it jumped to 2500/sec)

measurements on a dual athlon 2100 box with 1g of ram

note that the code was staticly compiled to start with, but doing a name
lookup invoked nsswitch and loaded in libraries from that. do a strace of
the app, dumping it to a file and see what files it opens.

especially if you have SMP try the 2.6 kernel and keep tweaking your cgi's

David Lang

On Mon, 19 Apr 2004, Stephan T. Lavavej wrote:

> Date: Mon, 19 Apr 2004 05:44:12 -0700
> From: Stephan T. Lavavej <stl@nuwen.net>
> To: linux-kernel@vger.kernel.org
> Subject: RE: Process Creation Speed
>
> Thanks to all who have responded.
>
> I had been measuring the time to create and terminate a do-nothing program.
> I had not been measuring CGI programs, though that was why I was doing the
> measurement in the first place.
>
> I changed my measurement strategy, and I now get about 110 microseconds for
> creation and termination of a do-nothing process (fork() followed by
> execve()).  Statically linking everything gave a significant speedup, which
> allowed me to reach that value.  This was on a 2.6.x kernel.  110
> microseconds is well within my "doesn't suck" range, so I'm happy - CGI will
> be fast enough for my needs, and I can always turn to FastCGI later if
> necessary.
>
> I am writing a web-based forum entirely in C++, rejecting interpreted
> languages (Perl, PHP, ASP, etc.) and relational databases (MySQL,
> PostGreSQL, etc.) entirely.  My forum consists of "kiddy" CGI processes
> which talk over the network to a persistent "mommy" daemon who keeps all
> forum state in main memory.
>
> My code runs on both Windows and GNU/Linux with no configuration needed, but
> separate measurements indicate that XP takes about 3.3 ms to create and
> terminate a do-nothing process.  Thus it looks like Linux 2.6.x will be the
> kernel of choice for my forum.
>
> Thanks again!
>
> Stephan T. Lavavej
> http://nuwen.net
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19 12:44               ` Stephan T. Lavavej
  2004-04-19 22:48                 ` David Lang
@ 2004-04-22 13:40                 ` Jakob Oestergaard
  1 sibling, 0 replies; 13+ messages in thread
From: Jakob Oestergaard @ 2004-04-22 13:40 UTC (permalink / raw)
  To: Stephan T. Lavavej; +Cc: linux-kernel

On Mon, Apr 19, 2004 at 05:44:12AM -0700, Stephan T. Lavavej wrote:
> Thanks to all who have responded.
> 
...
> 
> I am writing a web-based forum entirely in C++, rejecting interpreted
> languages (Perl, PHP, ASP, etc.) and relational databases (MySQL,
> PostGreSQL, etc.) entirely.  My forum consists of "kiddy" CGI processes
> which talk over the network to a persistent "mommy" daemon who keeps all
> forum state in main memory.

You could consider loading your .o as an apache module, rather than
executing it as a CGI program.

I was involved in one project where we did this with good success. Even
segfaults in our module would "only" take down one of the Apache
sub-processes, so while they incur performance overhead (and of course
should be fixed no matter what - which luckily is very easy (using for
example { if (!fork()) abort(); } to create snapshot coredumps)), it's
not catastrophic.  It's entirely realistic to write a good module for
Apache in a fairly short timespan.

 / jakob

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Process Creation Speed
  2004-04-19 12:09             ` Johannes Stezenbach
  2004-04-19 12:44               ` Stephan T. Lavavej
@ 2004-04-19 13:28               ` Jamie Lokier
  1 sibling, 0 replies; 13+ messages in thread
From: Jamie Lokier @ 2004-04-19 13:28 UTC (permalink / raw)
  To: Johannes Stezenbach, Eric, linux-kernel, Stephan T. Lavavej

Johannes Stezenbach wrote:
> > > > None of this answers the question which is relevant to linux-kernel:
> > > > why does process creation take 7.5ms and fail to scale with CPU
> > > > internal clock speed over a factor of 4 (600MHz x86 to 2.2GHz x86).
> 
> http://bulk.fefe.de/scalability/ has some benchmarks on the issue.
> But I guess the numbers depend heavily on the server/CGI software used.

Nice page.  The graphs there show fork() taking 250-350 microseconds,
which is quite fast.  Where is the 7.5ms complaint coming from?

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <1MFUQ-1zo-3@gated-at.bofh.it>]

[parent not found: <1MGnU-1U9-19@gated-at.bofh.it>]

* Re: Process Creation Speed
       [not found] ` <1MGnU-1U9-19@gated-at.bofh.it>
@ 2004-04-19 15:43   ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2004-04-19 15:43 UTC (permalink / raw)
  To: stl; +Cc: linux-kernel

"Stephan T. Lavavej" <stl@nuwen.net> writes:
>
> I changed my measurement strategy, and I now get about 110 microseconds for
> creation and termination of a do-nothing process (fork() followed by
> execve()).  Statically linking everything gave a significant speedup, which
> allowed me to reach that value.  This was on a 2.6.x kernel.  110
> microseconds is well within my "doesn't suck" range, so I'm happy - CGI will
> be fast enough for my needs, and I can always turn to FastCGI later if
> necessary.

This just means ld.so is too slow for you. Perhaps you should complain
to the glibc people about that?

-Andi


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-04-22 13:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-17  2:16 Process Creation Speed Stephan T. Lavavej
2004-04-18  5:44 ` Eric
2004-04-19  0:30   ` Jamie Lokier
2004-04-19  2:15     ` Eric
2004-04-19  3:04       ` Jamie Lokier
2004-04-19  5:43         ` Eric
2004-04-19  9:48           ` Jamie Lokier
2004-04-19 12:09             ` Johannes Stezenbach
2004-04-19 12:44               ` Stephan T. Lavavej
2004-04-19 22:48                 ` David Lang
2004-04-22 13:40                 ` Jakob Oestergaard
2004-04-19 13:28               ` Jamie Lokier
     [not found] <1MFUQ-1zo-3@gated-at.bofh.it>
     [not found] ` <1MGnU-1U9-19@gated-at.bofh.it>
2004-04-19 15:43   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox