public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* dynamic linking files slow fork down significantly
@ 2007-03-03  0:53 David Lang
  2007-03-03  7:48 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: David Lang @ 2007-03-03  0:53 UTC (permalink / raw)
  To: linux-kernel

I have a fork-heavy workload (a proxy that forks per connection, I know it's not 
the most efficiant design) and I discovered a 2x performance difference between 
a static and dynamicly linked version of the same program (2200 connections/sec 
vs 4700 connections/sec)

I know that there is overhead on program startup, but didn't expect to find it 
on a fork with no exec. If I has been asked I would have guessed that the static 
version would have been slower due to the need to mark more memory as COW.

what is it that costs so much with dynamic libraries on a fork/clone?

according to strace, the clone call that's being made is
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0xb7c92c08)

David Lang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dynamic linking files slow fork down significantly
  2007-03-03  0:53 dynamic linking files slow fork down significantly David Lang
@ 2007-03-03  7:48 ` Eric Dumazet
  2007-03-04  8:22   ` David Lang
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2007-03-03  7:48 UTC (permalink / raw)
  To: David Lang; +Cc: linux-kernel

David Lang a écrit :
> I have a fork-heavy workload (a proxy that forks per connection, I know 
> it's not the most efficiant design) and I discovered a 2x performance 
> difference between a static and dynamicly linked version of the same 
> program (2200 connections/sec vs 4700 connections/sec)
> 
> I know that there is overhead on program startup, but didn't expect to 
> find it on a fork with no exec. If I has been asked I would have guessed 
> that the static version would have been slower due to the need to mark 
> more memory as COW.
> 
> what is it that costs so much with dynamic libraries on a fork/clone?
> 

man ld.so

        LD_BIND_NOW
               If set to non-empty string, causes the dynamic linker to resolve
               all symbols at program startup  instead  of  deferring  function
               call resolval to the point when they are first referenced.


If you do :
export LD_BIND_NOW=1
before starting your dynamicaly linked version, do you get different numbers ?

If some symbols are resolved dynamically after your forks(), the dynamic 
linker has to dirty some parts of memory and each child gets its own copy of 
modified pages. The cpu cost is not factorized, and memory needs are larger, 
so cpu caches are less efficient.

With LD_BIND_NOW=1, the initial exec of your programm will be a litle bit 
longer, but in the end you win.

You may see effect of immediate binding with ldd command :
Its -r option asks to do the full binding :

# time ldd ./groff
         libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0xf7ea0000)
         libm.so.6 => /lib/tls/libm.so.6 (0xf7e7e000)
         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7e73000)
         libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
         /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf7f5d000)
0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+696minor)pagefaults 0swaps

# time ldd -r ./groff
         libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0xf7e8f000)
         libm.so.6 => /lib/tls/libm.so.6 (0xf7e6d000)
         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7e62000)
         libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
         /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf7f4c000)
0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+777minor)pagefaults 0swaps

You can see 777 pagefaults instead of 696 on this example.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dynamic linking files slow fork down significantly
  2007-03-03  7:48 ` Eric Dumazet
@ 2007-03-04  8:22   ` David Lang
  2007-03-04 10:14     ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: David Lang @ 2007-03-04  8:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3104 bytes --]

On Sat, 3 Mar 2007, Eric Dumazet wrote:

> David Lang a écrit :
>> I have a fork-heavy workload (a proxy that forks per connection, I know 
>> it's not the most efficiant design) and I discovered a 2x performance 
>> difference between a static and dynamicly linked version of the same 
>> program (2200 connections/sec vs 4700 connections/sec)
>> 
>> I know that there is overhead on program startup, but didn't expect to find 
>> it on a fork with no exec. If I has been asked I would have guessed that 
>> the static version would have been slower due to the need to mark more 
>> memory as COW.
>> 
>> what is it that costs so much with dynamic libraries on a fork/clone?
>> 
>
> man ld.so
>
>       LD_BIND_NOW
>              If set to non-empty string, causes the dynamic linker to 
> resolve
>              all symbols at program startup  instead  of  deferring 
> function
>              call resolval to the point when they are first referenced.
>
>
> If you do :
> export LD_BIND_NOW=1
> before starting your dynamicaly linked version, do you get different numbers 
> ?

slightly, with LD_BIND_NOW=1 the dynamicly linked version goes up to 3000 
connections/sec, gaining back about half the difference.

> If some symbols are resolved dynamically after your forks(), the dynamic 
> linker has to dirty some parts of memory and each child gets its own copy of 
> modified pages. The cpu cost is not factorized, and memory needs are larger, 
> so cpu caches are less efficient.

thanks, this makes sense.

> With LD_BIND_NOW=1, the initial exec of your programm will be a litle bit 
> longer, but in the end you win.

definantly, any cost that can be moved to startup instead of the per-connection 
handling pays off _very_ quickly

> You may see effect of immediate binding with ldd command :
> Its -r option asks to do the full binding :
>
> # time ldd ./groff
>        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0xf7ea0000)
>        libm.so.6 => /lib/tls/libm.so.6 (0xf7e7e000)
>        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7e73000)
>        libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
>        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf7f5d000)
> 0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+696minor)pagefaults 0swaps
>
> # time ldd -r ./groff
>        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0xf7e8f000)
>        libm.so.6 => /lib/tls/libm.so.6 (0xf7e6d000)
>        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7e62000)
>        libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
>        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf7f4c000)
> 0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+777minor)pagefaults 0swaps
>
> You can see 777 pagefaults instead of 696 on this example.

the version of time on my system (debian 3.1) doesn't give me the cpu or 
pagefault info, just the times. where should I get the version that gives more 
info? (although I don't think I can apply it directly to my troubleshooting as 
the work is all being done in child processes).

David Lang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dynamic linking files slow fork down significantly
  2007-03-04  8:22   ` David Lang
@ 2007-03-04 10:14     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2007-03-04 10:14 UTC (permalink / raw)
  To: David Lang; +Cc: linux-kernel

David Lang a écrit :
> On Sat, 3 Mar 2007, Eric Dumazet wrote:
>>
>> # time ldd -r ./groff
>>        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0xf7e8f000)
>>        libm.so.6 => /lib/tls/libm.so.6 (0xf7e6d000)
>>        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7e62000)
>>        libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
>>        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xf7f4c000)
>> 0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 
>> 0maxresident)k
>> 0inputs+0outputs (0major+777minor)pagefaults 0swaps
>>
>> You can see 777 pagefaults instead of 696 on this example.
> 
> the version of time on my system (debian 3.1) doesn't give me the cpu or 
> pagefault info, just the times. where should I get the version that 
> gives more info? (although I don't think I can apply it directly to my 
> troubleshooting as the work is all being done in child processes).

time is a shell builtin.
But there is normally an /usr/bin/time standalone program.

# type time
time is a shell keyword
# time date
Sun Mar  4 12:15:54 CET 2007

real    0m0.003s
user    0m0.000s
sys     0m0.000s

# /usr/bin/time date
Sun Mar  4 12:15:56 CET 2007
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+192minor)pagefaults 0swaps

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-03-04 10:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-03  0:53 dynamic linking files slow fork down significantly David Lang
2007-03-03  7:48 ` Eric Dumazet
2007-03-04  8:22   ` David Lang
2007-03-04 10:14     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox