netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Throughput Bug?
@ 2007-10-18 15:54 Matthew Faulkner
  2007-10-18 17:11 ` Rick Jones
  2007-10-19  5:44 ` Bill Fink
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Faulkner @ 2007-10-18 15:54 UTC (permalink / raw)
  To: netdev

Hey all

I'm using netperf to perform TCP throughput tests via the localhost
interface. This is being done on a SMP machine. I'm forcing the
netperf server and client to run on the same core. However, for any
packet sizes below 523 the throughput is much lower compared to the
throughput when the packet sizes are greater than 524.

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB
 65536  65536    523    30.01        81.49   50.00    50.00    11.984  11.984
 65536  65536    524    30.01       460.61   49.99    49.99    2.120   2.120

The chances are i'm being stupid and there is an obvious reason for
this, but when i put  the server and client on different cores i don't
see this effect.

Any help explaining this will be greatly appreciated.

Machine details:

Linux 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64 GNU/Linux

sched_affinity is used by netperf internally to set the core affinity.

I tried this on 2.6.18 and i got the same problem!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Throughput Bug?
  2007-10-18 15:54 Throughput Bug? Matthew Faulkner
@ 2007-10-18 17:11 ` Rick Jones
  2007-10-19  5:44 ` Bill Fink
  1 sibling, 0 replies; 5+ messages in thread
From: Rick Jones @ 2007-10-18 17:11 UTC (permalink / raw)
  To: Matthew Faulkner; +Cc: netdev

Matthew Faulkner wrote:
> Hey all
> 
> I'm using netperf to perform TCP throughput tests via the localhost
> interface. This is being done on a SMP machine. I'm forcing the
> netperf server and client to run on the same core. However, for any
> packet sizes below 523 the throughput is much lower compared to the
> throughput when the packet sizes are greater than 524.
> 
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB
>  65536  65536    523    30.01        81.49   50.00    50.00    11.984  11.984
>  65536  65536    524    30.01       460.61   49.99    49.99    2.120   2.120
> 
> The chances are i'm being stupid and there is an obvious reason for
> this, but when i put  the server and client on different cores i don't
> see this effect.
> 
> Any help explaining this will be greatly appreciated.

One minor nit, but perhaps one that may help in the diagnosis - unless you set 
-D (lack of the full test banner, or a copy of the command line precludes 
knowing), and perhaps even then, all the -m option _really_ does for a 
TCP_STREAM test is set the size of the buffer passed to the transport on each 
send() call.  It is then entirely up to TCP as to how that gets 
merged/sliced/diced into TCP segments.

I forget what the MTU is of loopback, but you can get netperf to report the MSS 
for the connection by setting verbosity to 2 or more with the global -v option.

A packet trace might be interesting.  Seems that is possible under Linux with 
tcpdump.  If it were not possible, another netperf-level thing I might do is 
configure with --enable-histogram and recompile netperf (netserver does not need 
to be recompiled, although it doesn't take much longer once netperf is 
recompiled) and use the -v 2 again.  That will give you a histogram of the time 
spent in the send() call, which might be interesting if it ever blocks.


> Machine details:
> 
> Linux 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64 GNU/Linux

FWIW, with an "earlier" kernel I am not sure I can name since I'm not sure it is 
shipping (sorry, it was just what was on my system at the moment) don't see that 
_big_ difference between 523 and 524 regardless of TCP_NODELAY:

[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 524
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    524    10.00      2264.18   25.00    25.00    3.618   3.618
[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 523
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    523    10.00      3356.05   25.01    25.01    2.442   2.442


[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 523 -D
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : nodelay : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    523    10.00       398.87   25.00    25.00    20.539  20.537
[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 524 -D
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : nodelay : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    524    10.00       439.33   25.00    25.00    18.646  18.644

Although, if I do constrain the socket buffers to 64KB I _do_ see the behaviour 
on the older kernel as well:

[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 523 -s 64K -S 64K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

131072 131072    523    10.00       406.61   25.00    25.00    20.146  20.145
[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 524 -s 64K -S 64K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

131072 131072    524    10.00      2017.12   25.02    25.03    4.065   4.066


(yes, this is a four-core system, hence 25% CPU util reported by netperf).

> sched_affinity is used by netperf internally to set the core affinity.
> 
> I tried this on 2.6.18 and i got the same problem!

I can say that the kernel I tried was based on 2.6.18...  So, due dilligence and 
no good deed going unpunished suggests that Matthew and I are now in a race to 
take some tcpdump traces :)

rick jones

> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Throughput Bug?
  2007-10-18 15:54 Throughput Bug? Matthew Faulkner
  2007-10-18 17:11 ` Rick Jones
@ 2007-10-19  5:44 ` Bill Fink
  2007-10-19 15:41   ` Matthew Faulkner
  1 sibling, 1 reply; 5+ messages in thread
From: Bill Fink @ 2007-10-19  5:44 UTC (permalink / raw)
  To: Matthew Faulkner; +Cc: netdev

On Thu, 18 Oct 2007, Matthew Faulkner wrote:

> Hey all
> 
> I'm using netperf to perform TCP throughput tests via the localhost
> interface. This is being done on a SMP machine. I'm forcing the
> netperf server and client to run on the same core. However, for any
> packet sizes below 523 the throughput is much lower compared to the
> throughput when the packet sizes are greater than 524.
> 
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB
>  65536  65536    523    30.01        81.49   50.00    50.00    11.984  11.984
>  65536  65536    524    30.01       460.61   49.99    49.99    2.120   2.120
> 
> The chances are i'm being stupid and there is an obvious reason for
> this, but when i put  the server and client on different cores i don't
> see this effect.
> 
> Any help explaining this will be greatly appreciated.
> 
> Machine details:
> 
> Linux 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64 GNU/Linux
> 
> sched_affinity is used by netperf internally to set the core affinity.

I don't know if it's relevant, but note that 524 bytes + 52 bytes
of IP(20)/TCP(20)/TimeStamp(12) overhead gives a 576 byte packet,
which is the specified size that all IP routers must handle (and
the smallest value possible during PMTU discovery I believe).  A
message size of 523 bytes would be 1 less than that.  Could this
possibly have to do with ABC (possibly try disabling it if set)?

						-Bill

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Throughput Bug?
  2007-10-19  5:44 ` Bill Fink
@ 2007-10-19 15:41   ` Matthew Faulkner
  2007-10-19 17:42     ` Rick Jones
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Faulkner @ 2007-10-19 15:41 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev, netperf-feedback, netperf-talk

I removed the socket sizes in an attempt to reproduce your results
Rick and i managed to do so, but only when i launch netperf by typing
in the follow cmd in to the bash shell.

/home/cheka/netperf-2.4.4/src/netperf -T 0,0 -l 10 -t TCP_STREAM -c
100 -C 100 -f M -P 0 -- -m 523

As soon as i try to launch netperf (with the same line as i do manual)
from within a script of any form (be it php or bash) the difference
between 523 and 524 appears again.

The php script i'm using is pasted below (it's the same as the bash
script that comes with netperf to provide the tcp_stream)

<?php
        $START=522;
        $END=524;
        $MULT=1;
        $ADD=1;
        $MAXPROC=1; // This is the maximum number of CPU's you have so
we can assign the client to different CPUs to show the same problem
between 523 and 524 does not occur unless it's on CPU 0 and CPU 0

        $DURATION = 10; // Length of test
        $LOC_CPU = "-c 100"; // Report the local CPU info
        $REM_CPU = "-C 100"; // Report the remove CPU info

       $NETSERVER = "netserver"; //path to netserver
       $NETPERF = "netperf"; // path to netperf

        for($i=0; $i<=$MAXPROC; $i++) {
                echo "0,$i\n";
                $MESSAGE = $START;

                while($MESSAGE <= $END) {
                        passthru('killall netserver > /dev/null'); //
tried it with and without the following restarts of netserver
                        passthru('sleep 5');
                        passthru("$NETSERVER");
                        passthru('sleep 5');
                        echo "$NETPERF -T 0,$i -l $DURATION -t
TCP_STREAM $LOC_CPU $REM_CPU -f M -P 0 -- -m $MESSAGE\n"; // let's see
what we try to exec
                        passthru("$NETPERF -T 0,$i -l $DURATION -t
TCP_STREAM $LOC_CPU $REM_CPU -f M -P 0 -- -m $MESSAGE"); // exec it -
this will also print to screen
                        passthru('sleep 5'); // sleep
                        $MESSAGE += $ADD;
                        $MESSAGE *= $MULT;
                }
        }
?>


On 19/10/2007, Bill Fink <billfink@mindspring.com> wrote:
> On Thu, 18 Oct 2007, Matthew Faulkner wrote:
>
> > Hey all
> >
> > I'm using netperf to perform TCP throughput tests via the localhost
> > interface. This is being done on a SMP machine. I'm forcing the
> > netperf server and client to run on the same core. However, for any
> > packet sizes below 523 the throughput is much lower compared to the
> > throughput when the packet sizes are greater than 524.
> >
> > Recv   Send    Send                          Utilization       Service Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB
> >  65536  65536    523    30.01        81.49   50.00    50.00    11.984  11.984
> >  65536  65536    524    30.01       460.61   49.99    49.99    2.120   2.120
> >
> > The chances are i'm being stupid and there is an obvious reason for
> > this, but when i put  the server and client on different cores i don't
> > see this effect.
> >
> > Any help explaining this will be greatly appreciated.
> >
> > Machine details:
> >
> > Linux 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64 GNU/Linux
> >
> > sched_affinity is used by netperf internally to set the core affinity.
>
> I don't know if it's relevant, but note that 524 bytes + 52 bytes
> of IP(20)/TCP(20)/TimeStamp(12) overhead gives a 576 byte packet,
> which is the specified size that all IP routers must handle (and
> the smallest value possible during PMTU discovery I believe).  A
> message size of 523 bytes would be 1 less than that.  Could this
> possibly have to do with ABC (possibly try disabling it if set)?
>
>                                                 -Bill
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Throughput Bug?
  2007-10-19 15:41   ` Matthew Faulkner
@ 2007-10-19 17:42     ` Rick Jones
  0 siblings, 0 replies; 5+ messages in thread
From: Rick Jones @ 2007-10-19 17:42 UTC (permalink / raw)
  To: Matthew Faulkner; +Cc: netdev

Matthew Faulkner wrote:
> I removed the socket sizes in an attempt to reproduce your results
> Rick and i managed to do so, but only when i launch netperf by typing
> in the follow cmd in to the bash shell.
> 
> /home/cheka/netperf-2.4.4/src/netperf -T 0,0 -l 10 -t TCP_STREAM -c
> 100 -C 100 -f M -P 0 -- -m 523
> 
> As soon as i try to launch netperf (with the same line as i do manual)
> from within a script of any form (be it php or bash) the difference
> between 523 and 524 appears again.
> 
> The php script i'm using is pasted below (it's the same as the bash
> script that comes with netperf to provide the tcp_stream)

well, bash on some platforms I guess - when those netperf scripts were first 
written, i'm not even sure bash was a gleam in its author's eye :)

> <?php
>         $START=522;
>         $END=524;
>         $MULT=1;
>         $ADD=1;
>         $MAXPROC=1; // This is the maximum number of CPU's you have so
> we can assign the client to different CPUs to show the same problem
> between 523 and 524 does not occur unless it's on CPU 0 and CPU 0
> 
>         $DURATION = 10; // Length of test
>         $LOC_CPU = "-c 100"; // Report the local CPU info
>         $REM_CPU = "-C 100"; // Report the remove CPU info
> 
>        $NETSERVER = "netserver"; //path to netserver
>        $NETPERF = "netperf"; // path to netperf
> 
>         for($i=0; $i<=$MAXPROC; $i++) {
>                 echo "0,$i\n";
>                 $MESSAGE = $START;
> 
>                 while($MESSAGE <= $END) {
>                         passthru('killall netserver > /dev/null'); //
> tried it with and without the following restarts of netserver
>                         passthru('sleep 5');
>                         passthru("$NETSERVER");
>                         passthru('sleep 5');
>                         echo "$NETPERF -T 0,$i -l $DURATION -t
> TCP_STREAM $LOC_CPU $REM_CPU -f M -P 0 -- -m $MESSAGE\n"; // let's see
> what we try to exec
>                         passthru("$NETPERF -T 0,$i -l $DURATION -t
> TCP_STREAM $LOC_CPU $REM_CPU -f M -P 0 -- -m $MESSAGE"); // exec it -
> this will also print to screen
>                         passthru('sleep 5'); // sleep
>                         $MESSAGE += $ADD;
>                         $MESSAGE *= $MULT;
>                 }
>         }
> ?>

While I wouldn't know broken php if it reared up and bit me on the backside, the 
above looks like a fairly straightforward trnaslation of some of the old netperf 
scripts with the add and mult  bits.  I don't see anything amis there.

While it is often a case of famous last words, I've dropped netperf-talk from 
this as I don't think there is a netperf issue, just an issue demonstrated with 
netperf.  Besides, netperf-talk, being a closed list (my simplistic attempts to 
deal with spam) would cause problems for most readers of netdev when/if they 
were to contribute to the thread...

> 
> 
> On 19/10/2007, Bill Fink <billfink@mindspring.com> wrote:
>>I don't know if it's relevant, but note that 524 bytes + 52 bytes
>>of IP(20)/TCP(20)/TimeStamp(12) overhead gives a 576 byte packet,
>>which is the specified size that all IP routers must handle (and
>>the smallest value possible during PMTU discovery I believe).  A
>>message size of 523 bytes would be 1 less than that.  Could this
>>possibly have to do with ABC (possibly try disabling it if set)?

ABC might be good to check.  It might also be worthwhile to try setting the 
lowlatency sysctl - both processes being on the same CPU might interact poorly 
with the attempts to run things on the receiver's stack.

rick jones

I guess I've not managed to lose the race to a packet trace... :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-10-19 17:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-18 15:54 Throughput Bug? Matthew Faulkner
2007-10-18 17:11 ` Rick Jones
2007-10-19  5:44 ` Bill Fink
2007-10-19 15:41   ` Matthew Faulkner
2007-10-19 17:42     ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).