netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rick Jones <rick.jones2@hp.com>
To: Matthew Faulkner <matthew.faulkner@gmail.com>
Cc: netdev@vger.kernel.org
Subject: Re: Throughput Bug?
Date: Thu, 18 Oct 2007 10:11:04 -0700	[thread overview]
Message-ID: <471793A8.20205@hp.com> (raw)
In-Reply-To: <c565abbb0710180854j6f2f756sdd390161bafd1c4a@mail.gmail.com>

Matthew Faulkner wrote:
> Hey all
> 
> I'm using netperf to perform TCP throughput tests via the localhost
> interface. This is being done on a SMP machine. I'm forcing the
> netperf server and client to run on the same core. However, for any
> packet sizes below 523 the throughput is much lower compared to the
> throughput when the packet sizes are greater than 524.
> 
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB
>  65536  65536    523    30.01        81.49   50.00    50.00    11.984  11.984
>  65536  65536    524    30.01       460.61   49.99    49.99    2.120   2.120
> 
> The chances are i'm being stupid and there is an obvious reason for
> this, but when i put  the server and client on different cores i don't
> see this effect.
> 
> Any help explaining this will be greatly appreciated.

One minor nit, but perhaps one that may help in the diagnosis - unless you set 
-D (lack of the full test banner, or a copy of the command line precludes 
knowing), and perhaps even then, all the -m option _really_ does for a 
TCP_STREAM test is set the size of the buffer passed to the transport on each 
send() call.  It is then entirely up to TCP as to how that gets 
merged/sliced/diced into TCP segments.

I forget what the MTU is of loopback, but you can get netperf to report the MSS 
for the connection by setting verbosity to 2 or more with the global -v option.

A packet trace might be interesting.  Seems that is possible under Linux with 
tcpdump.  If it were not possible, another netperf-level thing I might do is 
configure with --enable-histogram and recompile netperf (netserver does not need 
to be recompiled, although it doesn't take much longer once netperf is 
recompiled) and use the -v 2 again.  That will give you a histogram of the time 
spent in the send() call, which might be interesting if it ever blocks.


> Machine details:
> 
> Linux 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64 GNU/Linux

FWIW, with an "earlier" kernel I am not sure I can name since I'm not sure it is 
shipping (sorry, it was just what was on my system at the moment) don't see that 
_big_ difference between 523 and 524 regardless of TCP_NODELAY:

[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 524
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    524    10.00      2264.18   25.00    25.00    3.618   3.618
[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 523
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    523    10.00      3356.05   25.01    25.01    2.442   2.442


[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 523 -D
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : nodelay : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    523    10.00       398.87   25.00    25.00    20.539  20.537
[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 524 -D
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : nodelay : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  87380    524    10.00       439.33   25.00    25.00    18.646  18.644

Although, if I do constrain the socket buffers to 64KB I _do_ see the behaviour 
on the older kernel as well:

[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 523 -s 64K -S 64K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

131072 131072    523    10.00       406.61   25.00    25.00    20.146  20.145
[root@hpcpc105 netperf2_trunk]# netperf -T 0 -c -C -- -m 524 -s 64K -S 64K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET : cpu bind
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

131072 131072    524    10.00      2017.12   25.02    25.03    4.065   4.066


(yes, this is a four-core system, hence 25% CPU util reported by netperf).

> sched_affinity is used by netperf internally to set the core affinity.
> 
> I tried this on 2.6.18 and i got the same problem!

I can say that the kernel I tried was based on 2.6.18...  So, due dilligence and 
no good deed going unpunished suggests that Matthew and I are now in a race to 
take some tcpdump traces :)

rick jones

> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2007-10-18 17:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-18 15:54 Throughput Bug? Matthew Faulkner
2007-10-18 17:11 ` Rick Jones [this message]
2007-10-19  5:44 ` Bill Fink
2007-10-19 15:41   ` Matthew Faulkner
2007-10-19 17:42     ` Rick Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=471793A8.20205@hp.com \
    --to=rick.jones2@hp.com \
    --cc=matthew.faulkner@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).