Re: Bonding, GRO and tcp_reordering

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Simon Horman <horms@verge.net.au>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>, netdev@vger.kernel.org
Subject: Re: Bonding, GRO and tcp_reordering
Date: Fri, 3 Dec 2010 22:38:00 +0900	[thread overview]
Message-ID: <20101203133800.GA26038@verge.net.au> (raw)
In-Reply-To: <20101201043445.GC3485@verge.net.au>

On Wed, Dec 01, 2010 at 01:34:45PM +0900, Simon Horman wrote:
> On Tue, Nov 30, 2010 at 05:04:33PM +0100, Eric Dumazet wrote:
> > Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit :
> > > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote:

To clarify my statement in a previous email that GSO had no effect: I
re-ran the tests and I still haven't observed any affect of GSO on my
results. However, I did notice that in order for GRO on the server to have
effect I also need TSO enabled on the client.  I thought that I had
previously checked that but I was mistaken.

Enabling TSO on the client while leaving GSO disabled on the server
resulted in increased CPU utilisation on the client, from ~15% to ~20%.

> > > > The only other parameter that seemed to have significant effect was to
> > > > increase the mtu.  In the case of MTU=9000, GRO seemed to have a negative
> > > > impact on throughput, though a significant positive effect on CPU
> > > > utilisation.
> > > [...]
> > > 
> > > Increasing MTU also increases the interval between packets on a TCP flow
> > > using maximum segment size so that it is more likely to exceed the
> > > difference in delay.
> > > 
> > 
> > GRO really is operational _if_ we receive in same NAPI run several
> > packets for the same flow.
> > 
> > As soon as we exit NAPI mode, GRO packets are flushed.
> > 
> > Big MTU --> bigger delays between packets, so big chance that GRO cannot
> > trigger at all, since NAPI runs for one packet only.
> > 
> > One possibility with big MTU is to tweak "ethtool -c eth0" params
> > rx-usecs: 20
> > rx-frames: 5
> > rx-usecs-irq: 0
> > rx-frames-irq: 5
> > so that "rx-usecs" is bigger than the delay between two MTU full sized
> > packets.
> > 
> > Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us
> > delay between packets.
> > 
> > So try :
> > 
> > ethtool -C eth0 rx-usecs 100
> > 
> > to get chance that several packets are delivered at once by NIC.
> > 
> > Unfortunately, this also add some latency, so it helps bulk transferts,
> > and slowdown interactive traffic 
> 
> Thanks Eric,
> 
> I was tweaking those values recently for some latency tuning
> but I didn't think of them in relation to last night's tests.
> 
> In terms of my measurements, its just benchmarking at this stage.
> So a trade-off between throughput and latency is acceptable, so long
> as I remember to measure what it is.

Thanks, rx-usecs was set to 3 and changing it to 15 on the server
did seem increase throughput with 1500 byte packets. Although
CPU utilisation increased too, disproportionally so on the client.

MTU=1500, client,server:tcp_reordering=3, client:GSO=off,
	client:TSO=on, server:GRO=off, server:rx-usecs=3(default)
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      1591.34   16.35    5.80     1.683   2.390

MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off,
	client:TSO=on, server:GRO=off server:rx-usecs=15
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      1774.38   23.75    7.58     2.193   2.801

I also saw an improvement with GRO enabled on the server and TSO enabled on
the client.  Although in this case I found rx-usecs=45 to be the best
value.

MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off,
	client:TSO=on, server:GRO=on server:rx-usecs=3(default)
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      2553.27   13.31    3.35     0.854   0.860

MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off,
	client:TSO=on, server:GRO=on server:rx-usecs=45
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      2727.53   29.45    9.48     1.769   2.278

I did not observe any improvement in throughput when increasing rx-usecs
from 3 when using mtu=9000 although there was a slight increase in CPU
utilisation (maybe, there is quite a lot of noise in the results).

next prev parent reply	other threads:[~2010-12-03 13:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman
2010-11-30 15:42 ` Ben Hutchings
2010-11-30 16:04   ` Eric Dumazet
2010-12-01  4:34     ` Simon Horman
2010-12-01  4:47       ` Eric Dumazet
2010-12-02  6:39         ` Simon Horman
2010-12-03 13:38       ` Simon Horman [this message]
2010-12-01  4:31   ` Simon Horman
2010-11-30 17:56 ` Rick Jones
2010-11-30 18:14   ` Eric Dumazet
2010-12-01  4:30   ` Simon Horman
2010-12-01 19:42     ` Rick Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101203133800.GA26038@verge.net.au \
    --to=horms@verge.net.au \
    --cc=bhutchings@solarflare.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).