From: Rick Jones <rick.jones2@hp.com>
To: Simon Horman <horms@verge.net.au>
Cc: netdev@vger.kernel.org
Subject: Re: Bonding, GRO and tcp_reordering
Date: Tue, 30 Nov 2010 09:56:02 -0800 [thread overview]
Message-ID: <4CF53AB2.60209@hp.com> (raw)
In-Reply-To: <20101130135549.GA22688@verge.net.au>
Simon Horman wrote:
> Hi,
>
> I just wanted to share what is a rather pleasing,
> though to me somewhat surprising result.
>
> I am testing bonding using balance-rr mode with three physical links to try
> to get > gigabit speed for a single stream. Why? Because I'd like to run
> various tests at > gigabit speed and I don't have any 10G hardware at my
> disposal.
>
> The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both
> LSO and GSO disabled on both the sender and receiver I see:
>
> # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
Why 1472 bytes per send? If you wanted a 1-1 between the send size and the MSS,
I would guess that 1448 would have been in order. 1472 would be the maximum
data payload for a UDP/IPv4 datagram. TCP will have more header than UDP.
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> (172.17.60.216) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000
>
> But with GRO enabled on the receiver I see.
>
> # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> (172.17.60.216) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000
If you are changing things on the receiver, you should probably enable remote
CPU utilization measurement with the -C option.
> Which is much better than any result I get tweaking tcp_reordering when
> GRO is disabled on the receiver.
>
> Tweaking tcp_reordering when GRO is enabled on the receiver seems to have
> negligible effect. Which is interesting, because my brief reading on the
> subject indicated that tcp_reordering was the key tuning parameter for
> bonding with balance-rr.
You are in a maze of twisty heuristics and algorithms, all interacting :) If
there are only three links in the bond, I suspect the chances for spurrious fast
retransmission are somewhat smaller than if you had say four, based on just
hand-waving on three duplicate ACKs requires receipt of perhaps four out of
order segments.
> The only other parameter that seemed to have significant effect was to
> increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> impact on throughput, though a significant positive effect on CPU
> utilisation.
>
> MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off
> netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
9872?
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000
>
> MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on
> netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000
Short of packet traces, taking snapshots of netstat statistics before and after
each netperf run might be goodness - you can look at things like ratio of ACKs
to data segments/bytes and such. LRO/GRO can have a non-trivial effect on the
number of ACKs, and ACKs are what matter for fast retransmit.
netstat -s > before
netperf ...
netstat -s > after
beforeafter before after > delta
where beforeafter comes (for now, the site will have to go away before long as
the campus on which it is located has been sold)
ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract before from after.
happy benchmarking,
rick jones
next prev parent reply other threads:[~2010-11-30 17:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman
2010-11-30 15:42 ` Ben Hutchings
2010-11-30 16:04 ` Eric Dumazet
2010-12-01 4:34 ` Simon Horman
2010-12-01 4:47 ` Eric Dumazet
2010-12-02 6:39 ` Simon Horman
2010-12-03 13:38 ` Simon Horman
2010-12-01 4:31 ` Simon Horman
2010-11-30 17:56 ` Rick Jones [this message]
2010-11-30 18:14 ` Eric Dumazet
2010-12-01 4:30 ` Simon Horman
2010-12-01 19:42 ` Rick Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CF53AB2.60209@hp.com \
--to=rick.jones2@hp.com \
--cc=horms@verge.net.au \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.