* Bonding, GRO and tcp_reordering @ 2010-11-30 13:55 Simon Horman 2010-11-30 15:42 ` Ben Hutchings 2010-11-30 17:56 ` Rick Jones 0 siblings, 2 replies; 12+ messages in thread From: Simon Horman @ 2010-11-30 13:55 UTC (permalink / raw) To: netdev Hi, I just wanted to share what is a rather pleasing, though to me somewhat surprising result. I am testing bonding using balance-rr mode with three physical links to try to get > gigabit speed for a single stream. Why? Because I'd like to run various tests at > gigabit speed and I don't have any 10G hardware at my disposal. The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both LSO and GSO disabled on both the sender and receiver I see: # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000 But with GRO enabled on the receiver I see. # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000 Which is much better than any result I get tweaking tcp_reordering when GRO is disabled on the receiver. Tweaking tcp_reordering when GRO is enabled on the receiver seems to have negligible effect. Which is interesting, because my brief reading on the subject indicated that tcp_reordering was the key tuning parameter for bonding with balance-rr. The only other parameter that seemed to have significant effect was to increase the mtu. In the case of MTU=9000, GRO seemed to have a negative impact on throughput, though a significant positive effect on CPU utilisation. MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000 MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000 Test run using 2.6.37-rc1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman @ 2010-11-30 15:42 ` Ben Hutchings 2010-11-30 16:04 ` Eric Dumazet 2010-12-01 4:31 ` Simon Horman 2010-11-30 17:56 ` Rick Jones 1 sibling, 2 replies; 12+ messages in thread From: Ben Hutchings @ 2010-11-30 15:42 UTC (permalink / raw) To: Simon Horman; +Cc: netdev On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote: > Hi, > > I just wanted to share what is a rather pleasing, > though to me somewhat surprising result. > > I am testing bonding using balance-rr mode with three physical links to try > to get > gigabit speed for a single stream. Why? Because I'd like to run > various tests at > gigabit speed and I don't have any 10G hardware at my > disposal. > > The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both > LSO and GSO disabled on both the sender and receiver I see: > > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > (172.17.60.216) port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000 > > But with GRO enabled on the receiver I see. > > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > (172.17.60.216) port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000 > > Which is much better than any result I get tweaking tcp_reordering when > GRO is disabled on the receiver. Did you also enable TSO/GSO on the sender? What TSO/GSO will do is to change the round-robin scheduling from one packet per interface to one super-packet per interface. GRO then coalesces the physical packets back into a super-packet. The intervals between receiving super-packets then tend to exceed the difference in delay between interfaces, hiding the reordering. If you only enabled GRO then I don't understand this. > Tweaking tcp_reordering when GRO is enabled on the receiver seems to have > negligible effect. Which is interesting, because my brief reading on the > subject indicated that tcp_reordering was the key tuning parameter for > bonding with balance-rr. > > The only other parameter that seemed to have significant effect was to > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > impact on throughput, though a significant positive effect on CPU > utilisation. [...] Increasing MTU also increases the interval between packets on a TCP flow using maximum segment size so that it is more likely to exceed the difference in delay. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 15:42 ` Ben Hutchings @ 2010-11-30 16:04 ` Eric Dumazet 2010-12-01 4:34 ` Simon Horman 2010-12-01 4:31 ` Simon Horman 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2010-11-30 16:04 UTC (permalink / raw) To: Ben Hutchings; +Cc: Simon Horman, netdev Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit : > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote: > > The only other parameter that seemed to have significant effect was to > > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > > impact on throughput, though a significant positive effect on CPU > > utilisation. > [...] > > Increasing MTU also increases the interval between packets on a TCP flow > using maximum segment size so that it is more likely to exceed the > difference in delay. > GRO really is operational _if_ we receive in same NAPI run several packets for the same flow. As soon as we exit NAPI mode, GRO packets are flushed. Big MTU --> bigger delays between packets, so big chance that GRO cannot trigger at all, since NAPI runs for one packet only. One possibility with big MTU is to tweak "ethtool -c eth0" params rx-usecs: 20 rx-frames: 5 rx-usecs-irq: 0 rx-frames-irq: 5 so that "rx-usecs" is bigger than the delay between two MTU full sized packets. Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us delay between packets. So try : ethtool -C eth0 rx-usecs 100 to get chance that several packets are delivered at once by NIC. Unfortunately, this also add some latency, so it helps bulk transferts, and slowdown interactive traffic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 16:04 ` Eric Dumazet @ 2010-12-01 4:34 ` Simon Horman 2010-12-01 4:47 ` Eric Dumazet 2010-12-03 13:38 ` Simon Horman 0 siblings, 2 replies; 12+ messages in thread From: Simon Horman @ 2010-12-01 4:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: Ben Hutchings, netdev On Tue, Nov 30, 2010 at 05:04:33PM +0100, Eric Dumazet wrote: > Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit : > > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote: > > > > The only other parameter that seemed to have significant effect was to > > > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > > > impact on throughput, though a significant positive effect on CPU > > > utilisation. > > [...] > > > > Increasing MTU also increases the interval between packets on a TCP flow > > using maximum segment size so that it is more likely to exceed the > > difference in delay. > > > > GRO really is operational _if_ we receive in same NAPI run several > packets for the same flow. > > As soon as we exit NAPI mode, GRO packets are flushed. > > Big MTU --> bigger delays between packets, so big chance that GRO cannot > trigger at all, since NAPI runs for one packet only. > > One possibility with big MTU is to tweak "ethtool -c eth0" params > rx-usecs: 20 > rx-frames: 5 > rx-usecs-irq: 0 > rx-frames-irq: 5 > so that "rx-usecs" is bigger than the delay between two MTU full sized > packets. > > Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us > delay between packets. > > So try : > > ethtool -C eth0 rx-usecs 100 > > to get chance that several packets are delivered at once by NIC. > > Unfortunately, this also add some latency, so it helps bulk transferts, > and slowdown interactive traffic Thanks Eric, I was tweaking those values recently for some latency tuning but I didn't think of them in relation to last night's tests. In terms of my measurements, its just benchmarking at this stage. So a trade-off between throughput and latency is acceptable, so long as I remember to measure what it is. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-12-01 4:34 ` Simon Horman @ 2010-12-01 4:47 ` Eric Dumazet 2010-12-02 6:39 ` Simon Horman 2010-12-03 13:38 ` Simon Horman 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2010-12-01 4:47 UTC (permalink / raw) To: Simon Horman; +Cc: Ben Hutchings, netdev Le mercredi 01 décembre 2010 à 13:34 +0900, Simon Horman a écrit : > I was tweaking those values recently for some latency tuning > but I didn't think of them in relation to last night's tests. > > In terms of my measurements, its just benchmarking at this stage. > So a trade-off between throughput and latency is acceptable, so long > as I remember to measure what it is. > I was thinking again this morning about GRO and bonding, and dont know if it actually works... Is GRO on on individual eth0/eth1/eth2 you use, or on bonding device itself ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-12-01 4:47 ` Eric Dumazet @ 2010-12-02 6:39 ` Simon Horman 0 siblings, 0 replies; 12+ messages in thread From: Simon Horman @ 2010-12-02 6:39 UTC (permalink / raw) To: Eric Dumazet; +Cc: Ben Hutchings, netdev On Wed, Dec 01, 2010 at 05:47:06AM +0100, Eric Dumazet wrote: > Le mercredi 01 décembre 2010 à 13:34 +0900, Simon Horman a écrit : > > > I was tweaking those values recently for some latency tuning > > but I didn't think of them in relation to last night's tests. > > > > In terms of my measurements, its just benchmarking at this stage. > > So a trade-off between throughput and latency is acceptable, so long > > as I remember to measure what it is. > > > > I was thinking again this morning about GRO and bonding, and dont know > if it actually works... > > Is GRO on on individual eth0/eth1/eth2 you use, or on bonding device > itself ? All of the above. I can check different combinations if it helps. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-12-01 4:34 ` Simon Horman 2010-12-01 4:47 ` Eric Dumazet @ 2010-12-03 13:38 ` Simon Horman 1 sibling, 0 replies; 12+ messages in thread From: Simon Horman @ 2010-12-03 13:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: Ben Hutchings, netdev On Wed, Dec 01, 2010 at 01:34:45PM +0900, Simon Horman wrote: > On Tue, Nov 30, 2010 at 05:04:33PM +0100, Eric Dumazet wrote: > > Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit : > > > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote: To clarify my statement in a previous email that GSO had no effect: I re-ran the tests and I still haven't observed any affect of GSO on my results. However, I did notice that in order for GRO on the server to have effect I also need TSO enabled on the client. I thought that I had previously checked that but I was mistaken. Enabling TSO on the client while leaving GSO disabled on the server resulted in increased CPU utilisation on the client, from ~15% to ~20%. > > > > The only other parameter that seemed to have significant effect was to > > > > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > > > > impact on throughput, though a significant positive effect on CPU > > > > utilisation. > > > [...] > > > > > > Increasing MTU also increases the interval between packets on a TCP flow > > > using maximum segment size so that it is more likely to exceed the > > > difference in delay. > > > > > > > GRO really is operational _if_ we receive in same NAPI run several > > packets for the same flow. > > > > As soon as we exit NAPI mode, GRO packets are flushed. > > > > Big MTU --> bigger delays between packets, so big chance that GRO cannot > > trigger at all, since NAPI runs for one packet only. > > > > One possibility with big MTU is to tweak "ethtool -c eth0" params > > rx-usecs: 20 > > rx-frames: 5 > > rx-usecs-irq: 0 > > rx-frames-irq: 5 > > so that "rx-usecs" is bigger than the delay between two MTU full sized > > packets. > > > > Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us > > delay between packets. > > > > So try : > > > > ethtool -C eth0 rx-usecs 100 > > > > to get chance that several packets are delivered at once by NIC. > > > > Unfortunately, this also add some latency, so it helps bulk transferts, > > and slowdown interactive traffic > > Thanks Eric, > > I was tweaking those values recently for some latency tuning > but I didn't think of them in relation to last night's tests. > > In terms of my measurements, its just benchmarking at this stage. > So a trade-off between throughput and latency is acceptable, so long > as I remember to measure what it is. Thanks, rx-usecs was set to 3 and changing it to 15 on the server did seem increase throughput with 1500 byte packets. Although CPU utilisation increased too, disproportionally so on the client. MTU=1500, client,server:tcp_reordering=3, client:GSO=off, client:TSO=on, server:GRO=off, server:rx-usecs=3(default) # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 1591.34 16.35 5.80 1.683 2.390 MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off, client:TSO=on, server:GRO=off server:rx-usecs=15 # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 1774.38 23.75 7.58 2.193 2.801 I also saw an improvement with GRO enabled on the server and TSO enabled on the client. Although in this case I found rx-usecs=45 to be the best value. MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off, client:TSO=on, server:GRO=on server:rx-usecs=3(default) # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 2553.27 13.31 3.35 0.854 0.860 MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off, client:TSO=on, server:GRO=on server:rx-usecs=45 # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 2727.53 29.45 9.48 1.769 2.278 I did not observe any improvement in throughput when increasing rx-usecs from 3 when using mtu=9000 although there was a slight increase in CPU utilisation (maybe, there is quite a lot of noise in the results). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 15:42 ` Ben Hutchings 2010-11-30 16:04 ` Eric Dumazet @ 2010-12-01 4:31 ` Simon Horman 1 sibling, 0 replies; 12+ messages in thread From: Simon Horman @ 2010-12-01 4:31 UTC (permalink / raw) To: Ben Hutchings; +Cc: netdev On Tue, Nov 30, 2010 at 03:42:56PM +0000, Ben Hutchings wrote: > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote: > > Hi, > > > > I just wanted to share what is a rather pleasing, > > though to me somewhat surprising result. > > > > I am testing bonding using balance-rr mode with three physical links to try > > to get > gigabit speed for a single stream. Why? Because I'd like to run > > various tests at > gigabit speed and I don't have any 10G hardware at my > > disposal. > > > > The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both > > LSO and GSO disabled on both the sender and receiver I see: > > > > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > > (172.17.60.216) port 0 AF_INET > > Recv Send Send Utilization Service Demand > > Socket Socket Message Elapsed Send Recv Send Recv > > Size Size Size Time Throughput local remote local remote > > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > > > 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000 > > > > But with GRO enabled on the receiver I see. > > > > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > > (172.17.60.216) port 0 AF_INET > > Recv Send Send Utilization Service Demand > > Socket Socket Message Elapsed Send Recv Send Recv > > Size Size Size Time Throughput local remote local remote > > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > > > 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000 > > > > Which is much better than any result I get tweaking tcp_reordering when > > GRO is disabled on the receiver. > > Did you also enable TSO/GSO on the sender? It didn't seem to make any difference either way. I'll re-test just in case I missed something. > > What TSO/GSO will do is to change the round-robin scheduling from one > packet per interface to one super-packet per interface. GRO then > coalesces the physical packets back into a super-packet. The intervals > between receiving super-packets then tend to exceed the difference in > delay between interfaces, hiding the reordering. > > If you only enabled GRO then I don't understand this. > > > Tweaking tcp_reordering when GRO is enabled on the receiver seems to have > > negligible effect. Which is interesting, because my brief reading on the > > subject indicated that tcp_reordering was the key tuning parameter for > > bonding with balance-rr. > > > > The only other parameter that seemed to have significant effect was to > > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > > impact on throughput, though a significant positive effect on CPU > > utilisation. > [...] > > Increasing MTU also increases the interval between packets on a TCP flow > using maximum segment size so that it is more likely to exceed the > difference in delay. I hadn't considered that, thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman 2010-11-30 15:42 ` Ben Hutchings @ 2010-11-30 17:56 ` Rick Jones 2010-11-30 18:14 ` Eric Dumazet 2010-12-01 4:30 ` Simon Horman 1 sibling, 2 replies; 12+ messages in thread From: Rick Jones @ 2010-11-30 17:56 UTC (permalink / raw) To: Simon Horman; +Cc: netdev Simon Horman wrote: > Hi, > > I just wanted to share what is a rather pleasing, > though to me somewhat surprising result. > > I am testing bonding using balance-rr mode with three physical links to try > to get > gigabit speed for a single stream. Why? Because I'd like to run > various tests at > gigabit speed and I don't have any 10G hardware at my > disposal. > > The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both > LSO and GSO disabled on both the sender and receiver I see: > > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 Why 1472 bytes per send? If you wanted a 1-1 between the send size and the MSS, I would guess that 1448 would have been in order. 1472 would be the maximum data payload for a UDP/IPv4 datagram. TCP will have more header than UDP. > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > (172.17.60.216) port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000 > > But with GRO enabled on the receiver I see. > > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > (172.17.60.216) port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000 If you are changing things on the receiver, you should probably enable remote CPU utilization measurement with the -C option. > Which is much better than any result I get tweaking tcp_reordering when > GRO is disabled on the receiver. > > Tweaking tcp_reordering when GRO is enabled on the receiver seems to have > negligible effect. Which is interesting, because my brief reading on the > subject indicated that tcp_reordering was the key tuning parameter for > bonding with balance-rr. You are in a maze of twisty heuristics and algorithms, all interacting :) If there are only three links in the bond, I suspect the chances for spurrious fast retransmission are somewhat smaller than if you had say four, based on just hand-waving on three duplicate ACKs requires receipt of perhaps four out of order segments. > The only other parameter that seemed to have significant effect was to > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > impact on throughput, though a significant positive effect on CPU > utilisation. > > MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off > netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872 9872? > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000 > > MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on > netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872 > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000 Short of packet traces, taking snapshots of netstat statistics before and after each netperf run might be goodness - you can look at things like ratio of ACKs to data segments/bytes and such. LRO/GRO can have a non-trivial effect on the number of ACKs, and ACKs are what matter for fast retransmit. netstat -s > before netperf ... netstat -s > after beforeafter before after > delta where beforeafter comes (for now, the site will have to go away before long as the campus on which it is located has been sold) ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract before from after. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 17:56 ` Rick Jones @ 2010-11-30 18:14 ` Eric Dumazet 2010-12-01 4:30 ` Simon Horman 1 sibling, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2010-11-30 18:14 UTC (permalink / raw) To: Rick Jones; +Cc: Simon Horman, netdev Le mardi 30 novembre 2010 à 09:56 -0800, Rick Jones a écrit : > Short of packet traces, taking snapshots of netstat statistics before and after > each netperf run might be goodness - you can look at things like ratio of ACKs > to data segments/bytes and such. LRO/GRO can have a non-trivial effect on the > number of ACKs, and ACKs are what matter for fast retransmit. > > netstat -s > before > netperf ... > netstat -s > after > beforeafter before after > delta > > where beforeafter comes (for now, the site will have to go away before long as > the campus on which it is located has been sold) > ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract before from after. > > happy benchmarking, Yes indeed. With fast enough medium (or small MTUS), we can enter in a backlog processing problem {filling huge receive queues}, as seen on loopback lately... netstat -s can show some receive queue overrun in this case. TCPBacklogDrop: xxx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-11-30 17:56 ` Rick Jones 2010-11-30 18:14 ` Eric Dumazet @ 2010-12-01 4:30 ` Simon Horman 2010-12-01 19:42 ` Rick Jones 1 sibling, 1 reply; 12+ messages in thread From: Simon Horman @ 2010-12-01 4:30 UTC (permalink / raw) To: Rick Jones; +Cc: netdev On Tue, Nov 30, 2010 at 09:56:02AM -0800, Rick Jones wrote: > Simon Horman wrote: > >Hi, > > > >I just wanted to share what is a rather pleasing, > >though to me somewhat surprising result. > > > >I am testing bonding using balance-rr mode with three physical links to try > >to get > gigabit speed for a single stream. Why? Because I'd like to run > >various tests at > gigabit speed and I don't have any 10G hardware at my > >disposal. > > > >The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both > >LSO and GSO disabled on both the sender and receiver I see: > > > ># netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > > Why 1472 bytes per send? If you wanted a 1-1 between the send size > and the MSS, I would guess that 1448 would have been in order. 1472 > would be the maximum data payload for a UDP/IPv4 datagram. TCP will > have more header than UDP. Only to be consistent with UDP testing that I was doing at the same time. I'll re-test with 1448. > > >TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > >(172.17.60.216) port 0 AF_INET > >Recv Send Send Utilization Service Demand > >Socket Socket Message Elapsed Send Recv Send Recv > >Size Size Size Time Throughput local remote local remote > >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > > > 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000 > > > >But with GRO enabled on the receiver I see. > > > ># netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472 > >TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 > >(172.17.60.216) port 0 AF_INET > >Recv Send Send Utilization Service Demand > >Socket Socket Message Elapsed Send Recv Send Recv > >Size Size Size Time Throughput local remote local remote > >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > > > 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000 > > If you are changing things on the receiver, you should probably > enable remote CPU utilization measurement with the -C option. Thanks, I will do so. > >Which is much better than any result I get tweaking tcp_reordering when > >GRO is disabled on the receiver. > > > >Tweaking tcp_reordering when GRO is enabled on the receiver seems to have > >negligible effect. Which is interesting, because my brief reading on the > >subject indicated that tcp_reordering was the key tuning parameter for > >bonding with balance-rr. > > You are in a maze of twisty heuristics and algorithms, all > interacting :) If there are only three links in the bond, I suspect > the chances for spurrious fast retransmission are somewhat smaller > than if you had say four, based on just hand-waving on three > duplicate ACKs requires receipt of perhaps four out of order > segments. Unfortunately NIC/slot availability only stretches to three links :-( If you think its really worthwhile I can obtain some more dual-port nics. > >The only other parameter that seemed to have significant effect was to > >increase the mtu. In the case of MTU=9000, GRO seemed to have a negative > >impact on throughput, though a significant positive effect on CPU > >utilisation. > > > >MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off > >netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872 > > 9872? It should have been 8972, I'll retest with 8948 as per your suggestion above. > >Recv Send Send Utilization Service Demand > >Socket Socket Message Elapsed Send Recv Send Recv > >Size Size Size Time Throughput local remote local remote > >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > > > 87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000 > > > >MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on > >netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872 > >Recv Send Send Utilization Service Demand > >Socket Socket Message Elapsed Send Recv Send Recv > >Size Size Size Time Throughput local remote local remote > >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > > > 87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000 > > Short of packet traces, taking snapshots of netstat statistics > before and after each netperf run might be goodness - you can look > at things like ratio of ACKs to data segments/bytes and such. > LRO/GRO can have a non-trivial effect on the number of ACKs, and > ACKs are what matter for fast retransmit. > > netstat -s > before > netperf ... > netstat -s > after > beforeafter before after > delta > > where beforeafter comes (for now, the site will have to go away > before long as the campus on which it is located has been sold) > ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract > before from after. Thanks, I'll take a look into that. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering 2010-12-01 4:30 ` Simon Horman @ 2010-12-01 19:42 ` Rick Jones 0 siblings, 0 replies; 12+ messages in thread From: Rick Jones @ 2010-12-01 19:42 UTC (permalink / raw) To: Simon Horman; +Cc: netdev >>You are in a maze of twisty heuristics and algorithms, all >>interacting :) If there are only three links in the bond, I suspect >>the chances for spurrious fast retransmission are somewhat smaller >>than if you had say four, based on just hand-waving on three >>duplicate ACKs requires receipt of perhaps four out of order >>segments. > > > Unfortunately NIC/slot availability only stretches to three links :-( > If you think its really worthwhile I can obtain some more dual-port nics. Only if you want to increase the chances of reordering that triggers spurrious fast retransmits. rick jones ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-12-03 13:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman 2010-11-30 15:42 ` Ben Hutchings 2010-11-30 16:04 ` Eric Dumazet 2010-12-01 4:34 ` Simon Horman 2010-12-01 4:47 ` Eric Dumazet 2010-12-02 6:39 ` Simon Horman 2010-12-03 13:38 ` Simon Horman 2010-12-01 4:31 ` Simon Horman 2010-11-30 17:56 ` Rick Jones 2010-11-30 18:14 ` Eric Dumazet 2010-12-01 4:30 ` Simon Horman 2010-12-01 19:42 ` Rick Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).