* Use of 802.3ad bonding for increasing link throughput
@ 2011-08-10 12:07 Tom Brown
2011-08-10 13:23 ` Chris Adams
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Tom Brown @ 2011-08-10 12:07 UTC (permalink / raw)
To: netdev
[couldn't thread with '802.3ad bonding brain damaged', as I've just
signed up]
So, under what circumstances would a user actually use 802.3ad mode to
"increase" link throughput, rather than just for redundancy? Are there
any circumstances in which a single file, for example, could be
transferred at multiple-NIC speed? The 3 hashing options are:
- layer 2: presumably this always puts traffic on the same NIC, even in
a LAG with multiple NICs? Should layer 2 ever be used?
- layer2+3: can't be used for a single file, since it still hashes to
the same NIC, and can't be used for load-balancing, since different IP
endpoints go unintelligently to different NICs
- layer3+4: seems to have exactly the same issue as layer2+3, as well as
being non-compliant
I guess my problem is in understanding whether the 802.3/802.1AX spec
has any use at all beyond redundancy. Given the requirement to maintain
frame order at the distributor, I can't immediately see how having a
bonded group of, say, 3 NICs is any better than having 3 separate NICs.
Have I missed something obvious?
And, having said that, the redundancy features seem limited. For hot
standby, when the main link fails, you have to wait for both ends to
timeout, and re-negotiate via LACP, and hopefully pick up the same
lower-priority NIC, and then rely on a higher layer to request
retransmission of the missing frame. Do any of you have any experience
of using 802.1AX for anything useful and non-trivial?
So, to get multiple-NIC speed, are we stuck with balance-rr? But
presumably this only works if the other end of the link is also running
the bonding driver?
Thanks -
Tom
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Use of 802.3ad bonding for increasing link throughput 2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown @ 2011-08-10 13:23 ` Chris Adams 2011-08-10 13:50 ` Simon Farnsworth 2011-08-10 17:46 ` Jay Vosburgh 2 siblings, 0 replies; 5+ messages in thread From: Chris Adams @ 2011-08-10 13:23 UTC (permalink / raw) To: netdev Once upon a time, Tom Brown <sa212+glibc@cyconix.com> said: > So, under what circumstances would a user actually use 802.3ad mode to > "increase" link throughput, rather than just for redundancy? Are there > any circumstances in which a single file, for example, could be > transferred at multiple-NIC speed? The 3 hashing options are: It isn't going to increase the rate for a single stream. However, few setups have only a single TCP stream going across a segment, so this is still quite useful for real-world setups to increase the total throughput. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput 2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown 2011-08-10 13:23 ` Chris Adams @ 2011-08-10 13:50 ` Simon Farnsworth 2011-08-10 17:46 ` Jay Vosburgh 2 siblings, 0 replies; 5+ messages in thread From: Simon Farnsworth @ 2011-08-10 13:50 UTC (permalink / raw) To: netdev Tom Brown wrote: > [couldn't thread with '802.3ad bonding brain damaged', as I've just > signed up] > > So, under what circumstances would a user actually use 802.3ad mode to > "increase" link throughput, rather than just for redundancy? Are there > any circumstances in which a single file, for example, could be > transferred at multiple-NIC speed? The 3 hashing options are: > As an example, from my server room here; I have an install server (TFTP, FTP and HTTP) connected by a 2x1G LACP bond to the switch. When I have multiple clients installing simultaneously, the layer2 hash distributes the load nicely across both NICs - I can reach saturation on both NICs together. If I had routers between my clients and the install server, I'd need layer2+3 hashing to spread the clients over the links, but I'd still be able to push over a gigabit per second to the clients, despite being limited to 1GBit/s to each individual client by the packet distribution. I'm sure that you can think of lots of other situations in which you have multiple conversations sharing a link - those are the situations that gain speed from 802.3ad. -- Simon Farnsworth ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput 2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown 2011-08-10 13:23 ` Chris Adams 2011-08-10 13:50 ` Simon Farnsworth @ 2011-08-10 17:46 ` Jay Vosburgh 2011-08-25 9:35 ` Simon Horman 2 siblings, 1 reply; 5+ messages in thread From: Jay Vosburgh @ 2011-08-10 17:46 UTC (permalink / raw) To: Tom Brown; +Cc: netdev Tom Brown <sa212+glibc@cyconix.com> wrote: >[couldn't thread with '802.3ad bonding brain damaged', as I've just signed >up] > >So, under what circumstances would a user actually use 802.3ad mode to >"increase" link throughput, rather than just for redundancy? Are there any >circumstances in which a single file, for example, could be transferred at >multiple-NIC speed? Network load balancing, by and large, increases throughput in aggregate, not for individual connections. [...] The 3 hashing options are: > >- layer 2: presumably this always puts traffic on the same NIC, even in a >LAG with multiple NICs? Should layer 2 ever be used? Perhaps the network is such that the destinations are not bonded, and can't handle more than 1 interface's worth of throughput. Having the "server" end bonded still permits the clients deal with a single IP address, handle failures of devices on the server, etc. >- layer2+3: can't be used for a single file, since it still hashes to the >same NIC, and can't be used for load-balancing, since different IP >endpoints go unintelligently to different NICs > >- layer3+4: seems to have exactly the same issue as layer2+3, as well as >being non-compliant > >I guess my problem is in understanding whether the 802.3/802.1AX spec has >any use at all beyond redundancy. Given the requirement to maintain frame >order at the distributor, I can't immediately see how having a bonded >group of, say, 3 NICs is any better than having 3 separate NICs. Have I >missed something obvious? Others have answered this part already (that it permits larger aggregate throughput to/from the host, but not single-stream throughput greater than one interface's worth). This is by design, to prevent out of order delivery of packets. An aggregate of N devices can be better than 3 individual devices in that it will gracefully handle failure of one of the devices in the aggregate, and permits sharing of the bandwidth in aggregate without the peers having to be hard-coded to specific destinations. >And, having said that, the redundancy features seem limited. For hot >standby, when the main link fails, you have to wait for both ends to >timeout, and re-negotiate via LACP, and hopefully pick up the same >lower-priority NIC, and then rely on a higher layer to request >retransmission of the missing frame. Do any of you have any experience of >using 802.1AX for anything useful and non-trivial? In the linux implementation, as soon as the link goes down, that port is removed from the aggregator and a new aggregator is selected (which may be the same aggregator, depending on the option and configuration). Language in 802.1AX section 5.3.13 permits us to immediately remove a failed port from an aggregator without waiting for LACP to time out. >So, to get multiple-NIC speed, are we stuck with balance-rr? But >presumably this only works if the other end of the link is also running >the bonding driver? Striping a single connection across multiple network interfaces is very difficult to do without causing packets to be delivered out of order. Now, that said, if you want to have one TCP connection utilize more than one interface's worth of throughput, then yes, balance-rr is the only mode that may do that. The other end doesn't have to run bonding, but it must have sufficient aggregate bandwidth to accomodate the aggregate rate (e.g., N slower devices feeding into one faster device). Running balance-rr itself can be tricky to configure. An unmanaged switch may not handle multiple ports with the same MAC address very well (e.g., sending everything to one port, or sending everything to all the ports). A managed switch must have the relevant ports configured for etherchannel ("static link aggregation" in some documentation), and the switch may balance the traffic when it leaves the switch using its transmit algorithm. I'm not aware of any switches that have a round-robin balance policy, so the switch may end up hashing your traffic anyway (which will probably drop some of your packets, because you're feeding them in faster than the switch can send them out after they're hashed to one switch port). It's possible to play games on managed switches and, e.g., put each pair of ports (one at each end) into a separate VLAN, but schemes like that will fail badly if a link goes down somewhere. If each member of the bond goes through a different unmanaged and not interconnected switch, that may avoid those issues (and this was a common configuration back in the 10 Mb/sec days; it's described in bonding.txt in more detail). That configuration still has issues if a link fails. Connecting systems directly, back-to-back, should also avoid those issues. Lastly, balance-rr will deliver traffic out of order. Even the best case, N slow links feeding one faster link, delivers some small percentage out of order (in the low single digits). On linux, the tcp_reordering sysctl value can be raised to compensate, but it will still result in increased packet overhead, and is not likely to be very efficient, and doesn't help with anything that's not TCP/IP. I have not tested balance-rr in a few years now, but my recollection is that, as a best case, throughput of one TCP connection could reach about 1.5x with 2 slaves, or about 2.5x with 4 slaves (where the multipliers are in units of "bandwidth of one slave"). -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput 2011-08-10 17:46 ` Jay Vosburgh @ 2011-08-25 9:35 ` Simon Horman 0 siblings, 0 replies; 5+ messages in thread From: Simon Horman @ 2011-08-25 9:35 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Tom Brown, netdev On Wed, Aug 10, 2011 at 10:46:12AM -0700, Jay Vosburgh wrote: [snip] > On linux, the tcp_reordering sysctl value can be raised to > compensate, but it will still result in increased packet overhead, and > is not likely to be very efficient, and doesn't help with anything > that's not TCP/IP. I have not tested balance-rr in a few years now, but > my recollection is that, as a best case, throughput of one TCP > connection could reach about 1.5x with 2 slaves, or about 2.5x with 4 > slaves (where the multipliers are in units of "bandwidth of one slave"). Hi Jay, for what it is worth I would like to chip in with the results of some testing I did using ballance-rr and 3 gigabit NICs late last year. The link was three direct ("cross-over") cables to a machine that was also using balance-rr. I found that by increasing both rx-usecs (from 3 to 45) and enabling GRO and TSO I was able to push 2.7*10^9 bits/s. Local CPU utilisation was 30% and remote CPU utilisation was 10%. Local service demand was 1.7 us/KB and remote service demand was 2.2us/KB. The MTU was 1500 bytes. In this configuration, with the tuning options described above, increasing tcp_reordering (to 127) did not have a noticable effect on throughput but did increase local CPU utilisation to about 50% and local service demand to 3.0 us/KB. There was also increased remote CPU utilisation and service demand, although not as significant. By using an 9000 byte MTU I was able to get close to 3*10^9 bits/s with other parameters at their default values. Local CPU utilisation was 15% and remote CPU utilisation was 5%. Local service demand was 0.8us/KB and remote service demand was 1.1us/KB. Increasing rx-usecs was suggested to me by Eric Dumazet on this list. I no longer have access to the systems that I used to run these tests but I do have other results that I have omitted from this email for the sake of brevity. Anecdotally my opinion after running these and other tests is that if you want to push more than a gigabit/s over a single TCP stream then you would be well advised to get a faster link rather than bond gigabit devices. I believe you stated something similar earlier on in this thread. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-08-25 9:35 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown 2011-08-10 13:23 ` Chris Adams 2011-08-10 13:50 ` Simon Farnsworth 2011-08-10 17:46 ` Jay Vosburgh 2011-08-25 9:35 ` Simon Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).