* Use of 802.3ad bonding for increasing link throughput
@ 2011-08-10 12:07 Tom Brown
2011-08-10 13:23 ` Chris Adams
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Tom Brown @ 2011-08-10 12:07 UTC (permalink / raw)
To: netdev
[couldn't thread with '802.3ad bonding brain damaged', as I've just
signed up]
So, under what circumstances would a user actually use 802.3ad mode to
"increase" link throughput, rather than just for redundancy? Are there
any circumstances in which a single file, for example, could be
transferred at multiple-NIC speed? The 3 hashing options are:
- layer 2: presumably this always puts traffic on the same NIC, even in
a LAG with multiple NICs? Should layer 2 ever be used?
- layer2+3: can't be used for a single file, since it still hashes to
the same NIC, and can't be used for load-balancing, since different IP
endpoints go unintelligently to different NICs
- layer3+4: seems to have exactly the same issue as layer2+3, as well as
being non-compliant
I guess my problem is in understanding whether the 802.3/802.1AX spec
has any use at all beyond redundancy. Given the requirement to maintain
frame order at the distributor, I can't immediately see how having a
bonded group of, say, 3 NICs is any better than having 3 separate NICs.
Have I missed something obvious?
And, having said that, the redundancy features seem limited. For hot
standby, when the main link fails, you have to wait for both ends to
timeout, and re-negotiate via LACP, and hopefully pick up the same
lower-priority NIC, and then rely on a higher layer to request
retransmission of the missing frame. Do any of you have any experience
of using 802.1AX for anything useful and non-trivial?
So, to get multiple-NIC speed, are we stuck with balance-rr? But
presumably this only works if the other end of the link is also running
the bonding driver?
Thanks -
Tom
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput
2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown
@ 2011-08-10 13:23 ` Chris Adams
2011-08-10 13:50 ` Simon Farnsworth
2011-08-10 17:46 ` Jay Vosburgh
2 siblings, 0 replies; 5+ messages in thread
From: Chris Adams @ 2011-08-10 13:23 UTC (permalink / raw)
To: netdev
Once upon a time, Tom Brown <sa212+glibc@cyconix.com> said:
> So, under what circumstances would a user actually use 802.3ad mode to
> "increase" link throughput, rather than just for redundancy? Are there
> any circumstances in which a single file, for example, could be
> transferred at multiple-NIC speed? The 3 hashing options are:
It isn't going to increase the rate for a single stream. However, few
setups have only a single TCP stream going across a segment, so this is
still quite useful for real-world setups to increase the total
throughput.
--
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput
2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown
2011-08-10 13:23 ` Chris Adams
@ 2011-08-10 13:50 ` Simon Farnsworth
2011-08-10 17:46 ` Jay Vosburgh
2 siblings, 0 replies; 5+ messages in thread
From: Simon Farnsworth @ 2011-08-10 13:50 UTC (permalink / raw)
To: netdev
Tom Brown wrote:
> [couldn't thread with '802.3ad bonding brain damaged', as I've just
> signed up]
>
> So, under what circumstances would a user actually use 802.3ad mode to
> "increase" link throughput, rather than just for redundancy? Are there
> any circumstances in which a single file, for example, could be
> transferred at multiple-NIC speed? The 3 hashing options are:
>
As an example, from my server room here; I have an install server (TFTP, FTP
and HTTP) connected by a 2x1G LACP bond to the switch. When I have multiple
clients installing simultaneously, the layer2 hash distributes the load
nicely across both NICs - I can reach saturation on both NICs together.
If I had routers between my clients and the install server, I'd need
layer2+3 hashing to spread the clients over the links, but I'd still be able
to push over a gigabit per second to the clients, despite being limited to
1GBit/s to each individual client by the packet distribution.
I'm sure that you can think of lots of other situations in which you have
multiple conversations sharing a link - those are the situations that gain
speed from 802.3ad.
--
Simon Farnsworth
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput
2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown
2011-08-10 13:23 ` Chris Adams
2011-08-10 13:50 ` Simon Farnsworth
@ 2011-08-10 17:46 ` Jay Vosburgh
2011-08-25 9:35 ` Simon Horman
2 siblings, 1 reply; 5+ messages in thread
From: Jay Vosburgh @ 2011-08-10 17:46 UTC (permalink / raw)
To: Tom Brown; +Cc: netdev
Tom Brown <sa212+glibc@cyconix.com> wrote:
>[couldn't thread with '802.3ad bonding brain damaged', as I've just signed
>up]
>
>So, under what circumstances would a user actually use 802.3ad mode to
>"increase" link throughput, rather than just for redundancy? Are there any
>circumstances in which a single file, for example, could be transferred at
>multiple-NIC speed?
Network load balancing, by and large, increases throughput in
aggregate, not for individual connections.
[...] The 3 hashing options are:
>
>- layer 2: presumably this always puts traffic on the same NIC, even in a
>LAG with multiple NICs? Should layer 2 ever be used?
Perhaps the network is such that the destinations are not
bonded, and can't handle more than 1 interface's worth of throughput.
Having the "server" end bonded still permits the clients deal with a
single IP address, handle failures of devices on the server, etc.
>- layer2+3: can't be used for a single file, since it still hashes to the
>same NIC, and can't be used for load-balancing, since different IP
>endpoints go unintelligently to different NICs
>
>- layer3+4: seems to have exactly the same issue as layer2+3, as well as
>being non-compliant
>
>I guess my problem is in understanding whether the 802.3/802.1AX spec has
>any use at all beyond redundancy. Given the requirement to maintain frame
>order at the distributor, I can't immediately see how having a bonded
>group of, say, 3 NICs is any better than having 3 separate NICs. Have I
>missed something obvious?
Others have answered this part already (that it permits larger
aggregate throughput to/from the host, but not single-stream throughput
greater than one interface's worth). This is by design, to prevent out
of order delivery of packets.
An aggregate of N devices can be better than 3 individual
devices in that it will gracefully handle failure of one of the devices
in the aggregate, and permits sharing of the bandwidth in aggregate
without the peers having to be hard-coded to specific destinations.
>And, having said that, the redundancy features seem limited. For hot
>standby, when the main link fails, you have to wait for both ends to
>timeout, and re-negotiate via LACP, and hopefully pick up the same
>lower-priority NIC, and then rely on a higher layer to request
>retransmission of the missing frame. Do any of you have any experience of
>using 802.1AX for anything useful and non-trivial?
In the linux implementation, as soon as the link goes down, that
port is removed from the aggregator and a new aggregator is selected
(which may be the same aggregator, depending on the option and
configuration). Language in 802.1AX section 5.3.13 permits us to
immediately remove a failed port from an aggregator without waiting for
LACP to time out.
>So, to get multiple-NIC speed, are we stuck with balance-rr? But
>presumably this only works if the other end of the link is also running
>the bonding driver?
Striping a single connection across multiple network interfaces
is very difficult to do without causing packets to be delivered out of
order.
Now, that said, if you want to have one TCP connection utilize
more than one interface's worth of throughput, then yes, balance-rr is
the only mode that may do that. The other end doesn't have to run
bonding, but it must have sufficient aggregate bandwidth to accomodate
the aggregate rate (e.g., N slower devices feeding into one faster
device).
Running balance-rr itself can be tricky to configure. An
unmanaged switch may not handle multiple ports with the same MAC address
very well (e.g., sending everything to one port, or sending everything
to all the ports). A managed switch must have the relevant ports
configured for etherchannel ("static link aggregation" in some
documentation), and the switch may balance the traffic when it leaves
the switch using its transmit algorithm. I'm not aware of any switches
that have a round-robin balance policy, so the switch may end up hashing
your traffic anyway (which will probably drop some of your packets,
because you're feeding them in faster than the switch can send them out
after they're hashed to one switch port).
It's possible to play games on managed switches and, e.g., put
each pair of ports (one at each end) into a separate VLAN, but schemes
like that will fail badly if a link goes down somewhere.
If each member of the bond goes through a different unmanaged
and not interconnected switch, that may avoid those issues (and this was
a common configuration back in the 10 Mb/sec days; it's described in
bonding.txt in more detail). That configuration still has issues if a
link fails. Connecting systems directly, back-to-back, should also
avoid those issues.
Lastly, balance-rr will deliver traffic out of order. Even the
best case, N slow links feeding one faster link, delivers some small
percentage out of order (in the low single digits).
On linux, the tcp_reordering sysctl value can be raised to
compensate, but it will still result in increased packet overhead, and
is not likely to be very efficient, and doesn't help with anything
that's not TCP/IP. I have not tested balance-rr in a few years now, but
my recollection is that, as a best case, throughput of one TCP
connection could reach about 1.5x with 2 slaves, or about 2.5x with 4
slaves (where the multipliers are in units of "bandwidth of one slave").
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Use of 802.3ad bonding for increasing link throughput
2011-08-10 17:46 ` Jay Vosburgh
@ 2011-08-25 9:35 ` Simon Horman
0 siblings, 0 replies; 5+ messages in thread
From: Simon Horman @ 2011-08-25 9:35 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Tom Brown, netdev
On Wed, Aug 10, 2011 at 10:46:12AM -0700, Jay Vosburgh wrote:
[snip]
> On linux, the tcp_reordering sysctl value can be raised to
> compensate, but it will still result in increased packet overhead, and
> is not likely to be very efficient, and doesn't help with anything
> that's not TCP/IP. I have not tested balance-rr in a few years now, but
> my recollection is that, as a best case, throughput of one TCP
> connection could reach about 1.5x with 2 slaves, or about 2.5x with 4
> slaves (where the multipliers are in units of "bandwidth of one slave").
Hi Jay,
for what it is worth I would like to chip in with the results of some
testing I did using ballance-rr and 3 gigabit NICs late last year. The
link was three direct ("cross-over") cables to a machine that was also
using balance-rr.
I found that by increasing both rx-usecs (from 3 to 45) and enabling GRO
and TSO I was able to push 2.7*10^9 bits/s.
Local CPU utilisation was 30% and remote CPU utilisation was 10%.
Local service demand was 1.7 us/KB and remote service demand was 2.2us/KB.
The MTU was 1500 bytes.
In this configuration, with the tuning options described above, increasing
tcp_reordering (to 127) did not have a noticable effect on throughput but
did increase local CPU utilisation to about 50% and local service demand to
3.0 us/KB. There was also increased remote CPU utilisation and service
demand, although not as significant.
By using an 9000 byte MTU I was able to get close to 3*10^9 bits/s
with other parameters at their default values.
Local CPU utilisation was 15% and remote CPU utilisation was 5%.
Local service demand was 0.8us/KB and remote service demand was 1.1us/KB.
Increasing rx-usecs was suggested to me by Eric Dumazet on this list.
I no longer have access to the systems that I used to run these tests but I
do have other results that I have omitted from this email for the sake of
brevity.
Anecdotally my opinion after running these and other tests is that if you
want to push more than a gigabit/s over a single TCP stream then you would
be well advised to get a faster link rather than bond gigabit devices. I
believe you stated something similar earlier on in this thread.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-08-25 9:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-10 12:07 Use of 802.3ad bonding for increasing link throughput Tom Brown
2011-08-10 13:23 ` Chris Adams
2011-08-10 13:50 ` Simon Farnsworth
2011-08-10 17:46 ` Jay Vosburgh
2011-08-25 9:35 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).