* 802.3ad bonding brain damaged? @ 2011-08-07 19:52 Phillip Susi 2011-08-08 7:57 ` David Lamparter 0 siblings, 1 reply; 11+ messages in thread From: Phillip Susi @ 2011-08-07 19:52 UTC (permalink / raw) To: netdev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - From Documentation/networking/bonding.txt: Additionally, the linux bonding 802.3ad implementation distributes traffic by peer (using an XOR of MAC addresses), This is counter to the entire point of 802.3ad. Distributing traffic by hash of the destination address is poor mans load balancing for systems not supporting 802.3ad. When in 802.3ad mode, packets are supposed to be queued to whichever interface has the shortest tx length so a single stream to a single host can be balanced across all links instead of being restricted to one, while the other is idle. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4+7PMACgkQJ4UciIs+XuKJtwCgrubCy9NgiS3HppxpRRtx4W7l aFkAnR1uLW+4aM/TOSQgYZVsf/4yXGvE =Yetx -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-07 19:52 802.3ad bonding brain damaged? Phillip Susi @ 2011-08-08 7:57 ` David Lamparter 2011-08-08 15:59 ` Eric Dumazet 2011-08-08 20:06 ` Phillip Susi 0 siblings, 2 replies; 11+ messages in thread From: David Lamparter @ 2011-08-08 7:57 UTC (permalink / raw) To: Phillip Susi; +Cc: netdev Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi: > - From Documentation/networking/bonding.txt: > > Additionally, the linux bonding 802.3ad implementation > distributes traffic by peer (using an XOR of MAC addresses), > > This is counter to the entire point of 802.3ad. Distributing traffic by > hash of the destination address is poor mans load balancing for > systems not supporting 802.3ad. No, it isn't. 802.3ad/.1AX explicitly requires that no packet re-ordering may ever occur, which can only be guaranteed by enqueueing packets for one host on one TX interface. This behaviour is mandated by 802.1AX-2008 page 15 which reads: This standard does not mandate any particular distribution algorithm(s); however, any distribution algorithm shall ensure that, when frames are received by a Frame Collector as specified in 5.2.3, the algorithm shall not cause a) Misordering of frames that are part of any given conversation, or b) Duplication of frames. | The above requirement to maintain frame ordering is met by ensuring | that all frames that compose a given conversation are transmitted on a | single link in the order that they are generated by the MAC Client; hence, this requirement does not involve the addition (or modification) of any information to the MAC frame, nor any buffering or processing on the part of the corresponding Frame Collector in order to reorder frames. This approach to the operation of the distribution function permits a wide variety of distribution and load balancing algorithms to be used, while also ensuring interoperability between devices that adopt differing algorithms. (IMHO it is 802.3ad/.1AX that is brain damaged, I would've made this a weak requirement with an optional mode switch for networks that require a strong one.) It is true that it might be possible to fulfill this requirement with a more sophisticated approach or just ignore edge cases, but in that case you can just use a different bonding algorithm. > When in 802.3ad mode, packets are supposed to > be queued to whichever interface has the shortest tx length so a single > stream to a single host can be balanced across all links instead of > being restricted to one, while the other is idle. Please make sure to tag your assumptions as such and don't make assertions without reading the specification. -David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 7:57 ` David Lamparter @ 2011-08-08 15:59 ` Eric Dumazet 2011-08-08 16:44 ` Jay Vosburgh 2011-08-08 20:06 ` Phillip Susi 1 sibling, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2011-08-08 15:59 UTC (permalink / raw) To: David Lamparter; +Cc: Phillip Susi, netdev Le lundi 08 août 2011 à 09:57 +0200, David Lamparter a écrit : > Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi: > > - From Documentation/networking/bonding.txt: > > > > Additionally, the linux bonding 802.3ad implementation > > distributes traffic by peer (using an XOR of MAC addresses), > > > > This is counter to the entire point of 802.3ad. Distributing traffic by > > hash of the destination address is poor mans load balancing for > > systems not supporting 802.3ad. > > No, it isn't. 802.3ad/.1AX explicitly requires that no packet > re-ordering may ever occur, which can only be guaranteed by enqueueing > packets for one host on one TX interface. This behaviour is mandated by > 802.1AX-2008 page 15 which reads: > > This standard does not mandate any particular distribution > algorithm(s); however, any distribution algorithm shall ensure that, > when frames are received by a Frame Collector as specified in 5.2.3, > the algorithm shall not cause > a) Misordering of frames that are part of any given conversation, or > b) Duplication of frames. > | The above requirement to maintain frame ordering is met by ensuring > | that all frames that compose a given conversation are transmitted on a > | single link in the order that they are generated by the MAC Client; > hence, this requirement does not involve the addition (or > modification) of any information to the MAC frame, nor any buffering > or processing on the part of the corresponding Frame Collector in > order to reorder frames. This approach to the operation of the > distribution function permits a wide variety of distribution and load > balancing algorithms to be used, while also ensuring interoperability > between devices that adopt differing algorithms. > It all depends on the definition of 'conversation' Phillip assumed two (or more) TCP flows from machine A to machine B could use two different links, while you assert they MUST use a single link. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 15:59 ` Eric Dumazet @ 2011-08-08 16:44 ` Jay Vosburgh 0 siblings, 0 replies; 11+ messages in thread From: Jay Vosburgh @ 2011-08-08 16:44 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Lamparter, Phillip Susi, netdev Eric Dumazet <eric.dumazet@gmail.com> wrote: >Le lundi 08 août 2011 à 09:57 +0200, David Lamparter a écrit : >> Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi: >> > - From Documentation/networking/bonding.txt: >> > >> > Additionally, the linux bonding 802.3ad implementation >> > distributes traffic by peer (using an XOR of MAC addresses), >> > >> > This is counter to the entire point of 802.3ad. Distributing traffic by >> > hash of the destination address is poor mans load balancing for >> > systems not supporting 802.3ad. >> >> No, it isn't. 802.3ad/.1AX explicitly requires that no packet >> re-ordering may ever occur, which can only be guaranteed by enqueueing >> packets for one host on one TX interface. This behaviour is mandated by >> 802.1AX-2008 page 15 which reads: >> >> This standard does not mandate any particular distribution >> algorithm(s); however, any distribution algorithm shall ensure that, >> when frames are received by a Frame Collector as specified in 5.2.3, >> the algorithm shall not cause >> a) Misordering of frames that are part of any given conversation, or >> b) Duplication of frames. >> | The above requirement to maintain frame ordering is met by ensuring >> | that all frames that compose a given conversation are transmitted on a >> | single link in the order that they are generated by the MAC Client; >> hence, this requirement does not involve the addition (or >> modification) of any information to the MAC frame, nor any buffering >> or processing on the part of the corresponding Frame Collector in >> order to reorder frames. This approach to the operation of the >> distribution function permits a wide variety of distribution and load >> balancing algorithms to be used, while also ensuring interoperability >> between devices that adopt differing algorithms. >> > >It all depends on the definition of 'conversation' The definition from 802.1AX is: 3.8 conversation: A set of frames transmitted from one end station to another, where all of the frames form an ordered sequence, and where the communicating end stations require the ordering to be maintained among the set of frames exchanged. (See IEEE Std 802.1AX, Clause 5.) So, basically, a TCP connection or a sequence of UDP datagrams from one IP.port to another and optionally the reverse. >Phillip assumed two (or more) TCP flows from machine A to machine B >could use two different links, while you assert they MUST use a single >link. The standard permits us to place separate conversations on different ports, even if they are going to the same MAC destination. 802.1AX 5.2.1: f) Frame ordering must be maintained for certain sequences of frame exchanges between MAC Clients (known as conversations, see Clause 3). The Distributor ensures that all frames of a given conversation are passed to a single port. For any given port, the Collector is required to pass frames to the MAC Client in the order that they are received from that port. The Collector is otherwise free to select frames received from the aggregated ports in any order. Since there are no means for frames to be misordered on a single link, this guarantees that frame ordering is maintained for any conversation. g) Conversations may be moved among ports within an aggregation, both for load balancing and to maintain availability in the event of link failures. The standard requires ordering for frames within any one conversation, but does not require ordering of frames between conversations. The layer2 (MAC) and layer3 (MAC + IP) hashes in bonding are compliant to this. The layer3+4 (IP + TCP/UDP port) is not, because fragmented datagrams will hash differently than unfragmented datagrams. I've not heard that this noncompliance has been a problem in actual practice. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 7:57 ` David Lamparter 2011-08-08 15:59 ` Eric Dumazet @ 2011-08-08 20:06 ` Phillip Susi 2011-08-08 20:08 ` Chris Adams ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: Phillip Susi @ 2011-08-08 20:06 UTC (permalink / raw) To: David Lamparter; +Cc: netdev On 8/8/2011 3:57 AM, David Lamparter wrote: > No, it isn't. 802.3ad/.1AX explicitly requires that no packet > re-ordering may ever occur, which can only be guaranteed by enqueueing > packets for one host on one TX interface. This behaviour is mandated by > 802.1AX-2008 page 15 which reads: Outch, that does cause a big problem for store-and-forward switching. You basically can't split up packets from a single stream without very careful cut-through switching, which we obviously can't do in Linux. That seems a rather silly requirement given that higher level protocols already deal with packet reordering. Why not an option to say stuff the standard? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 20:06 ` Phillip Susi @ 2011-08-08 20:08 ` Chris Adams 2011-08-08 20:14 ` Chris Friesen 2011-08-08 20:54 ` Rick Jones 2 siblings, 0 replies; 11+ messages in thread From: Chris Adams @ 2011-08-08 20:08 UTC (permalink / raw) To: netdev Once upon a time, Phillip Susi <psusi@cfl.rr.com> said: > On 8/8/2011 3:57 AM, David Lamparter wrote: > >No, it isn't. 802.3ad/.1AX explicitly requires that no packet > >re-ordering may ever occur, which can only be guaranteed by enqueueing > >packets for one host on one TX interface. This behaviour is mandated by > >802.1AX-2008 page 15 which reads: > > Outch, that does cause a big problem for store-and-forward switching. > You basically can't split up packets from a single stream without very > careful cut-through switching, which we obviously can't do in Linux. > That seems a rather silly requirement given that higher level protocols > already deal with packet reordering. Why not an option to say stuff the > standard? Packet reordering introduces jitter, which is bad for things like VOIP. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 20:06 ` Phillip Susi 2011-08-08 20:08 ` Chris Adams @ 2011-08-08 20:14 ` Chris Friesen 2011-08-08 20:32 ` Phillip Susi 2011-08-08 20:54 ` Rick Jones 2 siblings, 1 reply; 11+ messages in thread From: Chris Friesen @ 2011-08-08 20:14 UTC (permalink / raw) To: Phillip Susi; +Cc: David Lamparter, netdev On 08/08/2011 02:06 PM, Phillip Susi wrote: > On 8/8/2011 3:57 AM, David Lamparter wrote: >> No, it isn't. 802.3ad/.1AX explicitly requires that no packet >> re-ordering may ever occur, which can only be guaranteed by enqueueing >> packets for one host on one TX interface. This behaviour is mandated by >> 802.1AX-2008 page 15 which reads: > > Outch, that does cause a big problem for store-and-forward switching. > You basically can't split up packets from a single stream without very > careful cut-through switching, which we obviously can't do in Linux. > That seems a rather silly requirement given that higher level protocols > already deal with packet reordering. Why not an option to say stuff the > standard? Bonding doesn't know about "higher level protocols". Also, assuming that higher level protocols already deal with reordering can be dangerous. I've dealt with network protocols and apps that assumed there would be no reordering because at the time they were written they used point-to-point links. They actually work fairly well with single links, so it would be reasonable to try and keep them working with bonded links. Chris -- Chris Friesen Software Developer GENBAND chris.friesen@genband.com www.genband.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 20:14 ` Chris Friesen @ 2011-08-08 20:32 ` Phillip Susi 2011-08-08 20:42 ` Ben Hutchings 2011-08-09 11:24 ` Benny Amorsen 0 siblings, 2 replies; 11+ messages in thread From: Phillip Susi @ 2011-08-08 20:32 UTC (permalink / raw) To: Chris Friesen; +Cc: David Lamparter, netdev On 8/8/2011 4:14 PM, Chris Friesen wrote: > Bonding doesn't know about "higher level protocols". Also, assuming that > higher level protocols already deal with reordering can be dangerous. > I've dealt with network protocols and apps that assumed there would be > no reordering because at the time they were written they used > point-to-point links. They actually work fairly well with single links, > so it would be reasonable to try and keep them working with bonded links. Try, sure, but if you can't without seriously affecting performance, then having a knob for damn the torpedoes, full speed ahead mode seems reasonable. I wonder how it is that people have reported that Windows machines manage to do this? Come to think of it, can windows even bond in software? Maybe it's only possible on Windows with dual port cards where the drivers and hardware can make sure that the bonded interfaces service a single queue and maintain ordering that way? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 20:32 ` Phillip Susi @ 2011-08-08 20:42 ` Ben Hutchings 2011-08-09 11:24 ` Benny Amorsen 1 sibling, 0 replies; 11+ messages in thread From: Ben Hutchings @ 2011-08-08 20:42 UTC (permalink / raw) To: Phillip Susi; +Cc: Chris Friesen, David Lamparter, netdev On Mon, 2011-08-08 at 16:32 -0400, Phillip Susi wrote: > On 8/8/2011 4:14 PM, Chris Friesen wrote: > > Bonding doesn't know about "higher level protocols". Also, assuming that > > higher level protocols already deal with reordering can be dangerous. > > I've dealt with network protocols and apps that assumed there would be > > no reordering because at the time they were written they used > > point-to-point links. They actually work fairly well with single links, > > so it would be reasonable to try and keep them working with bonded links. > > Try, sure, but if you can't without seriously affecting performance, > then having a knob for damn the torpedoes, full speed ahead mode seems > reasonable. > > I wonder how it is that people have reported that Windows machines > manage to do this? Come to think of it, can windows even bond in > software? Maybe it's only possible on Windows with dual port cards > where the drivers and hardware can make sure that the bonded interfaces > service a single queue and maintain ordering that way? Microsoft doesn't provide a generic bonding driver for Windows. (This probably a sensible choice, considering how many different things people expect the Linux bonding driver to do.) Some hardware vendors provide bonding or 'teaming' drivers that work with their own hardware, and sometimes with other drivers as well. So if people report that 'Windows machines manage to do this' then you need to ask those people *which* driver they are using. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 20:32 ` Phillip Susi 2011-08-08 20:42 ` Ben Hutchings @ 2011-08-09 11:24 ` Benny Amorsen 1 sibling, 0 replies; 11+ messages in thread From: Benny Amorsen @ 2011-08-09 11:24 UTC (permalink / raw) To: Phillip Susi; +Cc: Chris Friesen, David Lamparter, netdev Phillip Susi <psusi@cfl.rr.com> writes: > Try, sure, but if you can't without seriously affecting performance, > then having a knob for damn the torpedoes, full speed ahead mode seems > reasonable. Packet reordering often affects performance. It can easily be more costly than losing half the bandwidth of a bundled link. These days you are even dependent on the NIC firmware for good performance with TCP for reordered packets -- if the NIC is bad at handling them, you don't get any performance boost from GRO. For UDP, applications have to handle it on their own. /Benny ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 802.3ad bonding brain damaged? 2011-08-08 20:06 ` Phillip Susi 2011-08-08 20:08 ` Chris Adams 2011-08-08 20:14 ` Chris Friesen @ 2011-08-08 20:54 ` Rick Jones 2 siblings, 0 replies; 11+ messages in thread From: Rick Jones @ 2011-08-08 20:54 UTC (permalink / raw) To: Phillip Susi; +Cc: David Lamparter, netdev On 08/08/2011 01:06 PM, Phillip Susi wrote: > On 8/8/2011 3:57 AM, David Lamparter wrote: >> No, it isn't. 802.3ad/.1AX explicitly requires that no packet >> re-ordering may ever occur, which can only be guaranteed by enqueueing >> packets for one host on one TX interface. This behaviour is mandated by >> 802.1AX-2008 page 15 which reads: > > Outch, that does cause a big problem for store-and-forward switching. > You basically can't split up packets from a single stream without very > careful cut-through switching, which we obviously can't do in Linux. > That seems a rather silly requirement given that higher level protocols > already deal with packet reordering. Why not an option to say stuff the > standard? At even in the case of protocols that deal with packet reordering, it is still quite possible to be sub-optimal. Try running a TCP_STREAM test through a mode-rr bond with 4 or more links in it. I suspect that even without injecting the occasional "other" packet there can be enough re-ordering to trigger spurious fast retransmissions. At the very least it will trigger lots of immediate ACKnowledgements, which will drive-up the CPU utilization per KB transferred. And if these spread packets arrive still spread at the receiver, round-robin will probably preclude effective GRO and certainly preclude LRO. Apart from some very carefully controlled conditions, if one needs a single flow to go faster than a single link, it is probably time to move up to the next higher link speed. rick jones ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-08-09 11:37 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-07 19:52 802.3ad bonding brain damaged? Phillip Susi 2011-08-08 7:57 ` David Lamparter 2011-08-08 15:59 ` Eric Dumazet 2011-08-08 16:44 ` Jay Vosburgh 2011-08-08 20:06 ` Phillip Susi 2011-08-08 20:08 ` Chris Adams 2011-08-08 20:14 ` Chris Friesen 2011-08-08 20:32 ` Phillip Susi 2011-08-08 20:42 ` Ben Hutchings 2011-08-09 11:24 ` Benny Amorsen 2011-08-08 20:54 ` Rick Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).