netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 802.3ad bonding brain damaged?
@ 2011-08-07 19:52 Phillip Susi
  2011-08-08  7:57 ` David Lamparter
  0 siblings, 1 reply; 11+ messages in thread
From: Phillip Susi @ 2011-08-07 19:52 UTC (permalink / raw)
  To: netdev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- From Documentation/networking/bonding.txt:

	Additionally, the linux bonding 802.3ad implementation
	distributes traffic by peer (using an XOR of MAC addresses),

This is counter to the entire point of 802.3ad.  Distributing traffic by
hash of the destination address is poor mans load balancing for systems
not supporting 802.3ad.  When in 802.3ad mode, packets are supposed to
be queued to whichever interface has the shortest tx length so a single
stream to a single host can be balanced across all links instead of
being restricted to one, while the other is idle.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4+7PMACgkQJ4UciIs+XuKJtwCgrubCy9NgiS3HppxpRRtx4W7l
aFkAnR1uLW+4aM/TOSQgYZVsf/4yXGvE
=Yetx
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-07 19:52 802.3ad bonding brain damaged? Phillip Susi
@ 2011-08-08  7:57 ` David Lamparter
  2011-08-08 15:59   ` Eric Dumazet
  2011-08-08 20:06   ` Phillip Susi
  0 siblings, 2 replies; 11+ messages in thread
From: David Lamparter @ 2011-08-08  7:57 UTC (permalink / raw)
  To: Phillip Susi; +Cc: netdev

Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi:
> - From Documentation/networking/bonding.txt:
> 
> 	Additionally, the linux bonding 802.3ad implementation
> 	distributes traffic by peer (using an XOR of MAC addresses),
> 
> This is counter to the entire point of 802.3ad. Distributing traffic by
> hash of the destination address is poor mans load balancing for
> systems not supporting 802.3ad. 

No, it isn't. 802.3ad/.1AX explicitly requires that no packet
re-ordering may ever occur, which can only be guaranteed by enqueueing
packets for one host on one TX interface. This behaviour is mandated by
802.1AX-2008 page 15 which reads:

  This standard does not mandate any particular distribution
  algorithm(s); however, any distribution algorithm shall ensure that,
  when frames are received by a Frame Collector as specified in 5.2.3,
  the algorithm shall not cause
  a) Misordering of frames that are part of any given conversation, or
  b) Duplication of frames.
| The above requirement to maintain frame ordering is met by ensuring
| that all frames that compose a given conversation are transmitted on a
| single link in the order that they are generated by the MAC Client;
  hence, this requirement does not involve the addition (or
  modification) of any information to the MAC frame, nor any buffering
  or processing on the part of the corresponding Frame Collector in
  order to reorder frames. This approach to the operation of the
  distribution function permits a wide variety of distribution and load
  balancing algorithms to be used, while also ensuring interoperability
  between devices that adopt differing algorithms.

(IMHO it is 802.3ad/.1AX that is brain damaged, I would've made this a
weak requirement with an optional mode switch for networks that require
a strong one.)

It is true that it might be possible to fulfill this requirement with a
more sophisticated approach or just ignore edge cases, but in that case
you can just use a different bonding algorithm.

> When in 802.3ad mode, packets are supposed to
> be queued to whichever interface has the shortest tx length so a single
> stream to a single host can be balanced across all links instead of
> being restricted to one, while the other is idle.

Please make sure to tag your assumptions as such and don't make
assertions without reading the specification.


-David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08  7:57 ` David Lamparter
@ 2011-08-08 15:59   ` Eric Dumazet
  2011-08-08 16:44     ` Jay Vosburgh
  2011-08-08 20:06   ` Phillip Susi
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-08-08 15:59 UTC (permalink / raw)
  To: David Lamparter; +Cc: Phillip Susi, netdev

Le lundi 08 août 2011 à 09:57 +0200, David Lamparter a écrit :
> Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi:
> > - From Documentation/networking/bonding.txt:
> > 
> > 	Additionally, the linux bonding 802.3ad implementation
> > 	distributes traffic by peer (using an XOR of MAC addresses),
> > 
> > This is counter to the entire point of 802.3ad. Distributing traffic by
> > hash of the destination address is poor mans load balancing for
> > systems not supporting 802.3ad. 
> 
> No, it isn't. 802.3ad/.1AX explicitly requires that no packet
> re-ordering may ever occur, which can only be guaranteed by enqueueing
> packets for one host on one TX interface. This behaviour is mandated by
> 802.1AX-2008 page 15 which reads:
> 
>   This standard does not mandate any particular distribution
>   algorithm(s); however, any distribution algorithm shall ensure that,
>   when frames are received by a Frame Collector as specified in 5.2.3,
>   the algorithm shall not cause
>   a) Misordering of frames that are part of any given conversation, or
>   b) Duplication of frames.
> | The above requirement to maintain frame ordering is met by ensuring
> | that all frames that compose a given conversation are transmitted on a
> | single link in the order that they are generated by the MAC Client;
>   hence, this requirement does not involve the addition (or
>   modification) of any information to the MAC frame, nor any buffering
>   or processing on the part of the corresponding Frame Collector in
>   order to reorder frames. This approach to the operation of the
>   distribution function permits a wide variety of distribution and load
>   balancing algorithms to be used, while also ensuring interoperability
>   between devices that adopt differing algorithms.
> 

It all depends on the definition of 'conversation'

Phillip assumed two (or more) TCP flows from machine A to machine B
could use two different links, while you assert they MUST use a single
link.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 15:59   ` Eric Dumazet
@ 2011-08-08 16:44     ` Jay Vosburgh
  0 siblings, 0 replies; 11+ messages in thread
From: Jay Vosburgh @ 2011-08-08 16:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Lamparter, Phillip Susi, netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:

>Le lundi 08 août 2011 à 09:57 +0200, David Lamparter a écrit :
>> Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi:
>> > - From Documentation/networking/bonding.txt:
>> > 
>> > 	Additionally, the linux bonding 802.3ad implementation
>> > 	distributes traffic by peer (using an XOR of MAC addresses),
>> > 
>> > This is counter to the entire point of 802.3ad. Distributing traffic by
>> > hash of the destination address is poor mans load balancing for
>> > systems not supporting 802.3ad. 
>> 
>> No, it isn't. 802.3ad/.1AX explicitly requires that no packet
>> re-ordering may ever occur, which can only be guaranteed by enqueueing
>> packets for one host on one TX interface. This behaviour is mandated by
>> 802.1AX-2008 page 15 which reads:
>> 
>>   This standard does not mandate any particular distribution
>>   algorithm(s); however, any distribution algorithm shall ensure that,
>>   when frames are received by a Frame Collector as specified in 5.2.3,
>>   the algorithm shall not cause
>>   a) Misordering of frames that are part of any given conversation, or
>>   b) Duplication of frames.
>> | The above requirement to maintain frame ordering is met by ensuring
>> | that all frames that compose a given conversation are transmitted on a
>> | single link in the order that they are generated by the MAC Client;
>>   hence, this requirement does not involve the addition (or
>>   modification) of any information to the MAC frame, nor any buffering
>>   or processing on the part of the corresponding Frame Collector in
>>   order to reorder frames. This approach to the operation of the
>>   distribution function permits a wide variety of distribution and load
>>   balancing algorithms to be used, while also ensuring interoperability
>>   between devices that adopt differing algorithms.
>> 
>
>It all depends on the definition of 'conversation'

	The definition from 802.1AX is:

3.8 conversation: A set of frames transmitted from one end station to
another, where all of the frames form an ordered sequence, and where the
communicating end stations require the ordering to be maintained among
the set of frames exchanged. (See IEEE Std 802.1AX, Clause 5.)

	So, basically, a TCP connection or a sequence of UDP datagrams
from one IP.port to another and optionally the reverse.

>Phillip assumed two (or more) TCP flows from machine A to machine B
>could use two different links, while you assert they MUST use a single
>link.

	The standard permits us to place separate conversations on
different ports, even if they are going to the same MAC destination.  

	802.1AX 5.2.1:

f) Frame ordering must be maintained for certain sequences of frame
exchanges between MAC Clients (known as conversations, see Clause
3). The Distributor ensures that all frames of a given conversation are
passed to a single port. For any given port, the Collector is required
to pass frames to the MAC Client in the order that they are received
from that port. The Collector is otherwise free to select frames
received from the aggregated ports in any order. Since there are no
means for frames to be misordered on a single link, this guarantees that
frame ordering is maintained for any conversation.

g) Conversations may be moved among ports within an aggregation, both
for load balancing and to maintain availability in the event of link
failures.

	The standard requires ordering for frames within any one
conversation, but does not require ordering of frames between
conversations.

	The layer2 (MAC) and layer3 (MAC + IP) hashes in bonding are
compliant to this.  The layer3+4 (IP + TCP/UDP port) is not, because
fragmented datagrams will hash differently than unfragmented datagrams.
I've not heard that this noncompliance has been a problem in actual
practice.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08  7:57 ` David Lamparter
  2011-08-08 15:59   ` Eric Dumazet
@ 2011-08-08 20:06   ` Phillip Susi
  2011-08-08 20:08     ` Chris Adams
                       ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Phillip Susi @ 2011-08-08 20:06 UTC (permalink / raw)
  To: David Lamparter; +Cc: netdev

On 8/8/2011 3:57 AM, David Lamparter wrote:
> No, it isn't. 802.3ad/.1AX explicitly requires that no packet
> re-ordering may ever occur, which can only be guaranteed by enqueueing
> packets for one host on one TX interface. This behaviour is mandated by
> 802.1AX-2008 page 15 which reads:

Outch, that does cause a big problem for store-and-forward switching. 
You basically can't split up packets from a single stream without very 
careful cut-through switching, which we obviously can't do in Linux. 
That seems a rather silly requirement given that higher level protocols 
already deal with packet reordering.  Why not an option to say stuff the 
standard?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 20:06   ` Phillip Susi
@ 2011-08-08 20:08     ` Chris Adams
  2011-08-08 20:14     ` Chris Friesen
  2011-08-08 20:54     ` Rick Jones
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Adams @ 2011-08-08 20:08 UTC (permalink / raw)
  To: netdev

Once upon a time, Phillip Susi <psusi@cfl.rr.com> said:
> On 8/8/2011 3:57 AM, David Lamparter wrote:
> >No, it isn't. 802.3ad/.1AX explicitly requires that no packet
> >re-ordering may ever occur, which can only be guaranteed by enqueueing
> >packets for one host on one TX interface. This behaviour is mandated by
> >802.1AX-2008 page 15 which reads:
> 
> Outch, that does cause a big problem for store-and-forward switching. 
> You basically can't split up packets from a single stream without very 
> careful cut-through switching, which we obviously can't do in Linux. 
> That seems a rather silly requirement given that higher level protocols 
> already deal with packet reordering.  Why not an option to say stuff the 
> standard?

Packet reordering introduces jitter, which is bad for things like VOIP.
-- 
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 20:06   ` Phillip Susi
  2011-08-08 20:08     ` Chris Adams
@ 2011-08-08 20:14     ` Chris Friesen
  2011-08-08 20:32       ` Phillip Susi
  2011-08-08 20:54     ` Rick Jones
  2 siblings, 1 reply; 11+ messages in thread
From: Chris Friesen @ 2011-08-08 20:14 UTC (permalink / raw)
  To: Phillip Susi; +Cc: David Lamparter, netdev

On 08/08/2011 02:06 PM, Phillip Susi wrote:
> On 8/8/2011 3:57 AM, David Lamparter wrote:
>> No, it isn't. 802.3ad/.1AX explicitly requires that no packet
>> re-ordering may ever occur, which can only be guaranteed by enqueueing
>> packets for one host on one TX interface. This behaviour is mandated by
>> 802.1AX-2008 page 15 which reads:
>
> Outch, that does cause a big problem for store-and-forward switching.
> You basically can't split up packets from a single stream without very
> careful cut-through switching, which we obviously can't do in Linux.
> That seems a rather silly requirement given that higher level protocols
> already deal with packet reordering. Why not an option to say stuff the
> standard?

Bonding doesn't know about "higher level protocols".  Also, assuming 
that higher level protocols already deal with reordering can be 
dangerous.  I've dealt with network protocols and apps that assumed 
there would be no reordering because at the time they were written they 
used point-to-point links.  They actually work fairly well with single 
links, so it would be reasonable to try and keep them working with 
bonded links.

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 20:14     ` Chris Friesen
@ 2011-08-08 20:32       ` Phillip Susi
  2011-08-08 20:42         ` Ben Hutchings
  2011-08-09 11:24         ` Benny Amorsen
  0 siblings, 2 replies; 11+ messages in thread
From: Phillip Susi @ 2011-08-08 20:32 UTC (permalink / raw)
  To: Chris Friesen; +Cc: David Lamparter, netdev

On 8/8/2011 4:14 PM, Chris Friesen wrote:
> Bonding doesn't know about "higher level protocols". Also, assuming that
> higher level protocols already deal with reordering can be dangerous.
> I've dealt with network protocols and apps that assumed there would be
> no reordering because at the time they were written they used
> point-to-point links. They actually work fairly well with single links,
> so it would be reasonable to try and keep them working with bonded links.

Try, sure, but if you can't without seriously affecting performance, 
then having a knob for damn the torpedoes, full speed ahead mode seems 
reasonable.

I wonder how it is that people have reported that Windows machines 
manage to do this?  Come to think of it, can windows even bond in 
software?  Maybe it's only possible on Windows with dual port cards 
where the drivers and hardware can make sure that the bonded interfaces 
service a single queue and maintain ordering that way?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 20:32       ` Phillip Susi
@ 2011-08-08 20:42         ` Ben Hutchings
  2011-08-09 11:24         ` Benny Amorsen
  1 sibling, 0 replies; 11+ messages in thread
From: Ben Hutchings @ 2011-08-08 20:42 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Chris Friesen, David Lamparter, netdev

On Mon, 2011-08-08 at 16:32 -0400, Phillip Susi wrote:
> On 8/8/2011 4:14 PM, Chris Friesen wrote:
> > Bonding doesn't know about "higher level protocols". Also, assuming that
> > higher level protocols already deal with reordering can be dangerous.
> > I've dealt with network protocols and apps that assumed there would be
> > no reordering because at the time they were written they used
> > point-to-point links. They actually work fairly well with single links,
> > so it would be reasonable to try and keep them working with bonded links.
> 
> Try, sure, but if you can't without seriously affecting performance, 
> then having a knob for damn the torpedoes, full speed ahead mode seems 
> reasonable.
> 
> I wonder how it is that people have reported that Windows machines 
> manage to do this?  Come to think of it, can windows even bond in 
> software?  Maybe it's only possible on Windows with dual port cards 
> where the drivers and hardware can make sure that the bonded interfaces 
> service a single queue and maintain ordering that way?

Microsoft doesn't provide a generic bonding driver for Windows.  (This
probably a sensible choice, considering how many different things people
expect the Linux bonding driver to do.)  Some hardware vendors provide
bonding or 'teaming' drivers that work with their own hardware, and
sometimes with other drivers as well.  So if people report that 'Windows
machines manage to do this' then you need to ask those people *which*
driver they are using.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 20:06   ` Phillip Susi
  2011-08-08 20:08     ` Chris Adams
  2011-08-08 20:14     ` Chris Friesen
@ 2011-08-08 20:54     ` Rick Jones
  2 siblings, 0 replies; 11+ messages in thread
From: Rick Jones @ 2011-08-08 20:54 UTC (permalink / raw)
  To: Phillip Susi; +Cc: David Lamparter, netdev

On 08/08/2011 01:06 PM, Phillip Susi wrote:
> On 8/8/2011 3:57 AM, David Lamparter wrote:
>> No, it isn't. 802.3ad/.1AX explicitly requires that no packet
>> re-ordering may ever occur, which can only be guaranteed by enqueueing
>> packets for one host on one TX interface. This behaviour is mandated by
>> 802.1AX-2008 page 15 which reads:
>
> Outch, that does cause a big problem for store-and-forward switching.
> You basically can't split up packets from a single stream without very
> careful cut-through switching, which we obviously can't do in Linux.
> That seems a rather silly requirement given that higher level protocols
> already deal with packet reordering. Why not an option to say stuff the
> standard?


At even in the case of protocols that deal with packet reordering, it is 
still quite possible to be sub-optimal.  Try running a TCP_STREAM test 
through a mode-rr bond with 4 or more links in it.  I suspect that even 
without injecting the occasional "other" packet there can be enough 
re-ordering to trigger spurious fast retransmissions.  At the very least 
it will trigger lots of immediate ACKnowledgements, which will drive-up 
the CPU utilization per KB transferred.  And if these spread packets 
arrive still spread at the receiver, round-robin will probably preclude 
effective GRO and certainly preclude LRO.

Apart from some very carefully controlled conditions, if one needs a 
single flow to go faster than a single link, it is probably time to move 
up to the next higher link speed.

rick jones

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 802.3ad bonding brain damaged?
  2011-08-08 20:32       ` Phillip Susi
  2011-08-08 20:42         ` Ben Hutchings
@ 2011-08-09 11:24         ` Benny Amorsen
  1 sibling, 0 replies; 11+ messages in thread
From: Benny Amorsen @ 2011-08-09 11:24 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Chris Friesen, David Lamparter, netdev

Phillip Susi <psusi@cfl.rr.com> writes:

> Try, sure, but if you can't without seriously affecting performance,
> then having a knob for damn the torpedoes, full speed ahead mode seems
> reasonable.

Packet reordering often affects performance. It can easily be more
costly than losing half the bandwidth of a bundled link.

These days you are even dependent on the NIC firmware for good
performance with TCP for reordered packets -- if the NIC is bad at
handling them, you don't get any performance boost from GRO.

For UDP, applications have to handle it on their own.


/Benny

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-08-09 11:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-07 19:52 802.3ad bonding brain damaged? Phillip Susi
2011-08-08  7:57 ` David Lamparter
2011-08-08 15:59   ` Eric Dumazet
2011-08-08 16:44     ` Jay Vosburgh
2011-08-08 20:06   ` Phillip Susi
2011-08-08 20:08     ` Chris Adams
2011-08-08 20:14     ` Chris Friesen
2011-08-08 20:32       ` Phillip Susi
2011-08-08 20:42         ` Ben Hutchings
2011-08-09 11:24         ` Benny Amorsen
2011-08-08 20:54     ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).