netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <fubar@us.ibm.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Lamparter <equinox@diac24.net>,
	Phillip Susi <psusi@cfl.rr.com>,
	netdev@vger.kernel.org
Subject: Re: 802.3ad bonding brain damaged?
Date: Mon, 08 Aug 2011 09:44:59 -0700	[thread overview]
Message-ID: <25847.1312821899@death> (raw)
In-Reply-To: <1312819168.2531.3.camel@edumazet-laptop>

Eric Dumazet <eric.dumazet@gmail.com> wrote:

>Le lundi 08 août 2011 à 09:57 +0200, David Lamparter a écrit :
>> Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi:
>> > - From Documentation/networking/bonding.txt:
>> > 
>> > 	Additionally, the linux bonding 802.3ad implementation
>> > 	distributes traffic by peer (using an XOR of MAC addresses),
>> > 
>> > This is counter to the entire point of 802.3ad. Distributing traffic by
>> > hash of the destination address is poor mans load balancing for
>> > systems not supporting 802.3ad. 
>> 
>> No, it isn't. 802.3ad/.1AX explicitly requires that no packet
>> re-ordering may ever occur, which can only be guaranteed by enqueueing
>> packets for one host on one TX interface. This behaviour is mandated by
>> 802.1AX-2008 page 15 which reads:
>> 
>>   This standard does not mandate any particular distribution
>>   algorithm(s); however, any distribution algorithm shall ensure that,
>>   when frames are received by a Frame Collector as specified in 5.2.3,
>>   the algorithm shall not cause
>>   a) Misordering of frames that are part of any given conversation, or
>>   b) Duplication of frames.
>> | The above requirement to maintain frame ordering is met by ensuring
>> | that all frames that compose a given conversation are transmitted on a
>> | single link in the order that they are generated by the MAC Client;
>>   hence, this requirement does not involve the addition (or
>>   modification) of any information to the MAC frame, nor any buffering
>>   or processing on the part of the corresponding Frame Collector in
>>   order to reorder frames. This approach to the operation of the
>>   distribution function permits a wide variety of distribution and load
>>   balancing algorithms to be used, while also ensuring interoperability
>>   between devices that adopt differing algorithms.
>> 
>
>It all depends on the definition of 'conversation'

	The definition from 802.1AX is:

3.8 conversation: A set of frames transmitted from one end station to
another, where all of the frames form an ordered sequence, and where the
communicating end stations require the ordering to be maintained among
the set of frames exchanged. (See IEEE Std 802.1AX, Clause 5.)

	So, basically, a TCP connection or a sequence of UDP datagrams
from one IP.port to another and optionally the reverse.

>Phillip assumed two (or more) TCP flows from machine A to machine B
>could use two different links, while you assert they MUST use a single
>link.

	The standard permits us to place separate conversations on
different ports, even if they are going to the same MAC destination.  

	802.1AX 5.2.1:

f) Frame ordering must be maintained for certain sequences of frame
exchanges between MAC Clients (known as conversations, see Clause
3). The Distributor ensures that all frames of a given conversation are
passed to a single port. For any given port, the Collector is required
to pass frames to the MAC Client in the order that they are received
from that port. The Collector is otherwise free to select frames
received from the aggregated ports in any order. Since there are no
means for frames to be misordered on a single link, this guarantees that
frame ordering is maintained for any conversation.

g) Conversations may be moved among ports within an aggregation, both
for load balancing and to maintain availability in the event of link
failures.

	The standard requires ordering for frames within any one
conversation, but does not require ordering of frames between
conversations.

	The layer2 (MAC) and layer3 (MAC + IP) hashes in bonding are
compliant to this.  The layer3+4 (IP + TCP/UDP port) is not, because
fragmented datagrams will hash differently than unfragmented datagrams.
I've not heard that this noncompliance has been a problem in actual
practice.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

  reply	other threads:[~2011-08-08 16:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-07 19:52 802.3ad bonding brain damaged? Phillip Susi
2011-08-08  7:57 ` David Lamparter
2011-08-08 15:59   ` Eric Dumazet
2011-08-08 16:44     ` Jay Vosburgh [this message]
2011-08-08 20:06   ` Phillip Susi
2011-08-08 20:08     ` Chris Adams
2011-08-08 20:14     ` Chris Friesen
2011-08-08 20:32       ` Phillip Susi
2011-08-08 20:42         ` Ben Hutchings
2011-08-09 11:24         ` Benny Amorsen
2011-08-08 20:54     ` Rick Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25847.1312821899@death \
    --to=fubar@us.ibm.com \
    --cc=equinox@diac24.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).