From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: 802.3ad bonding brain damaged? Date: Mon, 08 Aug 2011 09:44:59 -0700 Message-ID: <25847.1312821899@death> References: <4E3EECF6.90409@cfl.rr.com> <1312790234.7020.26.camel@arkology.n2.diac24.net> <1312819168.2531.3.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Lamparter , Phillip Susi , netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from e7.ny.us.ibm.com ([32.97.182.137]:42368 "EHLO e7.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753771Ab1HHQpF convert rfc822-to-8bit (ORCPT ); Mon, 8 Aug 2011 12:45:05 -0400 Received: from d01relay06.pok.ibm.com (d01relay06.pok.ibm.com [9.56.227.116]) by e7.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p78GIwDN025740 for ; Mon, 8 Aug 2011 12:18:58 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay06.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p78Gj3Qb3285070 for ; Mon, 8 Aug 2011 12:45:03 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p78Gj1jU007410 for ; Mon, 8 Aug 2011 12:45:03 -0400 In-reply-to: <1312819168.2531.3.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: Eric Dumazet wrote: >Le lundi 08 ao=C3=BBt 2011 =C3=A0 09:57 +0200, David Lamparter a =C3=A9= crit : >> Am Sonntag, den 07.08.2011, 15:52 -0400 schrieb Phillip Susi: >> > - From Documentation/networking/bonding.txt: >> >=20 >> > Additionally, the linux bonding 802.3ad implementation >> > distributes traffic by peer (using an XOR of MAC addresses), >> >=20 >> > This is counter to the entire point of 802.3ad. Distributing traff= ic by >> > hash of the destination address is poor mans load balancing for >> > systems not supporting 802.3ad.=20 >>=20 >> No, it isn't. 802.3ad/.1AX explicitly requires that no packet >> re-ordering may ever occur, which can only be guaranteed by enqueuei= ng >> packets for one host on one TX interface. This behaviour is mandated= by >> 802.1AX-2008 page 15 which reads: >>=20 >> This standard does not mandate any particular distribution >> algorithm(s); however, any distribution algorithm shall ensure tha= t, >> when frames are received by a Frame Collector as specified in 5.2.= 3, >> the algorithm shall not cause >> a) Misordering of frames that are part of any given conversation, = or >> b) Duplication of frames. >> | The above requirement to maintain frame ordering is met by ensurin= g >> | that all frames that compose a given conversation are transmitted = on a >> | single link in the order that they are generated by the MAC Client= ; >> hence, this requirement does not involve the addition (or >> modification) of any information to the MAC frame, nor any bufferi= ng >> or processing on the part of the corresponding Frame Collector in >> order to reorder frames. This approach to the operation of the >> distribution function permits a wide variety of distribution and l= oad >> balancing algorithms to be used, while also ensuring interoperabil= ity >> between devices that adopt differing algorithms. >>=20 > >It all depends on the definition of 'conversation' The definition from 802.1AX is: 3.8 conversation: A set of frames transmitted from one end station to another, where all of the frames form an ordered sequence, and where th= e communicating end stations require the ordering to be maintained among the set of frames exchanged. (See IEEE Std 802.1AX, Clause 5.) So, basically, a TCP connection or a sequence of UDP datagrams from one IP.port to another and optionally the reverse. >Phillip assumed two (or more) TCP flows from machine A to machine B >could use two different links, while you assert they MUST use a single >link. The standard permits us to place separate conversations on different ports, even if they are going to the same MAC destination. =20 802.1AX 5.2.1: f) Frame ordering must be maintained for certain sequences of frame exchanges between MAC Clients (known as conversations, see Clause 3). The Distributor ensures that all frames of a given conversation are passed to a single port. For any given port, the Collector is required to pass frames to the MAC Client in the order that they are received from that port. The Collector is otherwise free to select frames received from the aggregated ports in any order. Since there are no means for frames to be misordered on a single link, this guarantees tha= t frame ordering is maintained for any conversation. g) Conversations may be moved among ports within an aggregation, both for load balancing and to maintain availability in the event of link failures. The standard requires ordering for frames within any one conversation, but does not require ordering of frames between conversations. The layer2 (MAC) and layer3 (MAC + IP) hashes in bonding are compliant to this. The layer3+4 (IP + TCP/UDP port) is not, because fragmented datagrams will hash differently than unfragmented datagrams. I've not heard that this noncompliance has been a problem in actual practice. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com