From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?= Subject: Re: [PATCH net-2.6] bonding: drop frames received with master's source MAC Date: Wed, 02 Mar 2011 00:08:22 +0100 Message-ID: <4D6D7C66.6050205@gmail.com> References: <1298668408-14849-1-git-send-email-andy@greyhouse.net> <4D68276B.90104@gmail.com> <20110225222455.GI11864@gospo.rdu.redhat.com> <4D683653.4050409@gmail.com> <20110228163255.GJ11864@gospo.rdu.redhat.com> <4D6C1764.1040008@gmail.com> <20110301023525.GK11864@gospo.rdu.redhat.com> <9882.1298958366@death> <20110301181624.GM11864@gospo.rdu.redhat.com> <4D6D658C.90300@gmail.com> <20893.1299018331@death> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Andy Gospodarek , netdev@vger.kernel.org, David Miller , Herbert Xu , Jiri Pirko To: Jay Vosburgh Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:39299 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755084Ab1CAXIZ (ORCPT ); Tue, 1 Mar 2011 18:08:25 -0500 Received: by wyg36 with SMTP id 36so5224535wyg.19 for ; Tue, 01 Mar 2011 15:08:24 -0800 (PST) In-Reply-To: <20893.1299018331@death> Sender: netdev-owner@vger.kernel.org List-ID: Le 01/03/2011 23:25, Jay Vosburgh a =C3=A9crit : > Nicolas de Peslo=C3=BCan wrote: > >> Le 01/03/2011 19:16, Andy Gospodarek a =C3=A9crit : >> >> [snip] >> >>> Knowing that I'm using an unmanaged switch with balance-rr probably >>> helps understand how this is happening. I'll clarify this however,= so >>> we are all on the same page. >>> >>> In my situation, eth2 and eth3 are in bond0. When bond0 transmits = the >>> NS, let's say it goes out eth3. Since it is a multicast frame my s= witch >>> will broadcast this to all ports and eth2 will receive the frame wi= th >>> the source MAC address being the same as bond0's MAC address. This >>> frame is passed up the stack to the ipv6 layer and appears to be a >>> response to the NS from another host and is dropped. >> >> 'sounds perfectly normal. >> >> This problem is described in detail in chapter 5.4.3 and appendix A = of >> RFC4862 "IPv6 Stateless Address Autoconfiguration". >> >> As this is clearly IPv6 related, it sounds normal from my point of v= iew to >> fix it at the ndisc_recv_ns() level. > > Andy's immediate problem is IPv6 related, but the issue itself > is generic: how to deal with broadcast / multicasts arriving at a -rr= or > -xor bond, because we do not and cannot know if the switch is going t= o > flood to the slaves or not. There may be other instances wherein tha= t > bonus copy of some packet confuses things. Agreed, even if the only known instances that currently expose the prob= lem is IPv6. Anyway, let's try and fix it at the bonding level... > My view is that -rr and -xor are intended to interoperate with > Etherchannel. Yes, they will often work tolerably well when connecte= d > to a non-Etherchannel switch. But, if the host and the switch are no= t > in agreement on the link aggregation status of the ports, some level = of > misbehavior is expected. If that misbehavior can be corrected withou= t > adversely affecting a properly configured host and switch, then I don= 't > see much problem with fixing it. > > For the IPv6 case here, I think there's a problem with any fix, > and that is that there's no way for bonding to know if the switch por= ts > are configured properly or not. I'm using "properly" to mean that th= e > switch ports corresponding to the bonding slaves are configured into = an > Etherchannel-type channel group. > > If the switch ports are grouped, then if IPv6 sees one of these > messages coming in, it's actually a duplicate detection. This becaus= e > the switch won't loop the broadcast / multicast back around to a memb= er > of the channel group. > > If the switch ports are not grouped, then the switch will > happily send broadcasts and multicasts to all ports of the bond, beca= use > it doesn't know about the aggregation. In this case, I suspect there= 's > no way to reliably determine if the incoming packet is a switch artif= act > or an actual duplicate detection. Anybody know for sure if this is t= he > case? > > For the generic case, I'm not seeing a way to distinguish actual > repeated packets from switch artifact duplicate packets without addin= g > another knob to bonding to tell it if the switch does etherchannel or > not (which I'm not in favor of doing). I originally thought about such knob and agree with you that we should = avoid adding one more... >> Quoting the RFC: >> >> "In those cases where the hardware cannot suppress loopbacks, howe= ver, >> one possible software heuristic to filter out unwanted loopbacks = is >> to discard any received packet whose link-layer source address is= the >> same as the receiving interface's. There is even a link-layer >> specification that requires that any such packets be discarded >> [IEEE802.11]. Unfortunately, use of that criteria also results i= n >> the discarding of all packets sent by another node using the same >> link-layer address. Duplicate Address Detection will fail on >> interfaces that filter received packets in this manner: >> >> [snip] >> >> Thus, to perform Duplicate Address Detection correctly in the cas= e >> where two interfaces are using the same link-layer address, an >> implementation must have a good understanding of the interface's >> multicast loopback semantics, and the interface cannot discard >> received packets simply because the source link-layer address is = the >> same as the interface's." >> >> So, simply dropping frames whose source MAC =3D=3D local MAC is appa= rently not the right solution. > > I tend to agree here, because this would break DAD for properly > configured (meaning etherchannel on the switch ports) installations. > > Is there a way to fix bonding and/or ndisc_recv_ns to work > correctly for both cases (have/don't have etherchannel on the switch)= ? Can we imagine that, at the time we change the bonding mode to -rr or -= xor, we simply brodcast or=20 multicast one or two frames with some random data and wait to see wheth= er we receive the frame back?=20 If we receive at least one frame with the same random data, in one of t= he slaves interface for this=20 bonding, we know for sure the switch configuration is not "multicast lo= op safe". Bonding already=20 send ARP requests/replies in many situations. Adding one broadcast/mult= icast frame at bond setup=20 time is probably acceptable. And to ensure consistent results, we need to send such broadcast/multic= ast every time the link goes=20 up for an already enslaved slave. This is not perfect, as the switch to= pology may change in a way=20 that won't be detected by bonding, but still cause a new multicast loop= , but... Knowing the switch configuration is not "multicast loop safe", we can, = at a minimum, issue a=20 warning, telling the user she should expect strange behaviors, like fal= se duplicate address detection. And we can probably use this information into the should-drop logic, fo= r mode that lack "inactive"=20 slaves. Nicolas.