netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nicolas de Pesloüan" <nicolas.2p.debian@gmail.com>
To: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>,
	netdev@vger.kernel.org, David Miller <davem@davemloft.net>,
	Herbert Xu <herbert@gondor.hengli.com.au>,
	Jiri Pirko <jpirko@redhat.com>
Subject: Re: [PATCH net-2.6] bonding: drop frames received with master's source MAC
Date: Wed, 02 Mar 2011 00:08:22 +0100	[thread overview]
Message-ID: <4D6D7C66.6050205@gmail.com> (raw)
In-Reply-To: <20893.1299018331@death>

Le 01/03/2011 23:25, Jay Vosburgh a écrit :
> Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 01/03/2011 19:16, Andy Gospodarek a écrit :
>>
>> [snip]
>>
>>> Knowing that I'm using an unmanaged switch with balance-rr probably
>>> helps understand how this is happening.  I'll clarify this however, so
>>> we are all on the same page.
>>>
>>> In my situation, eth2 and eth3 are in bond0.  When bond0 transmits the
>>> NS, let's say it goes out eth3.  Since it is a multicast frame my switch
>>> will broadcast this to all ports and eth2 will receive the frame with
>>> the source MAC address being the same as bond0's MAC address.  This
>>> frame is passed up the stack to the ipv6 layer and appears to be a
>>> response to the NS from another host and is dropped.
>>
>> 'sounds perfectly normal.
>>
>> This problem is described in detail in chapter 5.4.3 and appendix A of
>> RFC4862 "IPv6 Stateless Address Autoconfiguration".
>>
>> As this is clearly IPv6 related, it sounds normal from my point of view to
>> fix it at the ndisc_recv_ns() level.
>
> 	Andy's immediate problem is IPv6 related, but the issue itself
> is generic: how to deal with broadcast / multicasts arriving at a -rr or
> -xor bond, because we do not and cannot know if the switch is going to
> flood to the slaves or not.  There may be other instances wherein that
> bonus copy of some packet confuses things.

Agreed, even if the only known instances that currently expose the problem is IPv6.

Anyway, let's try and fix it at the bonding level...

> 	My view is that -rr and -xor are intended to interoperate with
> Etherchannel.  Yes, they will often work tolerably well when connected
> to a non-Etherchannel switch.  But, if the host and the switch are not
> in agreement on the link aggregation status of the ports, some level of
> misbehavior is expected.  If that misbehavior can be corrected without
> adversely affecting a properly configured host and switch, then I don't
> see much problem with fixing it.
>
> 	For the IPv6 case here, I think there's a problem with any fix,
> and that is that there's no way for bonding to know if the switch ports
> are configured properly or not.  I'm using "properly" to mean that the
> switch ports corresponding to the bonding slaves are configured into an
> Etherchannel-type channel group.
>
> 	If the switch ports are grouped, then if IPv6 sees one of these
> messages coming in, it's actually a duplicate detection.  This because
> the switch won't loop the broadcast / multicast back around to a member
> of the channel group.
>
> 	If the switch ports are not grouped, then the switch will
> happily send broadcasts and multicasts to all ports of the bond, because
> it doesn't know about the aggregation.  In this case, I suspect there's
> no way to reliably determine if the incoming packet is a switch artifact
> or an actual duplicate detection.  Anybody know for sure if this is the
> case?
>
> 	For the generic case, I'm not seeing a way to distinguish actual
> repeated packets from switch artifact duplicate packets without adding
> another knob to bonding to tell it if the switch does etherchannel or
> not (which I'm not in favor of doing).

I originally thought about such knob and agree with you that we should avoid adding one more...

>> Quoting the RFC:
>>
>>   "In those cases where the hardware cannot suppress loopbacks, however,
>>    one possible software heuristic to filter out unwanted loopbacks is
>>    to discard any received packet whose link-layer source address is the
>>    same as the receiving interface's.  There is even a link-layer
>>    specification that requires that any such packets be discarded
>>    [IEEE802.11].  Unfortunately, use of that criteria also results in
>>    the discarding of all packets sent by another node using the same
>>    link-layer address.  Duplicate Address Detection will fail on
>>    interfaces that filter received packets in this manner:
>>
>>    [snip]
>>
>>    Thus, to perform Duplicate Address Detection correctly in the case
>>    where two interfaces are using the same link-layer address, an
>>    implementation must have a good understanding of the interface's
>>    multicast loopback semantics, and the interface cannot discard
>>    received packets simply because the source link-layer address is the
>>    same as the interface's."
>>
>> So, simply dropping frames whose source MAC == local MAC is apparently not the right solution.
>
> 	I tend to agree here, because this would break DAD for properly
> configured (meaning etherchannel on the switch ports) installations.
>
> 	Is there a way to fix bonding and/or ndisc_recv_ns to work
> correctly for both cases (have/don't have etherchannel on the switch)?

Can we imagine that, at the time we change the bonding mode to -rr or -xor, we simply brodcast or 
multicast one or two frames with some random data and wait to see whether we receive the frame back? 
If we receive at least one frame with the same random data, in one of the slaves interface for this 
bonding, we know for sure the switch configuration is not "multicast loop safe". Bonding already 
send ARP requests/replies in many situations. Adding one broadcast/multicast frame at bond setup 
time is probably acceptable.

And to ensure consistent results, we need to send such broadcast/multicast every time the link goes 
up for an already enslaved slave. This is not perfect, as the switch topology may change in a way 
that won't be detected by bonding, but still cause a new multicast loop, but...

Knowing the switch configuration is not "multicast loop safe", we can, at a minimum, issue a 
warning, telling the user she should expect strange behaviors, like false duplicate address detection.

And we can probably use this information into the should-drop logic, for mode that lack "inactive" 
slaves.

	Nicolas.

  reply	other threads:[~2011-03-01 23:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-25 21:13 [PATCH net-2.6] bonding: drop frames received with master's source MAC Andy Gospodarek
2011-02-25 22:04 ` Nicolas de Pesloüan
2011-02-25 22:24   ` Andy Gospodarek
2011-02-25 23:08     ` Nicolas de Pesloüan
2011-02-28 16:32       ` Andy Gospodarek
2011-02-28 21:45         ` Nicolas de Pesloüan
2011-03-01  2:35           ` Andy Gospodarek
2011-03-01  5:46             ` Jay Vosburgh
2011-03-01 18:16               ` Andy Gospodarek
2011-03-01 21:30                 ` Nicolas de Pesloüan
2011-03-01 22:25                   ` Jay Vosburgh
2011-03-01 23:08                     ` Nicolas de Pesloüan [this message]
     [not found]                       ` <AANLkTi=QTDNBf7Jskj55NP64Os8kgEs1WMpFGHMo+K3B@mail.gmail.com>
2011-03-02 12:30                         ` Herbert Xu
2011-03-02 20:30                           ` Nicolas de Pesloüan
2011-03-02 20:26                       ` Nicolas de Pesloüan
2011-02-25 22:28   ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6D7C66.6050205@gmail.com \
    --to=nicolas.2p.debian@gmail.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=fubar@us.ibm.com \
    --cc=herbert@gondor.hengli.com.au \
    --cc=jpirko@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).