netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* unresponsive vlan on top of bond with fail_over_mac=active
@ 2012-10-10 23:11 Michal Kubecek
  2012-10-11  3:34 ` Jay Vosburgh
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Kubecek @ 2012-10-10 23:11 UTC (permalink / raw)
  To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek

Hello,

a customer of ours has the following problem:

A bond is set up in active-backup mode with fail_over_mac=1 (active). On
top of it, a VLAN is created so that it inherits MAC address of the bond
which is the same as address of its active slave.

When failover occurs, the bond switches its MAC address to address of
the new active slave but VLAN interface keeps the old address and it
stops receiving packets from outside.

The customer suggested that upon failover, not only bond should switch
its MAC address to the new active slave but also all VLAN interfaces on
top of it. I don't like this approach too much as there is already a
different mechanism for the problem: network device's uc list. Since
commits

  7d26bb10  bonding: emit event when bonding changes MAC
  2af73d4b  net/bonding: emit address change event also in bond_release

VLAN device's MAC address is copied into bond's uc list. Unfortunately
there is no code taking care of syncing the bond's uc list to its
slaves (so that the slave drops the packets for the VLAN). My idea is to
do this either via ndo_set_rx_mode method or in response to an event.

But before proposing a patch, I would like to ask: which approach is
preferrable: copying active slave's hw address to all VLAN devices
defined on top of the bond or syncing bond's uc list to its slaves?

Thanks in advance,
                                                         Michal Kubecek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: unresponsive vlan on top of bond with fail_over_mac=active
  2012-10-10 23:11 unresponsive vlan on top of bond with fail_over_mac=active Michal Kubecek
@ 2012-10-11  3:34 ` Jay Vosburgh
  2012-10-11 10:37   ` Michal Kubecek
  0 siblings, 1 reply; 5+ messages in thread
From: Jay Vosburgh @ 2012-10-11  3:34 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev, Andy Gospodarek

Michal Kubecek <mkubecek@suse.cz> wrote:

>Hello,
>
>a customer of ours has the following problem:
>
>A bond is set up in active-backup mode with fail_over_mac=1 (active). On
>top of it, a VLAN is created so that it inherits MAC address of the bond
>which is the same as address of its active slave.
>
>When failover occurs, the bond switches its MAC address to address of
>the new active slave but VLAN interface keeps the old address and it
>stops receiving packets from outside.

	What network device are they using that requires fail_over_mac
to be set to active?  The intended user of this facility is IPoIB, which
does not support VLANs (and therefore does not have this problem).  For
regular Ethernet, the active setting is not generally a good choice, as
network peers must be updated via gratutious ARP when a failover occurs,
so there is really no advantage to using it.

>The customer suggested that upon failover, not only bond should switch
>its MAC address to the new active slave but also all VLAN interfaces on
>top of it. I don't like this approach too much as there is already a
>different mechanism for the problem: network device's uc list. Since
>commits
>
>  7d26bb10  bonding: emit event when bonding changes MAC
>  2af73d4b  net/bonding: emit address change event also in bond_release
>
>VLAN device's MAC address is copied into bond's uc list. Unfortunately
>there is no code taking care of syncing the bond's uc list to its
>slaves (so that the slave drops the packets for the VLAN). My idea is to
>do this either via ndo_set_rx_mode method or in response to an event.
>
>But before proposing a patch, I would like to ask: which approach is
>preferrable: copying active slave's hw address to all VLAN devices
>defined on top of the bond or syncing bond's uc list to its slaves?

	I tested some of this out earlier this year, and I don't recall
having problems (although I'm not sure I did this exact test).  The
dev_uc_add() logic (in __dev_set_rx_mode) would put the underlying
device into promiscuous mode if the hardware didn't support multiple
unicast MAC addresses. dev_uc_add() was invoked by vlan_sync_address(),
which is called by the vlan NETDEV_CHANGEADDR notifier callback.

	Bonding does propagate promisc to its slaves, but (as you point
out) not the uc lists; is the hardware in question something that
supports multiple unicast addresses (IFF_UNICAST_FLT)?  The device I
tested with does not support IFF_UNICAST_FLT, and (as I recall) would
end up in promisc mode.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: unresponsive vlan on top of bond with fail_over_mac=active
  2012-10-11  3:34 ` Jay Vosburgh
@ 2012-10-11 10:37   ` Michal Kubecek
  2012-10-12  0:33     ` Jay Vosburgh
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Kubecek @ 2012-10-11 10:37 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Andy Gospodarek

On Wed, Oct 10, 2012 at 08:34:31PM -0700, Jay Vosburgh wrote:
> Michal Kubecek <mkubecek@suse.cz> wrote:
> 	What network device are they using that requires fail_over_mac
> to be set to active?

I would have to ask for exact configuration, all I know for sure is that
the problem was reported for s390x (S390-64) architecture. I reproduced
it with VMware Workstation virtual devices which emulate Intel e1000.

> 	I tested some of this out earlier this year, and I don't recall
> having problems (although I'm not sure I did this exact test).  The
> dev_uc_add() logic (in __dev_set_rx_mode) would put the underlying
> device into promiscuous mode if the hardware didn't support multiple
> unicast MAC addresses. dev_uc_add() was invoked by vlan_sync_address(),
> which is called by the vlan NETDEV_CHANGEADDR notifier callback.

Yes, this part works fine, I checked uc list with live crash session.
But as bonding driver doesn't set its ndo_set_rx_mode method, the
iformation about second MAC address doesn't propagate down to the
slaves.

> 	Bonding does propagate promisc to its slaves, but (as you point
> out) not the uc lists; is the hardware in question something that
> supports multiple unicast addresses (IFF_UNICAST_FLT)?  The device I
> tested with does not support IFF_UNICAST_FLT, and (as I recall) would
> end up in promisc mode.

My tests were done with (emulated) e1000 which supports unicast
filtering (up to 14 addresses, according to what I've seen in the
driver). I'm not sure about the devices on s390x.

                                                      Michal Kubecek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: unresponsive vlan on top of bond with fail_over_mac=active
  2012-10-11 10:37   ` Michal Kubecek
@ 2012-10-12  0:33     ` Jay Vosburgh
  2012-10-17 11:08       ` Michal Kubecek
  0 siblings, 1 reply; 5+ messages in thread
From: Jay Vosburgh @ 2012-10-12  0:33 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev, Andy Gospodarek

Michal Kubecek <mkubecek@suse.cz> wrote:

>On Wed, Oct 10, 2012 at 08:34:31PM -0700, Jay Vosburgh wrote:
>> Michal Kubecek <mkubecek@suse.cz> wrote:
>> 	What network device are they using that requires fail_over_mac
>> to be set to active?
>
>I would have to ask for exact configuration, all I know for sure is that
>the problem was reported for s390x (S390-64) architecture. I reproduced
>it with VMware Workstation virtual devices which emulate Intel e1000.

	Have you tried the "follow" setting to fail_over_mac?

	I looked into this very topic (VLAN address propagation on s390)
earlier this year.  The eventual solution for that case was to use the
"follow" fail_over_mac option, which resolved the problem for the OSA
device (qeth).

	I did submit a patch to do MAC address propagation to VLANs, but
then withdrew it after I figured out that "follow" would also resolve
the problem without code changes.

http://patchwork.ozlabs.org/patch/153551/

>> 	I tested some of this out earlier this year, and I don't recall
>> having problems (although I'm not sure I did this exact test).  The
>> dev_uc_add() logic (in __dev_set_rx_mode) would put the underlying
>> device into promiscuous mode if the hardware didn't support multiple
>> unicast MAC addresses. dev_uc_add() was invoked by vlan_sync_address(),
>> which is called by the vlan NETDEV_CHANGEADDR notifier callback.
>
>Yes, this part works fine, I checked uc list with live crash session.
>But as bonding driver doesn't set its ndo_set_rx_mode method, the
>iformation about second MAC address doesn't propagate down to the
>slaves.

	What kernel are you looking at?  In current mainline, bonding
does have bond_set_multicast_list as ndo_set_rx_mode, although it
doesn't propagate unicast address information, only multicast.  I
believe this has been the case for a long time.

>> 	Bonding does propagate promisc to its slaves, but (as you point
>> out) not the uc lists; is the hardware in question something that
>> supports multiple unicast addresses (IFF_UNICAST_FLT)?  The device I
>> tested with does not support IFF_UNICAST_FLT, and (as I recall) would
>> end up in promisc mode.
>
>My tests were done with (emulated) e1000 which supports unicast
>filtering (up to 14 addresses, according to what I've seen in the
>driver). I'm not sure about the devices on s390x.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: unresponsive vlan on top of bond with fail_over_mac=active
  2012-10-12  0:33     ` Jay Vosburgh
@ 2012-10-17 11:08       ` Michal Kubecek
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Kubecek @ 2012-10-17 11:08 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Andy Gospodarek

On Thu, Oct 11, 2012 at 05:33:48PM -0700, Jay Vosburgh wrote:
> 
> 	Have you tried the "follow" setting to fail_over_mac?

Yes, it works both for me and for the customer. They actually tried it
themselves before reporting the bug. I explained that "follow" is more
suitable also for other reasons and I hope they will accept it.

> 	What kernel are you looking at?  In current mainline, bonding
> does have bond_set_multicast_list as ndo_set_rx_mode, although it
> doesn't propagate unicast address information, only multicast.  I
> believe this has been the case for a long time.

I'm sorry, you are right. What I was looking at was actually 3.0 where
.ndo_set_multicast_list is used rather than .ndo_set_rx_mode (changed by
commit afc4b13d in v3.2-rc1).

Thank you for your help,
                                                        Michal Kubeček

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-17 11:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-10 23:11 unresponsive vlan on top of bond with fail_over_mac=active Michal Kubecek
2012-10-11  3:34 ` Jay Vosburgh
2012-10-11 10:37   ` Michal Kubecek
2012-10-12  0:33     ` Jay Vosburgh
2012-10-17 11:08       ` Michal Kubecek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).