Bonding and Neighbour Discovery on IPv6-only devices

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bonding and Neighbour Discovery on IPv6-only devices
@ 2008-09-15 17:35 Alex Sidorenko
  2008-09-15 18:00 ` Jeff Garzik
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Sidorenko @ 2008-09-15 17:35 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,

I suspect that there are some problems while switching slaves manually on 
IPv6-only bonds. I have tested that this does not work on 2.6.18 kernel 
(RHEL5). I looked at recent sources (2.6.26) and even though there were 
multiple changes in bonding code, I still don't see how this could work.

Testing setup
-------------

Two ethernets enslaved (eth2 and eh3), active-backup bond has IPv6-only 
address (no IPv4)

Everything works fine, but if we switch the slave doing

# ifenslave -c bond0 eth3

the incoming ping6 to this host stops for up to several minutes (waiting until 
the switch updates its caches).

Analysis
--------

Switching slaves invokes bond_change_active_slave(). In case of IPv4, we send 
gratuitous ARP doing bond_send_gratuitous_arp(). This informs switches that 
our active slave has changed, they update their ARP-tables and there is no 
delay.

In case of IPv6 only I would have expected NS-packet to be sent (doing the 
same work as gratuitous ARP). But testing shows that there are no packets 
sent when we switch the slave (checked both by network traces and just 
looking at the number of RX/TX packets per interface). So the switches are 
unaware that active slave has changed and we have a delay.

Inspecting the sources, I hoped that we'll send NS as a result of migrating 
multicast addresses from one slave to another doing bond_mc_swap() and then 
doing bond_resend_igmp_join_requests(). But this does not work either:

bond_mc_swap() tries to swap link-multicasts only (bond->dev->mc_list), but 
this does not work as we have users=2 and dev_mc_delete() does not delete 
addresses in this case. Here is what I see on slaves and bond (eth2 is 
active, ip maddr show):

2:      eth2
        link  33:33:00:00:00:fb users 2
        link  33:33:ff:5c:fd:de users 2
        link  33:33:00:00:00:01 users 2
        inet6 ff02::fb
        inet6 ff02::1:ff5c:fdde
        inet6 ff02::1
3:      eth3
        link  33:33:00:00:00:fb
        link  33:33:ff:5c:fd:de
        link  33:33:00:00:00:01
        inet6 ff02::fb
        inet6 ff02::1:ff5c:fdde
        inet6 ff02::1
<snip>
7:      bond0
        link  33:33:00:00:00:fb
        link  33:33:ff:5c:fd:de
        link  33:33:00:00:00:01
        inet6 ff02::fb
        inet6 ff02::1:ff5c:fdde users 2
        inet6 ff02::1

bond_resend_igmp_join_requests() works for IP-device only, not IPv6 - so once 
again, this does not help.

So how is it supposed to work for IPv6-only case? Please note that I am 
talking about switching slaves via 'ifenslave -c', not disconnecting the 
cable (in the latter case the switch will see carrier lost and we'll change 
the slave device state sending an event).

Once again, the testing was done on a rather old (2.6.18) kernel, so I 
apologize if this is already fixed in 2.6.26 (in this case I would appreciate 
if you briefly explain how this works).

Regards,
Alex
-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bonding and Neighbour Discovery on IPv6-only devices
  2008-09-15 17:35 Bonding and Neighbour Discovery on IPv6-only devices Alex Sidorenko
@ 2008-09-15 18:00 ` Jeff Garzik
  2008-09-15 18:16   ` Jay Vosburgh
  2008-09-15 18:16   ` Alex Sidorenko
  0 siblings, 2 replies; 18+ messages in thread
From: Jeff Garzik @ 2008-09-15 18:00 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: netdev@vger.kernel.org

On Mon, Sep 15, 2008 at 01:35:16PM -0400, Alex Sidorenko wrote:
> Hello,
> 
> I suspect that there are some problems while switching slaves manually on 
> IPv6-only bonds. I have tested that this does not work on 2.6.18 kernel 
> (RHEL5). I looked at recent sources (2.6.26) and even though there were 
> multiple changes in bonding code, I still don't see how this could work.
> 
> Testing setup
> -------------
> 
> Two ethernets enslaved (eth2 and eh3), active-backup bond has IPv6-only 
> address (no IPv4)
> 
> Everything works fine, but if we switch the slave doing
> 
> # ifenslave -c bond0 eth3
> 
> the incoming ping6 to this host stops for up to several minutes (waiting until 
> the switch updates its caches).

We _just_ put in an IPv6 fix, FWIW...

	Jeff




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bonding and Neighbour Discovery on IPv6-only devices
  2008-09-15 18:00 ` Jeff Garzik
@ 2008-09-15 18:16   ` Jay Vosburgh
  2008-09-15 18:16   ` Alex Sidorenko
  1 sibling, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2008-09-15 18:16 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alex Sidorenko, netdev@vger.kernel.org

Jeff Garzik <jeff@garzik.org> wrote:

>On Mon, Sep 15, 2008 at 01:35:16PM -0400, Alex Sidorenko wrote:
[...]
>> Everything works fine, but if we switch the slave doing
>> 
>> # ifenslave -c bond0 eth3
>> 
>> the incoming ping6 to this host stops for up to several minutes (waiting until 
>> the switch updates its caches).
>
>We _just_ put in an IPv6 fix, FWIW...

	The fix that just went in has to do with traffic balancing in
alb/tlb modes; this problem has to do with the lack of an IPv6
equivalent to a gratuitous ARP during a user-induced failover.

	I'm looking into this; it might be simple to fix.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bonding and Neighbour Discovery on IPv6-only devices
  2008-09-15 18:00 ` Jeff Garzik
  2008-09-15 18:16   ` Jay Vosburgh
@ 2008-09-15 18:16   ` Alex Sidorenko
  2008-09-24 16:58     ` Vlad Yasevich
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Sidorenko @ 2008-09-15 18:16 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev@vger.kernel.org

On September 15, 2008 02:00:15 pm Jeff Garzik wrote:
> On Mon, Sep 15, 2008 at 01:35:16PM -0400, Alex Sidorenko wrote:
> > Hello,
> >
> > I suspect that there are some problems while switching slaves manually on
> > IPv6-only bonds. I have tested that this does not work on 2.6.18 kernel
> > (RHEL5). I looked at recent sources (2.6.26) and even though there were
> > multiple changes in bonding code, I still don't see how this could work.
> >
> > Testing setup
> > -------------
> >
> > Two ethernets enslaved (eth2 and eh3), active-backup bond has IPv6-only
> > address (no IPv4)
> >
> > Everything works fine, but if we switch the slave doing
> >
> > # ifenslave -c bond0 eth3
> >
> > the incoming ping6 to this host stops for up to several minutes (waiting
> > until the switch updates its caches).
>
> We _just_ put in an IPv6 fix, FWIW...
>

Hi Jeff,

do you mean the patch for ALB/TLB bonds? I looked at it but I still don't 
understand how this helps for active-backup case.

Alex


-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bonding and Neighbour Discovery on IPv6-only devices
  2008-09-15 18:16   ` Alex Sidorenko
@ 2008-09-24 16:58     ` Vlad Yasevich
  2008-09-24 20:29       ` Jay Vosburgh
  0 siblings, 1 reply; 18+ messages in thread
From: Vlad Yasevich @ 2008-09-24 16:58 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: Jeff Garzik, netdev@vger.kernel.org

Alex Sidorenko wrote:
> On September 15, 2008 02:00:15 pm Jeff Garzik wrote:
>> On Mon, Sep 15, 2008 at 01:35:16PM -0400, Alex Sidorenko wrote:
>>> Hello,
>>>
>>> I suspect that there are some problems while switching slaves manually on
>>> IPv6-only bonds. I have tested that this does not work on 2.6.18 kernel
>>> (RHEL5). I looked at recent sources (2.6.26) and even though there were
>>> multiple changes in bonding code, I still don't see how this could work.
>>>
>>> Testing setup
>>> -------------
>>>
>>> Two ethernets enslaved (eth2 and eh3), active-backup bond has IPv6-only
>>> address (no IPv4)
>>>
>>> Everything works fine, but if we switch the slave doing
>>>
>>> # ifenslave -c bond0 eth3
>>>
>>> the incoming ping6 to this host stops for up to several minutes (waiting
>>> until the switch updates its caches).
>> We _just_ put in an IPv6 fix, FWIW...
>>
> 
> Hi Jeff,
> 
> do you mean the patch for ALB/TLB bonds? I looked at it but I still don't 
> understand how this helps for active-backup case.

It doesn't.  What appears to happen in the active-backup case the MLD reports
are not generated on all of the slave devices, just the active one.  Thus the
switch knows only about the active device.

When you force the failover, no reports happen because the device believes that
the MLD group was already reported.  Thus there needs to be some trigger after
the failover to tell the switch that the mac address and multicast group have
moved to a different port.

-vlad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bonding and Neighbour Discovery on IPv6-only devices
  2008-09-24 16:58     ` Vlad Yasevich
@ 2008-09-24 20:29       ` Jay Vosburgh
  2008-09-24 21:07         ` Brian Haley
  2008-09-25  2:46         ` [RFC] bonding: add better ipv6 failover support Brian Haley
  0 siblings, 2 replies; 18+ messages in thread
From: Jay Vosburgh @ 2008-09-24 20:29 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Alex Sidorenko, Jeff Garzik, netdev@vger.kernel.org

Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
[...]
>It doesn't.  What appears to happen in the active-backup case the MLD reports
>are not generated on all of the slave devices, just the active one.  Thus the
>switch knows only about the active device.

	This behavior is by design.  Previously, it worked the other way
(every slave did IPv6 addrconf stuff, etc), but that confused the
switches because they'd snoop the addrconf traffic whenever a backup
slave came up, and update the switch forwarding table.

>When you force the failover, no reports happen because the device believes that
>the MLD group was already reported.  Thus there needs to be some trigger after
>the failover to tell the switch that the mac address and multicast group have
>moved to a different port.

	Yes, there needs to be an ipv6 equivalent of the gratuitous ARP
and IGMP rejoin that's done for ipv4.  I haven't figured out how to
accomplish that yet; suggestions are welcome.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bonding and Neighbour Discovery on IPv6-only devices
  2008-09-24 20:29       ` Jay Vosburgh
@ 2008-09-24 21:07         ` Brian Haley
  2008-09-25  2:46         ` [RFC] bonding: add better ipv6 failover support Brian Haley
  1 sibling, 0 replies; 18+ messages in thread
From: Brian Haley @ 2008-09-24 21:07 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Vlad Yasevich, Alex Sidorenko, Jeff Garzik,
	netdev@vger.kernel.org

Jay Vosburgh wrote:
> Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
> [...]
>> It doesn't.  What appears to happen in the active-backup case the MLD reports
>> are not generated on all of the slave devices, just the active one.  Thus the
>> switch knows only about the active device.
> 
> 	This behavior is by design.  Previously, it worked the other way
> (every slave did IPv6 addrconf stuff, etc), but that confused the
> switches because they'd snoop the addrconf traffic whenever a backup
> slave came up, and update the switch forwarding table.
> 
>> When you force the failover, no reports happen because the device believes that
>> the MLD group was already reported.  Thus there needs to be some trigger after
>> the failover to tell the switch that the mac address and multicast group have
>> moved to a different port.
> 
> 	Yes, there needs to be an ipv6 equivalent of the gratuitous ARP
> and IGMP rejoin that's done for ipv4.  I haven't figured out how to
> accomplish that yet; suggestions are welcome.

I have a patch that does it, just doing some final testing before 
sending it out, hopefully later today...

-Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC] bonding: add better ipv6 failover support
  2008-09-24 20:29       ` Jay Vosburgh
  2008-09-24 21:07         ` Brian Haley
@ 2008-09-25  2:46         ` Brian Haley
  2008-09-25 15:07           ` Jay Vosburgh
  2008-09-26 18:51           ` David Stevens
  1 sibling, 2 replies; 18+ messages in thread
From: Brian Haley @ 2008-09-25  2:46 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Vlad Yasevich, Alex Sidorenko, Jeff Garzik,
	netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

This is an RFC patch to add better IPv6 failover support for bonding 
devices, especially when in active-backup mode, as reported by Alex 
Sidorenko.

What this patch does:

- Creates a new Kconfig option in the IPv6 Networking section to
   compile-in the support in the bonding driver.  This also forces
   IPV6=y since that's required to link everything.

- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
   IPv6-specific routines.

- Adds a new master_ipv6 address member to the bonding struct to
   hold a copy of the primary IPv6 address on the bond.

- Adds a new tunable, num_grat_ns, to limit the number of gratuitous
   Neighbor Solicitations that are sent on a failover event.  Default
   is 1.

On failover, this new code will generate two packets:

- An MLD report for the bond, on the current active slave.

- An IPv6 "gratuitous" Neighbor Solicitation, which helps the switch
   learn that the address has moved to the new slave.

Testing has shown that sending just the NS results in pretty good 
behavior when in active-back mode, I saw no lost ping packets for 
example.  Sending just the MLD packet didn't seem to have the same 
effect.  Sending both seems like the right thing to do.

Comments welcome.

-Brian

Signed-off-by: Brian Haley <brian.haley@hp.com>
---

[-- Attachment #2: bonding-ipv6.patch --]
[-- Type: text/x-patch, Size: 15552 bytes --]

diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index 5cdae2b..5136115 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -6,3 +6,6 @@ obj-$(CONFIG_BONDING) += bonding.o
 
 bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o
 
+ipv6-$(CONFIG_IPV6_BONDING) += bond_ipv6.o
+bonding-objs += $(ipv6-y)
+
diff --git a/drivers/net/bonding/bond_ipv6.c b/drivers/net/bonding/bond_ipv6.c
new file mode 100644
index 0000000..931c3c2
--- /dev/null
+++ b/drivers/net/bonding/bond_ipv6.c
@@ -0,0 +1,166 @@
+/*
+ * Copyright(c) 2008 Hewlett-Packard Development Company, L.P.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called LICENSE.
+ *
+ */
+
+//#define BONDING_DEBUG 1
+
+#include <linux/types.h>
+#include <net/ipv6.h>
+#include <net/ndisc.h>
+#include <net/addrconf.h>
+#include "bonding.h"
+
+/*
+ * Assign bond->master_ipv6 to the next IPv6 address in the list, or
+ * zero it out if there are none.
+ */
+static void bond_glean_dev_ipv6(struct net_device *dev, struct in6_addr *addr)
+{
+	struct inet6_dev *idev;
+	struct inet6_ifaddr *ifa;
+
+	if (!dev)
+		return;
+
+	idev = in6_dev_get(dev);
+	if (!idev)
+		return;
+
+	ifa = idev->addr_list;
+	if (ifa)
+		ipv6_addr_copy(addr, &ifa->addr);
+	else
+		ipv6_addr_set(addr, 0, 0, 0, 0);
+
+	in6_dev_put(idev);
+}
+
+/*
+ * Resend an IPv6 MLD report for the bonding device on the current
+ * active slave.
+ */
+void bond_resend_ipv6_mld_report(struct bonding *bond)
+{
+	struct inet6_dev *in6_dev;
+	struct slave *slave = bond->curr_active_slave;
+
+	dprintk("bond_resend_ipv6_mld_report: bond %s slave %s\n",
+				bond->dev->name,
+				slave ? slave->dev->name : "NULL");
+
+	if (!slave ||
+	    test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
+		return;
+
+	if (ipv6_addr_any(&bond->master_ipv6))
+		return;
+
+	dprintk("ipv6 mld report on slave %s\n", slave->name);
+
+	in6_dev = in6_dev_get(bond->dev);
+	if (in6_dev) {
+		mld_send_report(in6_dev, NULL, slave->dev);
+		in6_dev_put(in6_dev);
+	}
+}
+
+/*
+ * Kick out a gratuitous Neighbor Solicitation for an IPv6 address on
+ * the bonding master.  This will help the switch learn our address
+ * if in active-back mode.
+ *
+ * Caller must hold curr_slave_lock for read or better
+ */
+void bond_send_gratuitous_ns(struct bonding *bond)
+{
+	struct in6_addr mcaddr;
+	struct slave *slave = bond->curr_active_slave;
+
+	dprintk("bond_send_grat_ns: bond %s slave %s\n", bond->dev->name,
+				slave ? slave->dev->name : "NULL");
+
+	if (!slave || !bond->send_grat_ns ||
+	    test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
+		return;
+
+	bond->send_grat_ns--;
+
+	if (ipv6_addr_any(&bond->master_ipv6))
+		return;
+
+	dprintk("ipv6 ns on slave %s: target %s\n" NIP6_FMT,
+	       slave->name, NIP6(&bond->master_ipv6));
+
+	addrconf_addr_solict_mult(&bond->master_ipv6, &mcaddr);
+	ndisc_send_ns(slave->dev, NULL, &bond->master_ipv6, &mcaddr, &bond->master_ipv6);
+}
+
+/*
+ * bond_inet6addr_event: handle inet6addr notifier chain events.
+ *
+ * We keep track of device IPv6 addresses primarily to use as source
+ * addresses in NS probes.
+ *
+ * We track one IPv6 for the main device (if it has one).
+ */
+static int bond_inet6addr_event(struct notifier_block *this,
+				unsigned long event,
+				void *ptr)
+{
+	struct inet6_ifaddr *ifa = ptr;
+	struct net_device *event_dev = ifa->idev->dev;
+	struct bonding *bond;
+
+	if (dev_net(event_dev) != &init_net)
+		return NOTIFY_DONE;
+
+	list_for_each_entry(bond, &bond_dev_list, bond_list) {
+		if (bond->dev == event_dev) {
+			switch (event) {
+			case NETDEV_UP:
+				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
+				return NOTIFY_OK;
+			case NETDEV_DOWN:
+				bond_glean_dev_ipv6(bond->dev,
+						    &bond->master_ipv6);
+				return NOTIFY_OK;
+			default:
+				return NOTIFY_DONE;
+			}
+		}
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block bond_inet6addr_notifier = {
+	.notifier_call = bond_inet6addr_event,
+};
+
+void bond_register_ipv6_notifier(void)
+{
+	register_inet6addr_notifier(&bond_inet6addr_notifier);
+}
+
+void bond_unregister_ipv6_notifier(void)
+{
+	unregister_inet6addr_notifier(&bond_inet6addr_notifier);
+}
+
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index babe461..5c62626 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -89,6 +89,7 @@
 
 static int max_bonds	= BOND_DEFAULT_MAX_BONDS;
 static int num_grat_arp = 1;
+static int num_grat_ns  = 1;
 static int miimon	= BOND_LINK_MON_INTERV;
 static int updelay	= 0;
 static int downdelay	= 0;
@@ -107,6 +108,8 @@ module_param(max_bonds, int, 0);
 MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
 module_param(num_grat_arp, int, 0644);
 MODULE_PARM_DESC(num_grat_arp, "Number of gratuitous ARP packets to send on failover event");
+module_param(num_grat_ns, int, 0644);
+MODULE_PARM_DESC(num_grat_ns, "Number of gratuitous IPv6 Neighbor Solicitation packets to send on failover event");
 module_param(miimon, int, 0);
 MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
 module_param(updelay, int, 0);
@@ -988,6 +991,7 @@ static void bond_mc_swap(struct bonding *bond, struct slave *new_active, struct
 			dev_mc_add(new_active->dev, dmi->dmi_addr, dmi->dmi_addrlen, 0);
 		}
 		bond_resend_igmp_join_requests(bond);
+		bond_resend_ipv6_mld_report(bond);
 	}
 }
 
@@ -1208,6 +1212,9 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 			bond->send_grat_arp = bond->params.num_grat_arp;
 			bond_send_gratuitous_arp(bond);
 
+			bond->send_grat_ns = bond->params.num_grat_ns;
+			bond_send_gratuitous_ns(bond);
+
 			write_unlock_bh(&bond->curr_slave_lock);
 			read_unlock(&bond->lock);
 
@@ -2441,6 +2448,12 @@ void bond_mii_monitor(struct work_struct *work)
 		read_unlock(&bond->curr_slave_lock);
 	}
 
+	if (bond->send_grat_ns) {
+		read_lock(&bond->curr_slave_lock);
+		bond_send_gratuitous_ns(bond);
+		read_unlock(&bond->curr_slave_lock);
+	}
+
 	if (bond_miimon_inspect(bond)) {
 		read_unlock(&bond->lock);
 		rtnl_lock();
@@ -3138,6 +3151,12 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 		read_unlock(&bond->curr_slave_lock);
 	}
 
+	if (bond->send_grat_ns) {
+		read_lock(&bond->curr_slave_lock);
+		bond_send_gratuitous_ns(bond);
+		read_unlock(&bond->curr_slave_lock);
+	}
+
 	if (bond_ab_arp_inspect(bond, delta_in_ticks)) {
 		read_unlock(&bond->lock);
 		rtnl_lock();
@@ -3813,6 +3832,7 @@ static int bond_close(struct net_device *bond_dev)
 	write_lock_bh(&bond->lock);
 
 	bond->send_grat_arp = 0;
+	bond->send_grat_ns = 0;
 
 	/* signal timers not to re-arm */
 	bond->kill_timers = 1;
@@ -4522,6 +4542,7 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 	bond->primary_slave = NULL;
 	bond->dev = bond_dev;
 	bond->send_grat_arp = 0;
+	bond->send_grat_ns = 0;
 	bond->setup_by_slave = 0;
 	INIT_LIST_HEAD(&bond->vlan_list);
 
@@ -4770,6 +4791,13 @@ static int bond_check_params(struct bond_params *params)
 		num_grat_arp = 1;
 	}
 
+	if (num_grat_ns < 0 || num_grat_ns > 255) {
+		printk(KERN_WARNING DRV_NAME
+		       ": Warning: num_grat_ns (%d) not in range 0-255 so it "
+		       "was reset to 1 \n", num_grat_ns);
+		num_grat_ns = 1;
+	}
+
 	/* reset values for 802.3ad */
 	if (bond_mode == BOND_MODE_8023AD) {
 		if (!miimon) {
@@ -4971,6 +4999,7 @@ static int bond_check_params(struct bond_params *params)
 	params->xmit_policy = xmit_hashtype;
 	params->miimon = miimon;
 	params->num_grat_arp = num_grat_arp;
+	params->num_grat_ns = num_grat_ns;
 	params->arp_interval = arp_interval;
 	params->arp_validate = arp_validate_value;
 	params->updelay = updelay;
@@ -5123,6 +5152,7 @@ static int __init bonding_init(void)
 
 	register_netdevice_notifier(&bond_netdev_notifier);
 	register_inetaddr_notifier(&bond_inetaddr_notifier);
+	bond_register_ipv6_notifier();
 
 	goto out;
 err:
@@ -5145,6 +5175,7 @@ static void __exit bonding_exit(void)
 {
 	unregister_netdevice_notifier(&bond_netdev_notifier);
 	unregister_inetaddr_notifier(&bond_inetaddr_notifier);
+	bond_unregister_ipv6_notifier();
 
 	bond_destroy_sysfs();
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 3bdb473..4079295 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -981,6 +981,46 @@ out:
 	return ret;
 }
 static DEVICE_ATTR(num_grat_arp, S_IRUGO | S_IWUSR, bonding_show_n_grat_arp, bonding_store_n_grat_arp);
+
+/*
+ * Show and set the number of grat NS to send after a failover event.
+ */
+static ssize_t bonding_show_n_grat_ns(struct device *d,
+				   struct device_attribute *attr,
+				   char *buf)
+{
+	struct bonding *bond = to_bond(d);
+
+	return sprintf(buf, "%d\n", bond->params.num_grat_ns);
+}
+
+static ssize_t bonding_store_n_grat_ns(struct device *d,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	int new_value, ret = count;
+	struct bonding *bond = to_bond(d);
+
+	if (sscanf(buf, "%d", &new_value) != 1) {
+		printk(KERN_ERR DRV_NAME
+		       ": %s: no num_grat_ns value specified.\n",
+		       bond->dev->name);
+		ret = -EINVAL;
+		goto out;
+	}
+	if (new_value < 0 || new_value > 255) {
+		printk(KERN_ERR DRV_NAME
+		       ": %s: Invalid num_grat_ns value %d not in range 0-255; rejected.\n",
+		       bond->dev->name, new_value);
+		ret = -EINVAL;
+		goto out;
+	} else {
+		bond->params.num_grat_ns = new_value;
+	}
+out:
+	return ret;
+}
+static DEVICE_ATTR(num_grat_ns, S_IRUGO | S_IWUSR, bonding_show_n_grat_ns, bonding_store_n_grat_ns);
 /*
  * Show and set the MII monitor interval.  There are two tricky bits
  * here.  First, if MII monitoring is activated, then we must disable
@@ -1419,6 +1459,7 @@ static struct attribute *per_bond_attrs[] = {
 	&dev_attr_lacp_rate.attr,
 	&dev_attr_xmit_hash_policy.attr,
 	&dev_attr_num_grat_arp.attr,
+	&dev_attr_num_grat_ns.attr,
 	&dev_attr_miimon.attr,
 	&dev_attr_primary.attr,
 	&dev_attr_use_carrier.attr,
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index fb730ec..a113c06 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -19,6 +19,7 @@
 #include <linux/proc_fs.h>
 #include <linux/if_bonding.h>
 #include <linux/kobject.h>
+#include <linux/in6.h>
 #include "bond_3ad.h"
 #include "bond_alb.h"
 
@@ -29,6 +30,8 @@
 
 #define BOND_MAX_ARP_TARGETS	16
 
+extern struct list_head bond_dev_list;
+
 #ifdef BONDING_DEBUG
 #define dprintk(fmt, args...) \
 	printk(KERN_DEBUG     \
@@ -126,6 +129,7 @@ struct bond_params {
 	int xmit_policy;
 	int miimon;
 	int num_grat_arp;
+	int num_grat_ns;
 	int arp_interval;
 	int arp_validate;
 	int use_carrier;
@@ -195,6 +199,7 @@ struct bonding {
 	rwlock_t curr_slave_lock;
 	s8       kill_timers;
 	s8	 send_grat_arp;
+	s8	 send_grat_ns;
 	s8	 setup_by_slave;
 	struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
@@ -207,6 +212,7 @@ struct bonding {
 	__be32   master_ip;
 	u16      flags;
 	u16      rr_tx_counter;
+	struct   in6_addr master_ipv6;
 	struct   ad_bond_info ad_info;
 	struct   alb_bond_info alb_info;
 	struct   bond_params params;
@@ -333,5 +339,29 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
 void bond_register_arp(struct bonding *);
 void bond_unregister_arp(struct bonding *);
 
+#ifdef CONFIG_IPV6_BONDING
+void bond_resend_ipv6_mld_report(struct bonding *bond);
+void bond_send_gratuitous_ns(struct bonding *bond);
+void bond_register_ipv6_notifier(void);
+void bond_unregister_ipv6_notifier(void);
+#else
+static inline void bond_resend_ipv6_mld_report(struct bonding *bond)
+{
+	return;
+}
+static inline void bond_send_gratuitous_ns(struct bonding *bond)
+{
+	return;
+}
+static inline void bond_register_ipv6_notifier(void)
+{
+	return;
+}
+static inline void bond_unregister_ipv6_notifier(void)
+{
+	return;
+}
+#endif
+
 #endif /* _LINUX_BONDING_H */
 
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 113028f..6f04d60 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -577,6 +577,12 @@ extern int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 			 struct group_filter __user *optval,
 			 int __user *optlen);
 
+/*
+ * mcast.c
+ */
+extern void mld_send_report(struct inet6_dev *idev, struct ifmcaddr6 *pmc,
+			    struct net_device *dev);
+
 #ifdef CONFIG_PROC_FS
 extern int  ac6_proc_init(struct net *net);
 extern void ac6_proc_exit(struct net *net);
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index ec99215..bcaf3d4 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -217,4 +217,11 @@ config IPV6_PIMSM_V2
 	  Support for IPv6 PIM multicast routing protocol PIM-SMv2.
 	  If unsure, say N.
 
+config IPV6_BONDING
+	bool "IPv6: Bonding driver support (EXPERIMENTAL)"
+	depends on IPV6=y && BONDING && EXPERIMENTAL
+	---help---
+	  Support for IPv6 in the bonding driver.
+	  If unsure, say N.
+
 endif # IPV6
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index e7c03bc..59a8a8b 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1628,7 +1628,8 @@ empty_source:
 	return skb;
 }
 
-static void mld_send_report(struct inet6_dev *idev, struct ifmcaddr6 *pmc)
+void mld_send_report(struct inet6_dev *idev, struct ifmcaddr6 *pmc,
+		     struct net_device *dev)
 {
 	struct sk_buff *skb = NULL;
 	int type;
@@ -1656,9 +1657,14 @@ static void mld_send_report(struct inet6_dev *idev, struct ifmcaddr6 *pmc)
 		skb = add_grec(skb, pmc, type, 0, 0);
 		spin_unlock_bh(&pmc->mca_lock);
 	}
-	if (skb)
+	if (skb) {
+		/* caller can override device to xmit on */
+		if (dev)
+			skb->dev = dev;
 		mld_sendpack(skb);
+	}
 }
+EXPORT_SYMBOL_GPL(mld_send_report);
 
 /*
  * remove zero-count source records from a source filter list
@@ -2197,7 +2203,7 @@ static void mld_gq_timer_expire(unsigned long data)
 	struct inet6_dev *idev = (struct inet6_dev *)data;
 
 	idev->mc_gq_running = 0;
-	mld_send_report(idev, NULL);
+	mld_send_report(idev, NULL, NULL);
 	__in6_dev_put(idev);
 }
 
@@ -2230,7 +2236,7 @@ static void igmp6_timer_handler(unsigned long data)
 	if (MLD_V1_SEEN(ma->idev))
 		igmp6_send(&ma->mca_addr, ma->idev->dev, ICMPV6_MGM_REPORT);
 	else
-		mld_send_report(ma->idev, ma);
+		mld_send_report(ma->idev, ma, NULL);
 
 	spin_lock(&ma->mca_lock);
 	ma->mca_flags |=  MAF_LAST_REPORTER;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f1c62ba..2599484 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -586,6 +586,8 @@ void ndisc_send_ns(struct net_device *dev, struct neighbour *neigh,
 		     !ipv6_addr_any(saddr) ? ND_OPT_SOURCE_LL_ADDR : 0);
 }
 
+EXPORT_SYMBOL_GPL(ndisc_send_ns);
+
 void ndisc_send_rs(struct net_device *dev, const struct in6_addr *saddr,
 		   const struct in6_addr *daddr)
 {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-25  2:46         ` [RFC] bonding: add better ipv6 failover support Brian Haley
@ 2008-09-25 15:07           ` Jay Vosburgh
  2008-09-25 15:42             ` Brian Haley
  2008-09-26 18:51           ` David Stevens
  1 sibling, 1 reply; 18+ messages in thread
From: Jay Vosburgh @ 2008-09-25 15:07 UTC (permalink / raw)
  To: Brian Haley
  Cc: Vlad Yasevich, Alex Sidorenko, Jeff Garzik,
	netdev@vger.kernel.org

Brian Haley <brian.haley@hp.com> wrote:

>This is an RFC patch to add better IPv6 failover support for bonding
>devices, especially when in active-backup mode, as reported by Alex
>Sidorenko.
>
>What this patch does:
>
>- Creates a new Kconfig option in the IPv6 Networking section to
>  compile-in the support in the bonding driver.  This also forces
>  IPV6=y since that's required to link everything.

	I think it's probably better to have the IPV6 dependent bits
somehow depend on CONFIG_IPV6 rather than having a Kconfig entry.  I
doubt that many real-world users will say yes to IPv6 and bonding, but
no to the bonding IPv6 support.  I also suspect that the IPV6=y
requirement won't fly with distros.

>- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
>  IPv6-specific routines.

	Handy.

>- Adds a new master_ipv6 address member to the bonding struct to
>  hold a copy of the primary IPv6 address on the bond.

	Do we need to issue an NS for each ipv6 address, or is one
sufficient?

	Do ipv6 addresses configured on VLANs need one (or more) NS per
VLAN?

>- Adds a new tunable, num_grat_ns, to limit the number of gratuitous
>  Neighbor Solicitations that are sent on a failover event.  Default
>  is 1.
>
>On failover, this new code will generate two packets:
>
>- An MLD report for the bond, on the current active slave.
>
>- An IPv6 "gratuitous" Neighbor Solicitation, which helps the switch
>  learn that the address has moved to the new slave.
>
>Testing has shown that sending just the NS results in pretty good behavior
>when in active-back mode, I saw no lost ping packets for example.  Sending
>just the MLD packet didn't seem to have the same effect.  Sending both
>seems like the right thing to do.

	 I haven't tried the patch yet, so I'll comment further once
I've had a chance to test it (which may not be until tomorrow).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-25 15:07           ` Jay Vosburgh
@ 2008-09-25 15:42             ` Brian Haley
  2008-10-01  5:53               ` Simon Horman
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Haley @ 2008-09-25 15:42 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Vlad Yasevich, Alex Sidorenko, Jeff Garzik,
	netdev@vger.kernel.org

Jay Vosburgh wrote:
> Brian Haley <brian.haley@hp.com> wrote:
> 
>> This is an RFC patch to add better IPv6 failover support for bonding
>> devices, especially when in active-backup mode, as reported by Alex
>> Sidorenko.
>>
>> What this patch does:
>>
>> - Creates a new Kconfig option in the IPv6 Networking section to
>>  compile-in the support in the bonding driver.  This also forces
>>  IPV6=y since that's required to link everything.
> 
> 	I think it's probably better to have the IPV6 dependent bits
> somehow depend on CONFIG_IPV6 rather than having a Kconfig entry.  I
> doubt that many real-world users will say yes to IPv6 and bonding, but
> no to the bonding IPv6 support.  I also suspect that the IPV6=y
> requirement won't fly with distros.

I'm sure there's a way to do this better, for example, SCTP can be built 
as a module with IPv6 support and have IPV6=m.  I'll try to make it work 
without the option when IPV6=y or m.

>> - Adds a new master_ipv6 address member to the bonding struct to
>>  hold a copy of the primary IPv6 address on the bond.
> 
> 	Do we need to issue an NS for each ipv6 address, or is one
> sufficient?

It didn't seem like it from my testing, that single NS was enough to 
wake-up the switch when pinging either the link-local or global.  I'd 
have to add another global with a different prefix and re-test.

> 	Do ipv6 addresses configured on VLANs need one (or more) NS per
> VLAN?

I didn't test with VLANs, there would probably need to be some 
additional work there.

> 	 I haven't tried the patch yet, so I'll comment further once
> I've had a chance to test it (which may not be until tomorrow).

Thanks,

-Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-25  2:46         ` [RFC] bonding: add better ipv6 failover support Brian Haley
  2008-09-25 15:07           ` Jay Vosburgh
@ 2008-09-26 18:51           ` David Stevens
  2008-09-26 19:09             ` Jay Vosburgh
                               ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: David Stevens @ 2008-09-26 18:51 UTC (permalink / raw)
  To: Brian Haley
  Cc: Alex Sidorenko, fubar, Jeff Garzik, netdev@vger.kernel.org,
	netdev-owner, Vlad Yasevich

1) You're calling mld_send_report() directly, which will send the MLD
        report synchronously. It should use the randomized timer (see 
igmp6_join_group).
        A mass failover (e.g., a power event in a cluster) would blast all 
of these at once,
        which is why the randomized timer is required for gratuitous 
reports. This
        should use a randomized timer, like mld_ifc_start_timer(), but 
joining the
        group all by itself will do that.
2) There is already a configurable and code for unsolicited neighbor 
advertisements
        when adding an address-- why not use that? In fact, wouldn't just 
moving the
        failing device's address list to the new device do everything you 
want, since
        adding an address already sends unsolicited neighbor 
advertisements,
        joins the solicited node address, etc.? Or am I missing something?
3) MLD has a lot of state and it's all associated with the device. 
Changing the sending
        device out from under it seems risky to me. I don't know enough 
about
        bonding, but I think you really just want all the group 
memberships and
        MLD state to be with the master device and the master should just 
go
        through the multicast list for the master and join those groups on 
the
        new slave. The MLD code will already resolve the filters 
appropriately
        for joins and filters already done directly on the new slave that 
way.
                Actually, I thought that's what Jay's prior patch was all 
about, and
        those joins should trigger MLD reports where needed, so I'm 
definitely
        confused on what the problem with multicasts is beyond the 
solicited-node
        addresses (which just needs to mimic the address add code, or use 
it
        directly).

                                                        +-DLS


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-26 18:51           ` David Stevens
@ 2008-09-26 19:09             ` Jay Vosburgh
  2008-09-26 19:28             ` Brian Haley
  2008-09-26 19:46             ` Vlad Yasevich
  2 siblings, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2008-09-26 19:09 UTC (permalink / raw)
  To: David Stevens
  Cc: Brian Haley, Alex Sidorenko, Jeff Garzik, netdev@vger.kernel.org,
	netdev-owner, Vlad Yasevich


David Stevens <dlstevens@us.ibm.com> wrote:
>1) You're calling mld_send_report() directly, which will send the MLD
>        report synchronously. It should use the randomized timer (see 
>igmp6_join_group).
>        A mass failover (e.g., a power event in a cluster) would blast all 
>of these at once,
>        which is why the randomized timer is required for gratuitous 
>reports. This
>        should use a randomized timer, like mld_ifc_start_timer(), but 
>joining the
>        group all by itself will do that.

	I need to do some more reading to have an informed response on
this one (not that I don't believe you; I'm just not familiar with the
MLD specs).

>2) There is already a configurable and code for unsolicited neighbor 
>advertisements
>        when adding an address-- why not use that? In fact, wouldn't just 
>moving the
>        failing device's address list to the new device do everything you 
>want, since
>        adding an address already sends unsolicited neighbor 
>advertisements,
>        joins the solicited node address, etc.? Or am I missing something?

	Ooh, ooh, I can answer this one: The protocol addresses don't
move, they're attached to the bonding master.  The slaves have no
protocol level addresses of their own, so some kind of extra magic has
to take place.

>3) MLD has a lot of state and it's all associated with the device. 
>Changing the sending
>        device out from under it seems risky to me. I don't know enough 
>about
>        bonding, but I think you really just want all the group 
>memberships and
>        MLD state to be with the master device and the master should just 
>go
>        through the multicast list for the master and join those groups on 
>the
>        new slave. The MLD code will already resolve the filters 
>appropriately
>        for joins and filters already done directly on the new slave that 
>way.

	This sound analagous to the IPv4 multicast address handling,
wherein the multicast address list is moved from one slave to another.
Is that a reasonable parallel?

>                Actually, I thought that's what Jay's prior patch was all 
>about, and
>        those joins should trigger MLD reports where needed, so I'm 
>definitely
>        confused on what the problem with multicasts is beyond the 
>solicited-node
>        addresses (which just needs to mimic the address add code, or use 
>it
>        directly).

	I haven't posted any prior patch for this, so I'm not sure what
you're talking about here.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-26 18:51           ` David Stevens
  2008-09-26 19:09             ` Jay Vosburgh
@ 2008-09-26 19:28             ` Brian Haley
  2008-09-26 19:55               ` Vlad Yasevich
  2008-09-26 19:46             ` Vlad Yasevich
  2 siblings, 1 reply; 18+ messages in thread
From: Brian Haley @ 2008-09-26 19:28 UTC (permalink / raw)
  To: David Stevens
  Cc: Alex Sidorenko, fubar, Jeff Garzik, netdev@vger.kernel.org,
	netdev-owner, Vlad Yasevich

David Stevens wrote:
> 1) You're calling mld_send_report() directly, which will send the MLD
>         report synchronously. It should use the randomized timer (see 
> igmp6_join_group).
>         A mass failover (e.g., a power event in a cluster) would blast all 
> of these at once,
>         which is why the randomized timer is required for gratuitous 
> reports. This
>         should use a randomized timer, like mld_ifc_start_timer(), but 
> joining the
>         group all by itself will do that.

Ok, I'll try and change this code to spin through all the multicast 
addresses on the master and call igmp6_join_group() instead.

> 2) There is already a configurable and code for unsolicited neighbor 
> advertisements
>         when adding an address-- why not use that? In fact, wouldn't just 
> moving the
>         failing device's address list to the new device do everything you 
> want, since
>         adding an address already sends unsolicited neighbor 
> advertisements,
>         joins the solicited node address, etc.? Or am I missing something?

In this case the address is configured on the bond master, each slave is 
just used for transmit/receive.  While I could have sent an unsolicited 
NA, sending an NS is much easier, especially since it's only notifying 
the switch that the address has moved.

> 3) MLD has a lot of state and it's all associated with the device. 
> Changing the sending
>         device out from under it seems risky to me. I don't know enough 
> about
>         bonding, but I think you really just want all the group 
> memberships and
>         MLD state to be with the master device and the master should just 
> go
>         through the multicast list for the master and join those groups on 
> the
>         new slave. The MLD code will already resolve the filters 
> appropriately
>         for joins and filters already done directly on the new slave that 
> way.
>                 Actually, I thought that's what Jay's prior patch was all 
> about, and
>         those joins should trigger MLD reports where needed, so I'm 
> definitely
>         confused on what the problem with multicasts is beyond the 
> solicited-node
>         addresses (which just needs to mimic the address add code, or use 
> it
>         directly).

Like #1, I'll try changing the code.

Thanks for the comments.

-Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-26 18:51           ` David Stevens
  2008-09-26 19:09             ` Jay Vosburgh
  2008-09-26 19:28             ` Brian Haley
@ 2008-09-26 19:46             ` Vlad Yasevich
  2 siblings, 0 replies; 18+ messages in thread
From: Vlad Yasevich @ 2008-09-26 19:46 UTC (permalink / raw)
  To: Brian Haley
  Cc: David Stevens, Alex Sidorenko, fubar, Jeff Garzik,
	netdev@vger.kernel.org, netdev-owner

David Stevens wrote:
> 1) You're calling mld_send_report() directly, which will send the MLD
>         report synchronously. It should use the randomized timer (see 
> igmp6_join_group).
>         A mass failover (e.g., a power event in a cluster) would blast all 
> of these at once,
>         which is why the randomized timer is required for gratuitous 
> reports. This
>         should use a randomized timer, like mld_ifc_start_timer(), but 
> joining the
>         group all by itself will do that.

To add to what David said, looks like mld_send_report will always send a
Version 2 report.  This should honor correctly V1 or v2 configuration.

However, to address the random delay, this would have to be static delay
of at most 1 sec.  Otherwise any NUD probes would be lost.

> 3) MLD has a lot of state and it's all associated with the device. 
> Changing the sending device out from under it seems risky to me. I don't know enough 
> about bonding, but I think you really just want all the group 
> memberships and MLD state to be with the master device and the master should just 
> go through the multicast list for the master and join those groups on 
> the new slave. The MLD code will already resolve the filters 
> appropriately for joins and filters already done directly on the new slave that 
> way.  Actually, I thought that's what Jay's prior patch was all 
> about, and those joins should trigger MLD reports where needed, so I'm 
> definitely confused on what the problem with multicasts is beyond the 
> solicited-node addresses (which just needs to mimic the address add code, or use 
> it directly).

Yes, I think this needs a little more thought.  The multicast addresses are
already on the master and also on the active slave.  However, at failover time,
I think those memberships needs to be removed from the old slave, and added
to the new slave.  Alex mentioned that there were some refcounts that didn't
allow for this to happen, but I don't see any.

The trouble I see is that the MLD/IGMPv6 is only sent when an IPv6 multicast
address is added.  In the failover scenario, since IPv6 address is joined on
the bond, we only move the link multicast address from one interface to
another.  This doesn't normally trigger and new report, but this is just what
we want.

Additionally, I think the code should be using an unsolicited NA instead of the
NS, since we do really want to trigger a rediscovery the address and
the associated MAC to make sure that all forwarding state is updated on link.

-vlad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-26 19:28             ` Brian Haley
@ 2008-09-26 19:55               ` Vlad Yasevich
  0 siblings, 0 replies; 18+ messages in thread
From: Vlad Yasevich @ 2008-09-26 19:55 UTC (permalink / raw)
  To: Brian Haley
  Cc: David Stevens, Alex Sidorenko, fubar, Jeff Garzik,
	netdev@vger.kernel.org, netdev-owner

Brian Haley wrote:
> David Stevens wrote:
>> 1) You're calling mld_send_report() directly, which will send the MLD
>>         report synchronously. It should use the randomized timer (see
>> igmp6_join_group).
>>         A mass failover (e.g., a power event in a cluster) would blast
>> all of these at once,
>>         which is why the randomized timer is required for gratuitous
>> reports. This
>>         should use a randomized timer, like mld_ifc_start_timer(), but
>> joining the
>>         group all by itself will do that.
> 
> Ok, I'll try and change this code to spin through all the multicast
> addresses on the master and call igmp6_join_group() instead.

I think you would need to call igmp6_leave_group before switching the active
and then join group on the new active.

> 
>> 2) There is already a configurable and code for unsolicited neighbor
>> advertisements
>>         when adding an address-- why not use that? In fact, wouldn't
>> just moving the
>>         failing device's address list to the new device do everything
>> you want, since
>>         adding an address already sends unsolicited neighbor
>> advertisements,
>>         joins the solicited node address, etc.? Or am I missing
>> something?
> 
> In this case the address is configured on the bond master, each slave is
> just used for transmit/receive.  While I could have sent an unsolicited
> NA, sending an NS is much easier, especially since it's only notifying
> the switch that the address has moved.

Why?  NS and NA take exactly the same code paths.  The only difference
appears to be source address lookup, but you don't need to worry about it
since you know the source address ahead of time.

-vlad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-09-25 15:42             ` Brian Haley
@ 2008-10-01  5:53               ` Simon Horman
  2008-10-01 13:24                 ` Brian Haley
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Horman @ 2008-10-01  5:53 UTC (permalink / raw)
  To: Brian Haley
  Cc: Jay Vosburgh, Vlad Yasevich, Alex Sidorenko, Jeff Garzik, netdev

On Thu, Sep 25, 2008 at 11:42:57AM -0400, Brian Haley wrote:
> Jay Vosburgh wrote:
>> Brian Haley <brian.haley@hp.com> wrote:
>>
>>> This is an RFC patch to add better IPv6 failover support for bonding
>>> devices, especially when in active-backup mode, as reported by Alex
>>> Sidorenko.
>>>
>>> What this patch does:
>>>
>>> - Creates a new Kconfig option in the IPv6 Networking section to
>>>  compile-in the support in the bonding driver.  This also forces
>>>  IPV6=y since that's required to link everything.
>>
>> 	I think it's probably better to have the IPV6 dependent bits
>> somehow depend on CONFIG_IPV6 rather than having a Kconfig entry.  I
>> doubt that many real-world users will say yes to IPv6 and bonding, but
>> no to the bonding IPv6 support.  I also suspect that the IPV6=y
>> requirement won't fly with distros.
>
> I'm sure there's a way to do this better, for example, SCTP can be built  
> as a module with IPv6 support and have IPV6=m.  I'll try to make it work  
> without the option when IPV6=y or m.

Hi,

I took a bit of a stab at this, and here is what I cam up with.

It should enable IPV6_BONDING if one of the following is true
  * EXPERIMENTAL=y && IPV6=m && BONDING=m
  * EXPERIMENTAL=y && IPV6=y && BONDING=m
  * EXPERIMENTAL=y && IPV6=y && BONDING=y
And disable IPV6_BONDING in all other cases.

Compile tested only.

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 69c81da..60c06f8 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -61,6 +61,7 @@ config DUMMY
 config BONDING
 	tristate "Bonding driver support"
 	depends on INET
+	select IPV6_BONDING if (IPV6=y || IPV6=BONDING) && EXPERIMENTAL
 	---help---
 	  Say 'Y' or 'M' if you wish to be able to 'bond' multiple Ethernet
 	  Channels together. This is called 'Etherchannel' by Cisco,
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index ec99215..49108be 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -217,4 +217,7 @@ config IPV6_PIMSM_V2
 	  Support for IPv6 PIM multicast routing protocol PIM-SMv2.
 	  If unsure, say N.
 
+config IPV6_BONDING
+	bool
+
 endif # IPV6

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-10-01  5:53               ` Simon Horman
@ 2008-10-01 13:24                 ` Brian Haley
  2008-10-01 13:36                   ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Haley @ 2008-10-01 13:24 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jay Vosburgh, Vlad Yasevich, Alex Sidorenko, Jeff Garzik, netdev

Hi Simon,

Simon Horman wrote:
> I took a bit of a stab at this, and here is what I cam up with.
> 
> It should enable IPV6_BONDING if one of the following is true
>   * EXPERIMENTAL=y && IPV6=m && BONDING=m
>   * EXPERIMENTAL=y && IPV6=y && BONDING=m
>   * EXPERIMENTAL=y && IPV6=y && BONDING=y
> And disable IPV6_BONDING in all other cases.
> 
> Compile tested only.
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 69c81da..60c06f8 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -61,6 +61,7 @@ config DUMMY
>  config BONDING
>  	tristate "Bonding driver support"
>  	depends on INET
> +	select IPV6_BONDING if (IPV6=y || IPV6=BONDING) && EXPERIMENTAL
>  	---help---
>  	  Say 'Y' or 'M' if you wish to be able to 'bond' multiple Ethernet
>  	  Channels together. This is called 'Etherchannel' by Cisco,
> diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
> index ec99215..49108be 100644
> --- a/net/ipv6/Kconfig
> +++ b/net/ipv6/Kconfig
> @@ -217,4 +217,7 @@ config IPV6_PIMSM_V2
>  	  Support for IPv6 PIM multicast routing protocol PIM-SMv2.
>  	  If unsure, say N.
>  
> +config IPV6_BONDING
> +	bool
> +
>  endif # IPV6

I actually figured out how to do this without having a new Kconfig 
option at all, and building bonding_ipv6.c without getting a linker 
error.  I'll send out an updated patch soon, just sorting out the vlan 
support right now.

Thanks,

-Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] bonding: add better ipv6 failover support
  2008-10-01 13:24                 ` Brian Haley
@ 2008-10-01 13:36                   ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2008-10-01 13:36 UTC (permalink / raw)
  To: brian.haley
  Cc: horms, fubar, vladislav.yasevich, alexandre.sidorenko, jeff,
	netdev

From: Brian Haley <brian.haley@hp.com>
Date: Wed, 01 Oct 2008 09:24:08 -0400

> I actually figured out how to do this without having a new Kconfig
> option at all, and building bonding_ipv6.c without getting a linker
> error.  I'll send out an updated patch soon, just sorting out the
> vlan support right now.
> 

Great, I'll wait for that meanwhile, thanks everyone.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-10-01 13:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-15 17:35 Bonding and Neighbour Discovery on IPv6-only devices Alex Sidorenko
2008-09-15 18:00 ` Jeff Garzik
2008-09-15 18:16   ` Jay Vosburgh
2008-09-15 18:16   ` Alex Sidorenko
2008-09-24 16:58     ` Vlad Yasevich
2008-09-24 20:29       ` Jay Vosburgh
2008-09-24 21:07         ` Brian Haley
2008-09-25  2:46         ` [RFC] bonding: add better ipv6 failover support Brian Haley
2008-09-25 15:07           ` Jay Vosburgh
2008-09-25 15:42             ` Brian Haley
2008-10-01  5:53               ` Simon Horman
2008-10-01 13:24                 ` Brian Haley
2008-10-01 13:36                   ` David Miller
2008-09-26 18:51           ` David Stevens
2008-09-26 19:09             ` Jay Vosburgh
2008-09-26 19:28             ` Brian Haley
2008-09-26 19:55               ` Vlad Yasevich
2008-09-26 19:46             ` Vlad Yasevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).