All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@redhat.com>
To: Santiago Garcia Mantinan <manty@manty.net>
Cc: netdev@vger.kernel.org
Subject: Re: bonding + arp monitoring fails if interface is a vlan
Date: Fri, 02 Aug 2013 13:58:29 +0200	[thread overview]
Message-ID: <51FB9EE5.3040907@redhat.com> (raw)
In-Reply-To: <20130801121142.GA444@www.manty.net>

[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]

On 08/01/2013 02:11 PM, Santiago Garcia Mantinan wrote:
> Hi!
> 
> I'm trying to setup a bond of a couple of vlans, these vlans are different
> paths to an upstream switch from a local switch.  I want to do arp
> monitoring of the link in order for the bonding interface to know which path
> is ok and wich one is broken.  If I set it up using arp monitoring and
> without using vlans it works ok, it also works if I set it up using vlans
> but without arp monitoring, so the broken setup seems to be with bonding +
> arp monitoring + vlans. Here is a schema:
> 
>  -------------
> |Remote Switch|
>  -------------
>    |      |
>    P      P
>    A      A
>    T      T
>    H      H
>    1      2
>    |      |
>  ------------
> |Local switch|
>  ------------
>       |
>       | VLAN for PATH1
>       | VLAN for PATH2
>       |
>  Linux machine
> 
> The broken setup seems to work but arp monitoring makes it loose the logical
> link from time to time, thus changing to other slave if available.  What I
> saw when monitoring this with tcpdump is that all the arp requests were
> going out and that all the replies where coming in, so acording to the
> traffic seen on tcpdump the link should have been stable, but
> /proc/net/bonding/bond0 showed the link failures increasing and when testing
> with just a vlan interface I was loosing ping when the link was going down.
> 
> I've tried this on Debian wheezy with its 3.2.46 kernel and also the 3.10.3
> version in unstable, the tests where done on a couple of machines using a 32
> bits kernel with different nics (r8169 and skge).
> 
> I created a small lab to replicate the problem, on this setup I avoided all
> the switching and I directly connected the machine with bonding to another
> Linux on which I just had eth0.1002 configured with ip 192.168.1.1, the
> results where the same as in the full scenario, link on the bonding slave
> was going down from time to time.
> 
> This is the setup on the bonding interface.
> 
> auto bond0
> iface bond0 inet static
>         address 192.168.1.2
>         netmask 255.255.255.0
>         bond-slaves eth0.1002
>         bond-mode active-backup
>         bond-arp_validate 0
>         bond-arp_interval 5000
>         bond-arp_ip_target 192.168.1.1
>         pre-up ip link set eth0 up || true
>         pre-up ip link add link eth0 name eth0.1002 type vlan id 1002 || true
>         down ip link delete eth0.1002 || true
> 
I believe that it is because dev_trans_start() returns 0 for 8021q devices and
so the calculations if the slave has transmitted are wrong, and the flip-flop
happens.
Please try the attached patch, it should resolve your issue (basically it gets
the dev_trans_start of the vlan's underlying device if a vlan is found).

The patch is against Linus' tree.

Cheers,
 Nik



[-- Attachment #2: bond-trans-start.patch --]
[-- Type: text/x-patch, Size: 1729 bytes --]

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 07f257d4..6aac0ae 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -665,6 +665,16 @@ static int bond_check_dev_link(struct bonding *bond,
 	return reporting ? -1 : BMSR_LSTATUS;
 }
 
+static unsigned long bond_dev_trans_start(struct net_device *dev)
+{
+        struct net_device *real_dev = dev;
+
+        if (dev->priv_flags & IFF_802_1Q_VLAN)
+                real_dev = vlan_dev_real_dev(dev);
+
+        return dev_trans_start(real_dev);
+}
+
 /*----------------------------- Multicast list ------------------------------*/
 
 /*
@@ -2750,7 +2760,7 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
 	 *       so it can wait
 	 */
 	bond_for_each_slave(bond, slave, i) {
-		unsigned long trans_start = dev_trans_start(slave->dev);
+		unsigned long trans_start = bond_dev_trans_start(slave->dev);
 
 		if (slave->link != BOND_LINK_UP) {
 			if (time_in_range(jiffies,
@@ -2912,7 +2922,7 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
 		 * - (more than 2*delta since receive AND
 		 *    the bond has an IP address)
 		 */
-		trans_start = dev_trans_start(slave->dev);
+		trans_start = bond_dev_trans_start(slave->dev);
 		if (bond_is_active_slave(slave) &&
 		    (!time_in_range(jiffies,
 			trans_start - delta_in_ticks,
@@ -2947,7 +2957,7 @@ static void bond_ab_arp_commit(struct bonding *bond, int delta_in_ticks)
 			continue;
 
 		case BOND_LINK_UP:
-			trans_start = dev_trans_start(slave->dev);
+			trans_start = bond_dev_trans_start(slave->dev);
 			if ((!bond->curr_active_slave &&
 			     time_in_range(jiffies,
 					   trans_start - delta_in_ticks,

  parent reply	other threads:[~2013-08-02 12:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-01 12:11 bonding + arp monitoring fails if interface is a vlan Santiago Garcia Mantinan
2013-08-01 13:00 ` Erik Hugne
2013-08-02  7:26   ` Santiago Garcia Mantinan
2013-08-02  9:33     ` Santiago Garcia Mantinan
2013-08-01 20:21 ` Veaceslav Falico
2013-08-02  7:30   ` Santiago Garcia Mantinan
2013-08-02 11:58 ` Nikolay Aleksandrov [this message]
2013-08-02 15:49   ` Jay Vosburgh
2013-08-02 16:13     ` Nikolay Aleksandrov
2013-08-04 10:45   ` Santiago Garcia Mantinan
2013-08-05 10:26     ` Santiago Garcia Mantinan
2013-08-05 10:26       ` Nikolay Aleksandrov
2013-08-07  7:26         ` Santiago Garcia Mantinan
2013-08-07  7:39           ` Nikolay Aleksandrov
2013-08-07 10:44             ` Santiago Garcia Mantinan
2013-08-20  8:05               ` Santiago Garcia Mantinan
2013-08-20 10:11                 ` Nikolay Aleksandrov
2013-08-21  7:39                   ` Santiago Garcia Mantinan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51FB9EE5.3040907@redhat.com \
    --to=nikolay@redhat.com \
    --cc=manty@manty.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.