netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@redhat.com>
To: Santiago Garcia Mantinan <manty@manty.net>
Cc: netdev@vger.kernel.org
Subject: Re: bonding + arp monitoring fails if interface is a vlan
Date: Fri, 02 Aug 2013 13:58:29 +0200	[thread overview]
Message-ID: <51FB9EE5.3040907@redhat.com> (raw)
In-Reply-To: <20130801121142.GA444@www.manty.net>

[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]

On 08/01/2013 02:11 PM, Santiago Garcia Mantinan wrote:
> Hi!
> 
> I'm trying to setup a bond of a couple of vlans, these vlans are different
> paths to an upstream switch from a local switch.  I want to do arp
> monitoring of the link in order for the bonding interface to know which path
> is ok and wich one is broken.  If I set it up using arp monitoring and
> without using vlans it works ok, it also works if I set it up using vlans
> but without arp monitoring, so the broken setup seems to be with bonding +
> arp monitoring + vlans. Here is a schema:
> 
>  -------------
> |Remote Switch|
>  -------------
>    |      |
>    P      P
>    A      A
>    T      T
>    H      H
>    1      2
>    |      |
>  ------------
> |Local switch|
>  ------------
>       |
>       | VLAN for PATH1
>       | VLAN for PATH2
>       |
>  Linux machine
> 
> The broken setup seems to work but arp monitoring makes it loose the logical
> link from time to time, thus changing to other slave if available.  What I
> saw when monitoring this with tcpdump is that all the arp requests were
> going out and that all the replies where coming in, so acording to the
> traffic seen on tcpdump the link should have been stable, but
> /proc/net/bonding/bond0 showed the link failures increasing and when testing
> with just a vlan interface I was loosing ping when the link was going down.
> 
> I've tried this on Debian wheezy with its 3.2.46 kernel and also the 3.10.3
> version in unstable, the tests where done on a couple of machines using a 32
> bits kernel with different nics (r8169 and skge).
> 
> I created a small lab to replicate the problem, on this setup I avoided all
> the switching and I directly connected the machine with bonding to another
> Linux on which I just had eth0.1002 configured with ip 192.168.1.1, the
> results where the same as in the full scenario, link on the bonding slave
> was going down from time to time.
> 
> This is the setup on the bonding interface.
> 
> auto bond0
> iface bond0 inet static
>         address 192.168.1.2
>         netmask 255.255.255.0
>         bond-slaves eth0.1002
>         bond-mode active-backup
>         bond-arp_validate 0
>         bond-arp_interval 5000
>         bond-arp_ip_target 192.168.1.1
>         pre-up ip link set eth0 up || true
>         pre-up ip link add link eth0 name eth0.1002 type vlan id 1002 || true
>         down ip link delete eth0.1002 || true
> 
I believe that it is because dev_trans_start() returns 0 for 8021q devices and
so the calculations if the slave has transmitted are wrong, and the flip-flop
happens.
Please try the attached patch, it should resolve your issue (basically it gets
the dev_trans_start of the vlan's underlying device if a vlan is found).

The patch is against Linus' tree.

Cheers,
 Nik



[-- Attachment #2: bond-trans-start.patch --]
[-- Type: text/x-patch, Size: 1729 bytes --]

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 07f257d4..6aac0ae 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -665,6 +665,16 @@ static int bond_check_dev_link(struct bonding *bond,
 	return reporting ? -1 : BMSR_LSTATUS;
 }
 
+static unsigned long bond_dev_trans_start(struct net_device *dev)
+{
+        struct net_device *real_dev = dev;
+
+        if (dev->priv_flags & IFF_802_1Q_VLAN)
+                real_dev = vlan_dev_real_dev(dev);
+
+        return dev_trans_start(real_dev);
+}
+
 /*----------------------------- Multicast list ------------------------------*/
 
 /*
@@ -2750,7 +2760,7 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
 	 *       so it can wait
 	 */
 	bond_for_each_slave(bond, slave, i) {
-		unsigned long trans_start = dev_trans_start(slave->dev);
+		unsigned long trans_start = bond_dev_trans_start(slave->dev);
 
 		if (slave->link != BOND_LINK_UP) {
 			if (time_in_range(jiffies,
@@ -2912,7 +2922,7 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
 		 * - (more than 2*delta since receive AND
 		 *    the bond has an IP address)
 		 */
-		trans_start = dev_trans_start(slave->dev);
+		trans_start = bond_dev_trans_start(slave->dev);
 		if (bond_is_active_slave(slave) &&
 		    (!time_in_range(jiffies,
 			trans_start - delta_in_ticks,
@@ -2947,7 +2957,7 @@ static void bond_ab_arp_commit(struct bonding *bond, int delta_in_ticks)
 			continue;
 
 		case BOND_LINK_UP:
-			trans_start = dev_trans_start(slave->dev);
+			trans_start = bond_dev_trans_start(slave->dev);
 			if ((!bond->curr_active_slave &&
 			     time_in_range(jiffies,
 					   trans_start - delta_in_ticks,

  parent reply	other threads:[~2013-08-02 12:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-01 12:11 bonding + arp monitoring fails if interface is a vlan Santiago Garcia Mantinan
2013-08-01 13:00 ` Erik Hugne
2013-08-02  7:26   ` Santiago Garcia Mantinan
2013-08-02  9:33     ` Santiago Garcia Mantinan
2013-08-01 20:21 ` Veaceslav Falico
2013-08-02  7:30   ` Santiago Garcia Mantinan
2013-08-02 11:58 ` Nikolay Aleksandrov [this message]
2013-08-02 15:49   ` Jay Vosburgh
2013-08-02 16:13     ` Nikolay Aleksandrov
2013-08-04 10:45   ` Santiago Garcia Mantinan
2013-08-05 10:26     ` Santiago Garcia Mantinan
2013-08-05 10:26       ` Nikolay Aleksandrov
2013-08-07  7:26         ` Santiago Garcia Mantinan
2013-08-07  7:39           ` Nikolay Aleksandrov
2013-08-07 10:44             ` Santiago Garcia Mantinan
2013-08-20  8:05               ` Santiago Garcia Mantinan
2013-08-20 10:11                 ` Nikolay Aleksandrov
2013-08-21  7:39                   ` Santiago Garcia Mantinan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51FB9EE5.3040907@redhat.com \
    --to=nikolay@redhat.com \
    --cc=manty@manty.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).