netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jarod Wilson <jarod@redhat.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
	linux-kernel@vger.kernel.org,
	Uwe Koziolek <uwe.koziolek@redknee.com>,
	Andy Gospodarek <gospo@cumulusnetworks.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH v4] net/bonding: send arp in interval if no active slave
Date: Mon, 12 Oct 2015 11:33:03 -0400	[thread overview]
Message-ID: <561BD2AF.5040803@redhat.com> (raw)
In-Reply-To: <11019.1444404713@famine>

Jay Vosburgh wrote:
> Jarod Wilson<jarod@redhat.com>  wrote:
>
>> Jarod Wilson wrote:
>> ...
>>> As Andy already stated I'm not a fan of such workarounds either but it's
>>> necessary sometimes so if this is going to be actually considered then a
>>> few things need to be fixed. Please make this a proper bonding option
>>> which can be changed at runtime and not only via a module parameter.
>> Is there any particular userspace tool that would need some updating, or
>> is adding the sysfs knobs sufficient here? I think I've got all the sysfs
>> stuff thrown together now, but still need to test.
>
> 	Most (all?) bonding options should be configurable via iproute
> (netlink) now.

D'oh, of course. I've done the kernel-side netlink bits now too, and 
started looking at the iproute source. However...


>>>> Now, I saw that you've only tested with 500 ms, can't this be fixed by
>>>> using
>>>> a different interval ? This seems like a very specific problem to have a
>>>> whole new option for.
>>> ...I'll wait until we've heard confirmation from Uwe that intervals
>>> other than 500ms don't fix things.
>> Okay, so I believe the "only tested with 500ms" was in reference to
>> testing with Uwe's initial patch. I do have supporting evidence in a
>> bugzilla report that shows upwards of 5000ms still experience the problem
>> here.
>
> 	I did set up some switches and attempt to reproduce this
> yesterday; I daisy-chained three switches (two Cisco and an HP) together
> and connected the bonded interfaces to the "end" switches.  I tried
> various ARP targets (the switch, hosts on various points of the switch)
> and varying arp_intervals and was unable to reproduce the problem.
>
> 	As I understand it, the working theory is something like this:
>
> 	- host with two bonded interfaces, A and B.  For active-backup
> mode, the interfaces have been assigned the same MAC address.
>
> 	- switch has MAC for B in its forwarding table
>
> 	- bonding goes from down to up, and thinks all its slaves are
> down, and starts the "curr_arp_slave" search for an active
> arp_ip_target.  In this case, it starts with A, and sends an ARP from A.
>
> 	As an aside, I'm not 100% clear on what exactly is going on in
> the "bonding goes from down to up" transition; this seems to be key in
> reproducing the issue.
>
> 	- switch sees source mac coming from port A, starts to update
> its forwarding table
>
> 	- meanwhile, switch forwards ARP request, and receives ARP
> reply, which it forwards to port B.  Bonding drops this, as the slave is
> inactive.
>
> 	- switch finishes updating forwarding table, MAC is now assigned
> to port A.
>
> 	- bonding now tries sending on port B, and the cycle repeats.
>
> 	If this is what's taking place, then the arp_interval itself is
> irrelevant, the race is between the switch table update and the
> generation of the ARP reply.
>
> 	Also, presuming the above is what's going on, we could modify
> the ARP "curr_arp_slave" logic a bit to resolve this without requiring
> any magic knobs.

I really like this idea. Still trying to grasp exactly how we get into 
this situation and what everything looks like as we hop through the 
various bond_ab_arp_* functions though.

> 	For example, we could change the "drop on inactive" logic to
> recognise the "curr_arp_slave" search and accept the unicast ARP reply,
> and perhaps make that receiving slave the next curr_arp_slave
> automatically.

Nothing ever actually getting picked as curr_arp_slave does appear to be 
the problem, so that does sound like it could do the trick.

> 	I also wonder if the fail_over_mac option would affect this
> behavior, as it would cause the slaves to keep their MAC address for the
> duration, so the switch would not see the MAC move from port to port.

Not sure if that's an option for the particular environment, but we 
could certainly ask Uwe to give it a try.

> 	Another thought would be to have the curr_arp_slave cycle
> through the slaves in random order, but that could create
> non-deterministic results even when things are working correctly.

I'd say avoid this route if at all possible, would rather not make 
things less predictable.

-- 
Jarod Wilson
jarod@redhat.com

  reply	other threads:[~2015-10-12 15:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-17 16:23 [PATCH] net/bonding: send arp in interval if no active slave Jarod Wilson
2015-08-17 16:55 ` Veaceslav Falico
2015-08-17 17:12   ` Jarod Wilson
2015-08-17 18:56     ` Uwe Koziolek
2015-08-17 19:14       ` Jay Vosburgh
2015-08-17 20:51         ` Uwe Koziolek
2015-08-31 22:21           ` Jarod Wilson
2015-09-01 23:15             ` Uwe Koziolek
2015-09-01 15:41           ` Andy Gospodarek
2015-09-01 23:10             ` Uwe Koziolek
2015-09-03 15:05               ` Jay Vosburgh
2015-09-04 11:04                 ` Uwe Koziolek
2015-09-28 13:31                   ` Jarod Wilson
2015-10-06 19:53                     ` [PATCH v4] " Jarod Wilson
2015-10-06 19:58                       ` Jarod Wilson
2015-10-07 12:03                       ` Nikolay Aleksandrov
2015-10-07 13:29                         ` Jarod Wilson
2015-10-09 14:36                           ` Jarod Wilson
2015-10-09 15:25                             ` Nikolay Aleksandrov
2015-10-09 15:31                             ` Jay Vosburgh
2015-10-12 15:33                               ` Jarod Wilson [this message]
2015-10-30 18:59                           ` Uwe Koziolek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561BD2AF.5040803@redhat.com \
    --to=jarod@redhat.com \
    --cc=gospo@cumulusnetworks.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=uwe.koziolek@redknee.com \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).