All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jarod Wilson <jarod@redhat.com>
To: Veaceslav Falico <vfalico@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	Uwe Koziolek <uwe.koziolek@redknee.com>,
	Jay Vosburgh <j.vosburgh@gmail.com>,
	Andy Gospodarek <gospo@cumulusnetworks.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH] net/bonding: send arp in interval if no active slave
Date: Mon, 17 Aug 2015 13:12:23 -0400	[thread overview]
Message-ID: <55D215F7.3080905@redhat.com> (raw)
In-Reply-To: <20150817165500.GA21512@vps.falico.eu>

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
>> From: Uwe Koziolek <uwe.koziolek@redknee.com>
>>
>> With some very finicky switch hardware, active backup bonding can get
>> into
>> a situation where we play ping-pong between interfaces, trying to get one
>> to come up as the active slave. There seems to be an issue with the
>> switch's arp replies either taking too long, or simply getting lost,
>> so we
>> wind up unable to get any interface up and active. Sometimes, the issue
>> sorts itself out after a while, sometimes it doesn't.
>>
>> Testing with num_grat_arp has proven fruitless, but sending an additional
>> arp on curr_arp_slave if we're still in the arp_interval timeslice in
>> bond_ab_arp_probe(), has shown to produce 100% reliability in testing
>> with
>> this hardware combination.
>
> Sorry, I don't understand the logic of why it works, and what exactly are
> we fixiing here.
>
> It also breaks completely the logic for link state management in case of no
> current active slave for 2*arp_interval.
>
> Could you please elaborate what exactly is fixed here, and how it works? :)

I can either duplicate some information from the bug, or Uwe can, to 
illustrate the exact nature of the problem.

> p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was 
explored, and it didn't help any. I don't actually have a reproducer 
setup of my own, unfortunately, so I'm kind of caught in the middle here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp 
settings were tried that didn't help? I'm still of the mind that if 
num_grat_arp *didn't* help, we probably need to do something keyed off 
num_grat_arp.


>> [jarod: manufacturing of changelog]
>> CC: Jay Vosburgh <j.vosburgh@gmail.com>
>> CC: Veaceslav Falico <vfalico@gmail.com>
>> CC: Andy Gospodarek <gospo@cumulusnetworks.com>
>> CC: netdev@vger.kernel.org
>> Signed-off-by: Uwe Koziolek <uwe.koziolek@redknee.com>
>> Signed-off-by: Jarod Wilson <jarod@redhat.com>
>> ---
>> drivers/net/bonding/bond_main.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c
>> b/drivers/net/bonding/bond_main.c
>> index 0c627b4..60b9483 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding
>> *bond)
>>             return should_notify_rtnl;
>>     }
>>
>> +    if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) {
>> +        bond_arp_send_all(bond, curr_arp_slave);
>> +        return should_notify_rtnl;
>> +    }
>> +
>>     bond_set_slave_inactive_flags(curr_arp_slave,
>> BOND_SLAVE_NOTIFY_LATER);
>>
>>     bond_for_each_slave_rcu(bond, slave, iter) {
>> --
>> 1.8.3.1
>>


-- 
Jarod Wilson
jarod@redhat.com

  reply	other threads:[~2015-08-17 17:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-17 16:23 [PATCH] net/bonding: send arp in interval if no active slave Jarod Wilson
2015-08-17 16:55 ` Veaceslav Falico
2015-08-17 17:12   ` Jarod Wilson [this message]
2015-08-17 18:56     ` Uwe Koziolek
2015-08-17 19:14       ` Jay Vosburgh
2015-08-17 20:51         ` Uwe Koziolek
2015-08-31 22:21           ` Jarod Wilson
2015-09-01 23:15             ` Uwe Koziolek
2015-09-01 15:41           ` Andy Gospodarek
2015-09-01 23:10             ` Uwe Koziolek
2015-09-03 15:05               ` Jay Vosburgh
2015-09-04 11:04                 ` Uwe Koziolek
2015-09-28 13:31                   ` Jarod Wilson
2015-10-06 19:53                     ` [PATCH v4] " Jarod Wilson
2015-10-06 19:58                       ` Jarod Wilson
2015-10-07 12:03                       ` Nikolay Aleksandrov
2015-10-07 13:29                         ` Jarod Wilson
2015-10-09 14:36                           ` Jarod Wilson
2015-10-09 15:25                             ` Nikolay Aleksandrov
2015-10-09 15:31                             ` Jay Vosburgh
2015-10-12 15:33                               ` Jarod Wilson
2015-10-30 18:59                           ` Uwe Koziolek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D215F7.3080905@redhat.com \
    --to=jarod@redhat.com \
    --cc=gospo@cumulusnetworks.com \
    --cc=j.vosburgh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=uwe.koziolek@redknee.com \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.