All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Friesen <chris.friesen@genband.com>
To: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jiri Bohac <jbohac@suse.cz>, Andy Gospodarek <andy@greyhouse.net>,
	netdev@vger.kernel.org, Petr Tesarik <ptesarik@suse.cz>
Subject: Re: bonding: time limits too tight in bond_ab_arp_inspect
Date: Wed, 22 Aug 2012 12:58:24 -0600	[thread overview]
Message-ID: <50352BD0.3060409@genband.com> (raw)
In-Reply-To: <24655.1345660922@death.nxdomain>

On 08/22/2012 12:42 PM, Jay Vosburgh wrote:
> Chris Friesen<chris.friesen@genband.com>  wrote:
>
>> On 08/22/2012 11:45 AM, Jiri Bohac wrote:
>>
>>> This code is run from bond_activebackup_arp_mon() about
>>> delta_in_ticks jiffies after the previous ARP probe has been
>>> sent. If the delayed work gets executed exactly in delta_in_ticks
>>> jiffies, there is a chance the slave will be brought up.  If the
>>> delayed work runs one jiffy later, the slave will stay down.
>
> 	Presumably the ARP reply is coming back in less than one jiffy,
> then, so the slave_last_rx() value is the same jiffy as when the
> _inspect was previously called?
>
>> <snip>
>>
>>> Should they perhaps all be increased by, say, delta_in_ticks/2, to make this
>>> less dependent on the current scheduling latencies?
>>
>> We have been using a patch that tracks the arpmon requested sleep time vs
>> the actual sleep time and adds any scheduling latency to the allowed
>> delta.  That way if we sleep too long due to scheduling latency it doesn't
>> affect the calculation.
>
> 	How much scheduling latency do you see?
>
> 	Is that really better than just permitting a bit more slack in
> the timing window?

We hit enough latency that it triggered arpmon to falsely mark multiple 
links as lost.  This triggered our system maintenance code to go into a 
"oh no we can't talk to the outside world" secenario, which does fairly 
intrusive things to try and bring connectivity back up.  Basically a bad 
thing to happen just because of a random scheduler latency spike.

I should note that we added this some time back and are still running 
older kernels so I have no idea what latency on modern kernels is like.

Chris

  reply	other threads:[~2012-08-22 19:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-22 17:45 bonding: time limits too tight in bond_ab_arp_inspect Jiri Bohac
2012-08-22 17:54 ` Chris Friesen
2012-08-22 18:42   ` Jay Vosburgh
2012-08-22 18:58     ` Chris Friesen [this message]
2012-08-23  7:34     ` Petr Tesarik
2012-08-30 22:02     ` [PATCH] bonding: add some slack to arp monitoring time limits Jiri Bohac
2012-08-31 20:37       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50352BD0.3060409@genband.com \
    --to=chris.friesen@genband.com \
    --cc=andy@greyhouse.net \
    --cc=fubar@us.ibm.com \
    --cc=jbohac@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=ptesarik@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.