* Re: bonding: time limits too tight in bond_ab_arp_inspect
2012-08-22 18:42 ` Jay Vosburgh
@ 2012-08-22 18:58 ` Chris Friesen
2012-08-23 7:34 ` Petr Tesarik
2012-08-30 22:02 ` [PATCH] bonding: add some slack to arp monitoring time limits Jiri Bohac
2 siblings, 0 replies; 7+ messages in thread
From: Chris Friesen @ 2012-08-22 18:58 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Jiri Bohac, Andy Gospodarek, netdev, Petr Tesarik
On 08/22/2012 12:42 PM, Jay Vosburgh wrote:
> Chris Friesen<chris.friesen@genband.com> wrote:
>
>> On 08/22/2012 11:45 AM, Jiri Bohac wrote:
>>
>>> This code is run from bond_activebackup_arp_mon() about
>>> delta_in_ticks jiffies after the previous ARP probe has been
>>> sent. If the delayed work gets executed exactly in delta_in_ticks
>>> jiffies, there is a chance the slave will be brought up. If the
>>> delayed work runs one jiffy later, the slave will stay down.
>
> Presumably the ARP reply is coming back in less than one jiffy,
> then, so the slave_last_rx() value is the same jiffy as when the
> _inspect was previously called?
>
>> <snip>
>>
>>> Should they perhaps all be increased by, say, delta_in_ticks/2, to make this
>>> less dependent on the current scheduling latencies?
>>
>> We have been using a patch that tracks the arpmon requested sleep time vs
>> the actual sleep time and adds any scheduling latency to the allowed
>> delta. That way if we sleep too long due to scheduling latency it doesn't
>> affect the calculation.
>
> How much scheduling latency do you see?
>
> Is that really better than just permitting a bit more slack in
> the timing window?
We hit enough latency that it triggered arpmon to falsely mark multiple
links as lost. This triggered our system maintenance code to go into a
"oh no we can't talk to the outside world" secenario, which does fairly
intrusive things to try and bring connectivity back up. Basically a bad
thing to happen just because of a random scheduler latency spike.
I should note that we added this some time back and are still running
older kernels so I have no idea what latency on modern kernels is like.
Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: bonding: time limits too tight in bond_ab_arp_inspect
2012-08-22 18:42 ` Jay Vosburgh
2012-08-22 18:58 ` Chris Friesen
@ 2012-08-23 7:34 ` Petr Tesarik
2012-08-30 22:02 ` [PATCH] bonding: add some slack to arp monitoring time limits Jiri Bohac
2 siblings, 0 replies; 7+ messages in thread
From: Petr Tesarik @ 2012-08-23 7:34 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Chris Friesen, Jiri Bohac, Andy Gospodarek, netdev
Dne St 22. srpna 2012 20:42:02 Jay Vosburgh napsal(a):
> Chris Friesen <chris.friesen@genband.com> wrote:
> >On 08/22/2012 11:45 AM, Jiri Bohac wrote:
> >> This code is run from bond_activebackup_arp_mon() about
> >> delta_in_ticks jiffies after the previous ARP probe has been
> >> sent. If the delayed work gets executed exactly in delta_in_ticks
> >> jiffies, there is a chance the slave will be brought up. If the
> >> delayed work runs one jiffy later, the slave will stay down.
>
> Presumably the ARP reply is coming back in less than one jiffy,
> then, so the slave_last_rx() value is the same jiffy as when the
> _inspect was previously called?
Yes, that's what happens. Keep in mind that the backup slave validates the
original ARP query, so on a fast network, you get it more or less immediately
(for my case, I can see a delay of ~70us).
Anyway, why do we have to wait until the next ARP send? Couldn't we simply
kick the work queue when we receive a valid packet on a down interface?
Petr Tesarik
SUSE Linux
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] bonding: add some slack to arp monitoring time limits
2012-08-22 18:42 ` Jay Vosburgh
2012-08-22 18:58 ` Chris Friesen
2012-08-23 7:34 ` Petr Tesarik
@ 2012-08-30 22:02 ` Jiri Bohac
2012-08-31 20:37 ` David Miller
2 siblings, 1 reply; 7+ messages in thread
From: Jiri Bohac @ 2012-08-30 22:02 UTC (permalink / raw)
To: Jay Vosburgh
Cc: Chris Friesen, Jiri Bohac, Andy Gospodarek, netdev, Petr Tesarik,
davem
Currently, all the time limits in the bonding ARP monitor are in
multiples of arp_interval -- the time interval at which the ARP
monitor is periodically scheduled.
With a fast network round-trip and a little scheduling latency
of the ARP monitor work, a limit of n*delta_in_ticks may
effectively mean (n-1)*delta_in_ticks.
This is fatal in case of n==1 (the link will stay down
forever) and makes the behaviour non-deterministic in all the
other cases.
Add a delta_in_ticks/2 time slack to all the time limits.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6fae5f3..0f04115 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2813,12 +2813,13 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
arp_work.work);
struct slave *slave, *oldcurrent;
int do_failover = 0;
- int delta_in_ticks;
+ int delta_in_ticks, extra_ticks;
int i;
read_lock(&bond->lock);
delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
+ extra_ticks = delta_in_ticks / 2;
if (bond->slave_cnt == 0)
goto re_arm;
@@ -2841,10 +2842,10 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
if (slave->link != BOND_LINK_UP) {
if (time_in_range(jiffies,
trans_start - delta_in_ticks,
- trans_start + delta_in_ticks) &&
+ trans_start + delta_in_ticks + extra_ticks) &&
time_in_range(jiffies,
slave->dev->last_rx - delta_in_ticks,
- slave->dev->last_rx + delta_in_ticks)) {
+ slave->dev->last_rx + delta_in_ticks + extra_ticks)) {
slave->link = BOND_LINK_UP;
bond_set_active_slave(slave);
@@ -2874,10 +2875,10 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
*/
if (!time_in_range(jiffies,
trans_start - delta_in_ticks,
- trans_start + 2 * delta_in_ticks) ||
+ trans_start + 2 * delta_in_ticks + extra_ticks) ||
!time_in_range(jiffies,
slave->dev->last_rx - delta_in_ticks,
- slave->dev->last_rx + 2 * delta_in_ticks)) {
+ slave->dev->last_rx + 2 * delta_in_ticks + extra_ticks)) {
slave->link = BOND_LINK_DOWN;
bond_set_backup_slave(slave);
@@ -2935,6 +2936,14 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
struct slave *slave;
int i, commit = 0;
unsigned long trans_start;
+ int extra_ticks;
+
+ /* All the time comparisons below need some extra time. Otherwise, on
+ * fast networks the ARP probe/reply may arrive within the same jiffy
+ * as it was sent. Then, the next time the ARP monitor is run, one
+ * arp_interval will already have passed in the comparisons.
+ */
+ extra_ticks = delta_in_ticks / 2;
bond_for_each_slave(bond, slave, i) {
slave->new_link = BOND_LINK_NOCHANGE;
@@ -2942,7 +2951,7 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
if (slave->link != BOND_LINK_UP) {
if (time_in_range(jiffies,
slave_last_rx(bond, slave) - delta_in_ticks,
- slave_last_rx(bond, slave) + delta_in_ticks)) {
+ slave_last_rx(bond, slave) + delta_in_ticks + extra_ticks)) {
slave->new_link = BOND_LINK_UP;
commit++;
@@ -2958,7 +2967,7 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
*/
if (time_in_range(jiffies,
slave->jiffies - delta_in_ticks,
- slave->jiffies + 2 * delta_in_ticks))
+ slave->jiffies + 2 * delta_in_ticks + extra_ticks))
continue;
/*
@@ -2978,7 +2987,7 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
!bond->current_arp_slave &&
!time_in_range(jiffies,
slave_last_rx(bond, slave) - delta_in_ticks,
- slave_last_rx(bond, slave) + 3 * delta_in_ticks)) {
+ slave_last_rx(bond, slave) + 3 * delta_in_ticks + extra_ticks)) {
slave->new_link = BOND_LINK_DOWN;
commit++;
@@ -2994,10 +3003,10 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
if (bond_is_active_slave(slave) &&
(!time_in_range(jiffies,
trans_start - delta_in_ticks,
- trans_start + 2 * delta_in_ticks) ||
+ trans_start + 2 * delta_in_ticks + extra_ticks) ||
!time_in_range(jiffies,
slave_last_rx(bond, slave) - delta_in_ticks,
- slave_last_rx(bond, slave) + 2 * delta_in_ticks))) {
+ slave_last_rx(bond, slave) + 2 * delta_in_ticks + extra_ticks))) {
slave->new_link = BOND_LINK_DOWN;
commit++;
@@ -3029,7 +3038,7 @@ static void bond_ab_arp_commit(struct bonding *bond, int delta_in_ticks)
if ((!bond->curr_active_slave &&
time_in_range(jiffies,
trans_start - delta_in_ticks,
- trans_start + delta_in_ticks)) ||
+ trans_start + delta_in_ticks + delta_in_ticks / 2)) ||
bond->curr_active_slave != slave) {
slave->link = BOND_LINK_UP;
if (bond->current_arp_slave) {
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ
^ permalink raw reply related [flat|nested] 7+ messages in thread