Re: [BUG?] bonding, slave selection, carrier loss, etc.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jay Vosburgh <fubar@us.ibm.com>
To: Chris Friesen <chris.friesen@genband.com>
Cc: andy@greyhouse.net, netdev <netdev@vger.kernel.org>
Subject: Re: [BUG?] bonding, slave selection, carrier loss, etc.
Date: Fri, 10 Feb 2012 17:53:53 -0800	[thread overview]
Message-ID: <28766.1328925233@death.nxdomain> (raw)
In-Reply-To: <4F35AC78.3010907@genband.com>

Chris Friesen <chris.friesen@genband.com> wrote:

>I'm resurrecting an ancient discussion I had with Jay, because I think
>the issue described below is still present and the code he talked about
>submitting to close it doesn't appear to have ever gone in.

	Yah, I never got it to work quite right; I don't remember
exactly why.

>Basically in active/backup mode with mii monitoring there is a window
>between the active slave device losing carrier and calling
>netif_carrier_off() and the miimon code actually detecting the loss of
>the carrier and selecting a new active slave.
>
>The best solution would be for bonding to just register for notification
>of the link going down.  Presumably most drivers should be doing that
>properly by now, and for devices that get interrupt-driven notification
>of link status changes this would allow the bonding code to react much
>quicker.

	A quick look at some drivers shows that at least acenic still
doesn't do netif_carrier_off, so converting entirely to a notifier-based
failover mechanism would break drivers that work today.

	Adding a notifier callback as an additional path into something
like bond_miimon_commit may be feasible.

>Barring that, I think something like the following is needed.  This is
>against 2.6.27, but could easily be reworked against current.
>
>
>
>---------------------- drivers/net/bonding/bond_main.c -----------------------
>index 8499558..e4445d8 100644
>@@ -4313,20 +4313,33 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> 	read_lock(&bond->lock);
> 	read_lock(&bond->curr_slave_lock);
>
> 	if (!BOND_IS_OK(bond)) {
> 		goto out;
> 	}
>
> 	if (!bond->curr_active_slave)
> 		goto out;
>
>+	/* Verify that the active slave is actually up before
>+	 * trying to send packets.  If it isn't, then
>+	 * trigger the selection of a new active slave.
>+	 */
>+	if (!IS_UP(bond->curr_active_slave->dev)) {
>+		read_unlock(&bond->curr_slave_lock);
>+		write_lock(&bond->curr_slave_lock);
>+		bond_select_active_slave(bond);
>+		write_unlock(&bond->curr_slave_lock);
>+		read_lock(&bond->curr_slave_lock);
>+		if (!bond->curr_active_slave)
>+			goto out;
>+	}

	The problem here is going to be that bond_select_active_slave()
should be called with RTNL held (because the notifier calls it makes
require RTNL), and I'm not sure it's permissible to acquire RTNL in a
driver transmit function.

	-J	


> 	res = bond_dev_queue_xmit(bond, skb, bond->curr_active_slave->dev);
>
> out:
> 	if (res) {
> 		/* no suitable interface, frame not sent */
> 		dev_kfree_skb(skb);
> 	}
> 	read_unlock(&bond->curr_slave_lock);
> 	read_unlock(&bond->lock);
> 	return 0;
>
>Chris
>
>
>
>
>On 03/27/2009 06:00 PM, Jay Vosburgh wrote:
>> Chris Friesen<cfriesen@nortel.com>  wrote:
>> 
>>> In a much earlier version of the bonding driver we ran into problems
>>> where we could have lost carrier on one of the slaves, but at the time
>>> of xmit the bonding driver hadn't yet switched to a better slave.
>>> Because of this we added a patch very much like the one below.
>>>
>>> A quick glance at the current bonding code would seem to indicate that
>>> there could still be a window between the active slave device losing
>>> carrier and calling netif_carrier_off() and the miimon code actually
>>> detecting the loss of the carrier and selecting a new active slave.
>>> Do I have this correct?  If so, would the patch below be correct?
>> 
>> 	Yes, the window is equal to whatever the monitoring interval is
>> (for miimon) or double the interval for ARP.
>> 
>> 	Your patch, I think, would work, but it's suboptimal in that it
>> only affects one mode, and doesn't resolve any of the bigger issues with
>> the link monitoring system in bonding (see below).  Trying to do the
>> equivalent in other modes may have issues; some modes require RTNL to be
>> held when changing slave states, so it's difficult to do that from the
>> transmit routine.
>> 
>>> On a related note--assuming the net driver can detect link loss and
>>> is properly calling netif_carrier_off() why do we still need to poll
>>> the status in the bonding driver?  Isn't there some way to hook into
>>> the network stack and get notified when the carrier goes down?
>> 
>> 	This is actually something I'm working on now.
>> 
>> 	There are notifier callbacks that are tied to a driver calling
>> netif_carrier_on or _off.  The problem is that a bunch of older (mostly
>> 10/100, although acenic is a gigabit) drivers don't do netif_carrier_on
>> or _off, or check their link state on a ridiculously long interval, so
>> simply dropping the current miimon implementation and replacing it with
>> the event notifier may not be feasible for backwards compatibility
>> reasons.  Heck, I've still got 3c59x and acenic cards in my test
>> systems, neither of which do netif_carrier correctly; I can't be the
>> only one.
>> 
>> 	An additional goal is to permit the state change notifications
>> (or miimon) and the ARP monitor to run concurrently.  Sadly, the current
>> "link state" system can't handle two things simultaneously poking at the
>> slave's link state; if, e.g., ARP says down, but MII/notifiers says up,
>> then the link state can flap, so it needs a sort of "arbitrator."
>> 
>> 	A minor advantage of reworking all of that is that it should end
>> up being less code when all done, and should be more modular, so it'd be
>> easier if somebody wanted to add, say, an ICMP probe monitor.
>> 
>> 	I'll probably be posting an RFC patch next week.
>> 
>> 	-J
>> 
>> ---
>> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>> 
>
>
>-- 
>Chris Friesen
>Software Developer
>GENBAND
>chris.friesen@genband.com
>www.genband.com
>

next prev parent reply	other threads:[~2012-02-11  1:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <49CD5B93.7010407@nortel.com>
     [not found] ` <31087.1238198438@death.nxdomain.ibm.com>
2012-02-10 23:47   ` [BUG?] bonding, slave selection, carrier loss, etc Chris Friesen
2012-02-11  1:53     ` Jay Vosburgh [this message]
2012-02-11 18:52       ` Ben Hutchings
2012-02-13 18:16         ` Chris Friesen
2012-02-13 18:48           ` Stephen Hemminger
2012-02-13 19:18             ` Chris Friesen
2012-02-13 20:24             ` Ben Hutchings
2012-02-13 20:37               ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28766.1328925233@death.nxdomain \
    --to=fubar@us.ibm.com \
    --cc=andy@greyhouse.net \
    --cc=chris.friesen@genband.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).