netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <fubar@us.ibm.com>
To: Andy Gospodarek <andy@greyhouse.net>
Cc: netdev@vger.kernel.org, lhh@redhat.com,
	bonding-devel@lists.sourceforge.net
Subject: Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode
Date: Thu, 25 Mar 2010 15:31:11 -0700	[thread overview]
Message-ID: <24080.1269556271@death.nxdomain.ibm.com> (raw)
In-Reply-To: <20100325214033.GA28741@gospo.rdu.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

>Round-robin (mode 0) does nothing to ensure that any multicast traffic
>originally destined for the host will continue to arrive at the host when
>the link that sent the IGMP join or membership report goes down.  One of
>the benefits of absolute round-robin transmit.
>
>Keeping track of subscribed multicast groups for each slave did not seem
>like a good use of resources, so I decided to simply send on the
>curr_active slave of the bond (typically the first enslaved device that
>is up).  This makes failover management simple as IGMP membership
>reports only need to be sent when the curr_active_slave changes.  I
>tested this patch and it appears to work as expected.
>
>Originally reported by Lon Hohberger <lhh@redhat.com>.
>
>Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

	Seems reasonable, modulo a couple of minor things (see below).

	I checked, and the link failover logic appears to maintain
curr_active_slave even for round robin mode, which, prior to this patch,
didn't use it.

>CC: Lon Hohberger <lhh@redhat.com>
>CC: Jay Vosburgh <fubar@us.ibm.com>
>
>---
> drivers/net/bonding/bond_main.c |   34 ++++++++++++++++++++++++++--------
> 1 files changed, 26 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 430c022..0b38455 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1235,6 +1235,11 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
> 			write_lock_bh(&bond->curr_slave_lock);
> 		}
> 	}
>+
>+	/* resend IGMP joins since all were sent on curr_active_slave */
>+	if (bond->params.mode == BOND_MODE_ROUNDROBIN) {
>+		bond_resend_igmp_join_requests(bond);
>+	}
> }
>
> /**
>@@ -4138,22 +4143,35 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct slave *slave, *start_at;
> 	int i, slave_no, res = 1;
>+	struct iphdr *iph = ip_hdr(skb);
>
> 	read_lock(&bond->lock);
>
> 	if (!BOND_IS_OK(bond))
> 		goto out;
>-
> 	/*
>-	 * Concurrent TX may collide on rr_tx_counter; we accept that
>-	 * as being rare enough not to justify using an atomic op here
>+	 * Start with the curr_active_slave that joined the bond as the
>+	 * default for sending IGMP traffic.  For failover purposes one
>+	 * needs to maintain some consistency for the interface that will
>+	 * send the join/membership reports.  The curr_active_slave found
>+	 * will send all of this type of traffic.
> 	 */
>-	slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
>+	if ((skb->protocol == htons(ETH_P_IP)) &&
>+	    (iph->protocol == htons(IPPROTO_IGMP))) {
>+		slave = bond->curr_active_slave;

	Technically, this should acquire bond->curr_slave_lock for read
around the inspection of curr_active_slave.

	I believe you'll also want a test for curr_active_slave == NULL,
and free the skb if so (or do something else).  There's a race window in
bond_release: when releasing the curr_active_slave, the field is left
momentarily NULL with the bond unlocked.  This occurs after the
bond_change_active_slave(bond, NULL) call, during the lock dance prior
to the call bond_select_active_slave:

bond_main.c:bond_release():
[...]
	if (oldcurrent == slave)
		bond_change_active_slave(bond, NULL);
[...]
	if (oldcurrent == slave) {
		/*
		 * Note that we hold RTNL over this sequence, so there
		 * is no concern that another slave add/remove event
		 * will interfere.
		 */
		write_unlock_bh(&bond->lock);

		[ race window is here ]

		read_lock(&bond->lock);
		write_lock_bh(&bond->curr_slave_lock);

		bond_select_active_slave(bond);

		write_unlock_bh(&bond->curr_slave_lock);
		read_unlock(&bond->lock);
		write_lock_bh(&bond->lock);
	}

	I'm reasonably sure the other TX functions (that need to) will
handle the case that curr_active_slave is NULL.

>+	} else {
>+		/*
>+		 * Concurrent TX may collide on rr_tx_counter; we accept
>+		 * that as being rare enough not to justify using an
>+		 * atomic op here.
>+		 */
>+		slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
>
>-	bond_for_each_slave(bond, slave, i) {
>-		slave_no--;
>-		if (slave_no < 0)
>-			break;
>+		bond_for_each_slave(bond, slave, i) {
>+			slave_no--;
>+			if (slave_no < 0)
>+				break;
>+		}
> 	}
>
> 	start_at = slave;

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

  reply	other threads:[~2010-03-25 22:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-25 21:40 [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode Andy Gospodarek
2010-03-25 22:31 ` Jay Vosburgh [this message]
2010-03-26  0:49   ` [net-2.6 PATCH v2] " Andy Gospodarek
2010-03-26  0:55     ` Jay Vosburgh
2010-03-31  9:08 ` [net-2.6 PATCH] " Eric Dumazet
2010-03-31 14:49   ` Andy Gospodarek
2010-03-31 15:14     ` Eric Dumazet
2010-03-31 21:00       ` David Miller
2010-03-31 21:23         ` Eric Dumazet
2010-03-31 21:25           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24080.1269556271@death.nxdomain.ibm.com \
    --to=fubar@us.ibm.com \
    --cc=andy@greyhouse.net \
    --cc=bonding-devel@lists.sourceforge.net \
    --cc=lhh@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).