netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] bonding: fix user-controlled queuing issues
@ 2011-02-21 22:53 Andy Gospodarek
  2011-02-22 15:55 ` Phil Oester
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Gospodarek @ 2011-02-21 22:53 UTC (permalink / raw)
  To: netdev, Ben Hutchings, Jay Vosburgh, Phil Oester

Users noticed the following messages were filling their logs when using
queue 16 for user-mode bonding:

kernel: bond0 selects TX queue 16, but real number of TX queues is 16

These messages were showing up since the code for setting queues
presumed that queues 1-16 were usable and queue 0 was reserved, but
parts of the code didn't account for this correctly.  This update now
allows 16 queues (numbered 0-15) to be used (in the nominal case) and
queue awareness can be completely and clearly disabled rather than by
simply setting the queue for a device to 0.

Configuration syntax has changed a little, but is similar to the
existing method.  To associate a queue with an output device, you can do
this:

# echo "+eth1:2" > /sys/class/net/bond0/bonding/queue_id

and to clear it this:

# echo "-eth1:2" > /sys/class/net/bond0/bonding/queue_id

The orginal report that prompted this patch changed bond_select_queue to
return a value different than sinply skb->queue_mapping.  This should
now be a proper return value since the interal queues were remapped
internally and queues 0-15 are the usable queues.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Phil Oester <kernel@linuxace.com>

---

My tests all seem to work well, but more testing/feedback is obviously
appreciated.

---
 Documentation/networking/bonding.txt |   20 ++++++----
 drivers/net/bonding/bond_main.c      |   16 ++++---
 drivers/net/bonding/bond_sysfs.c     |   71 +++++++++++++++++++++------------
 drivers/net/bonding/bonding.h        |   20 +++++++++
 4 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 25d2f41..8a27f0f 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1404,10 +1404,11 @@ By default the bonding driver is multiqueue aware and 16 queues are created
 when the driver initializes (see Documentation/networking/multiqueue.txt
 for details).  If more or less queues are desired the module parameter
 tx_queues can be used to change this value.  There is no sysfs parameter
-available as the allocation is done at module init time.
+available as the allocation is done at module init time.  Currently only one
+queue per device can be set.
 
 The output of the file /proc/net/bonding/bondX has changed so the output Queue
-ID is now printed for each slave:
+ID is now printed for each slave that has a queue configured:
 
 Bonding Mode: fault-tolerance (active-backup)
 Primary Slave: None
@@ -1421,7 +1422,6 @@ Slave Interface: eth0
 MII Status: up
 Link Failure Count: 0
 Permanent HW addr: 00:1a:a0:12:8f:cb
-Slave queue ID: 0
 
 Slave Interface: eth1
 MII Status: up
@@ -1431,12 +1431,16 @@ Slave queue ID: 2
 
 The queue_id for a slave can be set using the command:
 
-# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id
+# echo "+eth1:2" > /sys/class/net/bond0/bonding/queue_id
 
-Any interface that needs a queue_id set should set it with multiple calls
-like the one above until proper priorities are set for all interfaces.  On
-distributions that allow configuration via initscripts, multiple 'queue_id'
-arguments can be added to BONDING_OPTS to set all needed slave queues.
+and removed using the command:
+
+# echo "-eth1:2" > /sys/class/net/bond0/bonding/queue_id
+
+Any interface that needs a queue_id set should set it with multiple calls until
+proper priorities are set or cleared for all interfaces.  On distributions that
+allow configuration via initscripts, multiple 'queue_id' arguments can be added
+to BONDING_OPTS to set all needed slave queues.
 
 These queue id's can be used in conjunction with the tc utility to configure
 a multiqueue qdisc and filters to bias certain traffic to transmit on certain
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77e3c6a..c9723d6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1560,8 +1560,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	/*
-	 * Set the new_slave's queue_id to be zero.  Queue ID mapping
-	 * is set via sysfs or module option if desired.
+	 * Set the new_slave's queue_id to be zero to disable.  Queue ID
+	 * mapping is set via sysfs.
 	 */
 	new_slave->queue_id = 0;
 
@@ -3355,7 +3355,8 @@ static void bond_info_show_slave(struct seq_file *seq,
 		else
 			seq_puts(seq, "Aggregator ID: N/A\n");
 	}
-	seq_printf(seq, "Slave queue ID: %d\n", slave->queue_id);
+	if (slave_tx_queue_recorded(slave))
+		seq_printf(seq, "Slave queue ID: %d\n", slave_get_tx_queue(slave));
 }
 
 static int bond_info_seq_show(struct seq_file *seq, void *v)
@@ -4516,15 +4517,16 @@ static inline int bond_slave_override(struct bonding *bond,
 
 	/* Find out if any slaves have the same mapping as this skb. */
 	bond_for_each_slave(bond, check_slave, i) {
-		if (check_slave->queue_id == skb->queue_mapping) {
+		if (slave_tx_queue_recorded(check_slave) &&
+		    slave_get_tx_queue(check_slave) == skb->queue_mapping) {
 			slave = check_slave;
 			break;
 		}
 	}
 
 	/* If the slave isn't UP, use default transmit policy. */
-	if (slave && slave->queue_id && IS_UP(slave->dev) &&
-	    (slave->link == BOND_LINK_UP)) {
+	if (slave && slave_tx_queue_recorded(slave) &&
+	    IS_UP(slave->dev) && (slave->link == BOND_LINK_UP)) {
 		res = bond_dev_queue_xmit(bond, skb, slave->dev);
 	}
 
@@ -4537,7 +4539,7 @@ static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	/*
 	 * This helper function exists to help dev_pick_tx get the correct
-	 * destination queue.  Using a helper function skips the a call to
+	 * destination queue.  Using a helper function skips the call to
 	 * skb_tx_hash and will put the skbs in the queue we expect on their
 	 * way down to the bonding driver.
 	 */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 72bb0f6..d09701f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1448,15 +1448,18 @@ static ssize_t bonding_show_queue_id(struct device *d,
 
 	read_lock(&bond->lock);
 	bond_for_each_slave(bond, slave, i) {
-		if (res > (PAGE_SIZE - IFNAMSIZ - 6)) {
-			/* not enough space for another interface_name:queue_id pair */
-			if ((PAGE_SIZE - res) > 10)
-				res = PAGE_SIZE - 10;
-			res += sprintf(buf + res, "++more++ ");
-			break;
+		if (slave_tx_queue_recorded(slave)) {
+			if (res > (PAGE_SIZE - IFNAMSIZ - 6)) {
+				/* Not enough space for another
+				 * interface_name:queue_id pair. */
+				if ((PAGE_SIZE - res) > 10)
+					res = PAGE_SIZE - 10;
+				res += sprintf(buf + res, "++more++ ");
+				break;
+			}
+			res += sprintf(buf + res, "%s:%d ", slave->dev->name,
+				       slave_get_tx_queue(slave));
 		}
-		res += sprintf(buf + res, "%s:%d ",
-			       slave->dev->name, slave->queue_id);
 	}
 	read_unlock(&bond->lock);
 	if (res)
@@ -1479,55 +1482,71 @@ static ssize_t bonding_store_queue_id(struct device *d,
 	int i, ret = count;
 	char *delim;
 	struct net_device *sdev = NULL;
+	char command[IFNAMSIZ + 7] = {0, };
+	char *ifname;
 
 	if (!rtnl_trylock())
 		return restart_syscall();
 
+	sscanf(buffer, "%22s", command); /* IFNAMSIZ*/
+
 	/* delim will point to queue id if successful */
-	delim = strchr(buffer, ':');
+	delim = strchr(command + 1, ':');
 	if (!delim)
 		goto err_no_cmd;
-
 	/*
 	 * Terminate string that points to device name and bump it
 	 * up one, so we can read the queue id there.
 	 */
 	*delim = '\0';
-	if (sscanf(++delim, "%hd\n", &qid) != 1)
+	if (sscanf(++delim, "%hd\n", &qid) != 1) {
 		goto err_no_cmd;
+	}
 
-	/* Check buffer length, valid ifname and queue id */
-	if (strlen(buffer) > IFNAMSIZ ||
-	    !dev_valid_name(buffer) ||
-	    qid > bond->params.tx_queues)
+	/* ifname will now be command + 1 */
+	ifname = command + 1;
+	if ((strlen(command) <= 1) ||
+	    !dev_valid_name(ifname) ||
+	    qid >= bond->params.tx_queues)
 		goto err_no_cmd;
 
 	/* Get the pointer to that interface if it exists */
-	sdev = __dev_get_by_name(dev_net(bond->dev), buffer);
+	sdev = __dev_get_by_name(dev_net(bond->dev), ifname);
 	if (!sdev)
 		goto err_no_cmd;
 
 	read_lock(&bond->lock);
 
-	/* Search for thes slave and check for duplicate qids */
+	/* Search for the slave needing the change */
 	update_slave = NULL;
 	bond_for_each_slave(bond, slave, i) {
 		if (sdev == slave->dev)
-			/*
-			 * We don't need to check the matching
-			 * slave for dups, since we're overwriting it
-			 */
 			update_slave = slave;
-		else if (qid && qid == slave->queue_id) {
+		else if (slave_tx_queue_recorded(slave) &&
+			 qid == slave_get_tx_queue(slave)) {
 			goto err_no_cmd_unlock;
 		}
 	}
-
 	if (!update_slave)
 		goto err_no_cmd_unlock;
 
-	/* Actually set the qids for the slave */
-	update_slave->queue_id = qid;
+	/* at this point we have a valid slave */
+	switch (command[0]) {
+	case '+':
+		if (slave_tx_queue_recorded(update_slave))
+			goto err_no_cmd_unlock;
+		slave_record_tx_queue(update_slave, qid);
+		break;
+	case '-':
+		if (!slave_tx_queue_recorded(update_slave) ||
+		    slave_get_tx_queue(update_slave) != qid)
+			goto err_no_cmd_unlock;
+		slave_clear_tx_queue(update_slave);
+		break;
+
+	default:
+		goto err_no_cmd;
+	}
 
 	read_unlock(&bond->lock);
 out:
@@ -1537,7 +1556,7 @@ out:
 err_no_cmd_unlock:
 	read_unlock(&bond->lock);
 err_no_cmd:
-	pr_info("invalid input for queue_id set for %s.\n",
+	pr_info("cannot modify queue_id for %s.\n",
 		bond->dev->name);
 	ret = -EPERM;
 	goto out;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 31fe980..c9059f4 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -422,4 +422,24 @@ static inline void bond_unregister_ipv6_notifier(void)
 }
 #endif
 
+static inline void slave_record_tx_queue(struct slave *slave, u16 tx_queue)
+{
+	slave->queue_id = tx_queue + 1;
+}
+
+static inline u16 slave_get_tx_queue(const struct slave *slave)
+{
+	return slave->queue_id - 1;
+}
+
+static inline bool slave_tx_queue_recorded(const struct slave *slave)
+{
+	return slave->queue_id != 0;
+}
+
+static inline void slave_clear_tx_queue(struct slave *slave)
+{
+	slave->queue_id = 0;
+}
+
 #endif /* _LINUX_BONDING_H */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next-2.6] bonding: fix user-controlled queuing issues
  2011-02-21 22:53 [PATCH net-next-2.6] bonding: fix user-controlled queuing issues Andy Gospodarek
@ 2011-02-22 15:55 ` Phil Oester
  2011-02-22 16:19   ` Andy Gospodarek
  0 siblings, 1 reply; 5+ messages in thread
From: Phil Oester @ 2011-02-22 15:55 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, Ben Hutchings, Jay Vosburgh

On Mon, Feb 21, 2011 at 05:53:03PM -0500, Andy Gospodarek wrote:
> Users noticed the following messages were filling their logs when using
> queue 16 for user-mode bonding:
> 
> kernel: bond0 selects TX queue 16, but real number of TX queues is 16
> 
> ---
> 
> My tests all seem to work well, but more testing/feedback is obviously
> appreciated.

Sorry, this patch does not fix the issue:

Feb 22 10:46:50 foo kernel: bond0 selects TX queue 16, but real number of TX queues is 16

Adding the below debugging patch confirms:

--- a/net/core/dev.c.orig       2011-02-17 11:42:09.110280300 -0500
+++ b/net/core/dev.c    2011-02-17 11:40:42.110280300 -0500
@@ -2060,6 +2060,11 @@
        int queue_index;
        const struct net_device_ops *ops = dev->netdev_ops;
 
+                if (net_ratelimit()) {
+                        pr_warning("dev_pick_tx: %s queue_mapping=%d, real_num=%d\n",
+                                dev->name, skb->queue_mapping, dev->real_num_tx_queues);
+                }
+
        if (ops->ndo_select_queue) {
                queue_index = ops->ndo_select_queue(dev, skb);
                queue_index = dev_cap_txqueue(dev, queue_index);


Produces a number of these in the logs:

Feb 22 10:46:50 foo kernel: dev_pick_tx: bond0 queue_mapping=16, real_num=16

Phil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next-2.6] bonding: fix user-controlled queuing issues
  2011-02-22 15:55 ` Phil Oester
@ 2011-02-22 16:19   ` Andy Gospodarek
  2011-02-22 16:48     ` Phil Oester
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Gospodarek @ 2011-02-22 16:19 UTC (permalink / raw)
  To: Phil Oester; +Cc: Andy Gospodarek, netdev, Ben Hutchings, Jay Vosburgh

On Tue, Feb 22, 2011 at 07:55:18AM -0800, Phil Oester wrote:
> On Mon, Feb 21, 2011 at 05:53:03PM -0500, Andy Gospodarek wrote:
> > Users noticed the following messages were filling their logs when using
> > queue 16 for user-mode bonding:
> > 
> > kernel: bond0 selects TX queue 16, but real number of TX queues is 16
> > 
> > ---
> > 
> > My tests all seem to work well, but more testing/feedback is obviously
> > appreciated.
> 
> Sorry, this patch does not fix the issue:
> 
> Feb 22 10:46:50 foo kernel: bond0 selects TX queue 16, but real number of TX queues is 16
> 
> Adding the below debugging patch confirms:
> 
> --- a/net/core/dev.c.orig       2011-02-17 11:42:09.110280300 -0500
> +++ b/net/core/dev.c    2011-02-17 11:40:42.110280300 -0500
> @@ -2060,6 +2060,11 @@
>         int queue_index;
>         const struct net_device_ops *ops = dev->netdev_ops;
>  
> +                if (net_ratelimit()) {
> +                        pr_warning("dev_pick_tx: %s queue_mapping=%d, real_num=%d\n",
> +                                dev->name, skb->queue_mapping, dev->real_num_tx_queues);
> +                }
> +
>         if (ops->ndo_select_queue) {
>                 queue_index = ops->ndo_select_queue(dev, skb);
>                 queue_index = dev_cap_txqueue(dev, queue_index);
> 
> 
> Produces a number of these in the logs:
> 
> Feb 22 10:46:50 foo kernel: dev_pick_tx: bond0 queue_mapping=16, real_num=16
> 

Phil,

Can you send me the minimal set of tc rules that selects output queue 16
and the output of /proc/net/bonding/bond0?

Private email is fine if you do not want to post it to the list.

Thanks,

-andy


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next-2.6] bonding: fix user-controlled queuing issues
  2011-02-22 16:19   ` Andy Gospodarek
@ 2011-02-22 16:48     ` Phil Oester
  2011-02-22 21:24       ` Andy Gospodarek
  0 siblings, 1 reply; 5+ messages in thread
From: Phil Oester @ 2011-02-22 16:48 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, Ben Hutchings, Jay Vosburgh

On Tue, Feb 22, 2011 at 11:19:41AM -0500, Andy Gospodarek wrote:
> Phil,
> 
> Can you send me the minimal set of tc rules that selects output queue 16
> and the output of /proc/net/bonding/bond0?
> 
> Private email is fine if you do not want to post it to the list.
> 
> Thanks,
> 
> -andy

I have no tc rules which select output queue 16 (I have no tc rules at all
in fact). 

Output of /proc/net/bonding/bond0 below.  The bond consists of two Intel
igb nics, which only have 8 queues.  Note, however, that eth0 is an ixgbe,
which has 16 queues:

ixgbe 0000:0b:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16

which may answer your question as to what is selecting queue 16.  

Phil 


Ethernet Channel Bonding Driver: v3.7.0 (June 2, 2010)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 17
        Partner Key: 6
        Partner Mac Address: 00:d0:05:xx:xx:xx

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:xx:xx:xx
Aggregator ID: 1

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1b:21:xx:xx:xx
Aggregator ID: 1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next-2.6] bonding: fix user-controlled queuing issues
  2011-02-22 16:48     ` Phil Oester
@ 2011-02-22 21:24       ` Andy Gospodarek
  0 siblings, 0 replies; 5+ messages in thread
From: Andy Gospodarek @ 2011-02-22 21:24 UTC (permalink / raw)
  To: Phil Oester; +Cc: Andy Gospodarek, netdev, Ben Hutchings, Jay Vosburgh

On Tue, Feb 22, 2011 at 08:48:36AM -0800, Phil Oester wrote:
> On Tue, Feb 22, 2011 at 11:19:41AM -0500, Andy Gospodarek wrote:
> > Phil,
> > 
> > Can you send me the minimal set of tc rules that selects output queue 16
> > and the output of /proc/net/bonding/bond0?
> > 
> > Private email is fine if you do not want to post it to the list.
> > 
> > Thanks,
> > 
> > -andy
> 
> I have no tc rules which select output queue 16 (I have no tc rules at all
> in fact). 
> 
> Output of /proc/net/bonding/bond0 below.  The bond consists of two Intel
> igb nics, which only have 8 queues.  Note, however, that eth0 is an ixgbe,
> which has 16 queues:
> 
> ixgbe 0000:0b:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16
> 
> which may answer your question as to what is selecting queue 16.  
> 
> Phil 
> 

OK, I've got it now!  I can reproduce this when I recieve frames from a
device not in the bond and transmit those frames out of the bond device.
I suspect that is what you are doing.  That was a case I did not
consider with the original patch.

I will test all of this and post this as a 2-part series with the first
patch being one that should apply to stable as well, so you and others
can have it there.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-22 21:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-21 22:53 [PATCH net-next-2.6] bonding: fix user-controlled queuing issues Andy Gospodarek
2011-02-22 15:55 ` Phil Oester
2011-02-22 16:19   ` Andy Gospodarek
2011-02-22 16:48     ` Phil Oester
2011-02-22 21:24       ` Andy Gospodarek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).