[NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent.
@ 2008-09-12  3:07 Jeff Kirsher
  2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jeff Kirsher @ 2008-09-12  3:07 UTC (permalink / raw)
  To: davem; +Cc: jeff, netdev, akpm, Alexander Duyck, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

The vlan devices are not reading the gso max size of the parent device.  As
a result devices that do not support 64K max gso size are currently
failing.

This issue is seen on 2.6.26 kernels as well and the same patch should be
able to be applied without any issues.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 net/8021q/vlan.c     |    1 +
 net/8021q/vlan_dev.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index b661f47..f0e335a 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -394,6 +394,7 @@ static void vlan_transfer_features(struct net_device *dev,
 
 	vlandev->features &= ~dev->vlan_features;
 	vlandev->features |= dev->features & dev->vlan_features;
+	vlandev->gso_max_size = dev->gso_max_size;
 
 	if (old_features != vlandev->features)
 		netdev_features_change(vlandev);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4bf014e..97688cd 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -607,6 +607,7 @@ static int vlan_dev_init(struct net_device *dev)
 		      (1<<__LINK_STATE_PRESENT);
 
 	dev->features |= real_dev->features & real_dev->vlan_features;
+	dev->gso_max_size = real_dev->gso_max_size;
 
 	/* ipv6 shared card related stuff */
 	dev->dev_id = real_dev->dev_id;


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support
  2008-09-12  3:07 [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent Jeff Kirsher
@ 2008-09-12  3:08 ` Jeff Kirsher
  2008-09-12  3:19   ` David Miller
  2008-09-12 22:42   ` Jarek Poplawski
  2008-09-12  3:08 ` [NET-NEXT PATCH 3/3][UPDATED] pkt_action: add new action skbedit Jeff Kirsher
  2008-09-12  3:17 ` [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent David Miller
  2 siblings, 2 replies; 8+ messages in thread
From: Jeff Kirsher @ 2008-09-12  3:08 UTC (permalink / raw)
  To: davem; +Cc: jeff, netdev, akpm, Alexander Duyck, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is intended to add a qdisc to support the new tx multiqueue
architecture by providing a band for each hardware queue.  By doing
this it is possible to support a different qdisc per physical hardware
queue.

This qdisc uses the skb->queue_mapping to select which band to place
the traffic onto.  It then uses a round robin w/ a check to see if the
subqueue is stopped to determine which band to dequeue the packet from.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 Documentation/networking/multiqueue.txt |   47 +++
 include/linux/pkt_sched.h               |    7 
 net/sched/Kconfig                       |    9 +
 net/sched/Makefile                      |    1 
 net/sched/sch_multiq.c                  |  469 +++++++++++++++++++++++++++++++
 5 files changed, 532 insertions(+), 1 deletions(-)
 create mode 100644 net/sched/sch_multiq.c

diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
index d391ea6..5787ee6 100644
--- a/Documentation/networking/multiqueue.txt
+++ b/Documentation/networking/multiqueue.txt
@@ -24,4 +24,49 @@ netif_{start|stop|wake}_subqueue() functions to manage each queue while the
 device is still operational.  netdev->queue_lock is still used when the device
 comes online or when it's completely shut down (unregister_netdev(), etc.).
 
-Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
+
+Section 2: Qdisc support for multiqueue devices
+
+-----------------------------------------------
+
+Currently two qdiscs support multiqueue devices.  The first is the default
+pfifo_fast qdisc.  This qdisc supports one qdisc per hardware queue.  A new
+round-robin qdisc, sch_multiq also supports multiple hardware queues. The
+qdisc is responsible for classifying the skb's and then directing the skb's to
+bands and queues based on the value in skb->queue_mapping.  Use this field in
+the base driver to determine which queue to send the skb to.
+
+sch_multiq has been added for hardware that wishes to avoid unnecessary
+requeuing.  It will cycle though the bands and verify that the hardware queue
+associated with the band is not stopped prior to dequeuing a packet.
+
+On qdisc load, the number of bands is based on the number of queues on the
+hardware.  Once the association is made, any skb with skb->queue_mapping set,
+will be queued to the band associated with the hardware queue.
+
+
+Section 3: Brief howto using MULTIQ for multiqueue devices
+---------------------------------------------------------------
+
+The userspace command 'tc,' part of the iproute2 package, is used to configure
+qdiscs.  To add the MULTIQ qdisc to your network device, assuming the device
+is called eth0, run the following command:
+
+# tc qdisc add dev eth0 root handle 1: multiq
+
+The qdisc will allocate the number of bands to equal the number of queues that
+the device reports, and bring the qdisc online.  Assuming eth0 has 4 Tx
+queues, the band mapping would look like:
+
+band 0 => queue 0
+band 1 => queue 1
+band 2 => queue 2
+band 3 => queue 3
+
+Traffic will begin flowing through each queue if your base device has either
+the default simple_tx_hash or a custom netdev->select_queue() defined.
+
+The behavior of tc filters remains the same.
+
+Author: Alexander Duyck <alexander.h.duyck@intel.com>
+Original Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index e5de421..5d921fa 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -123,6 +123,13 @@ struct tc_prio_qopt
 	__u8	priomap[TC_PRIO_MAX+1];	/* Map: logical priority -> PRIO band */
 };
 
+/* MULTIQ section */
+
+struct tc_multiq_qopt {
+	__u16	bands;			/* Number of bands */
+	__u16	max_bands;		/* Maximum number of queues */
+};
+
 /* TBF section */
 
 struct tc_tbf_qopt
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 9437b27..efaa7a7 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -106,6 +106,15 @@ config NET_SCH_PRIO
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_prio.
 
+config NET_SCH_MULTIQ
+	tristate "Hardware Multiqueue-aware Multi Band Queuing (MULTIQ)"
+	---help---
+	  Say Y here if you want to use an n-band queue packet scheduler
+	  to support devices that have multiple hardware transmit queues.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_multiq.
+
 config NET_SCH_RED
 	tristate "Random Early Detection (RED)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 1d2b0f7..3d9b953 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_NET_SCH_SFQ)	+= sch_sfq.o
 obj-$(CONFIG_NET_SCH_TBF)	+= sch_tbf.o
 obj-$(CONFIG_NET_SCH_TEQL)	+= sch_teql.o
 obj-$(CONFIG_NET_SCH_PRIO)	+= sch_prio.o
+obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_multiq.o
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
new file mode 100644
index 0000000..ce00df4
--- /dev/null
+++ b/net/sched/sch_multiq.c
@@ -0,0 +1,469 @@
+/*
+ * Copyright (c) 2008, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Alexander Duyck <alexander.h.duyck@intel.com>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+
+
+struct multiq_sched_data {
+	u16 bands;
+	u16 max_bands;
+	u16 curband;
+	struct tcf_proto *filter_list;
+	struct Qdisc **queues;
+};
+
+
+static struct Qdisc *
+multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	u32 band;
+	struct tcf_result res;
+	int err;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	err = tc_classify(skb, q->filter_list, &res);
+#ifdef CONFIG_NET_CLS_ACT
+	switch (err) {
+	case TC_ACT_STOLEN:
+	case TC_ACT_QUEUED:
+		*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+	case TC_ACT_SHOT:
+		return NULL;
+	}
+#endif
+	band = skb_get_queue_mapping(skb);
+
+	if (band >= q->bands)
+		return q->queues[0];
+
+	return q->queues[band];
+}
+
+static int
+multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct Qdisc *qdisc;
+	int ret;
+
+	qdisc = multiq_classify(skb, sch, &ret);
+#ifdef CONFIG_NET_CLS_ACT
+	if (qdisc == NULL) {
+
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+#endif
+
+	ret = qdisc_enqueue(skb, qdisc);
+	if (ret == NET_XMIT_SUCCESS) {
+		sch->bstats.bytes += qdisc_pkt_len(skb);
+		sch->bstats.packets++;
+		sch->q.qlen++;
+		return NET_XMIT_SUCCESS;
+	}
+	if (net_xmit_drop_count(ret))
+		sch->qstats.drops++;
+	return ret;
+}
+
+
+static int
+multiq_requeue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct Qdisc *qdisc;
+	int ret;
+
+	qdisc = multiq_classify(skb, sch, &ret);
+#ifdef CONFIG_NET_CLS_ACT
+	if (qdisc == NULL) {
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+#endif
+
+	ret = qdisc->ops->requeue(skb, qdisc);
+	if (ret == NET_XMIT_SUCCESS) {
+		sch->q.qlen++;
+		sch->qstats.requeues++;
+		return NET_XMIT_SUCCESS;
+	}
+	if (net_xmit_drop_count(ret))
+		sch->qstats.drops++;
+	return ret;
+}
+
+
+static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *qdisc;
+	struct sk_buff *skb;
+	int band;
+
+	for (band = 0; band < q->bands; band++) {
+		/* cycle through bands to ensure fairness */
+		q->curband++;
+		if (q->curband >= q->bands)
+			q->curband = 0;
+
+		/* Check that target subqueue is available before
+		 * pulling an skb to avoid excessive requeues
+		 */
+		if (!__netif_subqueue_stopped(qdisc_dev(sch), q->curband)) {
+			qdisc = q->queues[q->curband];
+			skb = qdisc->dequeue(qdisc);
+			if (skb) {
+				sch->q.qlen--;
+				return skb;
+			}
+		}
+	}
+	return NULL;
+
+}
+
+static unsigned int multiq_drop(struct Qdisc *sch)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	int band;
+	unsigned int len;
+	struct Qdisc *qdisc;
+
+	for (band = q->bands-1; band >= 0; band--) {
+		qdisc = q->queues[band];
+		if (qdisc->ops->drop) {
+			len = qdisc->ops->drop(qdisc);
+			if (len != 0) {
+				sch->q.qlen--;
+				return len;
+			}
+		}
+	}
+	return 0;
+}
+
+
+static void
+multiq_reset(struct Qdisc *sch)
+{
+	u16 band;
+	struct multiq_sched_data *q = qdisc_priv(sch);
+
+	for (band = 0; band < q->bands; band++)
+		qdisc_reset(q->queues[band]);
+	sch->q.qlen = 0;
+	q->curband = 0;
+}
+
+static void
+multiq_destroy(struct Qdisc *sch)
+{
+	int band;
+	struct multiq_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	for (band = 0; band < q->bands; band++)
+		qdisc_destroy(q->queues[band]);
+
+	kfree(q->queues);
+}
+
+static int multiq_tune(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	struct tc_multiq_qopt *qopt;
+	int i;
+
+	if (sch->parent != TC_H_ROOT)
+		return -EINVAL;
+	if (!netif_is_multiqueue(qdisc_dev(sch)))
+		return -EINVAL;
+	if (nla_len(opt) < sizeof(*qopt))
+		return -EINVAL;
+
+	qopt = nla_data(opt);
+
+	qopt->bands = qdisc_dev(sch)->real_num_tx_queues;
+
+	sch_tree_lock(sch);
+	q->bands = qopt->bands;
+	for (i = q->bands; i < q->max_bands; i++) {
+		struct Qdisc *child = xchg(&q->queues[i], &noop_qdisc);
+		if (child != &noop_qdisc) {
+			qdisc_tree_decrease_qlen(child, child->q.qlen);
+			qdisc_destroy(child);
+		}
+	}
+
+	sch_tree_unlock(sch);
+
+	for (i = 0; i < q->bands; i++) {
+		if (q->queues[i] == &noop_qdisc) {
+			struct Qdisc *child;
+			child = qdisc_create_dflt(qdisc_dev(sch),
+						  sch->dev_queue,
+						  &pfifo_qdisc_ops,
+						  TC_H_MAKE(sch->handle,
+							    i + 1));
+			if (child) {
+				sch_tree_lock(sch);
+				child = xchg(&q->queues[i], child);
+
+				if (child != &noop_qdisc) {
+					qdisc_tree_decrease_qlen(child,
+								 child->q.qlen);
+					qdisc_destroy(child);
+				}
+				sch_tree_unlock(sch);
+			}
+		}
+	}
+	return 0;
+}
+
+static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	int i;
+
+	q->queues = NULL;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	q->max_bands = qdisc_dev(sch)->num_tx_queues;
+
+	q->queues = kcalloc(q->max_bands, sizeof(struct Qdisc *), GFP_KERNEL);
+	if (!q->queues)
+		return -ENOBUFS;
+	for (i = 0; i < q->max_bands; i++)
+		q->queues[i] = &noop_qdisc;
+
+	return multiq_tune(sch, opt);
+}
+
+static int multiq_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tc_multiq_qopt opt;
+
+	opt.bands = q->bands;
+	opt.max_bands = q->max_bands;
+
+	NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt);
+
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int multiq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		      struct Qdisc **old)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	unsigned long band = arg - 1;
+
+	if (band >= q->bands)
+		return -EINVAL;
+
+	if (new == NULL)
+		new = &noop_qdisc;
+
+	sch_tree_lock(sch);
+	*old = q->queues[band];
+	q->queues[band] = new;
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+
+	return 0;
+}
+
+static struct Qdisc *
+multiq_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	unsigned long band = arg - 1;
+
+	if (band >= q->bands)
+		return NULL;
+
+	return q->queues[band];
+}
+
+static unsigned long multiq_get(struct Qdisc *sch, u32 classid)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	unsigned long band = TC_H_MIN(classid);
+
+	if (band - 1 >= q->bands)
+		return 0;
+	return band;
+}
+
+static unsigned long multiq_bind(struct Qdisc *sch, unsigned long parent,
+				 u32 classid)
+{
+	return multiq_get(sch, classid);
+}
+
+
+static void multiq_put(struct Qdisc *q, unsigned long cl)
+{
+	return;
+}
+
+static int multiq_change(struct Qdisc *sch, u32 handle, u32 parent,
+			 struct nlattr **tca, unsigned long *arg)
+{
+	unsigned long cl = *arg;
+	struct multiq_sched_data *q = qdisc_priv(sch);
+
+	if (cl - 1 > q->bands)
+		return -ENOENT;
+	return 0;
+}
+
+static int multiq_delete(struct Qdisc *sch, unsigned long cl)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	if (cl - 1 > q->bands)
+		return -ENOENT;
+	return 0;
+}
+
+
+static int multiq_dump_class(struct Qdisc *sch, unsigned long cl,
+			     struct sk_buff *skb, struct tcmsg *tcm)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+
+	if (cl - 1 > q->bands)
+		return -ENOENT;
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	if (q->queues[cl-1])
+		tcm->tcm_info = q->queues[cl-1]->handle;
+	return 0;
+}
+
+static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
+				 struct gnet_dump *d)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *cl_q;
+
+	cl_q = q->queues[cl - 1];
+	if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 ||
+	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
+		return -1;
+
+	return 0;
+}
+
+static void multiq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+	int band;
+
+	if (arg->stop)
+		return;
+
+	for (band = 0; band < q->bands; band++) {
+		if (arg->count < arg->skip) {
+			arg->count++;
+			continue;
+		}
+		if (arg->fn(sch, band+1, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+		arg->count++;
+	}
+}
+
+static struct tcf_proto **multiq_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct multiq_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static const struct Qdisc_class_ops multiq_class_ops = {
+	.graft		=	multiq_graft,
+	.leaf		=	multiq_leaf,
+	.get		=	multiq_get,
+	.put		=	multiq_put,
+	.change		=	multiq_change,
+	.delete		=	multiq_delete,
+	.walk		=	multiq_walk,
+	.tcf_chain	=	multiq_find_tcf,
+	.bind_tcf	=	multiq_bind,
+	.unbind_tcf	=	multiq_put,
+	.dump		=	multiq_dump_class,
+	.dump_stats	=	multiq_dump_class_stats,
+};
+
+static struct Qdisc_ops multiq_qdisc_ops __read_mostly = {
+	.next		=	NULL,
+	.cl_ops		=	&multiq_class_ops,
+	.id		=	"multiq",
+	.priv_size	=	sizeof(struct multiq_sched_data),
+	.enqueue	=	multiq_enqueue,
+	.dequeue	=	multiq_dequeue,
+	.requeue	=	multiq_requeue,
+	.drop		=	multiq_drop,
+	.init		=	multiq_init,
+	.reset		=	multiq_reset,
+	.destroy	=	multiq_destroy,
+	.change		=	multiq_tune,
+	.dump		=	multiq_dump,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init multiq_module_init(void)
+{
+	return register_qdisc(&multiq_qdisc_ops);
+}
+
+static void __exit multiq_module_exit(void)
+{
+	unregister_qdisc(&multiq_qdisc_ops);
+}
+
+module_init(multiq_module_init)
+module_exit(multiq_module_exit)
+
+MODULE_LICENSE("GPL");


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [NET-NEXT PATCH 3/3][UPDATED] pkt_action: add new action skbedit
  2008-09-12  3:07 [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent Jeff Kirsher
  2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
@ 2008-09-12  3:08 ` Jeff Kirsher
  2008-09-12  3:17 ` [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Jeff Kirsher @ 2008-09-12  3:08 UTC (permalink / raw)
  To: davem; +Cc: jeff, netdev, akpm, Alexander Duyck, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This new action will have the ability to change the priority and/or
queue_mapping fields on an sk_buff.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 Documentation/networking/multiqueue.txt |    9 +
 include/linux/tc_act/Kbuild             |    1 
 include/linux/tc_act/tc_skbedit.h       |   44 +++++++
 include/net/tc_act/tc_skbedit.h         |   34 +++++
 net/sched/Kconfig                       |   11 ++
 net/sched/Makefile                      |    1 
 net/sched/act_skbedit.c                 |  203 +++++++++++++++++++++++++++++++
 7 files changed, 302 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/tc_act/tc_skbedit.h
 create mode 100644 include/net/tc_act/tc_skbedit.h
 create mode 100644 net/sched/act_skbedit.c

diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
index 5787ee6..10113ff 100644
--- a/Documentation/networking/multiqueue.txt
+++ b/Documentation/networking/multiqueue.txt
@@ -66,7 +66,14 @@ band 3 => queue 3
 Traffic will begin flowing through each queue if your base device has either
 the default simple_tx_hash or a custom netdev->select_queue() defined.
 
-The behavior of tc filters remains the same.
+The behavior of tc filters remains the same.  However a new tc action,
+skbedit, has been added.  Assuming you wanted to route all traffic to a
+specific host, for example 192.168.0.3, though a specific queue you could use
+this action and establish a filter such as:
+
+tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
+	match ip dst 192.168.0.3 \
+	action skbedit queue_mapping 3
 
 Author: Alexander Duyck <alexander.h.duyck@intel.com>
 Original Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/include/linux/tc_act/Kbuild b/include/linux/tc_act/Kbuild
index 6dac0d7..7699093 100644
--- a/include/linux/tc_act/Kbuild
+++ b/include/linux/tc_act/Kbuild
@@ -3,3 +3,4 @@ header-y += tc_ipt.h
 header-y += tc_mirred.h
 header-y += tc_pedit.h
 header-y += tc_nat.h
+header-y += tc_skbedit.h
diff --git a/include/linux/tc_act/tc_skbedit.h b/include/linux/tc_act/tc_skbedit.h
new file mode 100644
index 0000000..a14e461
--- /dev/null
+++ b/include/linux/tc_act/tc_skbedit.h
@@ -0,0 +1,44 @@
+/*
+ * Copyright (c) 2008, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Alexander Duyck <alexander.h.duyck@intel.com>
+ */
+
+#ifndef __LINUX_TC_SKBEDIT_H
+#define __LINUX_TC_SKBEDIT_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_SKBEDIT 11
+
+#define SKBEDIT_F_PRIORITY		0x1
+#define SKBEDIT_F_QUEUE_MAPPING		0x2
+
+struct tc_skbedit {
+	tc_gen;
+};
+
+enum {
+	TCA_SKBEDIT_UNSPEC,
+	TCA_SKBEDIT_TM,
+	TCA_SKBEDIT_PARMS,
+	TCA_SKBEDIT_PRIORITY,
+	TCA_SKBEDIT_QUEUE_MAPPING,
+	__TCA_SKBEDIT_MAX
+};
+#define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
+
+#endif
diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
new file mode 100644
index 0000000..6abb3ed
--- /dev/null
+++ b/include/net/tc_act/tc_skbedit.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2008, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Alexander Duyck <alexander.h.duyck@intel.com>
+ */
+
+#ifndef __NET_TC_SKBEDIT_H
+#define __NET_TC_SKBEDIT_H
+
+#include <net/act_api.h>
+
+struct tcf_skbedit {
+	struct tcf_common	common;
+	u32			flags;
+	u32     		priority;
+	u16			queue_mapping;
+};
+#define to_skbedit(pc) \
+	container_of(pc, struct tcf_skbedit, common)
+
+#endif /* __NET_TC_SKBEDIT_H */
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index efaa7a7..6767e54 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -485,6 +485,17 @@ config NET_ACT_SIMP
 	  To compile this code as a module, choose M here: the
 	  module will be called simple.
 
+config NET_ACT_SKBEDIT
+        tristate "SKB Editing"
+        depends on NET_CLS_ACT
+        ---help---
+	  Say Y here to change skb priority or queue_mapping settings.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called skbedit.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 3d9b953..e60c992 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_NET_ACT_IPT)	+= act_ipt.o
 obj-$(CONFIG_NET_ACT_NAT)	+= act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
+obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
new file mode 100644
index 0000000..fe9777e
--- /dev/null
+++ b/net/sched/act_skbedit.c
@@ -0,0 +1,203 @@
+/*
+ * Copyright (c) 2008, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Alexander Duyck <alexander.h.duyck@intel.com>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+
+#include <linux/tc_act/tc_skbedit.h>
+#include <net/tc_act/tc_skbedit.h>
+
+#define SKBEDIT_TAB_MASK     15
+static struct tcf_common *tcf_skbedit_ht[SKBEDIT_TAB_MASK + 1];
+static u32 skbedit_idx_gen;
+static DEFINE_RWLOCK(skbedit_lock);
+
+static struct tcf_hashinfo skbedit_hash_info = {
+	.htab	=	tcf_skbedit_ht,
+	.hmask	=	SKBEDIT_TAB_MASK,
+	.lock	=	&skbedit_lock,
+};
+
+static int tcf_skbedit(struct sk_buff *skb, struct tc_action *a,
+		       struct tcf_result *res)
+{
+	struct tcf_skbedit *d = a->priv;
+
+	spin_lock(&d->tcf_lock);
+	d->tcf_tm.lastuse = jiffies;
+	d->tcf_bstats.bytes += qdisc_pkt_len(skb);
+	d->tcf_bstats.packets++;
+
+	if (d->flags & SKBEDIT_F_PRIORITY)
+		skb->priority = d->priority;
+	if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
+	    skb->dev->real_num_tx_queues > d->queue_mapping)
+		skb_set_queue_mapping(skb, d->queue_mapping);
+
+	spin_unlock(&d->tcf_lock);
+	return d->tcf_action;
+}
+
+static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
+	[TCA_SKBEDIT_PARMS]		= { .len = sizeof(struct tc_skbedit) },
+	[TCA_SKBEDIT_PRIORITY]		= { .len = sizeof(u32) },
+	[TCA_SKBEDIT_QUEUE_MAPPING]	= { .len = sizeof(u16) },
+};
+
+static int tcf_skbedit_init(struct nlattr *nla, struct nlattr *est,
+			 struct tc_action *a, int ovr, int bind)
+{
+	struct nlattr *tb[TCA_SKBEDIT_MAX + 1];
+	struct tc_skbedit *parm;
+	struct tcf_skbedit *d;
+	struct tcf_common *pc;
+	u32 flags = 0, *priority = NULL;
+	u16 *queue_mapping = NULL;
+	int ret = 0, err;
+
+	if (nla == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_SKBEDIT_MAX, nla, skbedit_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_SKBEDIT_PARMS] == NULL)
+		return -EINVAL;
+
+	if (tb[TCA_SKBEDIT_PRIORITY] != NULL) {
+		flags |= SKBEDIT_F_PRIORITY;
+		priority = nla_data(tb[TCA_SKBEDIT_PRIORITY]);
+	}
+
+	if (tb[TCA_SKBEDIT_QUEUE_MAPPING] != NULL) {
+		flags |= SKBEDIT_F_QUEUE_MAPPING;
+		queue_mapping = nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING]);
+	}
+	if (!flags)
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
+
+	pc = tcf_hash_check(parm->index, a, bind, &skbedit_hash_info);
+	if (!pc) {
+		pc = tcf_hash_create(parm->index, est, a, sizeof(*d), bind,
+				     &skbedit_idx_gen, &skbedit_hash_info);
+		if (unlikely(!pc))
+			return -ENOMEM;
+
+		d = to_skbedit(pc);
+		ret = ACT_P_CREATED;
+	} else {
+		d = to_skbedit(pc);
+		if (!ovr) {
+			tcf_hash_release(pc, bind, &skbedit_hash_info);
+			return -EEXIST;
+		}
+	}
+
+	spin_lock_bh(&d->tcf_lock);
+
+	d->flags = flags;
+	if (flags & SKBEDIT_F_PRIORITY)
+		d->priority = *priority;
+	if (flags & SKBEDIT_F_QUEUE_MAPPING)
+		d->queue_mapping = *queue_mapping;
+	d->tcf_action = parm->action;
+
+	spin_unlock_bh(&d->tcf_lock);
+
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(pc, &skbedit_hash_info);
+	return ret;
+}
+
+static inline int tcf_skbedit_cleanup(struct tc_action *a, int bind)
+{
+	struct tcf_skbedit *d = a->priv;
+
+	if (d)
+		return tcf_hash_release(&d->common, bind, &skbedit_hash_info);
+	return 0;
+}
+
+static inline int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
+				int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_skbedit *d = a->priv;
+	struct tc_skbedit opt;
+	struct tcf_t t;
+
+	opt.index = d->tcf_index;
+	opt.refcnt = d->tcf_refcnt - ref;
+	opt.bindcnt = d->tcf_bindcnt - bind;
+	opt.action = d->tcf_action;
+	NLA_PUT(skb, TCA_SKBEDIT_PARMS, sizeof(opt), &opt);
+	if (d->flags & SKBEDIT_F_PRIORITY)
+		NLA_PUT(skb, TCA_SKBEDIT_PRIORITY, sizeof(d->priority),
+			&d->priority);
+	if (d->flags & SKBEDIT_F_QUEUE_MAPPING)
+		NLA_PUT(skb, TCA_SKBEDIT_QUEUE_MAPPING,
+			sizeof(d->queue_mapping), &d->queue_mapping);
+	t.install = jiffies_to_clock_t(jiffies - d->tcf_tm.install);
+	t.lastuse = jiffies_to_clock_t(jiffies - d->tcf_tm.lastuse);
+	t.expires = jiffies_to_clock_t(d->tcf_tm.expires);
+	NLA_PUT(skb, TCA_SKBEDIT_TM, sizeof(t), &t);
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static struct tc_action_ops act_skbedit_ops = {
+	.kind		=	"skbedit",
+	.hinfo		=	&skbedit_hash_info,
+	.type		=	TCA_ACT_SKBEDIT,
+	.capab		=	TCA_CAP_NONE,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_skbedit,
+	.dump		=	tcf_skbedit_dump,
+	.cleanup	=	tcf_skbedit_cleanup,
+	.init		=	tcf_skbedit_init,
+	.walk		=	tcf_generic_walker,
+};
+
+MODULE_AUTHOR("Alexander Duyck, <alexander.h.duyck@intel.com>");
+MODULE_DESCRIPTION("SKB Editing");
+MODULE_LICENSE("GPL");
+
+static int __init skbedit_init_module(void)
+{
+	return tcf_register_action(&act_skbedit_ops);
+}
+
+static void __exit skbedit_cleanup_module(void)
+{
+	tcf_unregister_action(&act_skbedit_ops);
+}
+
+module_init(skbedit_init_module);
+module_exit(skbedit_cleanup_module);


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent.
  2008-09-12  3:07 [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent Jeff Kirsher
  2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
  2008-09-12  3:08 ` [NET-NEXT PATCH 3/3][UPDATED] pkt_action: add new action skbedit Jeff Kirsher
@ 2008-09-12  3:17 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2008-09-12  3:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: jeff, netdev, akpm, alexander.h.duyck

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 11 Sep 2008 20:07:47 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> The vlan devices are not reading the gso max size of the parent device.  As
> a result devices that do not support 64K max gso size are currently
> failing.
> 
> This issue is seen on 2.6.26 kernels as well and the same patch should be
> able to be applied without any issues.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Good catch, applied to net-next-2.6

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support
  2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
@ 2008-09-12  3:19   ` David Miller
  2008-09-12  4:42     ` Alexander Duyck
  2008-09-12 22:42   ` Jarek Poplawski
  1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2008-09-12  3:19 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: jeff, netdev, akpm, alexander.h.duyck

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 11 Sep 2008 20:08:31 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch is intended to add a qdisc to support the new tx multiqueue
> architecture by providing a band for each hardware queue.  By doing
> this it is possible to support a different qdisc per physical hardware
> queue.
> 
> This qdisc uses the skb->queue_mapping to select which band to place
> the traffic onto.  It then uses a round robin w/ a check to see if the
> subqueue is stopped to determine which band to dequeue the packet from.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

I'm more and more agreeing with Jarek that the root-only restriction
is rediculious.

What if I want to do PRIO multiqueue for certain classes of traffic?

If I want to shoot myself in the foot trying to use a setup like that,
this thing damn well better let me.  You don't unilaterally know better
than the user.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support
  2008-09-12  3:19   ` David Miller
@ 2008-09-12  4:42     ` Alexander Duyck
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Duyck @ 2008-09-12  4:42 UTC (permalink / raw)
  To: David Miller; +Cc: jeffrey.t.kirsher, jeff, netdev, akpm, alexander.h.duyck

On Thu, Sep 11, 2008 at 8:19 PM, David Miller <davem@davemloft.net> wrote:
> I'm more and more agreeing with Jarek that the root-only restriction
> is rediculious.
>
> What if I want to do PRIO multiqueue for certain classes of traffic?
>
> If I want to shoot myself in the foot trying to use a setup like that,
> this thing damn well better let me.  You don't unilaterally know better
> than the user.

I will submit an additional patch in the next day or so to pull the
root-only requirement.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support
  2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
  2008-09-12  3:19   ` David Miller
@ 2008-09-12 22:42   ` Jarek Poplawski
  2008-09-12 23:33     ` Duyck, Alexander H
  1 sibling, 1 reply; 8+ messages in thread
From: Jarek Poplawski @ 2008-09-12 22:42 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, jeff, netdev, akpm, Alexander Duyck

Jeff Kirsher wrote, On 09/12/2008 05:08 AM:

> From: Alexander Duyck <alexander.h.duyck@intel.com>
...
> diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
> index d391ea6..5787ee6 100644
> --- a/Documentation/networking/multiqueue.txt
> +++ b/Documentation/networking/multiqueue.txt
> @@ -24,4 +24,49 @@ netif_{start|stop|wake}_subqueue() functions to manage each queue while the
>  device is still operational.  netdev->queue_lock is still used when the device
>  comes online or when it's completely shut down (unregister_netdev(), etc.).
>  
> -Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
> +
> +Section 2: Qdisc support for multiqueue devices
> +
> +-----------------------------------------------
> +
> +Currently two qdiscs support multiqueue devices.

This looks like a bit misleading: one could think it's necessary to use
one of these two with mq devices.

>  The first is the default
> +pfifo_fast qdisc.  This qdisc supports one qdisc per hardware queue.  A new
> +round-robin qdisc, sch_multiq also supports multiple hardware queues. The
> +qdisc is responsible for classifying the skb's and then directing the skb's to
> +bands and queues based on the value in skb->queue_mapping.  Use this field in
> +the base driver to determine which queue to send the skb to.
> +
> +sch_multiq has been added for hardware that wishes to avoid unnecessary
> +requeuing.

This could suggest that other qdiscs don't wish the same or even
generate excessive requeuing, which IMHO could be true, but it's not
proven, and might be only temporary situation anyway...

Anyway, I wonder why you don't mention here anything from your first
nice explantion to my doubts around the first attempt to restore
multiqueue to prio, like EEDC/DCB/etc?

>   It will cycle though the bands and verify that the hardware queue
> +associated with the band is not stopped prior to dequeuing a packet.
> +
> +On qdisc load, the number of bands is based on the number of queues on the
> +hardware.  Once the association is made, any skb with skb->queue_mapping set,
> +will be queued to the band associated with the hardware queue.
> +
> +
> +Section 3: Brief howto using MULTIQ for multiqueue devices
> +---------------------------------------------------------------
> +
> +The userspace command 'tc,' part of the iproute2 package, is used to configure
> +qdiscs.  To add the MULTIQ qdisc to your network device, assuming the device
> +is called eth0, run the following command:
> +
> +# tc qdisc add dev eth0 root handle 1: multiq
> +
> +The qdisc will allocate the number of bands to equal the number of queues that
> +the device reports, and bring the qdisc online.  Assuming eth0 has 4 Tx
> +queues, the band mapping would look like:
> +
> +band 0 => queue 0
> +band 1 => queue 1
> +band 2 => queue 2
> +band 3 => queue 3
> +
> +Traffic will begin flowing through each queue if your base device has either
> +the default simple_tx_hash or a custom netdev->select_queue() defined.

I think a device doesn't need to have simple_tx_hash defined?

...
> +static int multiq_tune(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct multiq_sched_data *q = qdisc_priv(sch);
> +	struct tc_multiq_qopt *qopt;
> +	int i;
> +
> +	if (sch->parent != TC_H_ROOT)
> +		return -EINVAL;
> +	if (!netif_is_multiqueue(qdisc_dev(sch)))
> +		return -EINVAL;
> +	if (nla_len(opt) < sizeof(*qopt))
> +		return -EINVAL;
> +
> +	qopt = nla_data(opt);
> +
> +	qopt->bands = qdisc_dev(sch)->real_num_tx_queues;
> +
> +	sch_tree_lock(sch);
> +	q->bands = qopt->bands;
> +	for (i = q->bands; i < q->max_bands; i++) {
> +		struct Qdisc *child = xchg(&q->queues[i], &noop_qdisc);
> +		if (child != &noop_qdisc) {

Or maybe?:
		if (q->queues[i] != &noop_qdisc) {
			struct Qdisc *child = xchg(&q->queues[i], &noop_qdisc);

> +			qdisc_tree_decrease_qlen(child, child->q.qlen);
> +			qdisc_destroy(child);
> +		}
> +	}
...
> +static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct multiq_sched_data *q = qdisc_priv(sch);
> +	int i;
> +
> +	q->queues = NULL;
> +
> +	if (opt == NULL)
> +		return -EINVAL;
> +
> +	q->max_bands = qdisc_dev(sch)->num_tx_queues;
> +
> +	q->queues = kcalloc(q->max_bands, sizeof(struct Qdisc *), GFP_KERNEL);
> +	if (!q->queues)
> +		return -ENOBUFS;
> +	for (i = 0; i < q->max_bands; i++)
> +		q->queues[i] = &noop_qdisc;
> +
> +	return multiq_tune(sch, opt);

But multiq_tune() can EINVAL, so kfree is needed here.

Jarek P.

PS: to save some "paper" a typo, I guess, from PATCH 3/3:

> +The behavior of tc filters remains the same.  However a new tc action,
> +skbedit, has been added.  Assuming you wanted to route all traffic to a
> +specific host, for example 192.168.0.3, though a specific queue you could use

 +specific host, for example 192.168.0.3, through a specific queue you could use

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support
  2008-09-12 22:42   ` Jarek Poplawski
@ 2008-09-12 23:33     ` Duyck, Alexander H
  0 siblings, 0 replies; 8+ messages in thread
From: Duyck, Alexander H @ 2008-09-12 23:33 UTC (permalink / raw)
  To: Jarek Poplawski, Kirsher, Jeffrey T
  Cc: davem@davemloft.net, jeff@garzik.org, netdev@vger.kernel.org,
	akpm@linux-foundation.org

Jarek Poplawski wrote:
----------------------------------------------
>> +
>> +Currently two qdiscs support multiqueue devices.
>
> This looks like a bit misleading: one could think it's necessary to
> use one of these two with mq devices.
>
How about I change this from "support" to "are optimized for".  Would
that work better for you?

>> +
>> +sch_multiq has been added for hardware that wishes to avoid
>> +unnecessary requeuing.
>
> This could suggest that other qdiscs don't wish the same or even
> generate excessive requeuing, which IMHO could be true, but it's not
> proven, and might be only temporary situation anyway...
>
> Anyway, I wonder why you don't mention here anything from your first
> nice explantion to my doubts around the first attempt to restore
> multiqueue to prio, like EEDC/DCB/etc?

Actually this was just a poor choice of words on my part.  I should
probably replace "unnecessary requeueing" with "head-of-line blocking".


>> +Traffic will begin flowing through each queue if your base device
>> +has either the default simple_tx_hash or a custom
>> +netdev->select_queue() defined.
>
> I think a device doesn't need to have simple_tx_hash defined?

Actually simple_tx_hash is the default hash function that netdev uses
if select_queue is not defined.  Another poor choice of wording.  It
Should have been "..each queue via either the default simple_tx_hash
or the netdev->select_queue function if you have it defined".

>> +     for (i = q->bands; i < q->max_bands; i++) {
>> +             struct Qdisc *child = xchg(&q->queues[i], &noop_qdisc);
>> +             if (child != &noop_qdisc) {
>
> Or maybe?:
>                 if (q->queues[i] != &noop_qdisc) {
>                         struct Qdisc *child = xchg(&q->queues[i],
>                                                    &noop_qdisc);
>
>> +                     qdisc_tree_decrease_qlen(child, child->q.qlen);
>> +                     qdisc_destroy(child);
>> +             }
>> +     }
I suppose I can make this change since we are already protected by the
tree lock already.

>> +static int multiq_init(struct Qdisc *sch, struct nlattr *opt) +{
>> +     struct multiq_sched_data *q = qdisc_priv(sch); +     int i;
>> +
>> +     q->queues = NULL;
>> +
>> +     if (opt == NULL)
>> +             return -EINVAL;
>> +
>> +     q->max_bands = qdisc_dev(sch)->num_tx_queues; +
>> +     q->queues = kcalloc(q->max_bands, sizeof(struct Qdisc *),
...
>> +
>> +     return multiq_tune(sch, opt);
>
> But multiq_tune() can EINVAL, so kfree is needed here.

Noted and corrected.

> Jarek P.
>
> PS: to save some "paper" a typo, I guess, from PATCH 3/3:
>
>> +specific host, for example 192.168.0.3, though a specific queue
>> +you could use
>
>  +specific host, for example 192.168.0.3, through a specific queue
> you could use

Typo noted and will be corrected shortly.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-09-12 23:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-12  3:07 [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent Jeff Kirsher
2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
2008-09-12  3:19   ` David Miller
2008-09-12  4:42     ` Alexander Duyck
2008-09-12 22:42   ` Jarek Poplawski
2008-09-12 23:33     ` Duyck, Alexander H
2008-09-12  3:08 ` [NET-NEXT PATCH 3/3][UPDATED] pkt_action: add new action skbedit Jeff Kirsher
2008-09-12  3:17 ` [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).