[net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS
@ 2010-12-17 15:34 John Fastabend
  2010-12-17 15:34 ` [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root John Fastabend
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: John Fastabend @ 2010-12-17 15:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, hadi, shemminger, tgraf, eric.dumazet, nhorman

This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.

Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.

By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.

With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
a single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.

To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.

This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/netdevice.h |   60 +++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c            |   10 +++++++-
 2 files changed, 69 insertions(+), 1 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a9ac5dc..9694138 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -646,6 +646,12 @@ struct xps_dev_maps {
     (nr_cpu_ids * sizeof(struct xps_map *)))
 #endif /* CONFIG_XPS */

+/* HW offloaded queuing disciplines txq count and offset maps */
+struct netdev_tc_txq {
+	u16 count;
+	u16 offset;
+};
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1146,6 +1152,9 @@ struct net_device {
 	/* Data Center Bridging netlink ops */
 	const struct dcbnl_rtnl_ops *dcbnl_ops;
 #endif
+	u8 num_tc;
+	struct netdev_tc_txq tc_to_txq[16];
+	u8 prio_tc_map[16];

 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	/* max exchange id for FCoE LRO by ddp */
@@ -1162,6 +1171,57 @@ struct net_device {
 #define	NETDEV_ALIGN		32

 static inline
+int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio)
+{
+	return dev->prio_tc_map[prio & 15];
+}
+
+static inline
+int netdev_set_prio_tc_map(struct net_device *dev, u8 prio, u8 tc)
+{
+	if (tc >= dev->num_tc)
+		return -EINVAL;
+
+	dev->prio_tc_map[prio & 15] = tc & 15;
+	return 0;
+}
+
+static inline
+void netdev_reset_tc(struct net_device *dev)
+{
+	dev->num_tc = 0;
+	memset(dev->tc_to_txq, 0, sizeof(dev->tc_to_txq));
+	memset(dev->prio_tc_map, 0, sizeof(dev->prio_tc_map));
+}
+
+static inline
+int netdev_set_tc_queue(struct net_device *dev, u8 tc, u16 count, u16 offset)
+{
+	if (tc >= dev->num_tc)
+		return -EINVAL;
+
+	dev->tc_to_txq[tc].count = count;
+	dev->tc_to_txq[tc].offset = offset;
+	return 0;
+}
+
+static inline
+int netdev_set_num_tc(struct net_device *dev, u8 num_tc)
+{
+	if (num_tc > 16)
+		return -EINVAL;
+
+	dev->num_tc = num_tc;
+	return 0;
+}
+
+static inline
+u8 netdev_get_num_tc(const struct net_device *dev)
+{
+	return dev->num_tc;
+}
+
+static inline
 struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
 					 unsigned int index)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 55ff66f..58e04ba 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2118,6 +2118,8 @@ static u32 hashrnd __read_mostly;
 u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
 {
 	u32 hash;
+	u16 qoffset = 0;
+	u16 qcount = dev->real_num_tx_queues;

 	if (skb_rx_queue_recorded(skb)) {
 		hash = skb_get_rx_queue(skb);
@@ -2126,13 +2128,19 @@ u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
 		return hash;
 	}

+	if (dev->num_tc) {
+		u8 tc = netdev_get_prio_tc_map(dev, skb->priority);
+		qoffset = dev->tc_to_txq[tc].offset;
+		qcount = dev->tc_to_txq[tc].count;
+	}
+
 	if (skb->sk && skb->sk->sk_hash)
 		hash = skb->sk->sk_hash;
 	else
 		hash = (__force u16) skb->protocol ^ skb->rxhash;
 	hash = jhash_1word(hash, hashrnd);

-	return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
+	return (u16) ((((u64) hash * qcount)) >> 32) + qoffset;
 }
 EXPORT_SYMBOL(skb_tx_hash);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root
  2010-12-17 15:34 [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
@ 2010-12-17 15:34 ` John Fastabend
  2010-12-20 23:12   ` Ben Hutchings
  2010-12-17 15:34 ` [net-next-2.6 PATCH 3/4] net_sched: implement a root container qdisc sch_mclass John Fastabend
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: John Fastabend @ 2010-12-17 15:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, hadi, shemminger, tgraf, eric.dumazet, nhorman

This patch modifies the mq qdisc to allow multiple mq qdiscs
to be used. Allowing TX queues to be grouped for management.

This allows a root container qdisc to create multiple traffic
classes and use the mq qdisc as a default queueing discipline. It
is expected other queueing disciplines can then be grafted to the
container as needed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/sched/sch_mq.c |   70 ++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index ecc302f..35ed26d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -19,17 +19,39 @@
 
 struct mq_sched {
 	struct Qdisc		**qdiscs;
+	u8 num_tc;
 };
 
+static void mq_queues(struct net_device *dev, struct Qdisc *sch,
+		      unsigned int *count, unsigned int *offset)
+{
+	struct mq_sched *priv = qdisc_priv(sch);
+	if (priv->num_tc) {
+		int queue = TC_H_MIN(sch->parent) - 1;
+		if (count)
+			*count = dev->tc_to_txq[queue].count;
+		if (offset)
+			*offset = dev->tc_to_txq[queue].offset;
+	} else {
+		if (count)
+			*count = dev->num_tx_queues;
+		if (offset)
+			*offset = 0;
+	}
+}
+
 static void mq_destroy(struct Qdisc *sch)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mq_sched *priv = qdisc_priv(sch);
-	unsigned int ntx;
+	unsigned int ntx, count;
 
 	if (!priv->qdiscs)
 		return;
-	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
+
+	mq_queues(dev, sch, &count, NULL);
+
+	for (ntx = 0; ntx < count && priv->qdiscs[ntx]; ntx++)
 		qdisc_destroy(priv->qdiscs[ntx]);
 	kfree(priv->qdiscs);
 }
@@ -41,21 +63,26 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
 	struct netdev_queue *dev_queue;
 	struct Qdisc *qdisc;
 	unsigned int ntx;
+	unsigned int count, offset;
 
-	if (sch->parent != TC_H_ROOT)
+	if (sch->parent != TC_H_ROOT && !dev->num_tc)
 		return -EOPNOTSUPP;
 
 	if (!netif_is_multiqueue(dev))
 		return -EOPNOTSUPP;
 
+	/* Record num tc's in priv so we can tear down cleanly */
+	priv->num_tc = dev->num_tc;
+	mq_queues(dev, sch, &count, &offset);
+
 	/* pre-allocate qdiscs, attachment can't fail */
-	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
+	priv->qdiscs = kcalloc(count, sizeof(priv->qdiscs[0]),
 			       GFP_KERNEL);
 	if (priv->qdiscs == NULL)
 		return -ENOMEM;
 
-	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-		dev_queue = netdev_get_tx_queue(dev, ntx);
+	for (ntx = 0; ntx < count; ntx++) {
+		dev_queue = netdev_get_tx_queue(dev, ntx + offset);
 		qdisc = qdisc_create_dflt(dev_queue, &pfifo_fast_ops,
 					  TC_H_MAKE(TC_H_MAJ(sch->handle),
 						    TC_H_MIN(ntx + 1)));
@@ -65,7 +92,8 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
 		priv->qdiscs[ntx] = qdisc;
 	}
 
-	sch->flags |= TCQ_F_MQROOT;
+	if (!priv->num_tc)
+		sch->flags |= TCQ_F_MQROOT;
 	return 0;
 
 err:
@@ -78,9 +106,11 @@ static void mq_attach(struct Qdisc *sch)
 	struct net_device *dev = qdisc_dev(sch);
 	struct mq_sched *priv = qdisc_priv(sch);
 	struct Qdisc *qdisc;
-	unsigned int ntx;
+	unsigned int ntx, count;
 
-	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+	mq_queues(dev, sch, &count, NULL);
+
+	for (ntx = 0; ntx < count; ntx++) {
 		qdisc = priv->qdiscs[ntx];
 		qdisc = dev_graft_qdisc(qdisc->dev_queue, qdisc);
 		if (qdisc)
@@ -94,14 +124,17 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct Qdisc *qdisc;
-	unsigned int ntx;
+	unsigned int ntx, count, offset;
+
+	mq_queues(dev, sch, &count, &offset);
 
 	sch->q.qlen = 0;
 	memset(&sch->bstats, 0, sizeof(sch->bstats));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
-	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+	for (ntx = 0; ntx < count; ntx++) {
+		int txq = ntx + offset;
+		qdisc = netdev_get_tx_queue(dev, txq)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
 		sch->bstats.bytes	+= qdisc->bstats.bytes;
@@ -120,10 +153,13 @@ static struct netdev_queue *mq_queue_get(struct Qdisc *sch, unsigned long cl)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	unsigned long ntx = cl - 1;
+	unsigned int count, offset;
+
+	mq_queues(dev, sch, &count, &offset);
 
-	if (ntx >= dev->num_tx_queues)
+	if (ntx >= count)
 		return NULL;
-	return netdev_get_tx_queue(dev, ntx);
+	return netdev_get_tx_queue(dev, offset + ntx);
 }
 
 static struct netdev_queue *mq_select_queue(struct Qdisc *sch,
@@ -203,13 +239,15 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 static void mq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 {
 	struct net_device *dev = qdisc_dev(sch);
-	unsigned int ntx;
+	unsigned int ntx, count;
+
+	mq_queues(dev, sch, &count, NULL);
 
 	if (arg->stop)
 		return;
 
 	arg->count = arg->skip;
-	for (ntx = arg->skip; ntx < dev->num_tx_queues; ntx++) {
+	for (ntx = arg->skip; ntx < count; ntx++) {
 		if (arg->fn(sch, ntx + 1, arg) < 0) {
 			arg->stop = 1;
 			break;


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next-2.6 PATCH 3/4] net_sched: implement a root container qdisc sch_mclass
  2010-12-17 15:34 [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
  2010-12-17 15:34 ` [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root John Fastabend
@ 2010-12-17 15:34 ` John Fastabend
  2010-12-17 15:34 ` [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs John Fastabend
  2010-12-17 16:54 ` [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
  3 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2010-12-17 15:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, hadi, shemminger, tgraf, eric.dumazet, nhorman

This implements a mclass 'multi-class' queueing discipline that by
default creates multiple mq qdisc's one for each traffic class. Each
mq qdisc then owns a range of queues per the netdev_tc_txq mappings.

Using the mclass qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mclass_qopt {
        __u8    num_tc;
        __u8    prio_tc_map[16];
        __u8    hw;
        __u16   count[16];
        __u16   offset[16];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc. The
hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup(). This
scheme can be expanded as needed with additional qdisc being graft'd
onto the root qdisc to provide per tc queuing disciplines. Allowing
Software and hardware queuing disciplines can be used together

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to no specifics about
traffic types.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/netdevice.h |    3 
 include/linux/pkt_sched.h |    9 +
 include/net/sch_generic.h |    1 
 net/sched/Makefile        |    2 
 net/sched/sch_api.c       |    1 
 net/sched/sch_generic.c   |    8 +
 net/sched/sch_mclass.c    |  375 +++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 397 insertions(+), 2 deletions(-)
 create mode 100644 net/sched/sch_mclass.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9694138..169a23f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -762,6 +762,8 @@ struct netdev_tc_txq {
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
  *			  struct nlattr *port[]);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ *
+ * int (*ndo_setup_tc)(struct net_device *dev, int tc);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -820,6 +822,7 @@ struct net_device_ops {
 						   struct nlattr *port[]);
 	int			(*ndo_get_vf_port)(struct net_device *dev,
 						   int vf, struct sk_buff *skb);
+	int			(*ndo_setup_tc)(struct net_device *dev, u8 tc);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 2cfa4bc..0134ed4 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -481,4 +481,13 @@ struct tc_drr_stats {
 	__u32	deficit;
 };
 
+/* MCLASS */
+struct tc_mclass_qopt {
+	__u8	num_tc;
+	__u8	prio_tc_map[16];
+	__u8	hw;
+	__u16	count[16];
+	__u16	offset[16];
+};
+
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ea1f8a8..2bbcd09 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -276,6 +276,7 @@ extern struct Qdisc noop_qdisc;
 extern struct Qdisc_ops noop_qdisc_ops;
 extern struct Qdisc_ops pfifo_fast_ops;
 extern struct Qdisc_ops mq_qdisc_ops;
+extern struct Qdisc_ops mclass_qdisc_ops;
 
 struct Qdisc_class_common {
 	u32			classid;
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 960f5db..76dcf5b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the Linux Traffic Control Unit.
 #
 
-obj-y	:= sch_generic.o sch_mq.o
+obj-y	:= sch_generic.o sch_mq.o sch_mclass.o
 
 obj-$(CONFIG_NET_SCHED)		+= sch_api.o sch_blackhole.o
 obj-$(CONFIG_NET_CLS)		+= cls_api.o
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index b22ca2d..24f40e0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1770,6 +1770,7 @@ static int __init pktsched_init(void)
 	register_qdisc(&bfifo_qdisc_ops);
 	register_qdisc(&pfifo_head_drop_qdisc_ops);
 	register_qdisc(&mq_qdisc_ops);
+	register_qdisc(&mclass_qdisc_ops);
 
 	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
 	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 0918834..73ed9b7 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -709,7 +709,13 @@ static void attach_default_qdiscs(struct net_device *dev)
 		dev->qdisc = txq->qdisc_sleeping;
 		atomic_inc(&dev->qdisc->refcnt);
 	} else {
-		qdisc = qdisc_create_dflt(txq, &mq_qdisc_ops, TC_H_ROOT);
+		if (dev->num_tc)
+			qdisc = qdisc_create_dflt(txq, &mclass_qdisc_ops,
+						  TC_H_ROOT);
+		else
+			qdisc = qdisc_create_dflt(txq, &mq_qdisc_ops,
+						  TC_H_ROOT);
+
 		if (qdisc) {
 			qdisc->ops->attach(qdisc);
 			dev->qdisc = qdisc;
diff --git a/net/sched/sch_mclass.c b/net/sched/sch_mclass.c
new file mode 100644
index 0000000..551b660
--- /dev/null
+++ b/net/sched/sch_mclass.c
@@ -0,0 +1,375 @@
+/*
+ * net/sched/sch_mclass.c
+ *
+ * Copyright (c) 2010 John Fastabend <john.r.fastabend@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/sch_generic.h>
+
+struct mclass_sched {
+	struct Qdisc		**qdiscs;
+	int hw_owned;
+};
+
+static void mclass_destroy(struct Qdisc *sch)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	unsigned int ntc;
+
+	if (!priv->qdiscs)
+		return;
+
+	for (ntc = 0; ntc < dev->num_tc && priv->qdiscs[ntc]; ntc++)
+		qdisc_destroy(priv->qdiscs[ntc]);
+
+	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
+		dev->netdev_ops->ndo_setup_tc(dev, 0);
+	else
+		netdev_set_num_tc(dev, 0);
+
+	kfree(priv->qdiscs);
+}
+
+static int mclass_parse_opt(struct net_device *dev, struct tc_mclass_qopt *qopt)
+{
+	int i, j;
+
+	/* Verify TC offset and count are sane */
+	for (i = 0; i < qopt->num_tc; i++) {
+		int last = qopt->offset[i] + qopt->count[i];
+		if (last > dev->num_tx_queues)
+			return -EINVAL;
+		for (j = i + 1; j < qopt->num_tc; j++) {
+			if (last > qopt->offset[j])
+				return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int mclass_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	struct netdev_queue *dev_queue;
+	struct Qdisc *qdisc;
+	int i, err = -EOPNOTSUPP;
+	struct tc_mclass_qopt *qopt = NULL;
+
+	/* Unwind attributes on failure */
+	u8 unwnd_tc = dev->num_tc;
+	u8 unwnd_map[16];
+	struct netdev_tc_txq unwnd_txq[16];
+
+	if (sch->parent != TC_H_ROOT)
+		return -EOPNOTSUPP;
+
+	if (!netif_is_multiqueue(dev))
+		return -EOPNOTSUPP;
+
+	if (nla_len(opt) < sizeof(*qopt))
+		return -EINVAL;
+	qopt = nla_data(opt);
+
+	memcpy(unwnd_map, dev->prio_tc_map, sizeof(unwnd_map));
+	memcpy(unwnd_txq, dev->tc_to_txq, sizeof(unwnd_txq));
+
+	/* If the mclass options indicate that hardware should own
+	 * the queue mapping then run ndo_setup_tc if this can not
+	 * be done fail immediately.
+	 */
+	if (qopt->hw && dev->netdev_ops->ndo_setup_tc) {
+		priv->hw_owned = 1;
+		if (dev->netdev_ops->ndo_setup_tc(dev, qopt->num_tc))
+			return -EINVAL;
+	} else if (!qopt->hw) {
+		if (mclass_parse_opt(dev, qopt))
+			return -EINVAL;
+
+		if (netdev_set_num_tc(dev, qopt->num_tc))
+			return -ENOMEM;
+
+		for (i = 0; i < qopt->num_tc; i++)
+			netdev_set_tc_queue(dev, i,
+					    qopt->count[i], qopt->offset[i]);
+	} else {
+		return -EINVAL;
+	}
+
+	/* Always use supplied priority mappings */
+	for (i = 0; i < 16; i++) {
+		if (netdev_set_prio_tc_map(dev, i, qopt->prio_tc_map[i])) {
+			err = -EINVAL;
+			goto tc_err;
+		}
+	}
+
+	/* pre-allocate qdisc, attachment can't fail */
+	priv->qdiscs = kcalloc(qopt->num_tc,
+			       sizeof(priv->qdiscs[0]), GFP_KERNEL);
+	if (priv->qdiscs == NULL) {
+		err = -ENOMEM;
+		goto tc_err;
+	}
+
+	for (i = 0; i < dev->num_tc; i++) {
+		dev_queue = netdev_get_tx_queue(dev, dev->tc_to_txq[i].offset);
+		qdisc = qdisc_create_dflt(dev_queue, &mq_qdisc_ops,
+					  TC_H_MAKE(TC_H_MAJ(sch->handle),
+						    TC_H_MIN(i + 1)));
+		if (qdisc == NULL) {
+			err = -ENOMEM;
+			goto err;
+		}
+		qdisc->flags |= TCQ_F_CAN_BYPASS;
+		priv->qdiscs[i] = qdisc;
+	}
+
+	sch->flags |= TCQ_F_MQROOT;
+	return 0;
+
+err:
+	mclass_destroy(sch);
+tc_err:
+	if (priv->hw_owned)
+		dev->netdev_ops->ndo_setup_tc(dev, unwnd_tc);
+	else
+		netdev_set_num_tc(dev, unwnd_tc);
+
+	memcpy(dev->prio_tc_map, unwnd_map, sizeof(unwnd_map));
+	memcpy(dev->tc_to_txq, unwnd_txq, sizeof(unwnd_txq));
+
+	return err;
+}
+
+static void mclass_attach(struct Qdisc *sch)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	struct Qdisc *qdisc;
+	unsigned int ntc;
+
+	/* Attach underlying qdisc */
+	for (ntc = 0; ntc < dev->num_tc; ntc++) {
+		qdisc = priv->qdiscs[ntc];
+		if (qdisc->ops && qdisc->ops->attach)
+			qdisc->ops->attach(qdisc);
+	}
+}
+
+static int mclass_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
+		    struct Qdisc **old)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	unsigned long ntc = cl - 1;
+
+	if (ntc >= dev->num_tc)
+		return -EINVAL;
+
+	if (dev->flags & IFF_UP)
+		dev_deactivate(dev);
+
+	*old = priv->qdiscs[ntc];
+	if (new == NULL)
+		new = &noop_qdisc;
+	priv->qdiscs[ntc] = new;
+	qdisc_reset(*old);
+
+	if (dev->flags & IFF_UP)
+		dev_activate(dev);
+
+	return 0;
+}
+
+static int mclass_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tc_mclass_qopt opt;
+	struct Qdisc *qdisc;
+	unsigned int i;
+
+	sch->q.qlen = 0;
+	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	memset(&sch->qstats, 0, sizeof(sch->qstats));
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+		spin_lock_bh(qdisc_lock(qdisc));
+		sch->q.qlen		+= qdisc->q.qlen;
+		sch->bstats.bytes	+= qdisc->bstats.bytes;
+		sch->bstats.packets	+= qdisc->bstats.packets;
+		sch->qstats.qlen	+= qdisc->qstats.qlen;
+		sch->qstats.backlog	+= qdisc->qstats.backlog;
+		sch->qstats.drops	+= qdisc->qstats.drops;
+		sch->qstats.requeues	+= qdisc->qstats.requeues;
+		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+		spin_unlock_bh(qdisc_lock(qdisc));
+	}
+
+	opt.num_tc = dev->num_tc;
+	memcpy(opt.prio_tc_map, dev->prio_tc_map, 16);
+	opt.hw = priv->hw_owned;
+
+	for (i = 0; i < dev->num_tc; i++) {
+		opt.count[i] = dev->tc_to_txq[i].count;
+		opt.offset[i] = dev->tc_to_txq[i].offset;
+	}
+
+	NLA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt);
+
+	return skb->len;
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static struct Qdisc *mclass_leaf(struct Qdisc *sch, unsigned long cl)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	unsigned long ntc = cl - 1;
+
+	if (ntc >= dev->num_tc)
+		return NULL;
+	return priv->qdiscs[ntc];
+}
+
+static unsigned long mclass_get(struct Qdisc *sch, u32 classid)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	unsigned int ntc = TC_H_MIN(classid);
+
+	if (ntc >= dev->num_tc)
+		return 0;
+	return ntc;
+}
+
+static void mclass_put(struct Qdisc *sch, unsigned long cl)
+{
+}
+
+static int mclass_dump_class(struct Qdisc *sch, unsigned long cl,
+			 struct sk_buff *skb, struct tcmsg *tcm)
+{
+	struct Qdisc *class;
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	unsigned long ntc = cl - 1;
+
+	if (ntc >= dev->num_tc)
+		return -EINVAL;
+
+	class = priv->qdiscs[ntc];
+
+	tcm->tcm_parent = TC_H_ROOT;
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	tcm->tcm_info = class->handle;
+	return 0;
+}
+
+static int mclass_dump_class_stats(struct Qdisc *sch, unsigned long cl,
+			       struct gnet_dump *d)
+{
+	struct Qdisc *class, *qdisc;
+	struct net_device *dev = qdisc_dev(sch);
+	struct mclass_sched *priv = qdisc_priv(sch);
+	unsigned long ntc = cl - 1;
+	unsigned int i;
+	u16 count, offset;
+
+	if (ntc >= dev->num_tc)
+		return -EINVAL;
+
+	class = priv->qdiscs[ntc];
+	count = dev->tc_to_txq[ntc].count;
+	offset = dev->tc_to_txq[ntc].offset;
+
+	memset(&class->bstats, 0, sizeof(class->bstats));
+	memset(&class->qstats, 0, sizeof(class->qstats));
+
+	/* Drop lock here it will be reclaimed before touching statistics
+	 * this is required because the qdisc_root_sleeping_lock we hold
+	 * here is the look on dev_queue->qdisc_sleeping also acquired
+	 * below.
+	 */
+	spin_unlock_bh(d->lock);
+
+	for (i = offset; i < offset + count; i++) {
+		qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+		spin_lock_bh(qdisc_lock(qdisc));
+		class->q.qlen		 += qdisc->q.qlen;
+		class->bstats.bytes	 += qdisc->bstats.bytes;
+		class->bstats.packets	 += qdisc->bstats.packets;
+		class->qstats.qlen	 += qdisc->qstats.qlen;
+		class->qstats.backlog	 += qdisc->qstats.backlog;
+		class->qstats.drops	 += qdisc->qstats.drops;
+		class->qstats.requeues	 += qdisc->qstats.requeues;
+		class->qstats.overlimits += qdisc->qstats.overlimits;
+		spin_unlock_bh(qdisc_lock(qdisc));
+	}
+
+	/* Reclaim root sleeping lock before completing stats */
+	spin_lock_bh(d->lock);
+
+	class->qstats.qlen = class->q.qlen;
+	if (gnet_stats_copy_basic(d, &class->bstats) < 0 ||
+	    gnet_stats_copy_queue(d, &class->qstats) < 0)
+		return -1;
+	return 0;
+}
+
+static void mclass_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	unsigned long ntc;
+
+	if (arg->stop)
+		return;
+
+	arg->count = arg->skip;
+	for (ntc = arg->skip; ntc < dev->num_tc; ntc++) {
+		if (arg->fn(sch, ntc + 1, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops mclass_class_ops = {
+	.graft		= mclass_graft,
+	.leaf		= mclass_leaf,
+	.get		= mclass_get,
+	.put		= mclass_put,
+	.walk		= mclass_walk,
+	.dump		= mclass_dump_class,
+	.dump_stats	= mclass_dump_class_stats,
+};
+
+struct Qdisc_ops mclass_qdisc_ops __read_mostly = {
+	.cl_ops		= &mclass_class_ops,
+	.id		= "mclass",
+	.priv_size	= sizeof(struct mclass_sched),
+	.init		= mclass_init,
+	.destroy	= mclass_destroy,
+	.attach		= mclass_attach,
+	.dump		= mclass_dump,
+	.owner		= THIS_MODULE,
+};


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs
  2010-12-17 15:34 [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
  2010-12-17 15:34 ` [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root John Fastabend
  2010-12-17 15:34 ` [net-next-2.6 PATCH 3/4] net_sched: implement a root container qdisc sch_mclass John Fastabend
@ 2010-12-17 15:34 ` John Fastabend
  2010-12-20 23:22   ` Ben Hutchings
  2010-12-17 16:54 ` [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
  3 siblings, 1 reply; 9+ messages in thread
From: John Fastabend @ 2010-12-17 15:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, hadi, shemminger, tgraf, eric.dumazet, nhorman

Add a MQSAFE flag to the qdisc schedulers that can be safely
managed by sch_mclass. Without this flag schedulers that are
not aware of multiple tx queues can be grafted under the
mclass qdisc. Allowing incorrect qdiscs to be grafted causes
an invalid mapping from qdisc's to netdevice queues.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/net/sch_generic.h |    1 +
 net/sched/sch_generic.c   |    2 +-
 net/sched/sch_mclass.c    |    5 +++--
 net/sched/sch_mq.c        |    3 +++
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2bbcd09..791df75 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -50,6 +50,7 @@ struct Qdisc {
 #define TCQ_F_INGRESS		4
 #define TCQ_F_CAN_BYPASS	8
 #define TCQ_F_MQROOT		16
+#define TCQ_F_MQSAFE		32
 #define TCQ_F_WARN_NONWC	(1 << 16)
 	int			padded;
 	struct Qdisc_ops	*ops;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 73ed9b7..1bcc0ed 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -376,7 +376,7 @@ static struct netdev_queue noop_netdev_queue = {
 struct Qdisc noop_qdisc = {
 	.enqueue	=	noop_enqueue,
 	.dequeue	=	noop_dequeue,
-	.flags		=	TCQ_F_BUILTIN,
+	.flags		=	TCQ_F_BUILTIN | TCQ_F_MQSAFE,
 	.ops		=	&noop_qdisc_ops,
 	.list		=	LIST_HEAD_INIT(noop_qdisc.list),
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
diff --git a/net/sched/sch_mclass.c b/net/sched/sch_mclass.c
index 551b660..444492a 100644
--- a/net/sched/sch_mclass.c
+++ b/net/sched/sch_mclass.c
@@ -178,15 +178,16 @@ static int mclass_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 	struct mclass_sched *priv = qdisc_priv(sch);
 	unsigned long ntc = cl - 1;
 
-	if (ntc >= dev->num_tc)
+	if (ntc >= dev->num_tc || (new && !(new->flags & TCQ_F_MQSAFE)))
 		return -EINVAL;
 
 	if (dev->flags & IFF_UP)
 		dev_deactivate(dev);
 
-	*old = priv->qdiscs[ntc];
 	if (new == NULL)
 		new = &noop_qdisc;
+
+	*old = priv->qdiscs[ntc];
 	priv->qdiscs[ntc] = new;
 	qdisc_reset(*old);
 
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 35ed26d..493eaab 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -94,6 +94,9 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
 
 	if (!priv->num_tc)
 		sch->flags |= TCQ_F_MQROOT;
+	else
+		sch->flags |= TCQ_F_MQSAFE;
+
 	return 0;
 
 err:


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS
  2010-12-17 15:34 [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
                   ` (2 preceding siblings ...)
  2010-12-17 15:34 ` [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs John Fastabend
@ 2010-12-17 16:54 ` John Fastabend
  3 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2010-12-17 16:54 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: netdev@vger.kernel.org, hadi@cyberus.ca, shemminger@vyatta.com,
	tgraf@infradead.org, eric.dumazet@gmail.com,
	nhorman@tuxdriver.com

On 12/17/2010 7:34 AM, John Fastabend wrote:
> This patch provides a mechanism for lower layer devices to
> steer traffic using skb->priority to tx queues. This allows
> for hardware based QOS schemes to use the default qdisc without
> incurring the penalties related to global state and the qdisc
> lock. While reliably receiving skbs on the correct tx ring
> to avoid head of line blocking resulting from shuffling in
> the LLD. Finally, all the goodness from txq caching and xps/rps
> can still be leveraged.
> 
> Many drivers and hardware exist with the ability to implement
> QOS schemes in the hardware but currently these drivers tend
> to rely on firmware to reroute specific traffic, a driver
> specific select_queue or the queue_mapping action in the
> qdisc.
> 
> By using select_queue for this drivers need to be updated for
> each and every traffic type and we lose the goodness of much
> of the upstream work. Firmware solutions are inherently
> inflexible. And finally if admins are expected to build a
> qdisc and filter rules to steer traffic this requires knowledge
> of how the hardware is currently configured. The number of tx
> queues and the queue offsets may change depending on resources.
> Also this approach incurs all the overhead of a qdisc with filters.
> 
> With the mechanism in this patch users can set skb priority using
> expected methods ie setsockopt() or the stack can set the priority
> directly. Then the skb will be steered to the correct tx queues
> aligned with hardware QOS traffic classes. In the normal case with
> a single traffic class and all queues in this class everything
> works as is until the LLD enables multiple tcs.
> 
> To steer the skb we mask out the lower 4 bits of the priority
> and allow the hardware to configure upto 15 distinct classes
> of traffic. This is expected to be sufficient for most applications
> at any rate it is more then the 8021Q spec designates and is
> equal to the number of prio bands currently implemented in
> the default qdisc.
> 
> This in conjunction with a userspace application such as
> lldpad can be used to implement 8021Q transmission selection
> algorithms one of these algorithms being the extended transmission
> selection algorithm currently being used for DCB.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>

This conflicts with Vladislav Zolotarov's patch

[PATCH net-next 1/9] Take the distribution range definition out of skb_tx_hash()

The conflict is easily resolved, but I'll post a v2 of my patch to make it clean.

--John.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root
  2010-12-17 15:34 ` [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root John Fastabend
@ 2010-12-20 23:12   ` Ben Hutchings
  2010-12-21 19:21     ` John Fastabend
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Hutchings @ 2010-12-20 23:12 UTC (permalink / raw)
  To: John Fastabend
  Cc: davem, netdev, hadi, shemminger, tgraf, eric.dumazet, nhorman

On Fri, 2010-12-17 at 07:34 -0800, John Fastabend wrote:
[...]
> diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
> index ecc302f..35ed26d 100644
> --- a/net/sched/sch_mq.c
> +++ b/net/sched/sch_mq.c
> @@ -19,17 +19,39 @@
>  
>  struct mq_sched {
>  	struct Qdisc		**qdiscs;
> +	u8 num_tc;
>  };
>  
> +static void mq_queues(struct net_device *dev, struct Qdisc *sch,
> +		      unsigned int *count, unsigned int *offset)
> +{
> +	struct mq_sched *priv = qdisc_priv(sch);
> +	if (priv->num_tc) {
> +		int queue = TC_H_MIN(sch->parent) - 1;
> +		if (count)
> +			*count = dev->tc_to_txq[queue].count;
> +		if (offset)
> +			*offset = dev->tc_to_txq[queue].offset;
> +	} else {
> +		if (count)
> +			*count = dev->num_tx_queues;
> +		if (offset)
> +			*offset = 0;
> +	}
> +}
[...]

It looks like num_tc will be set even for the root qdisc if the device
is capable of QoS.  Would mq_queues() behave correctly then, i.e. is the
queue range for priority 0 required to be [0, dev->num_tx_queues)?

Also it would be neater to return count and offset together as struct
netdev_tc_txq, rather than through optional out-parameters.  Even better
would be to cache these in struct mq_sched, if that's possible.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs
  2010-12-17 15:34 ` [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs John Fastabend
@ 2010-12-20 23:22   ` Ben Hutchings
  2010-12-21 19:21     ` John Fastabend
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Hutchings @ 2010-12-20 23:22 UTC (permalink / raw)
  To: John Fastabend
  Cc: davem, netdev, hadi, shemminger, tgraf, eric.dumazet, nhorman

On Fri, 2010-12-17 at 07:34 -0800, John Fastabend wrote:
> Add a MQSAFE flag to the qdisc schedulers that can be safely
> managed by sch_mclass. Without this flag schedulers that are
> not aware of multiple tx queues can be grafted under the
> mclass qdisc. Allowing incorrect qdiscs to be grafted causes
> an invalid mapping from qdisc's to netdevice queues.
[...]

This should be defined before adding sch_mclass, or at the same time,
not after.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root
  2010-12-20 23:12   ` Ben Hutchings
@ 2010-12-21 19:21     ` John Fastabend
  0 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2010-12-21 19:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem@davemloft.net, netdev@vger.kernel.org, hadi@cyberus.ca,
	shemminger@vyatta.com, tgraf@infradead.org,
	eric.dumazet@gmail.com, nhorman@tuxdriver.com

On 12/20/2010 3:12 PM, Ben Hutchings wrote:
> On Fri, 2010-12-17 at 07:34 -0800, John Fastabend wrote:
> [...]
>> diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
>> index ecc302f..35ed26d 100644
>> --- a/net/sched/sch_mq.c
>> +++ b/net/sched/sch_mq.c
>> @@ -19,17 +19,39 @@
>>  
>>  struct mq_sched {
>>  	struct Qdisc		**qdiscs;
>> +	u8 num_tc;
>>  };
>>  
>> +static void mq_queues(struct net_device *dev, struct Qdisc *sch,
>> +		      unsigned int *count, unsigned int *offset)
>> +{
>> +	struct mq_sched *priv = qdisc_priv(sch);
>> +	if (priv->num_tc) {
>> +		int queue = TC_H_MIN(sch->parent) - 1;
>> +		if (count)
>> +			*count = dev->tc_to_txq[queue].count;
>> +		if (offset)
>> +			*offset = dev->tc_to_txq[queue].offset;
>> +	} else {
>> +		if (count)
>> +			*count = dev->num_tx_queues;
>> +		if (offset)
>> +			*offset = 0;
>> +	}
>> +}
> [...]
> 
> It looks like num_tc will be set even for the root qdisc if the device
> is capable of QoS.  Would mq_queues() behave correctly then, i.e. is the
> queue range for priority 0 required to be [0, dev->num_tx_queues)?

If num_tc is set the mclass qdisc is loaded by default and not the mq qdisc. When mclass is destroyed it sets num_tc to zero. So I believe mq_queues() will behave correctly ie if mq is the root qdisc [0, dev->num_tx_queues) will be used.

> 
> Also it would be neater to return count and offset together as struct
> netdev_tc_txq, rather than through optional out-parameters.  Even better
> would be to cache these in struct mq_sched, if that's possible.
> 

Yes should be possible to embed this in mq_sched.

> Ben.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs
  2010-12-20 23:22   ` Ben Hutchings
@ 2010-12-21 19:21     ` John Fastabend
  0 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2010-12-21 19:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem@davemloft.net, netdev@vger.kernel.org, hadi@cyberus.ca,
	shemminger@vyatta.com, tgraf@infradead.org,
	eric.dumazet@gmail.com, nhorman@tuxdriver.com

On 12/20/2010 3:22 PM, Ben Hutchings wrote:
> On Fri, 2010-12-17 at 07:34 -0800, John Fastabend wrote:
>> Add a MQSAFE flag to the qdisc schedulers that can be safely
>> managed by sch_mclass. Without this flag schedulers that are
>> not aware of multiple tx queues can be grafted under the
>> mclass qdisc. Allowing incorrect qdiscs to be grafted causes
>> an invalid mapping from qdisc's to netdevice queues.
> [...]
> 
> This should be defined before adding sch_mclass, or at the same time,
> not after.
> 
> Ben.
> 

Right. I'll roll it into the initial mclass patch. Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-12-21 19:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-17 15:34 [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend
2010-12-17 15:34 ` [net-next-2.6 PATCH 2/4] net_sched: Allow multiple mq qdisc to be used as non-root John Fastabend
2010-12-20 23:12   ` Ben Hutchings
2010-12-21 19:21     ` John Fastabend
2010-12-17 15:34 ` [net-next-2.6 PATCH 3/4] net_sched: implement a root container qdisc sch_mclass John Fastabend
2010-12-17 15:34 ` [net-next-2.6 PATCH 4/4] net_sched: add MQSAFE flag to qdisc to identify mq like qdiscs John Fastabend
2010-12-20 23:22   ` Ben Hutchings
2010-12-21 19:21     ` John Fastabend
2010-12-17 16:54 ` [net-next-2.6 PATCH 1/4] net: implement mechanism for HW based QOS John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).