[PATCH] NET: Multiple queue hardware support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] NET: Multiple queue hardware support
@ 2007-06-18 18:42 PJ Waskiewicz
  0 siblings, 0 replies; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-18 18:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, hadi, kaber

Please consider these patches for 2.6.23 inclusion.

This patchset is an updated version of previous multiqueue network device
support patches.  The general approach of introducing a new API for multiqueue
network devices to register with the stack has remained.  The changes include
adding a round-robin qdisc, heavily based on sch_prio, which will allow
queueing to hardware with no OS-enforced queuing policy.  sch_prio still has
the multiqueue code in it, but has a Kconfig option to compile it out of the
qdisc.  This allows people with hardware containing scheduling policies to
use sch_rr (round-robin), and others without scheduling policies in hardware
to continue using sch_prio if they wish to have some notion of scheduling
priority.

The patches being sent are split into Documentation, Qdisc changes, and
core stack changes.  The requested e1000 changes are still being resolved,
and will be sent at a later date.

I did not modify other users of netif_queue_stopped() in net/core/netpoll.c,
net/core/dev.c, or net/core/pktgen.c, since no classification occurs for
the skb being sent to the device.  Therefore, packets should always be
ending up in queue 0, so there's no need to check the subqueue status either.

The patches to iproute2 for tc will be sent separately, to support sch_rr.

-- 
PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] NET: Multiple queue hardware support
@ 2007-06-20 14:58 Jan-Bernd Themann
  2007-06-20 17:21 ` Waskiewicz Jr, Peter P
  2007-06-20 21:51 ` David Miller
  0 siblings, 2 replies; 20+ messages in thread
From: Jan-Bernd Themann @ 2007-06-20 14:58 UTC (permalink / raw)
  To: peter.p.waskiewicz.jr
  Cc: netdev, Christoph Raisch, Thomas Klein, Jan-Bernd Themann

Hi,

to me it seems that this patch set only include multiple transmit queue support
(for qdisc). Am I right with this observation? If so, are there also plans to 
support multiple receive queues to allow the queues to be processed in parallel
on different CPUs via a standard interface? Currently, some drivers use 
"fake netdevices" to feed netif_rx_schedule().

Thanks,
Jan-Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] NET: Multiple queue hardware support
  2007-06-20 14:58 Jan-Bernd Themann
@ 2007-06-20 17:21 ` Waskiewicz Jr, Peter P
  2007-06-20 21:51 ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: Waskiewicz Jr, Peter P @ 2007-06-20 17:21 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: netdev, Christoph Raisch, Thomas Klein, Jan-Bernd Themann

> to me it seems that this patch set only include multiple 
> transmit queue support (for qdisc). Am I right with this 
> observation? If so, are there also plans to support multiple 
> receive queues to allow the queues to be processed in 
> parallel on different CPUs via a standard interface? 
> Currently, some drivers use "fake netdevices" to feed 
> netif_rx_schedule().
> 
> Thanks,
> Jan-Bernd

Jan,
	Yes, these patches are for transmit multiqueue hardware support
only.  There are other efforts to stop using the fake netdevices for Rx
polling, such as using MSI-X vectors assigned to particular receive
queues, but this is still done in the driver (which is ok).  I'm
currently not working on anything in the kernel to support multiqueue Rx
for non MSI-X devices though.

Thanks,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] NET: Multiple queue hardware support
  2007-06-20 14:58 Jan-Bernd Themann
  2007-06-20 17:21 ` Waskiewicz Jr, Peter P
@ 2007-06-20 21:51 ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-06-20 21:51 UTC (permalink / raw)
  To: ossthema; +Cc: peter.p.waskiewicz.jr, netdev, raisch, osstklei, themann

From: Jan-Bernd Themann <ossthema@de.ibm.com>
Date: Wed, 20 Jun 2007 16:58:43 +0200

> to me it seems that this patch set only include multiple transmit
> queue support (for qdisc). Am I right with this observation? If so,
> are there also plans to support multiple receive queues to allow the
> queues to be processed in parallel on different CPUs via a standard
> interface? Currently, some drivers use "fake netdevices" to feed
> netif_rx_schedule().

Yes.

See the "struct net_poll" patches that went out several months
ago, that will help things out in that area.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] NET: Multiple queue hardware support
@ 2007-06-21 21:26 PJ Waskiewicz
  2007-06-21 21:26 ` [PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation PJ Waskiewicz
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-21 21:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, kaber, hadi

Please consider these patches for 2.6.23 inclusion.

Updates since the last submission:

1. skb->queue_mapping moved into the iff cacheline.  I looked at moving
   iff and queue_mapping, but there wasn't enough room anywhere else to
   logically group these in a different cacheline that I could see.  Thanks
   Patrick McHardy.

2. netdev->egress_subqueue is now indexed thanks to Dave Miller.

3. sch_rr is now a MODULE_ALIAS of sch_prio.  Thanks Patrick McHardy.

4. Both sch_rr and multiqueue sch_prio expect the number of bands to
   equal the number of queues on the netdev.

This patchset is an updated version of previous multiqueue network device
support patches.  The general approach of introducing a new API for multiqueue
network devices to register with the stack has remained.  The changes include
adding a round-robin qdisc, heavily based on sch_prio, which will allow
queueing to hardware with no OS-enforced queuing policy.  sch_prio still has
the multiqueue code in it, but has a Kconfig option to compile it out of the
qdisc.  This allows people with hardware containing scheduling policies to
use sch_rr (round-robin), and others without scheduling policies in hardware
to continue using sch_prio if they wish to have some notion of scheduling
priority.

The patches being sent are split into Documentation, Qdisc changes, and
core stack changes.  The requested e1000 changes are still being resolved,
and will be sent at a later date.

I did not modify other users of netif_queue_stopped() in net/core/netpoll.c,
net/core/dev.c, or net/core/pktgen.c, since no classification occurs for
the skb being sent to the device.  Therefore, packets should always be
ending up in queue 0, so there's no need to check the subqueue status either.

The patches to iproute2 for tc will be sent separately, to support sch_rr.

-- 
PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation
  2007-06-21 21:26 [PATCH] NET: Multiple queue hardware support PJ Waskiewicz
@ 2007-06-21 21:26 ` PJ Waskiewicz
  2007-06-21 21:26 ` [PATCH 2/3] NET: [CORE] Stack changes to add multiqueue hardware support API PJ Waskiewicz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-21 21:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, kaber, hadi

Add a brief howto to Documentation/networking for multiqueue.  It
explains how to use the multiqueue API in a driver to support
multiqueue paths from the stack, as well as the qdiscs to use for
feeding a multiqueue device.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 Documentation/networking/multiqueue.txt |  100 +++++++++++++++++++++++++++++++
 1 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
new file mode 100644
index 0000000..55b2db8
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,100 @@
+
+		HOWTO for multiqueue network device support
+		===========================================
+
+Section 1: Base driver requirements for implementing multiqueue support
+Section 2: Qdisc support for multiqueue devices
+Section 3: Brief howto using PRIO or RR for multiqueue devices
+
+
+Intro: Kernel support for multiqueue devices
+---------------------------------------------------------
+
+Kernel support for multiqueue devices is only an API that is presented to the
+netdevice layer for base drivers to implement.  This feature is part of the
+core networking stack, and all network devices will be running on the
+multiqueue-aware stack.  If a base driver only has one queue, then these
+changes are transparent to that driver.
+
+
+Section 1: Base driver requirements for implementing multiqueue support
+-----------------------------------------------------------------------
+
+Base drivers are required to use the new alloc_etherdev_mq() or
+alloc_netdev_mq() functions to allocate the subqueues for the device.  The
+underlying kernel API will take care of the allocation and deallocation of
+the subqueue memory, as well as netdev configuration of where the queues
+exist in memory.
+
+The base driver will also need to manage the queues as it does the global
+netdev->queue_lock today.  Therefore base drivers should use the
+netif_{start|stop|wake}_subqueue() functions to manage each queue while the
+device is still operational.  netdev->queue_lock is still used when the device
+comes online or when it's completely shut down (unregister_netdev(), etc.).
+
+Finally, the base driver should indicate that it is a multiqueue device.  The
+feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
+bitmap on device initialization.  Below is an example from e1000:
+
+#ifdef CONFIG_E1000_MQ
+	if ( (adapter->hw.mac.type == e1000_82571) ||
+	     (adapter->hw.mac.type == e1000_82572) ||
+	     (adapter->hw.mac.type == e1000_80003es2lan))
+		netdev->features |= NETIF_F_MULTI_QUEUE;
+#endif
+
+
+Section 2: Qdisc support for multiqueue devices
+-----------------------------------------------
+
+Currently two qdiscs support multiqueue devices.  A new round-robin qdisc,
+sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
+bands and queues, and will store the queue mapping into skb->queue_mapping.
+Use this field in the base driver to determine which queue to send the skb
+to.
+
+sch_rr has been added for hardware that doesn't want scheduling policies from
+software, so it's a straight round-robin qdisc.  It uses the same syntax and
+classification priomap that sch_prio uses, so it should be intuitive to
+configure for people who've used sch_prio.
+
+The PRIO qdisc naturally plugs into a multiqueue device.  If PRIO has been
+built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
+bands requested is equal to the number of queues on the hardware.  If they
+are equal, it sets a one-to-one mapping up between the queues and bands.  If
+they're not equal, it will not load the qdisc.  This is the same behavior
+for RR.  Once the association is made, any skb that is classified will have
+skb->queue_mapping set, which will allow the driver to properly queue skb's
+to multiple queues.
+
+
+Section 3: Brief howto using PRIO and RR for multiqueue devices
+---------------------------------------------------------------
+
+The userspace command 'tc,' part of the iproute2 package, is used to configure
+qdiscs.  To add the PRIO qdisc to your network device, assuming the device is
+called eth0, run the following command:
+
+# tc qdisc add dev eth0 root handle 1: prio bands 4
+
+This will create 4 bands, 0 being highest priority, and associate those bands
+to the queues on your NIC.  Assuming eth0 has 4 Tx queues, the band mapping
+would look like:
+
+band 0 => queue 0
+band 1 => queue 1
+band 2 => queue 2
+band 3 => queue 3
+
+Traffic will begin flowing through each queue if your TOS values are assigning
+traffic across the various bands.  For example, ssh traffic will always try to
+go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
+so it will be sent out queue 0.  ICMP traffic (pings) fall into the "normal"
+traffic classification, which is band 1.  Therefore pings will be send out
+queue 1 on the NIC.
+
+The behavior of tc filters remains the same, where it will override TOS priority
+classification.
+
+
+Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] NET: [CORE] Stack changes to add multiqueue hardware support API
  2007-06-21 21:26 [PATCH] NET: Multiple queue hardware support PJ Waskiewicz
  2007-06-21 21:26 ` [PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation PJ Waskiewicz
@ 2007-06-21 21:26 ` PJ Waskiewicz
  2007-06-21 21:26 ` [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue PJ Waskiewicz
  2007-06-21 21:31 ` [PATCH] NET: Multiple queue hardware support Patrick McHardy
  3 siblings, 0 replies; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-21 21:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, kaber, hadi

Add the multiqueue hardware device support API to the core network
stack.  Allow drivers to allocate multiple queues and manage them
at the netdev level if they choose to do so.

Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 include/linux/etherdevice.h |    3 +-
 include/linux/netdevice.h   |   62 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/skbuff.h      |    4 ++-
 net/core/dev.c              |   20 ++++++++++----
 net/core/skbuff.c           |    3 ++
 net/ethernet/eth.c          |    9 +++---
 6 files changed, 87 insertions(+), 14 deletions(-)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index f48eb89..b3fbb54 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -39,7 +39,8 @@ extern void		eth_header_cache_update(struct hh_cache *hh, struct net_device *dev
 extern int		eth_header_cache(struct neighbour *neigh,
 					 struct hh_cache *hh);
 
-extern struct net_device *alloc_etherdev(int sizeof_priv);
+extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count);
+#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
 
 /**
  * is_zero_ether_addr - Determine if give Ethernet address is all zeros.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e7913ee..6509eb4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -108,6 +108,14 @@ struct wireless_dev;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+struct net_device_subqueue
+{
+	/* Give a control state for each queue.  This struct may contain
+	 * per-queue locks in the future.
+	 */
+	unsigned long	state;
+};
+
 /*
  *	Network device statistics. Akin to the 2.0 ether stats but
  *	with byte counters.
@@ -325,6 +333,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_GSO		2048	/* Enable software GSO. */
 #define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_MULTI_QUEUE	16384	/* Has multiple TX/RX queues */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -543,6 +552,10 @@ struct net_device
 
 	/* rtnetlink link ops */
 	const struct rtnl_link_ops *rtnl_link_ops;
+
+ 	/* The TX queue control structures */
+ 	int				egress_subqueue_count;
+ 	struct net_device_subqueue	egress_subqueue[0];
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -705,6 +718,48 @@ static inline int netif_running(const struct net_device *dev)
 	return test_bit(__LINK_STATE_START, &dev->state);
 }
 
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the device if we're multiqueue.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, u16 queue_index)
+{
+	clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+	if (netpoll_trap())
+		return;
+#endif
+	set_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+                                         u16 queue_index)
+{
+	return test_bit(__LINK_STATE_XOFF,
+	                &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+	if (netpoll_trap())
+		return;
+#endif
+	if (test_and_clear_bit(__LINK_STATE_XOFF,
+	                       &dev->egress_subqueue[queue_index].state))
+		__netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+	return (!!(NETIF_F_MULTI_QUEUE & dev->features));
+}
 
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
@@ -995,8 +1050,11 @@ static inline void netif_tx_disable(struct net_device *dev)
 extern void		ether_setup(struct net_device *dev);
 
 /* Support for loadable net-drivers */
-extern struct net_device *alloc_netdev(int sizeof_priv, const char *name,
-				       void (*setup)(struct net_device *));
+extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
+					  void (*setup)(struct net_device *),
+					  int queue_count);
+#define alloc_netdev(sizeof_priv, name, setup) \
+	alloc_netdev_mq(sizeof_priv, name, setup, 1)
 extern int		register_netdev(struct net_device *dev);
 extern void		unregister_netdev(struct net_device *dev);
 /* Functions used for multicast support */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e7367c7..01b5e25 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -197,6 +197,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@tstamp: Time we arrived
  *	@dev: Device we arrived on/are leaving by
  *	@iif: ifindex of device we arrived on
+ *	@queue_mapping: Queue mapping for multiqueue devices
  *	@transport_header: Transport layer header
  *	@network_header: Network layer header
  *	@mac_header: Link layer header
@@ -246,7 +247,8 @@ struct sk_buff {
 	ktime_t			tstamp;
 	struct net_device	*dev;
 	int			iif;
-	/* 4 byte hole on 64 bit*/
+	__u16			queue_mapping;
+	/* 2 byte hole on 64 bit*/
 
 	struct  dst_entry	*dst;
 	struct	sec_path	*sp;
diff --git a/net/core/dev.c b/net/core/dev.c
index 2609062..66909aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1545,6 +1545,8 @@ gso:
 		spin_lock(&dev->queue_lock);
 		q = dev->qdisc;
 		if (q->enqueue) {
+			/* reset queue_mapping to zero */
+			skb->queue_mapping = 0;
 			rc = q->enqueue(skb, q);
 			qdisc_run(dev);
 			spin_unlock(&dev->queue_lock);
@@ -3343,16 +3345,18 @@ static struct net_device_stats *internal_stats(struct net_device *dev)
 }
 
 /**
- *	alloc_netdev - allocate network device
+ *	alloc_netdev_mq - allocate network device
  *	@sizeof_priv:	size of private data to allocate space for
  *	@name:		device name format string
  *	@setup:		callback to initialize device
+ *	@queue_count:	the number of subqueues to allocate
  *
  *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.
+ *	and performs basic initialization.  Also allocates subqueue structs
+ *	for each queue on the device at the end of the netdevice.
  */
-struct net_device *alloc_netdev(int sizeof_priv, const char *name,
-		void (*setup)(struct net_device *))
+struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
+		void (*setup)(struct net_device *), int queue_count)
 {
 	void *p;
 	struct net_device *dev;
@@ -3361,7 +3365,9 @@ struct net_device *alloc_netdev(int sizeof_priv, const char *name,
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
 	/* ensure 32-byte alignment of both the device and private area */
-	alloc_size = (sizeof(*dev) + NETDEV_ALIGN_CONST) & ~NETDEV_ALIGN_CONST;
+	alloc_size = (sizeof(*dev) + NETDEV_ALIGN_CONST +
+		     (sizeof(struct net_device_subqueue) * (queue_count - 1))) &
+		     ~NETDEV_ALIGN_CONST;
 	alloc_size += sizeof_priv + NETDEV_ALIGN_CONST;
 
 	p = kzalloc(alloc_size, GFP_KERNEL);
@@ -3377,12 +3383,14 @@ struct net_device *alloc_netdev(int sizeof_priv, const char *name,
 	if (sizeof_priv)
 		dev->priv = netdev_priv(dev);
 
+  	dev->egress_subqueue_count = queue_count;
+
 	dev->get_stats = internal_stats;
 	setup(dev);
 	strcpy(dev->name, name);
 	return dev;
 }
-EXPORT_SYMBOL(alloc_netdev);
+EXPORT_SYMBOL(alloc_netdev_mq);
 
 /**
  *	free_netdev - free network device
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7c6a34e..7bbed45 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -418,6 +418,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 	n->nohdr = 0;
 	C(pkt_type);
 	C(ip_summed);
+	C(queue_mapping);
 	C(priority);
 #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
 	C(ipvs_property);
@@ -459,6 +460,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 #endif
 	new->sk		= NULL;
 	new->dev	= old->dev;
+	new->queue_mapping = old->queue_mapping;
 	new->priority	= old->priority;
 	new->protocol	= old->protocol;
 	new->dst	= dst_clone(old->dst);
@@ -1925,6 +1927,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
 		tail = nskb;
 
 		nskb->dev = skb->dev;
+		nskb->queue_mapping = skb->queue_mapping;
 		nskb->priority = skb->priority;
 		nskb->protocol = skb->protocol;
 		nskb->dst = dst_clone(skb->dst);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 0ac2524..87a509c 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -316,9 +316,10 @@ void ether_setup(struct net_device *dev)
 EXPORT_SYMBOL(ether_setup);
 
 /**
- * alloc_etherdev - Allocates and sets up an Ethernet device
+ * alloc_etherdev_mq - Allocates and sets up an Ethernet device
  * @sizeof_priv: Size of additional driver-private structure to be allocated
  *	for this Ethernet device
+ * @queue_count: The number of queues this device has.
  *
  * Fill in the fields of the device structure with Ethernet-generic
  * values. Basically does everything except registering the device.
@@ -328,8 +329,8 @@ EXPORT_SYMBOL(ether_setup);
  * this private data area.
  */
 
-struct net_device *alloc_etherdev(int sizeof_priv)
+struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count)
 {
-	return alloc_netdev(sizeof_priv, "eth%d", ether_setup);
+	return alloc_netdev_mq(sizeof_priv, "eth%d", ether_setup, queue_count);
 }
-EXPORT_SYMBOL(alloc_etherdev);
+EXPORT_SYMBOL(alloc_etherdev_mq);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-21 21:26 [PATCH] NET: Multiple queue hardware support PJ Waskiewicz
  2007-06-21 21:26 ` [PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation PJ Waskiewicz
  2007-06-21 21:26 ` [PATCH 2/3] NET: [CORE] Stack changes to add multiqueue hardware support API PJ Waskiewicz
@ 2007-06-21 21:26 ` PJ Waskiewicz
  2007-06-21 23:47   ` Patrick McHardy
  2007-06-21 21:31 ` [PATCH] NET: Multiple queue hardware support Patrick McHardy
  3 siblings, 1 reply; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-21 21:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, kaber, hadi

Add the new sch_rr qdisc for multiqueue network device support.
Allow sch_prio to be compiled with or without multiqueue hardware
support.

sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS.  This
was done since sch_prio and sch_rr only differ in their dequeue routine.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---

 net/sched/Kconfig       |   32 ++++++++++++
 net/sched/sch_generic.c |    3 +
 net/sched/sch_prio.c    |  123 ++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 150 insertions(+), 8 deletions(-)

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 475df84..ca0b352 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -102,8 +102,16 @@ config NET_SCH_ATM
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_atm.
 
+config NET_SCH_BANDS
+        bool "Multi Band Queueing (PRIO and RR)"
+        ---help---
+          Say Y here if you want to use n-band multiqueue packet
+          schedulers.  These include a priority-based scheduler and
+	   a round-robin scheduler.
+
 config NET_SCH_PRIO
 	tristate "Multi Band Priority Queueing (PRIO)"
+	depends on NET_SCH_BANDS
 	---help---
 	  Say Y here if you want to use an n-band priority queue packet
 	  scheduler.
@@ -111,6 +119,30 @@ config NET_SCH_PRIO
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_prio.
 
+config NET_SCH_PRIO_MQ
+	bool "Multiple hardware queue support for PRIO"
+	depends on NET_SCH_PRIO
+	---help---
+	  Say Y here if you want to allow the PRIO qdisc to assign
+	  flows to multiple hardware queues on an ethernet device.  This
+	  will still work on devices with 1 queue.
+
+	  Consider this scheduler for devices that do not use
+	  hardware-based scheduling policies.  Otherwise, use NET_SCH_RR.
+
+	  Most people will say N here.
+
+config NET_SCH_RR
+	bool "Multi Band Round Robin Queuing (RR)"
+	depends on NET_SCH_BANDS && NET_SCH_PRIO
+	---help---
+	  Say Y here if you want to use an n-band round robin packet
+	  scheduler.
+
+	  The module uses sch_prio for its framework and is aliased as
+	  sch_rr, so it will load sch_prio, although it is referred
+	  to using sch_rr.
+
 config NET_SCH_RED
 	tristate "Random Early Detection (RED)"
 	---help---
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 9461e8a..203d5c4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -168,7 +168,8 @@ static inline int qdisc_restart(struct net_device *dev)
 	spin_unlock(&dev->queue_lock);
 
 	ret = NETDEV_TX_BUSY;
-	if (!netif_queue_stopped(dev))
+	if (!netif_queue_stopped(dev) &&
+	    !netif_subqueue_stopped(dev, skb->queue_mapping))
 		/* churn baby churn .. */
 		ret = dev_hard_start_xmit(skb, dev);
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 6d7542c..4eb3ba5 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -9,6 +9,8 @@
  * Authors:	Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
  * Fixes:       19990609: J Hadi Salim <hadi@nortelnetworks.com>:
  *              Init --  EINVAL when opt undefined
+ * Additions:	Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
+ *		Added round-robin scheduling for selection at load-time
  */
 
 #include <linux/module.h>
@@ -40,9 +42,13 @@
 struct prio_sched_data
 {
 	int bands;
+#ifdef CONFIG_NET_SCH_RR
+	int curband; /* for round-robin */
+#endif
 	struct tcf_proto *filter_list;
 	u8  prio2band[TC_PRIO_MAX+1];
 	struct Qdisc *queues[TCQ_PRIO_BANDS];
+	u16 band2queue[TC_PRIO_MAX + 1];
 };
 
 
@@ -70,14 +76,19 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 #endif
 			if (TC_H_MAJ(band))
 				band = 0;
+			skb->queue_mapping =
+				q->band2queue[q->prio2band[band&TC_PRIO_MAX]];
 			return q->queues[q->prio2band[band&TC_PRIO_MAX]];
 		}
 		band = res.classid;
 	}
 	band = TC_H_MIN(band) - 1;
-	if (band >= q->bands)
+	if (band >= q->bands) {
+ 		skb->queue_mapping = q->band2queue[q->prio2band[0]];
 		return q->queues[q->prio2band[0]];
+	}
 
+ 	skb->queue_mapping = q->band2queue[band];
 	return q->queues[band];
 }
 
@@ -144,17 +155,59 @@ prio_dequeue(struct Qdisc* sch)
 	struct Qdisc *qdisc;
 
 	for (prio = 0; prio < q->bands; prio++) {
-		qdisc = q->queues[prio];
-		skb = qdisc->dequeue(qdisc);
-		if (skb) {
-			sch->q.qlen--;
-			return skb;
+		/* Check if the target subqueue is available before
+		 * pulling an skb.  This way we avoid excessive requeues
+		 * for slower queues.
+		 */
+		if (!netif_subqueue_stopped(sch->dev, q->band2queue[prio])) {
+			qdisc = q->queues[prio];
+			skb = qdisc->dequeue(qdisc);
+			if (skb) {
+				sch->q.qlen--;
+				return skb;
+			}
 		}
 	}
 	return NULL;
 
 }
 
+#ifdef CONFIG_NET_SCH_RR
+static struct sk_buff *rr_dequeue(struct Qdisc* sch)
+{
+	struct sk_buff *skb;
+	struct prio_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *qdisc;
+	int bandcount;
+
+	/* Only take one pass through the queues.  If nothing is available,
+	 * return nothing.
+	 */
+	for (bandcount = 0; bandcount < q->bands; bandcount++) {
+		/* Check if the target subqueue is available before
+		 * pulling an skb.  This way we avoid excessive requeues
+		 * for slower queues.  If the queue is stopped, try the
+		 * next queue.
+		 */
+		if (!netif_subqueue_stopped(sch->dev, q->band2queue[q->curband])) {
+			qdisc = q->queues[q->curband];
+			skb = qdisc->dequeue(qdisc);
+			if (skb) {
+				sch->q.qlen--;
+				q->curband++;
+				if (q->curband >= q->bands)
+					q->curband = 0;
+				return skb;
+			}
+		}
+		q->curband++;
+		if (q->curband >= q->bands)
+			q->curband = 0;
+	}
+	return NULL;
+}
+#endif
+
 static unsigned int prio_drop(struct Qdisc* sch)
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
@@ -200,6 +253,7 @@ static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
 	struct prio_sched_data *q = qdisc_priv(sch);
 	struct tc_prio_qopt *qopt = RTA_DATA(opt);
 	int i;
+	int queue;
 
 	if (opt->rta_len < RTA_LENGTH(sizeof(*qopt)))
 		return -EINVAL;
@@ -211,6 +265,22 @@ static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
 			return -EINVAL;
 	}
 
+	/* If we're prio multiqueue or are using round-robin, make
+	 * sure the number of incoming bands matches the number of
+	 * queues on the device we're associating with.
+	 */
+#ifdef CONFIG_NET_SCH_RR
+	if (strcmp("rr", sch->ops->id) == 0)
+		if (qopt->bands != sch->dev->egress_subqueue_count)
+			return -EINVAL;
+#endif
+
+#ifdef CONFIG_NET_SCH_PRIO_MQ
+	if (strcmp("prio", sch->ops->id) == 0)
+		if (qopt->bands != sch->dev->egress_subqueue_count)
+			return -EINVAL;
+#endif
+
 	sch_tree_lock(sch);
 	q->bands = qopt->bands;
 	memcpy(q->prio2band, qopt->priomap, TC_PRIO_MAX+1);
@@ -242,6 +312,18 @@ static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
 			}
 		}
 	}
+
+	/* setup queue to band mapping */
+	for (i = 0, queue = 0; i < q->bands; i++, queue++)
+		q->band2queue[i] = queue;
+
+#ifndef CONFIG_NET_SCH_PRIO_MQ
+	/* for non-mq prio */
+	if (strcmp("prio", sch->ops->id) == 0)
+		for (i = 0; i < q->bands; i++)
+			q->band2queue[i] = 0;
+#endif
+
 	return 0;
 }
 
@@ -443,17 +525,44 @@ static struct Qdisc_ops prio_qdisc_ops = {
 	.owner		=	THIS_MODULE,
 };
 
+#ifdef CONFIG_NET_SCH_RR
+static struct Qdisc_ops rr_qdisc_ops = {
+	.next		=	NULL,
+	.cl_ops		=	&prio_class_ops,
+	.id		=	"rr",
+	.priv_size	=	sizeof(struct prio_sched_data),
+	.enqueue	=	prio_enqueue,
+	.dequeue	=	rr_dequeue,
+	.requeue	=	prio_requeue,
+	.drop		=	prio_drop,
+	.init		=	prio_init,
+	.reset		=	prio_reset,
+	.destroy	=	prio_destroy,
+	.change		=	prio_tune,
+	.dump		=	prio_dump,
+	.owner		=	THIS_MODULE,
+};
+#endif
+
 static int __init prio_module_init(void)
 {
-	return register_qdisc(&prio_qdisc_ops);
+	register_qdisc(&prio_qdisc_ops);
+#ifdef CONFIG_NET_SCH_RR
+	register_qdisc(&rr_qdisc_ops);
+#endif
+	return 0;
 }
 
 static void __exit prio_module_exit(void)
 {
 	unregister_qdisc(&prio_qdisc_ops);
+#ifdef CONFIG_NET_SCH_RR
+	unregister_qdisc(&rr_qdisc_ops);
+#endif
 }
 
 module_init(prio_module_init)
 module_exit(prio_module_exit)
 
 MODULE_LICENSE("GPL");
+MODULE_ALIAS("sch_rr");

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] NET: Multiple queue hardware support
  2007-06-21 21:26 [PATCH] NET: Multiple queue hardware support PJ Waskiewicz
                   ` (2 preceding siblings ...)
  2007-06-21 21:26 ` [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue PJ Waskiewicz
@ 2007-06-21 21:31 ` Patrick McHardy
  2007-06-21 23:27   ` Waskiewicz Jr, Peter P
  3 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2007-06-21 21:31 UTC (permalink / raw)
  To: PJ Waskiewicz; +Cc: davem, netdev, jeff, auke-jan.h.kok, hadi

PJ Waskiewicz wrote:
> I did not modify other users of netif_queue_stopped() in net/core/netpoll.c,
> net/core/dev.c, or net/core/pktgen.c, since no classification occurs for
> the skb being sent to the device.  Therefore, packets should always be
> ending up in queue 0, so there's no need to check the subqueue status either.
>   

Thats not correct. Subqueue 0 may be full and the queue still running.

I'll look over the patches later.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] NET: Multiple queue hardware support
  2007-06-21 21:31 ` [PATCH] NET: Multiple queue hardware support Patrick McHardy
@ 2007-06-21 23:27   ` Waskiewicz Jr, Peter P
  0 siblings, 0 replies; 20+ messages in thread
From: Waskiewicz Jr, Peter P @ 2007-06-21 23:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

> PJ Waskiewicz wrote:
> > I did not modify other users of netif_queue_stopped() in 
> > net/core/netpoll.c, net/core/dev.c, or net/core/pktgen.c, since no 
> > classification occurs for the skb being sent to the device.  
> > Therefore, packets should always be ending up in queue 0, 
> so there's no need to check the subqueue status either.
> >   
> 
> Thats not correct. Subqueue 0 may be full and the queue still running.
> 
> I'll look over the patches later.

I'm working something up to address this.  The last time I thought about
this, I had issues with software devices, such as loopback.  They
weren't allocating any subqueues at all, so they would call
netif_subqueue_stopped() and panic the kernel.  However, now with Dave's
request to index egress_subqueue, the first queue is allocated for
everyone, so loopback and other software devices should be happy.  Let
me put these checks back in, test it out, and resend if I don't see any
issues.

Sorry for the thrash,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-21 21:26 ` [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue PJ Waskiewicz
@ 2007-06-21 23:47   ` Patrick McHardy
  2007-06-22  0:01     ` Waskiewicz Jr, Peter P
  2007-06-22 18:00     ` Waskiewicz Jr, Peter P
  0 siblings, 2 replies; 20+ messages in thread
From: Patrick McHardy @ 2007-06-21 23:47 UTC (permalink / raw)
  To: PJ Waskiewicz; +Cc: davem, netdev, jeff, auke-jan.h.kok, hadi

PJ Waskiewicz wrote:
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index 475df84..ca0b352 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -102,8 +102,16 @@ config NET_SCH_ATM
>  	  To compile this code as a module, choose M here: the
>  	  module will be called sch_atm.
>  
> +config NET_SCH_BANDS
> +        bool "Multi Band Queueing (PRIO and RR)"
> +        ---help---
> +          Say Y here if you want to use n-band multiqueue packet
> +          schedulers.  These include a priority-based scheduler and
> +	   a round-robin scheduler.
> +
>  config NET_SCH_PRIO
>  	tristate "Multi Band Priority Queueing (PRIO)"
> +	depends on NET_SCH_BANDS
>  	---help---
>  	  Say Y here if you want to use an n-band priority queue packet
>  	  scheduler.
> @@ -111,6 +119,30 @@ config NET_SCH_PRIO
>  	  To compile this code as a module, choose M here: the
>  	  module will be called sch_prio.
>  
> +config NET_SCH_PRIO_MQ
> +	bool "Multiple hardware queue support for PRIO"
> +	depends on NET_SCH_PRIO
> +	---help---
> +	  Say Y here if you want to allow the PRIO qdisc to assign
> +	  flows to multiple hardware queues on an ethernet device.  This
> +	  will still work on devices with 1 queue.
> +
> +	  Consider this scheduler for devices that do not use
> +	  hardware-based scheduling policies.  Otherwise, use NET_SCH_RR.
> +
> +	  Most people will say N here.
> +
> +config NET_SCH_RR
> +	bool "Multi Band Round Robin Queuing (RR)"
> +	depends on NET_SCH_BANDS && NET_SCH_PRIO
> +	---help---
> +	  Say Y here if you want to use an n-band round robin packet
> +	  scheduler.
> +
> +	  The module uses sch_prio for its framework and is aliased as
> +	  sch_rr, so it will load sch_prio, although it is referred
> +	  to using sch_rr.
>   

The dependencies seem to be very confused. SCHED_PRIO does not depend
on anything new, SCH_RR also doesn't depend on anything. SCH_PRIO_MQ
and SCH_RR_MQ (which is missing) depend on SCH_PRIO/SCH_RR. A single
NET_SCH_MULTIQUEUE option seems better than adding one per scheduler
though.

> --- a/net/sched/sch_prio.c
> +++ b/net/sched/sch_prio.c
> @@ -9,6 +9,8 @@
>   * Authors:	Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
>   * Fixes:       19990609: J Hadi Salim <hadi@nortelnetworks.com>:
>   *              Init --  EINVAL when opt undefined
> + * Additions:	Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
> + *		Added round-robin scheduling for selection at load-time
>   

git keeps changelogs, please don't add it here.

>   */
>  
>  #include <linux/module.h>
> @@ -40,9 +42,13 @@
>  struct prio_sched_data
>  {
>  	int bands;
> +#ifdef CONFIG_NET_SCH_RR
> +	int curband; /* for round-robin */
> +#endif
>  	struct tcf_proto *filter_list;
>  	u8  prio2band[TC_PRIO_MAX+1];
>  	struct Qdisc *queues[TCQ_PRIO_BANDS];
> +	u16 band2queue[TC_PRIO_MAX + 1];
>   

Why is this still here? Its a 1:1 mapping.
> @@ -211,6 +265,22 @@ static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
>  			return -EINVAL;
>  	}
>  
> +	/* If we're prio multiqueue or are using round-robin, make
> +	 * sure the number of incoming bands matches the number of
> +	 * queues on the device we're associating with.
> +	 */
> +#ifdef CONFIG_NET_SCH_RR
> +	if (strcmp("rr", sch->ops->id) == 0)
> +		if (qopt->bands != sch->dev->egress_subqueue_count)
> +			return -EINVAL;
> +#endif
> +
> +#ifdef CONFIG_NET_SCH_PRIO_MQ
> +	if (strcmp("prio", sch->ops->id) == 0)
> +		if (qopt->bands != sch->dev->egress_subqueue_count)
> +			return -EINVAL;
> +#endif
>   

For the tenth time now, the user should enable this at
runtime. You can't just break things dependant on config
options.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-21 23:47   ` Patrick McHardy
@ 2007-06-22  0:01     ` Waskiewicz Jr, Peter P
  2007-06-22  0:26       ` Patrick McHardy
  2007-06-22 18:00     ` Waskiewicz Jr, Peter P
  1 sibling, 1 reply; 20+ messages in thread
From: Waskiewicz Jr, Peter P @ 2007-06-22  0:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

> The dependencies seem to be very confused. SCHED_PRIO does 
> not depend on anything new, SCH_RR also doesn't depend on 
> anything. SCH_PRIO_MQ and SCH_RR_MQ (which is missing) depend 
> on SCH_PRIO/SCH_RR. A single NET_SCH_MULTIQUEUE option seems 
> better than adding one per scheduler though.

I agree with a NET_SCH_MULTIQUEUE option.  However, SCH_RR does depend
on SCH_PRIO being built since it's the same code, doesn't it?  Maybe I'm
not understanding something about the build process.  I'll clean this
up.

> 
> > --- a/net/sched/sch_prio.c
> > +++ b/net/sched/sch_prio.c
> > @@ -9,6 +9,8 @@
> >   * Authors:	Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
> >   * Fixes:       19990609: J Hadi Salim <hadi@nortelnetworks.com>:
> >   *              Init --  EINVAL when opt undefined
> > + * Additions:	Peter P. Waskiewicz Jr. 
> <peter.p.waskiewicz.jr@intel.com>
> > + *		Added round-robin scheduling for selection at load-time
> >   
> 
> git keeps changelogs, please don't add it here.

Roger.

> >  	struct tcf_proto *filter_list;
> >  	u8  prio2band[TC_PRIO_MAX+1];
> >  	struct Qdisc *queues[TCQ_PRIO_BANDS];
> > +	u16 band2queue[TC_PRIO_MAX + 1];
> >   
> 
> Why is this still here? Its a 1:1 mapping.

I'll fix this.

> > @@ -211,6 +265,22 @@ static int prio_tune(struct Qdisc 
> *sch, struct rtattr *opt)
> >  			return -EINVAL;
> >  	}
> >  
> > +	/* If we're prio multiqueue or are using round-robin, make
> > +	 * sure the number of incoming bands matches the number of
> > +	 * queues on the device we're associating with.
> > +	 */
> > +#ifdef CONFIG_NET_SCH_RR
> > +	if (strcmp("rr", sch->ops->id) == 0)
> > +		if (qopt->bands != sch->dev->egress_subqueue_count)
> > +			return -EINVAL;
> > +#endif
> > +
> > +#ifdef CONFIG_NET_SCH_PRIO_MQ
> > +	if (strcmp("prio", sch->ops->id) == 0)
> > +		if (qopt->bands != sch->dev->egress_subqueue_count)
> > +			return -EINVAL;
> > +#endif
> >   
> 
> For the tenth time now, the user should enable this at 
> runtime. You can't just break things dependant on config options.

I had this in sch_prio and tc before, and was told to remove it because
of ABI issues.  I can put it back in, but I'm not sure what those
previous ABI issues were.  Was it backwards compatibility that you
referred to before that was broken?

As always, the feedback is very much appreciated.  I'll get these fixes
in as soon as possible.

-PJ

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-22  0:01     ` Waskiewicz Jr, Peter P
@ 2007-06-22  0:26       ` Patrick McHardy
  0 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2007-06-22  0:26 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

Waskiewicz Jr, Peter P wrote:
>> The dependencies seem to be very confused. SCHED_PRIO does 
>> not depend on anything new, SCH_RR also doesn't depend on 
>> anything. SCH_PRIO_MQ and SCH_RR_MQ (which is missing) depend 
>> on SCH_PRIO/SCH_RR. A single NET_SCH_MULTIQUEUE option seems 
>> better than adding one per scheduler though.
>>     
>
> I agree with a NET_SCH_MULTIQUEUE option.  However, SCH_RR does depend
> on SCH_PRIO being built since it's the same code, doesn't it?  Maybe I'm
> not understanding something about the build process.  I'll clean this
> up.

The easiest solution is to select SCH_PRIO from SCH_RR.
I head something else in mind initially but that is
needlessly complicated.

>>
>> For the tenth time now, the user should enable this at 
>> runtime. You can't just break things dependant on config options.
>>     
>
> I had this in sch_prio and tc before, and was told to remove it because
> of ABI issues.  I can put it back in, but I'm not sure what those
> previous ABI issues were.  Was it backwards compatibility that you
> referred to before that was broken?

Your tc changes changed the structure in a way that old tc binaries
wouldn't work anymore. This version breaks configurations that use
a number of bands not matching the HW queues when the user enables
the multiqueue compile time option.

Unfortunately prio does not use nested attributes, so the easiest
way is extending struct tc_prio_qopt at the end and checking the
attribute size to decide whether its an old or a new version.

A better fix would be to introduce a new qdisc configuration
attribute that takes precedence before TCA_OPTIONS and have
userspace send both the old non-nested structure and a new
nested configuration. That would make sure we never run into
this problem again.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-21 23:47   ` Patrick McHardy
  2007-06-22  0:01     ` Waskiewicz Jr, Peter P
@ 2007-06-22 18:00     ` Waskiewicz Jr, Peter P
  2007-06-22 18:42       ` Patrick McHardy
  1 sibling, 1 reply; 20+ messages in thread
From: Waskiewicz Jr, Peter P @ 2007-06-22 18:00 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

> >  #include <linux/module.h>
> > @@ -40,9 +42,13 @@
> >  struct prio_sched_data
> >  {
> >  	int bands;
> > +#ifdef CONFIG_NET_SCH_RR
> > +	int curband; /* for round-robin */
> > +#endif
> >  	struct tcf_proto *filter_list;
> >  	u8  prio2band[TC_PRIO_MAX+1];
> >  	struct Qdisc *queues[TCQ_PRIO_BANDS];
> > +	u16 band2queue[TC_PRIO_MAX + 1];
> >   
> 
> Why is this still here? Its a 1:1 mapping.

Thought about this more last night and this morning.  As far as I can
tell, I still need this.  If the qdisc gets loaded with multiqueue
turned on, I can just use the value of band to assign
skb->queue_mapping.  But if the qdisc is loaded without multiqueue
support, then I need to assign a value of zero to queue_mapping, or not
assign it at all (it will be zero'd out before the call to ->enqueue()
in dev_queue_xmit()).  But I'd rather not have a conditional in the
hotpath checking if the qdisc is multiqueue; I'd rather have the array
to match the bands so I can just do an assignment.

What do you think?

Thanks,
-PJ

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-22 18:00     ` Waskiewicz Jr, Peter P
@ 2007-06-22 18:42       ` Patrick McHardy
  2007-06-22 18:44         ` Patrick McHardy
  2007-06-22 18:53         ` Patrick McHardy
  0 siblings, 2 replies; 20+ messages in thread
From: Patrick McHardy @ 2007-06-22 18:42 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

Waskiewicz Jr, Peter P wrote:
>>> #include <linux/module.h>
>>>@@ -40,9 +42,13 @@
>>> struct prio_sched_data
>>> {
>>> 	int bands;
>>>+#ifdef CONFIG_NET_SCH_RR
>>>+	int curband; /* for round-robin */
>>>+#endif
>>> 	struct tcf_proto *filter_list;
>>> 	u8  prio2band[TC_PRIO_MAX+1];
>>> 	struct Qdisc *queues[TCQ_PRIO_BANDS];
>>>+	u16 band2queue[TC_PRIO_MAX + 1];
>>>  
>>
>>Why is this still here? Its a 1:1 mapping.
> 
> 
> Thought about this more last night and this morning.  As far as I can
> tell, I still need this.  If the qdisc gets loaded with multiqueue
> turned on, I can just use the value of band to assign
> skb->queue_mapping.  But if the qdisc is loaded without multiqueue
> support, then I need to assign a value of zero to queue_mapping, or not
> assign it at all (it will be zero'd out before the call to ->enqueue()
> in dev_queue_xmit()).  But I'd rather not have a conditional in the
> hotpath checking if the qdisc is multiqueue; I'd rather have the array
> to match the bands so I can just do an assignment.
> 
> What do you think?


I very much doubt that it has any measurable impact. You can
also add a small inline function

void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)
{
#ifdef CONFIG_NET_SCH_MULTIQUEUE
	skb->queue_mapping = queue;
#else
	skb->queue_mapping = 0;
#endif
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-22 18:42       ` Patrick McHardy
@ 2007-06-22 18:44         ` Patrick McHardy
  2007-06-22 18:53         ` Patrick McHardy
  1 sibling, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2007-06-22 18:44 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Waskiewicz Jr, Peter P, davem, netdev, jeff, Kok, Auke-jan H,
	hadi

Patrick McHardy wrote:
> void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)
> {
> #ifdef CONFIG_NET_SCH_MULTIQUEUE
> 	skb->queue_mapping = queue;
> #else
> 	skb->queue_mapping = 0;
> #endif


Maybe even use it everywhere and guard skb->queue_mapping by
an #ifdef, on 32 bit it does enlarge the skb.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-22 18:42       ` Patrick McHardy
  2007-06-22 18:44         ` Patrick McHardy
@ 2007-06-22 18:53         ` Patrick McHardy
  2007-06-22 21:03           ` Waskiewicz Jr, Peter P
  1 sibling, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2007-06-22 18:53 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

Patrick McHardy wrote:
> Waskiewicz Jr, Peter P wrote:
> 
>>Thought about this more last night and this morning.  As far as I can
>>tell, I still need this.  If the qdisc gets loaded with multiqueue
>>turned on, I can just use the value of band to assign
>>skb->queue_mapping.  But if the qdisc is loaded without multiqueue
>>support, then I need to assign a value of zero to queue_mapping, or not
>>assign it at all (it will be zero'd out before the call to ->enqueue()
>>in dev_queue_xmit()).  But I'd rather not have a conditional in the
>>hotpath checking if the qdisc is multiqueue; I'd rather have the array
>>to match the bands so I can just do an assignment.
>>
>>What do you think?
> 
> 
> 
> I very much doubt that it has any measurable impact. You can
> also add a small inline function
> 
> void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)


OK I didn't really listen obviously :) A compile time option
won't help. Just remove it and assign it conditionally.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
  2007-06-22 18:53         ` Patrick McHardy
@ 2007-06-22 21:03           ` Waskiewicz Jr, Peter P
  0 siblings, 0 replies; 20+ messages in thread
From: Waskiewicz Jr, Peter P @ 2007-06-22 21:03 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, jeff, Kok, Auke-jan H, hadi

> Patrick McHardy wrote:
> > Waskiewicz Jr, Peter P wrote:
> > 
> >>Thought about this more last night and this morning.  As 
> far as I can 
> >>tell, I still need this.  If the qdisc gets loaded with multiqueue 
> >>turned on, I can just use the value of band to assign
> >>skb->queue_mapping.  But if the qdisc is loaded without multiqueue
> >>support, then I need to assign a value of zero to queue_mapping, or 
> >>not assign it at all (it will be zero'd out before the call to 
> >>->enqueue() in dev_queue_xmit()).  But I'd rather not have a 
> >>conditional in the hotpath checking if the qdisc is multiqueue; I'd 
> >>rather have the array to match the bands so I can just do 
> an assignment.
> >>
> >>What do you think?
> > 
> > 
> > 
> > I very much doubt that it has any measurable impact. You 
> can also add 
> > a small inline function
> > 
> > void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)
> 
> 
> OK I didn't really listen obviously :) A compile time option 
> won't help. Just remove it and assign it conditionally.

Sounds good.  Thanks Patrick.

-PJ

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] NET: Multiple queue hardware support
@ 2007-06-23 21:36 PJ Waskiewicz
  0 siblings, 0 replies; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-23 21:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, hadi, kaber

Please consider these patches for 2.6.23 inclusion.

These patches are built against Patrick McHardy's recently submitted
RTNETLINK nested compat attribute patches.  They're needed to preserve
ABI between sch_{rr|prio} and iproute2.

Updates since the last submission:

1. Added checks for netif_subqueue_stopped() to net/core/netpoll.c,
   net/core/pktgen.c, and to software device hard_start_xmit in
   dev_queue_xmit().

2. Removed TCA_PRIO_TEST and added TCA_PRIO_MQ for sch_prio and sch_rr.

3. Fixed dependancy issues in net/sched/Kconfig with NET_SCH_RR.

4. Implemented the new nested compat attribute API for MQ in NET_SCH_PRIO
   and NET_SCH_RR.

5. Allow sch_rr and sch_prio to turn multiqueue hardware support on and off
   at loadtime.

This patchset is an updated version of previous multiqueue network device
support patches.  The general approach of introducing a new API for multiqueue
network devices to register with the stack has remained.  The changes include
adding a round-robin qdisc, heavily based on sch_prio, which will allow
queueing to hardware with no OS-enforced queuing policy.  sch_prio still has
the multiqueue code in it, but has a Kconfig option to compile it out of the
qdisc.  This allows people with hardware containing scheduling policies to
use sch_rr (round-robin), and others without scheduling policies in hardware
to continue using sch_prio if they wish to have some notion of scheduling
priority.

The patches being sent are split into Documentation, Qdisc changes, and
core stack changes.  The requested e1000 changes are still being resolved,
and will be sent at a later date.

The patches to iproute2 for tc will be sent separately, to support sch_rr.

-- 
PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] NET: Multiple queue hardware support
@ 2007-06-28 16:20 PJ Waskiewicz
  0 siblings, 0 replies; 20+ messages in thread
From: PJ Waskiewicz @ 2007-06-28 16:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, jeff, auke-jan.h.kok, hadi, kaber

Please consider these patches for 2.6.23 inclusion.

Updates since the last submission:

1. Fixed alloc_netdev_mq() queue_count bug.

2. Fixed the TCA_PRIO_MQ options layout.

3. Protected sch_prio and sch_rr multiqueue code with NET_SCH_MULTIQUEUE.

4. Added RTA_{GET|PUT}_FLAG in place of RTA_DATA for passing multiqueue
   options to and from the qdisc.

5. Allow sch_prio and sch_rr to take 0 bands when in multiqueue mode.  This
   will set q->bands to dev->egress_subqueue_count; added this also to the
   kernel doc.

This patchset is an updated version of previous multiqueue network device
support patches.  The general approach of introducing a new API for multiqueue
network devices to register with the stack has remained.  The changes include
adding a round-robin qdisc, heavily based on sch_prio, which will allow
queueing to hardware with no OS-enforced queuing policy.  sch_prio still has
the multiqueue code in it, but has a Kconfig option to compile it out of the
qdisc.  This allows people with hardware containing scheduling policies to
use sch_rr (round-robin), and others without scheduling policies in hardware
to continue using sch_prio if they wish to have some notion of scheduling
priority.

The patches being sent are split into Documentation, Qdisc changes, and
core stack changes.

The patches to iproute2 for tc will be sent separately, to support sch_rr.

-- 
PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-06-28 16:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-21 21:26 [PATCH] NET: Multiple queue hardware support PJ Waskiewicz
2007-06-21 21:26 ` [PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation PJ Waskiewicz
2007-06-21 21:26 ` [PATCH 2/3] NET: [CORE] Stack changes to add multiqueue hardware support API PJ Waskiewicz
2007-06-21 21:26 ` [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue PJ Waskiewicz
2007-06-21 23:47   ` Patrick McHardy
2007-06-22  0:01     ` Waskiewicz Jr, Peter P
2007-06-22  0:26       ` Patrick McHardy
2007-06-22 18:00     ` Waskiewicz Jr, Peter P
2007-06-22 18:42       ` Patrick McHardy
2007-06-22 18:44         ` Patrick McHardy
2007-06-22 18:53         ` Patrick McHardy
2007-06-22 21:03           ` Waskiewicz Jr, Peter P
2007-06-21 21:31 ` [PATCH] NET: Multiple queue hardware support Patrick McHardy
2007-06-21 23:27   ` Waskiewicz Jr, Peter P
  -- strict thread matches above, loose matches on Subject: below --
2007-06-28 16:20 PJ Waskiewicz
2007-06-23 21:36 PJ Waskiewicz
2007-06-20 14:58 Jan-Bernd Themann
2007-06-20 17:21 ` Waskiewicz Jr, Peter P
2007-06-20 21:51 ` David Miller
2007-06-18 18:42 PJ Waskiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).