netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
@ 2014-03-03 11:47 Andrew J. Bennieston
  2014-03-03 11:47 ` [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct Andrew J. Bennieston
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Andrew J. Bennieston @ 2014-03-03 11:47 UTC (permalink / raw)
  To: xen-devel; +Cc: netdev, paul.durrant, wei.liu2, ian.campbell, david.vrabel


This patch series implements multiple transmit and receive queues (i.e.
multiple shared rings) for the xen virtual network interfaces.

The series is split up as follows:
 - Patches 1 and 3 factor out the queue-specific data for netback and
    netfront respectively, and modify the rest of the code to use these
    as appropriate.
 - Patches 2 and 4 introduce new XenStore keys to negotiate and use
   multiple shared rings and event channels, and code to connect these
   as appropriate.
 - Patch 5 documents the XenStore keys required for the new feature
   in include/xen/interface/io/netif.h

All other transmit and receive processing remains unchanged, i.e. there
is a kthread per queue and a NAPI context per queue.

The performance of these patches has been analysed in detail, with
results available at:

http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing

To summarise:
  * Using multiple queues allows a VM to transmit at line rate on a 10
    Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
    with a single queue.
  * For intra-host VM--VM traffic, eight queues provide 171% of the
    throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
  * There is a corresponding increase in total CPU usage, i.e. this is a
    scaling out over available resources, not an efficiency improvement.
  * Results depend on the availability of sufficient CPUs, as well as the
    distribution of interrupts and the distribution of TCP streams across
    the queues.

Queue selection is currently achieved via an L4 hash on the packet (i.e.
TCP src/dst port, IP src/dst address) and is not negotiated between the
frontend and backend, since only one option exists. Future patches to
support other frontends (particularly Windows) will need to add some
capability to negotiate not only the hash algorithm selection, but also
allow the frontend to specify some parameters to this.

Note that queue selection is a decision by the transmitting system about
which queue to use for a particular packet. In general, the algorithm
may differ between the frontend and the backend with no adverse effects.

Queue-specific XenStore entries for ring references and event channels
are stored hierarchically, i.e. under .../queue-N/... where N varies
from 0 to one less than the requested number of queues (inclusive). If
only one queue is requested, it falls back to the flat structure where
the ring references and event channels are written at the same level as
other vif information.

V6:
- Use 'max_queues' as the module param. name for both netback and netfront.

V5:
- Fix bug in xenvif_free() that could lead to an attempt to transmit an
  skb after the queue structures had been freed.
- Improve the XenStore protocol documentation in netif.h.
- Fix IRQ_NAME_SIZE double-accounting for null terminator.
- Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
- Don't initialise a local variable that is set in both branches (xspath).

V4:
- Add MODULE_PARM_DESC() for the multi-queue parameters for netback
  and netfront modules.
- Move del_timer_sync() in netfront to after unregister_netdev, which
  restores the order in which these functions were called before applying
  these patches.

V3:
- Further indentation and style fixups.

V2:
- Rebase onto net-next.
- Change queue->number to queue->id.
- Add atomic operations around the small number of stats variables that
  are not queue-specific or per-cpu.
- Fixup formatting and style issues.
- XenStore protocol changes documented in netif.h.
- Default max. number of queues to num_online_cpus().
- Check requested number of queues does not exceed maximum.

--
Andrew J. Bennieston

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct.
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
@ 2014-03-03 11:47 ` Andrew J. Bennieston
  2014-03-14 15:55   ` Ian Campbell
  2014-03-03 11:47 ` [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues Andrew J. Bennieston
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Andrew J. Bennieston @ 2014-03-03 11:47 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, netdev, paul.durrant, david.vrabel,
	Andrew J. Bennieston

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

In preparation for multi-queue support in xen-netback, move the
queue-specific data from struct xenvif into struct xenvif_queue, and
update the rest of the code to use this.

Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_netdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.

Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0 for a single queue and uses
skb_get_hash() to compute the queue index otherwise.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
 drivers/net/xen-netback/common.h    |   85 ++++--
 drivers/net/xen-netback/interface.c |  329 ++++++++++++++--------
 drivers/net/xen-netback/netback.c   |  530 ++++++++++++++++++-----------------
 drivers/net/xen-netback/xenbus.c    |   87 ++++--
 4 files changed, 608 insertions(+), 423 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ae413a2..4176539 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -108,17 +108,39 @@ struct xenvif_rx_meta {
  */
 #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
-struct xenvif {
-	/* Unique identifier for this interface. */
-	domid_t          domid;
-	unsigned int     handle;
+/* Queue name is interface name with "-qNNN" appended */
+#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
+
+/* IRQ name is queue name with "-tx" or "-rx" appended */
+#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
+
+struct xenvif;
+
+struct xenvif_stats {
+	/* Stats fields to be updated per-queue.
+	 * A subset of struct net_device_stats that contains only the
+	 * fields that are updated in netback.c for each queue.
+	 */
+	unsigned int rx_bytes;
+	unsigned int rx_packets;
+	unsigned int tx_bytes;
+	unsigned int tx_packets;
+
+	/* Additional stats used by xenvif */
+	unsigned long rx_gso_checksum_fixup;
+};
+
+struct xenvif_queue { /* Per-queue data for xenvif */
+	unsigned int id; /* Queue ID, 0-based */
+	char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
+	struct xenvif *vif; /* Parent VIF */
 
 	/* Use NAPI for guest TX */
 	struct napi_struct napi;
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
 	unsigned int tx_irq;
 	/* Only used when feature-split-event-channels = 1 */
-	char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
+	char tx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-tx */
 	struct xen_netif_tx_back_ring tx;
 	struct sk_buff_head tx_queue;
 	struct page *mmap_pages[MAX_PENDING_REQS];
@@ -140,19 +162,34 @@ struct xenvif {
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
 	unsigned int rx_irq;
 	/* Only used when feature-split-event-channels = 1 */
-	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
+	char rx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-rx */
 	struct xen_netif_rx_back_ring rx;
 	struct sk_buff_head rx_queue;
 	RING_IDX rx_last_skb_slots;
 
-	/* This array is allocated seperately as it is large */
-	struct gnttab_copy *grant_copy_op;
+	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
 
 	/* We create one meta structure per ring request we consume, so
 	 * the maximum number is the same as the ring size.
 	 */
 	struct xenvif_rx_meta meta[XEN_NETIF_RX_RING_SIZE];
 
+	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
+	unsigned long   credit_bytes;
+	unsigned long   credit_usec;
+	unsigned long   remaining_credit;
+	struct timer_list credit_timeout;
+	u64 credit_window_start;
+
+	/* Statistics */
+	struct xenvif_stats stats;
+};
+
+struct xenvif {
+	/* Unique identifier for this interface. */
+	domid_t          domid;
+	unsigned int     handle;
+
 	u8               fe_dev_addr[6];
 
 	/* Frontend feature information. */
@@ -166,15 +203,9 @@ struct xenvif {
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
 
-	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
-	unsigned long   credit_bytes;
-	unsigned long   credit_usec;
-	unsigned long   remaining_credit;
-	struct timer_list credit_timeout;
-	u64 credit_window_start;
-
-	/* Statistics */
-	unsigned long rx_gso_checksum_fixup;
+	/* Queues */
+	unsigned int num_queues;
+	struct xenvif_queue *queues;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
@@ -189,7 +220,9 @@ struct xenvif *xenvif_alloc(struct device *parent,
 			    domid_t domid,
 			    unsigned int handle);
 
-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
+void xenvif_init_queue(struct xenvif_queue *queue);
+
+int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
 		   unsigned int rx_evtchn);
 void xenvif_disconnect(struct xenvif *vif);
@@ -200,31 +233,31 @@ void xenvif_xenbus_fini(void);
 
 int xenvif_schedulable(struct xenvif *vif);
 
-int xenvif_must_stop_queue(struct xenvif *vif);
+int xenvif_must_stop_queue(struct xenvif_queue *queue);
 
 /* (Un)Map communication rings. */
-void xenvif_unmap_frontend_rings(struct xenvif *vif);
-int xenvif_map_frontend_rings(struct xenvif *vif,
+void xenvif_unmap_frontend_rings(struct xenvif_queue *queue);
+int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 			      grant_ref_t tx_ring_ref,
 			      grant_ref_t rx_ring_ref);
 
 /* Check for SKBs from frontend and schedule backend processing */
-void xenvif_check_rx_xenvif(struct xenvif *vif);
+void xenvif_check_rx_xenvif(struct xenvif_queue *queue);
 
 /* Prevent the device from generating any further traffic. */
 void xenvif_carrier_off(struct xenvif *vif);
 
-int xenvif_tx_action(struct xenvif *vif, int budget);
+int xenvif_tx_action(struct xenvif_queue *queue, int budget);
 
 int xenvif_kthread(void *data);
-void xenvif_kick_thread(struct xenvif *vif);
+void xenvif_kick_thread(struct xenvif_queue *queue);
 
 /* Determine whether the needed number of slots (req) are available,
  * and set req_event if not.
  */
-bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
+bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed);
 
-void xenvif_stop_queue(struct xenvif *vif);
+void xenvif_carrier_on(struct xenvif *vif);
 
 extern bool separate_tx_rx_irq;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7669d49..0297980 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -34,7 +34,6 @@
 #include <linux/ethtool.h>
 #include <linux/rtnetlink.h>
 #include <linux/if_vlan.h>
-#include <linux/vmalloc.h>
 
 #include <xen/events.h>
 #include <asm/xen/hypercall.h>
@@ -42,6 +41,16 @@
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64
 
+static inline void xenvif_stop_queue(struct xenvif_queue *queue)
+{
+	struct net_device *dev = queue->vif->dev;
+
+	if (!queue->vif->can_queue)
+		return;
+
+	netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
+}
+
 int xenvif_schedulable(struct xenvif *vif)
 {
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
@@ -49,20 +58,20 @@ int xenvif_schedulable(struct xenvif *vif)
 
 static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
 {
-	struct xenvif *vif = dev_id;
+	struct xenvif_queue *queue = dev_id;
 
-	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
-		napi_schedule(&vif->napi);
+	if (RING_HAS_UNCONSUMED_REQUESTS(&queue->tx))
+		napi_schedule(&queue->napi);
 
 	return IRQ_HANDLED;
 }
 
-static int xenvif_poll(struct napi_struct *napi, int budget)
+int xenvif_poll(struct napi_struct *napi, int budget)
 {
-	struct xenvif *vif = container_of(napi, struct xenvif, napi);
+	struct xenvif_queue *queue = container_of(napi, struct xenvif_queue, napi);
 	int work_done;
 
-	work_done = xenvif_tx_action(vif, budget);
+	work_done = xenvif_tx_action(queue, budget);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -86,7 +95,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 
 		local_irq_save(flags);
 
-		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
+		RING_FINAL_CHECK_FOR_REQUESTS(&queue->tx, more_to_do);
 		if (!more_to_do)
 			__napi_complete(napi);
 
@@ -98,9 +107,9 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 
 static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
-	struct xenvif *vif = dev_id;
+	struct xenvif_queue *queue = dev_id;
 
-	xenvif_kick_thread(vif);
+	xenvif_kick_thread(queue);
 
 	return IRQ_HANDLED;
 }
@@ -113,15 +122,48 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
+			       void *accel_priv, select_queue_fallback_t fallback)
+{
+	struct xenvif *vif = netdev_priv(dev);
+	u32 hash;
+	u16 queue_index;
+
+	/* First, check if there is only one queue to optimise the
+	 * single-queue or old frontend scenario.
+	 */
+	if (vif->num_queues == 1) {
+		queue_index = 0;
+	} else {
+		/* Use skb_get_hash to obtain an L4 hash if available */
+		hash = skb_get_hash(skb);
+		queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);
+	}
+
+	return queue_index;
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
+	struct xenvif_queue *queue = NULL;
+	u16 index;
 	int min_slots_needed;
 
 	BUG_ON(skb->dev != dev);
 
+	/* Drop the packet if queues are not set up */
+	if (vif->num_queues < 1)
+		goto drop;
+
+	/* Obtain the queue to be used to transmit this packet */
+	index = skb_get_queue_mapping(skb);
+	if (index >= vif->num_queues)
+		index = 0; /* Fall back to queue 0 if out of range */
+	queue = &vif->queues[index];
+
 	/* Drop the packet if vif is not ready */
-	if (vif->task == NULL || !xenvif_schedulable(vif))
+	if (queue->task == NULL || !xenvif_schedulable(vif))
 		goto drop;
 
 	/* At best we'll need one slot for the header and one for each
@@ -140,11 +182,11 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * then turn off the queue to give the ring a chance to
 	 * drain.
 	 */
-	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
-		xenvif_stop_queue(vif);
+	if (!xenvif_rx_ring_slots_available(queue, min_slots_needed))
+		xenvif_stop_queue(queue);
 
-	skb_queue_tail(&vif->rx_queue, skb);
-	xenvif_kick_thread(vif);
+	skb_queue_tail(&queue->rx_queue, skb);
+	xenvif_kick_thread(queue);
 
 	return NETDEV_TX_OK;
 
@@ -157,25 +199,58 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
+	struct xenvif_queue *queue = NULL;
+	unsigned long rx_bytes = 0;
+	unsigned long rx_packets = 0;
+	unsigned long tx_bytes = 0;
+	unsigned long tx_packets = 0;
+	unsigned int index;
+
+	/* Aggregate tx and rx stats from each queue */
+	for (index = 0; index < vif->num_queues; ++index) {
+		queue = &vif->queues[index];
+		rx_bytes += queue->stats.rx_bytes;
+		rx_packets += queue->stats.rx_packets;
+		tx_bytes += queue->stats.tx_bytes;
+		tx_packets += queue->stats.tx_packets;
+	}
+
+	vif->dev->stats.rx_bytes = rx_bytes;
+	vif->dev->stats.rx_packets = rx_packets;
+	vif->dev->stats.tx_bytes = tx_bytes;
+	vif->dev->stats.tx_packets = tx_packets;
+
 	return &vif->dev->stats;
 }
 
 static void xenvif_up(struct xenvif *vif)
 {
-	napi_enable(&vif->napi);
-	enable_irq(vif->tx_irq);
-	if (vif->tx_irq != vif->rx_irq)
-		enable_irq(vif->rx_irq);
-	xenvif_check_rx_xenvif(vif);
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
+
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		napi_enable(&queue->napi);
+		enable_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			enable_irq(queue->rx_irq);
+		xenvif_check_rx_xenvif(queue);
+	}
 }
 
 static void xenvif_down(struct xenvif *vif)
 {
-	napi_disable(&vif->napi);
-	disable_irq(vif->tx_irq);
-	if (vif->tx_irq != vif->rx_irq)
-		disable_irq(vif->rx_irq);
-	del_timer_sync(&vif->credit_timeout);
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
+
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		napi_disable(&queue->napi);
+		disable_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			disable_irq(queue->rx_irq);
+		del_timer_sync(&queue->credit_timeout);
+	}
 }
 
 static int xenvif_open(struct net_device *dev)
@@ -183,7 +258,7 @@ static int xenvif_open(struct net_device *dev)
 	struct xenvif *vif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		xenvif_up(vif);
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 	return 0;
 }
 
@@ -192,7 +267,7 @@ static int xenvif_close(struct net_device *dev)
 	struct xenvif *vif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		xenvif_down(vif);
-	netif_stop_queue(dev);
+	netif_tx_stop_all_queues(dev);
 	return 0;
 }
 
@@ -232,7 +307,7 @@ static const struct xenvif_stat {
 } xenvif_stats[] = {
 	{
 		"rx_gso_checksum_fixup",
-		offsetof(struct xenvif, rx_gso_checksum_fixup)
+		offsetof(struct xenvif_stats, rx_gso_checksum_fixup)
 	},
 };
 
@@ -249,11 +324,19 @@ static int xenvif_get_sset_count(struct net_device *dev, int string_set)
 static void xenvif_get_ethtool_stats(struct net_device *dev,
 				     struct ethtool_stats *stats, u64 * data)
 {
-	void *vif = netdev_priv(dev);
+	struct xenvif *vif = netdev_priv(dev);
 	int i;
-
-	for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++)
-		data[i] = *(unsigned long *)(vif + xenvif_stats[i].offset);
+	unsigned int queue_index;
+	struct xenvif_stats *vif_stats;
+
+	for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
+		unsigned long accum = 0;
+		for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+			vif_stats = &vif->queues[queue_index].stats;
+			accum += *(unsigned long *)(vif_stats + xenvif_stats[i].offset);
+		}
+		data[i] = accum;
+	}
 }
 
 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * data)
@@ -286,6 +369,7 @@ static const struct net_device_ops xenvif_netdev_ops = {
 	.ndo_fix_features = xenvif_fix_features,
 	.ndo_set_mac_address = eth_mac_addr,
 	.ndo_validate_addr   = eth_validate_addr,
+	.ndo_select_queue = xenvif_select_queue,
 };
 
 struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
@@ -295,10 +379,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	struct net_device *dev;
 	struct xenvif *vif;
 	char name[IFNAMSIZ] = {};
-	int i;
 
 	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
-	dev = alloc_netdev(sizeof(struct xenvif), name, ether_setup);
+	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup, 1);
 	if (dev == NULL) {
 		pr_warn("Could not allocate netdev for %s\n", name);
 		return ERR_PTR(-ENOMEM);
@@ -308,24 +391,15 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 
 	vif = netdev_priv(dev);
 
-	vif->grant_copy_op = vmalloc(sizeof(struct gnttab_copy) *
-				     MAX_GRANT_COPY_OPS);
-	if (vif->grant_copy_op == NULL) {
-		pr_warn("Could not allocate grant copy space for %s\n", name);
-		free_netdev(dev);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
 	vif->ip_csum = 1;
 	vif->dev = dev;
 
-	vif->credit_bytes = vif->remaining_credit = ~0UL;
-	vif->credit_usec  = 0UL;
-	init_timer(&vif->credit_timeout);
-	vif->credit_window_start = get_jiffies_64();
+	/* Start out with no queues */
+	vif->num_queues = 0;
+	vif->queues = NULL;
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
@@ -336,16 +410,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
 
-	skb_queue_head_init(&vif->rx_queue);
-	skb_queue_head_init(&vif->tx_queue);
-
-	vif->pending_cons = 0;
-	vif->pending_prod = MAX_PENDING_REQS;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->pending_ring[i] = i;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->mmap_pages[i] = NULL;
-
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
 	 * largest non-broadcast address to prevent the address getting
@@ -355,8 +419,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	memset(dev->dev_addr, 0xFF, ETH_ALEN);
 	dev->dev_addr[0] &= ~0x01;
 
-	netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);
-
 	netif_carrier_off(dev);
 
 	err = register_netdev(dev);
@@ -373,85 +435,111 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	return vif;
 }
 
-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
+void xenvif_init_queue(struct xenvif_queue *queue)
+{
+	int i;
+
+	queue->credit_bytes = queue->remaining_credit = ~0UL;
+	queue->credit_usec  = 0UL;
+	init_timer(&queue->credit_timeout);
+	queue->credit_window_start = get_jiffies_64();
+
+	skb_queue_head_init(&queue->rx_queue);
+	skb_queue_head_init(&queue->tx_queue);
+
+	queue->pending_cons = 0;
+	queue->pending_prod = MAX_PENDING_REQS;
+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
+		queue->pending_ring[i] = i;
+		queue->mmap_pages[i] = NULL;
+	}
+
+	netif_napi_add(queue->vif->dev, &queue->napi, xenvif_poll,
+			XENVIF_NAPI_WEIGHT);
+}
+
+void xenvif_carrier_on(struct xenvif *vif)
+{
+	rtnl_lock();
+	if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN)
+		dev_set_mtu(vif->dev, ETH_DATA_LEN);
+	netdev_update_features(vif->dev);
+	netif_carrier_on(vif->dev);
+	if (netif_running(vif->dev))
+		xenvif_up(vif);
+	rtnl_unlock();
+}
+
+int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
 		   unsigned int rx_evtchn)
 {
 	struct task_struct *task;
 	int err = -ENOMEM;
 
-	BUG_ON(vif->tx_irq);
-	BUG_ON(vif->task);
+	BUG_ON(queue->tx_irq);
+	BUG_ON(queue->task);
 
-	err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
+	err = xenvif_map_frontend_rings(queue, tx_ring_ref, rx_ring_ref);
 	if (err < 0)
 		goto err;
 
-	init_waitqueue_head(&vif->wq);
+	init_waitqueue_head(&queue->wq);
 
 	if (tx_evtchn == rx_evtchn) {
 		/* feature-split-event-channels == 0 */
 		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, tx_evtchn, xenvif_interrupt, 0,
-			vif->dev->name, vif);
+			queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
+			queue->name, queue);
 		if (err < 0)
 			goto err_unmap;
-		vif->tx_irq = vif->rx_irq = err;
-		disable_irq(vif->tx_irq);
+		queue->tx_irq = queue->rx_irq = err;
+		disable_irq(queue->tx_irq);
 	} else {
 		/* feature-split-event-channels == 1 */
-		snprintf(vif->tx_irq_name, sizeof(vif->tx_irq_name),
-			 "%s-tx", vif->dev->name);
+		snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
+			 "%s-tx", queue->name);
 		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
-			vif->tx_irq_name, vif);
+			queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
+			queue->tx_irq_name, queue);
 		if (err < 0)
 			goto err_unmap;
-		vif->tx_irq = err;
-		disable_irq(vif->tx_irq);
+		queue->tx_irq = err;
+		disable_irq(queue->tx_irq);
 
-		snprintf(vif->rx_irq_name, sizeof(vif->rx_irq_name),
-			 "%s-rx", vif->dev->name);
+		snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
+			 "%s-rx", queue->name);
 		err = bind_interdomain_evtchn_to_irqhandler(
-			vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
-			vif->rx_irq_name, vif);
+			queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
+			queue->rx_irq_name, queue);
 		if (err < 0)
 			goto err_tx_unbind;
-		vif->rx_irq = err;
-		disable_irq(vif->rx_irq);
+		queue->rx_irq = err;
+		disable_irq(queue->rx_irq);
 	}
 
 	task = kthread_create(xenvif_kthread,
-			      (void *)vif, "%s", vif->dev->name);
+			      (void *)queue, "%s", queue->name);
 	if (IS_ERR(task)) {
-		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
+		pr_warn("Could not allocate kthread for %s\n", queue->name);
 		err = PTR_ERR(task);
 		goto err_rx_unbind;
 	}
 
-	vif->task = task;
+	queue->task = task;
 
-	rtnl_lock();
-	if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN)
-		dev_set_mtu(vif->dev, ETH_DATA_LEN);
-	netdev_update_features(vif->dev);
-	netif_carrier_on(vif->dev);
-	if (netif_running(vif->dev))
-		xenvif_up(vif);
-	rtnl_unlock();
-
-	wake_up_process(vif->task);
+	wake_up_process(queue->task);
 
 	return 0;
 
 err_rx_unbind:
-	unbind_from_irqhandler(vif->rx_irq, vif);
-	vif->rx_irq = 0;
+	unbind_from_irqhandler(queue->rx_irq, queue);
+	queue->rx_irq = 0;
 err_tx_unbind:
-	unbind_from_irqhandler(vif->tx_irq, vif);
-	vif->tx_irq = 0;
+	unbind_from_irqhandler(queue->tx_irq, queue);
+	queue->tx_irq = 0;
 err_unmap:
-	xenvif_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(queue);
 err:
 	module_put(THIS_MODULE);
 	return err;
@@ -470,34 +558,53 @@ void xenvif_carrier_off(struct xenvif *vif)
 
 void xenvif_disconnect(struct xenvif *vif)
 {
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
+
 	if (netif_carrier_ok(vif->dev))
 		xenvif_carrier_off(vif);
 
-	if (vif->task) {
-		kthread_stop(vif->task);
-		vif->task = NULL;
-	}
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
 
-	if (vif->tx_irq) {
-		if (vif->tx_irq == vif->rx_irq)
-			unbind_from_irqhandler(vif->tx_irq, vif);
-		else {
-			unbind_from_irqhandler(vif->tx_irq, vif);
-			unbind_from_irqhandler(vif->rx_irq, vif);
+		if (queue->task) {
+			kthread_stop(queue->task);
+			queue->task = NULL;
 		}
-		vif->tx_irq = 0;
+
+		if (queue->tx_irq) {
+			if (queue->tx_irq == queue->rx_irq)
+				unbind_from_irqhandler(queue->tx_irq, queue);
+			else {
+				unbind_from_irqhandler(queue->tx_irq, queue);
+				unbind_from_irqhandler(queue->rx_irq, queue);
+			}
+			queue->tx_irq = 0;
+		}
+
+		xenvif_unmap_frontend_rings(queue);
 	}
 
-	xenvif_unmap_frontend_rings(vif);
+
 }
 
 void xenvif_free(struct xenvif *vif)
 {
-	netif_napi_del(&vif->napi);
+	struct xenvif_queue *queue = NULL;
+	unsigned int queue_index;
 
 	unregister_netdev(vif->dev);
 
-	vfree(vif->grant_copy_op);
+	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
+		queue = &vif->queues[queue_index];
+		netif_napi_del(&queue->napi);
+	}
+
+	/* Free the array of queues */
+	vif->num_queues = 0;
+	vfree(vif->queues);
+	vif->queues = NULL;
+
 	free_netdev(vif->dev);
 
 	module_put(THIS_MODULE);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e5284bc..a32abd6 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -75,38 +75,38 @@ module_param(fatal_skb_slots, uint, 0444);
  * one or more merged tx requests, otherwise it is the continuation of
  * previous tx request.
  */
-static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx)
+static inline int pending_tx_is_head(struct xenvif_queue *queue, RING_IDX idx)
 {
-	return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
+	return queue->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
 }
 
-static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
+static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 			       u8 status);
 
-static void make_tx_response(struct xenvif *vif,
+static void make_tx_response(struct xenvif_queue *queue,
 			     struct xen_netif_tx_request *txp,
 			     s8       st);
 
-static inline int tx_work_todo(struct xenvif *vif);
-static inline int rx_work_todo(struct xenvif *vif);
+static inline int tx_work_todo(struct xenvif_queue *queue);
+static inline int rx_work_todo(struct xenvif_queue *queue);
 
-static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
+static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue,
 					     u16      id,
 					     s8       st,
 					     u16      offset,
 					     u16      size,
 					     u16      flags);
 
-static inline unsigned long idx_to_pfn(struct xenvif *vif,
+static inline unsigned long idx_to_pfn(struct xenvif_queue *queue,
 				       u16 idx)
 {
-	return page_to_pfn(vif->mmap_pages[idx]);
+	return page_to_pfn(queue->mmap_pages[idx]);
 }
 
-static inline unsigned long idx_to_kaddr(struct xenvif *vif,
+static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
 					 u16 idx)
 {
-	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
+	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
 }
 
 /* This is a miniumum size for the linear area to avoid lots of
@@ -131,30 +131,30 @@ static inline pending_ring_idx_t pending_index(unsigned i)
 	return i & (MAX_PENDING_REQS-1);
 }
 
-static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
+static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
 {
 	return MAX_PENDING_REQS -
-		vif->pending_prod + vif->pending_cons;
+		queue->pending_prod + queue->pending_cons;
 }
 
-bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed)
+bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
 {
 	RING_IDX prod, cons;
 
 	do {
-		prod = vif->rx.sring->req_prod;
-		cons = vif->rx.req_cons;
+		prod = queue->rx.sring->req_prod;
+		cons = queue->rx.req_cons;
 
 		if (prod - cons >= needed)
 			return true;
 
-		vif->rx.sring->req_event = prod + 1;
+		queue->rx.sring->req_event = prod + 1;
 
 		/* Make sure event is visible before we check prod
 		 * again.
 		 */
 		mb();
-	} while (vif->rx.sring->req_prod != prod);
+	} while (queue->rx.sring->req_prod != prod);
 
 	return false;
 }
@@ -208,13 +208,13 @@ struct netrx_pending_operations {
 	grant_ref_t copy_gref;
 };
 
-static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
+static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 						 struct netrx_pending_operations *npo)
 {
 	struct xenvif_rx_meta *meta;
 	struct xen_netif_rx_request *req;
 
-	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
+	req = RING_GET_REQUEST(&queue->rx, queue->rx.req_cons++);
 
 	meta = npo->meta + npo->meta_prod++;
 	meta->gso_type = XEN_NETIF_GSO_TYPE_NONE;
@@ -232,7 +232,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
  */
-static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
+static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb,
 				 struct netrx_pending_operations *npo,
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
@@ -267,7 +267,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 			 */
 			BUG_ON(*head);
 
-			meta = get_next_rx_buffer(vif, npo);
+			meta = get_next_rx_buffer(queue, npo);
 		}
 
 		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
@@ -281,7 +281,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->source.u.gmfn = virt_to_mfn(page_address(page));
 		copy_gop->source.offset = offset;
 
-		copy_gop->dest.domid = vif->domid;
+		copy_gop->dest.domid = queue->vif->domid;
 		copy_gop->dest.offset = npo->copy_off;
 		copy_gop->dest.u.ref = npo->copy_gref;
 
@@ -306,8 +306,8 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		else
 			gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
-		if (*head && ((1 << gso_type) & vif->gso_mask))
-			vif->rx.req_cons++;
+		if (*head && ((1 << gso_type) & queue->vif->gso_mask))
+			queue->rx.req_cons++;
 
 		*head = 0; /* There must be something in this buffer now. */
 
@@ -327,7 +327,8 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
  * frontend-side LRO).
  */
 static int xenvif_gop_skb(struct sk_buff *skb,
-			  struct netrx_pending_operations *npo)
+			  struct netrx_pending_operations *npo,
+			  struct xenvif_queue *queue)
 {
 	struct xenvif *vif = netdev_priv(skb->dev);
 	int nr_frags = skb_shinfo(skb)->nr_frags;
@@ -355,7 +356,7 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 
 	/* Set up a GSO prefix descriptor, if necessary */
 	if ((1 << gso_type) & vif->gso_prefix_mask) {
-		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
+		req = RING_GET_REQUEST(&queue->rx, queue->rx.req_cons++);
 		meta = npo->meta + npo->meta_prod++;
 		meta->gso_type = gso_type;
 		meta->gso_size = gso_size;
@@ -363,7 +364,7 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 		meta->id = req->id;
 	}
 
-	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
+	req = RING_GET_REQUEST(&queue->rx, queue->rx.req_cons++);
 	meta = npo->meta + npo->meta_prod++;
 
 	if ((1 << gso_type) & vif->gso_mask) {
@@ -387,13 +388,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 		if (data + len > skb_tail_pointer(skb))
 			len = skb_tail_pointer(skb) - data;
 
-		xenvif_gop_frag_copy(vif, skb, npo,
+		xenvif_gop_frag_copy(queue, skb, npo,
 				     virt_to_page(data), len, offset, &head);
 		data += len;
 	}
 
 	for (i = 0; i < nr_frags; i++) {
-		xenvif_gop_frag_copy(vif, skb, npo,
+		xenvif_gop_frag_copy(queue, skb, npo,
 				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
 				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
 				     skb_shinfo(skb)->frags[i].page_offset,
@@ -429,7 +430,7 @@ static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
 	return status;
 }
 
-static void xenvif_add_frag_responses(struct xenvif *vif, int status,
+static void xenvif_add_frag_responses(struct xenvif_queue *queue, int status,
 				      struct xenvif_rx_meta *meta,
 				      int nr_meta_slots)
 {
@@ -450,7 +451,7 @@ static void xenvif_add_frag_responses(struct xenvif *vif, int status,
 			flags = XEN_NETRXF_more_data;
 
 		offset = 0;
-		make_rx_response(vif, meta[i].id, status, offset,
+		make_rx_response(queue, meta[i].id, status, offset,
 				 meta[i].size, flags);
 	}
 }
@@ -459,12 +460,12 @@ struct skb_cb_overlay {
 	int meta_slots_used;
 };
 
-void xenvif_kick_thread(struct xenvif *vif)
+void xenvif_kick_thread(struct xenvif_queue *queue)
 {
-	wake_up(&vif->wq);
+	wake_up(&queue->wq);
 }
 
-static void xenvif_rx_action(struct xenvif *vif)
+static void xenvif_rx_action(struct xenvif_queue *queue)
 {
 	s8 status;
 	u16 flags;
@@ -478,13 +479,13 @@ static void xenvif_rx_action(struct xenvif *vif)
 	bool need_to_notify = false;
 
 	struct netrx_pending_operations npo = {
-		.copy  = vif->grant_copy_op,
-		.meta  = vif->meta,
+		.copy  = queue->grant_copy_op,
+		.meta  = queue->meta,
 	};
 
 	skb_queue_head_init(&rxq);
 
-	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
+	while ((skb = skb_dequeue(&queue->rx_queue)) != NULL) {
 		RING_IDX max_slots_needed;
 		int i;
 
@@ -505,41 +506,41 @@ static void xenvif_rx_action(struct xenvif *vif)
 			max_slots_needed++;
 
 		/* If the skb may not fit then bail out now */
-		if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
-			skb_queue_head(&vif->rx_queue, skb);
+		if (!xenvif_rx_ring_slots_available(queue, max_slots_needed)) {
+			skb_queue_head(&queue->rx_queue, skb);
 			need_to_notify = true;
-			vif->rx_last_skb_slots = max_slots_needed;
+			queue->rx_last_skb_slots = max_slots_needed;
 			break;
 		} else
-			vif->rx_last_skb_slots = 0;
+			queue->rx_last_skb_slots = 0;
 
 		sco = (struct skb_cb_overlay *)skb->cb;
-		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
+		sco->meta_slots_used = xenvif_gop_skb(skb, &npo, queue);
 		BUG_ON(sco->meta_slots_used > max_slots_needed);
 
 		__skb_queue_tail(&rxq, skb);
 	}
 
-	BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
+	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
 
 	if (!npo.copy_prod)
 		goto done;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
-	gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod);
+	gnttab_batch_copy(queue->grant_copy_op, npo.copy_prod);
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
 		sco = (struct skb_cb_overlay *)skb->cb;
 
-		if ((1 << vif->meta[npo.meta_cons].gso_type) &
-		    vif->gso_prefix_mask) {
-			resp = RING_GET_RESPONSE(&vif->rx,
-						 vif->rx.rsp_prod_pvt++);
+		if ((1 << queue->meta[npo.meta_cons].gso_type) &
+		    queue->vif->gso_prefix_mask) {
+			resp = RING_GET_RESPONSE(&queue->rx,
+						 queue->rx.rsp_prod_pvt++);
 
 			resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
 
-			resp->offset = vif->meta[npo.meta_cons].gso_size;
-			resp->id = vif->meta[npo.meta_cons].id;
+			resp->offset = queue->meta[npo.meta_cons].gso_size;
+			resp->id = queue->meta[npo.meta_cons].id;
 			resp->status = sco->meta_slots_used;
 
 			npo.meta_cons++;
@@ -547,10 +548,10 @@ static void xenvif_rx_action(struct xenvif *vif)
 		}
 
 
-		vif->dev->stats.tx_bytes += skb->len;
-		vif->dev->stats.tx_packets++;
+		queue->stats.tx_bytes += skb->len;
+		queue->stats.tx_packets++;
 
-		status = xenvif_check_gop(vif, sco->meta_slots_used, &npo);
+		status = xenvif_check_gop(queue->vif, sco->meta_slots_used, &npo);
 
 		if (sco->meta_slots_used == 1)
 			flags = 0;
@@ -564,22 +565,22 @@ static void xenvif_rx_action(struct xenvif *vif)
 			flags |= XEN_NETRXF_data_validated;
 
 		offset = 0;
-		resp = make_rx_response(vif, vif->meta[npo.meta_cons].id,
+		resp = make_rx_response(queue, queue->meta[npo.meta_cons].id,
 					status, offset,
-					vif->meta[npo.meta_cons].size,
+					queue->meta[npo.meta_cons].size,
 					flags);
 
-		if ((1 << vif->meta[npo.meta_cons].gso_type) &
-		    vif->gso_mask) {
+		if ((1 << queue->meta[npo.meta_cons].gso_type) &
+		    queue->vif->gso_mask) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
-				RING_GET_RESPONSE(&vif->rx,
-						  vif->rx.rsp_prod_pvt++);
+				RING_GET_RESPONSE(&queue->rx,
+						  queue->rx.rsp_prod_pvt++);
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
-			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
-			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
+			gso->u.gso.type = queue->meta[npo.meta_cons].gso_type;
+			gso->u.gso.size = queue->meta[npo.meta_cons].gso_size;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
 
@@ -587,11 +588,11 @@ static void xenvif_rx_action(struct xenvif *vif)
 			gso->flags = 0;
 		}
 
-		xenvif_add_frag_responses(vif, status,
-					  vif->meta + npo.meta_cons + 1,
+		xenvif_add_frag_responses(queue, status,
+					  queue->meta + npo.meta_cons + 1,
 					  sco->meta_slots_used);
 
-		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret);
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->rx, ret);
 
 		need_to_notify |= !!ret;
 
@@ -601,20 +602,20 @@ static void xenvif_rx_action(struct xenvif *vif)
 
 done:
 	if (need_to_notify)
-		notify_remote_via_irq(vif->rx_irq);
+		notify_remote_via_irq(queue->rx_irq);
 }
 
-void xenvif_check_rx_xenvif(struct xenvif *vif)
+void xenvif_check_rx_xenvif(struct xenvif_queue *queue)
 {
 	int more_to_do;
 
-	RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
+	RING_FINAL_CHECK_FOR_REQUESTS(&queue->tx, more_to_do);
 
 	if (more_to_do)
-		napi_schedule(&vif->napi);
+		napi_schedule(&queue->napi);
 }
 
-static void tx_add_credit(struct xenvif *vif)
+static void tx_add_credit(struct xenvif_queue *queue)
 {
 	unsigned long max_burst, max_credit;
 
@@ -622,37 +623,37 @@ static void tx_add_credit(struct xenvif *vif)
 	 * Allow a burst big enough to transmit a jumbo packet of up to 128kB.
 	 * Otherwise the interface can seize up due to insufficient credit.
 	 */
-	max_burst = RING_GET_REQUEST(&vif->tx, vif->tx.req_cons)->size;
+	max_burst = RING_GET_REQUEST(&queue->tx, queue->tx.req_cons)->size;
 	max_burst = min(max_burst, 131072UL);
-	max_burst = max(max_burst, vif->credit_bytes);
+	max_burst = max(max_burst, queue->credit_bytes);
 
 	/* Take care that adding a new chunk of credit doesn't wrap to zero. */
-	max_credit = vif->remaining_credit + vif->credit_bytes;
-	if (max_credit < vif->remaining_credit)
+	max_credit = queue->remaining_credit + queue->credit_bytes;
+	if (max_credit < queue->remaining_credit)
 		max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */
 
-	vif->remaining_credit = min(max_credit, max_burst);
+	queue->remaining_credit = min(max_credit, max_burst);
 }
 
 static void tx_credit_callback(unsigned long data)
 {
-	struct xenvif *vif = (struct xenvif *)data;
-	tx_add_credit(vif);
-	xenvif_check_rx_xenvif(vif);
+	struct xenvif_queue *queue = (struct xenvif_queue *)data;
+	tx_add_credit(queue);
+	xenvif_check_rx_xenvif(queue);
 }
 
-static void xenvif_tx_err(struct xenvif *vif,
+static void xenvif_tx_err(struct xenvif_queue *queue,
 			  struct xen_netif_tx_request *txp, RING_IDX end)
 {
-	RING_IDX cons = vif->tx.req_cons;
+	RING_IDX cons = queue->tx.req_cons;
 
 	do {
-		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
+		make_tx_response(queue, txp, XEN_NETIF_RSP_ERROR);
 		if (cons == end)
 			break;
-		txp = RING_GET_REQUEST(&vif->tx, cons++);
+		txp = RING_GET_REQUEST(&queue->tx, cons++);
 	} while (1);
-	vif->tx.req_cons = cons;
+	queue->tx.req_cons = cons;
 }
 
 static void xenvif_fatal_tx_err(struct xenvif *vif)
@@ -661,12 +662,12 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
 	xenvif_carrier_off(vif);
 }
 
-static int xenvif_count_requests(struct xenvif *vif,
+static int xenvif_count_requests(struct xenvif_queue *queue,
 				 struct xen_netif_tx_request *first,
 				 struct xen_netif_tx_request *txp,
 				 int work_to_do)
 {
-	RING_IDX cons = vif->tx.req_cons;
+	RING_IDX cons = queue->tx.req_cons;
 	int slots = 0;
 	int drop_err = 0;
 	int more_data;
@@ -678,10 +679,10 @@ static int xenvif_count_requests(struct xenvif *vif,
 		struct xen_netif_tx_request dropped_tx = { 0 };
 
 		if (slots >= work_to_do) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "Asked for %d slots but exceeds this limit\n",
 				   work_to_do);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -ENODATA;
 		}
 
@@ -689,10 +690,10 @@ static int xenvif_count_requests(struct xenvif *vif,
 		 * considered malicious.
 		 */
 		if (unlikely(slots >= fatal_skb_slots)) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "Malicious frontend using %d slots, threshold %u\n",
 				   slots, fatal_skb_slots);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -E2BIG;
 		}
 
@@ -705,7 +706,7 @@ static int xenvif_count_requests(struct xenvif *vif,
 		 */
 		if (!drop_err && slots >= XEN_NETBK_LEGACY_SLOTS_MAX) {
 			if (net_ratelimit())
-				netdev_dbg(vif->dev,
+				netdev_dbg(queue->vif->dev,
 					   "Too many slots (%d) exceeding limit (%d), dropping packet\n",
 					   slots, XEN_NETBK_LEGACY_SLOTS_MAX);
 			drop_err = -E2BIG;
@@ -714,7 +715,7 @@ static int xenvif_count_requests(struct xenvif *vif,
 		if (drop_err)
 			txp = &dropped_tx;
 
-		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
+		memcpy(txp, RING_GET_REQUEST(&queue->tx, cons + slots),
 		       sizeof(*txp));
 
 		/* If the guest submitted a frame >= 64 KiB then
@@ -728,7 +729,7 @@ static int xenvif_count_requests(struct xenvif *vif,
 		 */
 		if (!drop_err && txp->size > first->size) {
 			if (net_ratelimit())
-				netdev_dbg(vif->dev,
+				netdev_dbg(queue->vif->dev,
 					   "Invalid tx request, slot size %u > remaining size %u\n",
 					   txp->size, first->size);
 			drop_err = -EIO;
@@ -738,9 +739,9 @@ static int xenvif_count_requests(struct xenvif *vif,
 		slots++;
 
 		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
-			netdev_err(vif->dev, "Cross page boundary, txp->offset: %x, size: %u\n",
+			netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %x, size: %u\n",
 				 txp->offset, txp->size);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -EINVAL;
 		}
 
@@ -752,14 +753,14 @@ static int xenvif_count_requests(struct xenvif *vif,
 	} while (more_data);
 
 	if (drop_err) {
-		xenvif_tx_err(vif, first, cons + slots);
+		xenvif_tx_err(queue, first, cons + slots);
 		return drop_err;
 	}
 
 	return slots;
 }
 
-static struct page *xenvif_alloc_page(struct xenvif *vif,
+static struct page *xenvif_alloc_page(struct xenvif_queue *queue,
 				      u16 pending_idx)
 {
 	struct page *page;
@@ -767,12 +768,12 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
 	page = alloc_page(GFP_ATOMIC|__GFP_COLD);
 	if (!page)
 		return NULL;
-	vif->mmap_pages[pending_idx] = page;
+	queue->mmap_pages[pending_idx] = page;
 
 	return page;
 }
 
-static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
+static struct gnttab_copy *xenvif_get_requests(struct xenvif_queue *queue,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
 					       struct gnttab_copy *gop)
@@ -803,7 +804,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 	for (shinfo->nr_frags = slot = start; slot < nr_slots;
 	     shinfo->nr_frags++) {
 		struct pending_tx_info *pending_tx_info =
-			vif->pending_tx_info;
+			queue->pending_tx_info;
 
 		page = alloc_page(GFP_ATOMIC|__GFP_COLD);
 		if (!page)
@@ -815,7 +816,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 			gop->flags = GNTCOPY_source_gref;
 
 			gop->source.u.ref = txp->gref;
-			gop->source.domid = vif->domid;
+			gop->source.domid = queue->vif->domid;
 			gop->source.offset = txp->offset;
 
 			gop->dest.domid = DOMID_SELF;
@@ -840,9 +841,9 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 				gop->len = txp->size;
 				dst_offset += gop->len;
 
-				index = pending_index(vif->pending_cons++);
+				index = pending_index(queue->pending_cons++);
 
-				pending_idx = vif->pending_ring[index];
+				pending_idx = queue->pending_ring[index];
 
 				memcpy(&pending_tx_info[pending_idx].req, txp,
 				       sizeof(*txp));
@@ -851,7 +852,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 				 * fields for head tx req will be set
 				 * to correct values after the loop.
 				 */
-				vif->mmap_pages[pending_idx] = (void *)(~0UL);
+				queue->mmap_pages[pending_idx] = (void *)(~0UL);
 				pending_tx_info[pending_idx].head =
 					INVALID_PENDING_RING_IDX;
 
@@ -871,7 +872,7 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 		first->req.offset = 0;
 		first->req.size = dst_offset;
 		first->head = start_idx;
-		vif->mmap_pages[head_idx] = page;
+		queue->mmap_pages[head_idx] = page;
 		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
 	}
 
@@ -881,18 +882,18 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 err:
 	/* Unwind, freeing all pages and sending error responses. */
 	while (shinfo->nr_frags-- > start) {
-		xenvif_idx_release(vif,
+		xenvif_idx_release(queue,
 				frag_get_pending_idx(&frags[shinfo->nr_frags]),
 				XEN_NETIF_RSP_ERROR);
 	}
 	/* The head too, if necessary. */
 	if (start)
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR);
 
 	return NULL;
 }
 
-static int xenvif_tx_check_gop(struct xenvif *vif,
+static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 			       struct sk_buff *skb,
 			       struct gnttab_copy **gopp)
 {
@@ -907,7 +908,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	/* Check status of header. */
 	err = gop->status;
 	if (unlikely(err))
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR);
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
@@ -917,7 +918,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		pending_ring_idx_t head;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
-		tx_info = &vif->pending_tx_info[pending_idx];
+		tx_info = &queue->pending_tx_info[pending_idx];
 		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
@@ -925,19 +926,19 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 			newerr = (++gop)->status;
 			if (newerr)
 				break;
-			peek = vif->pending_ring[pending_index(++head)];
-		} while (!pending_tx_is_head(vif, peek));
+			peek = queue->pending_ring[pending_index(++head)];
+		} while (!pending_tx_is_head(queue, peek));
 
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
-				xenvif_idx_release(vif, pending_idx,
+				xenvif_idx_release(queue, pending_idx,
 						   XEN_NETIF_RSP_OKAY);
 			continue;
 		}
 
 		/* Error on this fragment: respond to client with an error. */
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR);
 
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
@@ -945,10 +946,10 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_OKAY);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
-			xenvif_idx_release(vif, pending_idx,
+			xenvif_idx_release(queue, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
 
@@ -960,7 +961,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	return err;
 }
 
-static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -974,46 +975,46 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 
 		pending_idx = frag_get_pending_idx(frag);
 
-		txp = &vif->pending_tx_info[pending_idx].req;
-		page = virt_to_page(idx_to_kaddr(vif, pending_idx));
+		txp = &queue->pending_tx_info[pending_idx].req;
+		page = virt_to_page(idx_to_kaddr(queue, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
 		skb->len += txp->size;
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
 		/* Take an extra reference to offset xenvif_idx_release */
-		get_page(vif->mmap_pages[pending_idx]);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		get_page(queue->mmap_pages[pending_idx]);
+		xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_OKAY);
 	}
 }
 
-static int xenvif_get_extras(struct xenvif *vif,
+static int xenvif_get_extras(struct xenvif_queue *queue,
 				struct xen_netif_extra_info *extras,
 				int work_to_do)
 {
 	struct xen_netif_extra_info extra;
-	RING_IDX cons = vif->tx.req_cons;
+	RING_IDX cons = queue->tx.req_cons;
 
 	do {
 		if (unlikely(work_to_do-- <= 0)) {
-			netdev_err(vif->dev, "Missing extra info\n");
-			xenvif_fatal_tx_err(vif);
+			netdev_err(queue->vif->dev, "Missing extra info\n");
+			xenvif_fatal_tx_err(queue->vif);
 			return -EBADR;
 		}
 
-		memcpy(&extra, RING_GET_REQUEST(&vif->tx, cons),
+		memcpy(&extra, RING_GET_REQUEST(&queue->tx, cons),
 		       sizeof(extra));
 		if (unlikely(!extra.type ||
 			     extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
-			vif->tx.req_cons = ++cons;
-			netdev_err(vif->dev,
+			queue->tx.req_cons = ++cons;
+			netdev_err(queue->vif->dev,
 				   "Invalid extra type: %d\n", extra.type);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			return -EINVAL;
 		}
 
 		memcpy(&extras[extra.type - 1], &extra, sizeof(extra));
-		vif->tx.req_cons = ++cons;
+		queue->tx.req_cons = ++cons;
 	} while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
 	return work_to_do;
@@ -1048,7 +1049,7 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 	return 0;
 }
 
-static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+static int checksum_setup(struct xenvif_queue *queue, struct sk_buff *skb)
 {
 	bool recalculate_partial_csum = false;
 
@@ -1058,7 +1059,7 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 	 * recalculate the partial checksum.
 	 */
 	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
-		vif->rx_gso_checksum_fixup++;
+		queue->stats.rx_gso_checksum_fixup++;
 		skb->ip_summed = CHECKSUM_PARTIAL;
 		recalculate_partial_csum = true;
 	}
@@ -1070,31 +1071,31 @@ static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
 	return skb_checksum_setup(skb, recalculate_partial_csum);
 }
 
-static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
+static bool tx_credit_exceeded(struct xenvif_queue *queue, unsigned size)
 {
 	u64 now = get_jiffies_64();
-	u64 next_credit = vif->credit_window_start +
-		msecs_to_jiffies(vif->credit_usec / 1000);
+	u64 next_credit = queue->credit_window_start +
+		msecs_to_jiffies(queue->credit_usec / 1000);
 
 	/* Timer could already be pending in rare cases. */
-	if (timer_pending(&vif->credit_timeout))
+	if (timer_pending(&queue->credit_timeout))
 		return true;
 
 	/* Passed the point where we can replenish credit? */
 	if (time_after_eq64(now, next_credit)) {
-		vif->credit_window_start = now;
-		tx_add_credit(vif);
+		queue->credit_window_start = now;
+		tx_add_credit(queue);
 	}
 
 	/* Still too big to send right now? Set a callback. */
-	if (size > vif->remaining_credit) {
-		vif->credit_timeout.data     =
-			(unsigned long)vif;
-		vif->credit_timeout.function =
+	if (size > queue->remaining_credit) {
+		queue->credit_timeout.data     =
+			(unsigned long)queue;
+		queue->credit_timeout.function =
 			tx_credit_callback;
-		mod_timer(&vif->credit_timeout,
+		mod_timer(&queue->credit_timeout,
 			  next_credit);
-		vif->credit_window_start = next_credit;
+		queue->credit_window_start = next_credit;
 
 		return true;
 	}
@@ -1102,15 +1103,15 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 	return false;
 }
 
-static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
+static unsigned xenvif_tx_build_gops(struct xenvif_queue *queue, int budget)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
+	struct gnttab_copy *gop = queue->tx_copy_ops, *request_gop;
 	struct sk_buff *skb;
 	int ret;
 
-	while ((nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
+	while ((nr_pending_reqs(queue) + XEN_NETBK_LEGACY_SLOTS_MAX
 		< MAX_PENDING_REQS) &&
-	       (skb_queue_len(&vif->tx_queue) < budget)) {
+	       (skb_queue_len(&queue->tx_queue) < budget)) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
 		struct page *page;
@@ -1121,69 +1122,69 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		unsigned int data_len;
 		pending_ring_idx_t index;
 
-		if (vif->tx.sring->req_prod - vif->tx.req_cons >
+		if (queue->tx.sring->req_prod - queue->tx.req_cons >
 		    XEN_NETIF_TX_RING_SIZE) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "Impossible number of requests. "
 				   "req_prod %d, req_cons %d, size %ld\n",
-				   vif->tx.sring->req_prod, vif->tx.req_cons,
+				   queue->tx.sring->req_prod, queue->tx.req_cons,
 				   XEN_NETIF_TX_RING_SIZE);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			continue;
 		}
 
-		work_to_do = RING_HAS_UNCONSUMED_REQUESTS(&vif->tx);
+		work_to_do = RING_HAS_UNCONSUMED_REQUESTS(&queue->tx);
 		if (!work_to_do)
 			break;
 
-		idx = vif->tx.req_cons;
+		idx = queue->tx.req_cons;
 		rmb(); /* Ensure that we see the request before we copy it. */
-		memcpy(&txreq, RING_GET_REQUEST(&vif->tx, idx), sizeof(txreq));
+		memcpy(&txreq, RING_GET_REQUEST(&queue->tx, idx), sizeof(txreq));
 
 		/* Credit-based scheduling. */
-		if (txreq.size > vif->remaining_credit &&
-		    tx_credit_exceeded(vif, txreq.size))
+		if (txreq.size > queue->remaining_credit &&
+		    tx_credit_exceeded(queue, txreq.size))
 			break;
 
-		vif->remaining_credit -= txreq.size;
+		queue->remaining_credit -= txreq.size;
 
 		work_to_do--;
-		vif->tx.req_cons = ++idx;
+		queue->tx.req_cons = ++idx;
 
 		memset(extras, 0, sizeof(extras));
 		if (txreq.flags & XEN_NETTXF_extra_info) {
-			work_to_do = xenvif_get_extras(vif, extras,
+			work_to_do = xenvif_get_extras(queue, extras,
 						       work_to_do);
-			idx = vif->tx.req_cons;
+			idx = queue->tx.req_cons;
 			if (unlikely(work_to_do < 0))
 				break;
 		}
 
-		ret = xenvif_count_requests(vif, &txreq, txfrags, work_to_do);
+		ret = xenvif_count_requests(queue, &txreq, txfrags, work_to_do);
 		if (unlikely(ret < 0))
 			break;
 
 		idx += ret;
 
 		if (unlikely(txreq.size < ETH_HLEN)) {
-			netdev_dbg(vif->dev,
+			netdev_dbg(queue->vif->dev,
 				   "Bad packet size: %d\n", txreq.size);
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
 		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
-			netdev_err(vif->dev,
+			netdev_err(queue->vif->dev,
 				   "txreq.offset: %x, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
 				   (txreq.offset&~PAGE_MASK) + txreq.size);
-			xenvif_fatal_tx_err(vif);
+			xenvif_fatal_tx_err(queue->vif);
 			break;
 		}
 
-		index = pending_index(vif->pending_cons);
-		pending_idx = vif->pending_ring[index];
+		index = pending_index(queue->pending_cons);
+		pending_idx = queue->pending_ring[index];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
 			    ret < XEN_NETBK_LEGACY_SLOTS_MAX) ?
@@ -1192,9 +1193,9 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
 				GFP_ATOMIC | __GFP_NOWARN);
 		if (unlikely(skb == NULL)) {
-			netdev_dbg(vif->dev,
+			netdev_dbg(queue->vif->dev,
 				   "Can't allocate a skb in start_xmit.\n");
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 
@@ -1205,7 +1206,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			struct xen_netif_extra_info *gso;
 			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
 
-			if (xenvif_set_skb_gso(vif, skb, gso)) {
+			if (xenvif_set_skb_gso(queue->vif, skb, gso)) {
 				/* Failure in xenvif_set_skb_gso is fatal. */
 				kfree_skb(skb);
 				break;
@@ -1213,15 +1214,15 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		}
 
 		/* XXX could copy straight to head */
-		page = xenvif_alloc_page(vif, pending_idx);
+		page = xenvif_alloc_page(queue, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 
 		gop->source.u.ref = txreq.gref;
-		gop->source.domid = vif->domid;
+		gop->source.domid = queue->vif->domid;
 		gop->source.offset = txreq.offset;
 
 		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
@@ -1233,9 +1234,9 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 
 		gop++;
 
-		memcpy(&vif->pending_tx_info[pending_idx].req,
+		memcpy(&queue->pending_tx_info[pending_idx].req,
 		       &txreq, sizeof(txreq));
-		vif->pending_tx_info[pending_idx].head = index;
+		queue->pending_tx_info[pending_idx].head = index;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1250,45 +1251,45 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 					     INVALID_PENDING_IDX);
 		}
 
-		vif->pending_cons++;
+		queue->pending_cons++;
 
-		request_gop = xenvif_get_requests(vif, skb, txfrags, gop);
+		request_gop = xenvif_get_requests(queue, skb, txfrags, gop);
 		if (request_gop == NULL) {
 			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, idx);
 			break;
 		}
 		gop = request_gop;
 
-		__skb_queue_tail(&vif->tx_queue, skb);
+		__skb_queue_tail(&queue->tx_queue, skb);
 
-		vif->tx.req_cons = idx;
+		queue->tx.req_cons = idx;
 
-		if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+		if ((gop - queue->tx_copy_ops) >= ARRAY_SIZE(queue->tx_copy_ops))
 			break;
 	}
 
-	return gop - vif->tx_copy_ops;
+	return gop - queue->tx_copy_ops;
 }
 
 
-static int xenvif_tx_submit(struct xenvif *vif)
+static int xenvif_tx_submit(struct xenvif_queue *queue)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops;
+	struct gnttab_copy *gop = queue->tx_copy_ops;
 	struct sk_buff *skb;
 	int work_done = 0;
 
-	while ((skb = __skb_dequeue(&vif->tx_queue)) != NULL) {
+	while ((skb = __skb_dequeue(&queue->tx_queue)) != NULL) {
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
 		unsigned data_len;
 
 		pending_idx = *((u16 *)skb->data);
-		txp = &vif->pending_tx_info[pending_idx].req;
+		txp = &queue->pending_tx_info[pending_idx].req;
 
 		/* Check the remap error code. */
-		if (unlikely(xenvif_tx_check_gop(vif, skb, &gop))) {
-			netdev_dbg(vif->dev, "netback grant failed.\n");
+		if (unlikely(xenvif_tx_check_gop(queue, skb, &gop))) {
+			netdev_dbg(queue->vif->dev, "netback grant failed.\n");
 			skb_shinfo(skb)->nr_frags = 0;
 			kfree_skb(skb);
 			continue;
@@ -1296,7 +1297,7 @@ static int xenvif_tx_submit(struct xenvif *vif)
 
 		data_len = skb->len;
 		memcpy(skb->data,
-		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
+		       (void *)(idx_to_kaddr(queue, pending_idx)|txp->offset),
 		       data_len);
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
@@ -1304,7 +1305,7 @@ static int xenvif_tx_submit(struct xenvif *vif)
 			txp->size -= data_len;
 		} else {
 			/* Schedule a response immediately. */
-			xenvif_idx_release(vif, pending_idx,
+			xenvif_idx_release(queue, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
 
@@ -1313,19 +1314,19 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xenvif_fill_frags(vif, skb);
+		xenvif_fill_frags(queue, skb);
 
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
 		}
 
-		skb->dev      = vif->dev;
+		skb->dev      = queue->vif->dev;
 		skb->protocol = eth_type_trans(skb, skb->dev);
 		skb_reset_network_header(skb);
 
-		if (checksum_setup(vif, skb)) {
-			netdev_dbg(vif->dev,
+		if (checksum_setup(queue, skb)) {
+			netdev_dbg(queue->vif->dev,
 				   "Can't setup checksum in net_tx_action\n");
 			kfree_skb(skb);
 			continue;
@@ -1347,8 +1348,8 @@ static int xenvif_tx_submit(struct xenvif *vif)
 				DIV_ROUND_UP(skb->len - hdrlen, mss);
 		}
 
-		vif->dev->stats.rx_bytes += skb->len;
-		vif->dev->stats.rx_packets++;
+		queue->stats.rx_bytes += skb->len;
+		queue->stats.rx_packets++;
 
 		work_done++;
 
@@ -1359,53 +1360,53 @@ static int xenvif_tx_submit(struct xenvif *vif)
 }
 
 /* Called after netfront has transmitted */
-int xenvif_tx_action(struct xenvif *vif, int budget)
+int xenvif_tx_action(struct xenvif_queue *queue, int budget)
 {
 	unsigned nr_gops;
 	int work_done;
 
-	if (unlikely(!tx_work_todo(vif)))
+	if (unlikely(!tx_work_todo(queue)))
 		return 0;
 
-	nr_gops = xenvif_tx_build_gops(vif, budget);
+	nr_gops = xenvif_tx_build_gops(queue, budget);
 
 	if (nr_gops == 0)
 		return 0;
 
-	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
+	gnttab_batch_copy(queue->tx_copy_ops, nr_gops);
 
-	work_done = xenvif_tx_submit(vif);
+	work_done = xenvif_tx_submit(queue);
 
 	return work_done;
 }
 
-static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
+static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 			       u8 status)
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t head;
 	u16 peek; /* peek into next tx request */
 
-	BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
+	BUG_ON(queue->mmap_pages[pending_idx] == (void *)(~0UL));
 
 	/* Already complete? */
-	if (vif->mmap_pages[pending_idx] == NULL)
+	if (queue->mmap_pages[pending_idx] == NULL)
 		return;
 
-	pending_tx_info = &vif->pending_tx_info[pending_idx];
+	pending_tx_info = &queue->pending_tx_info[pending_idx];
 
 	head = pending_tx_info->head;
 
-	BUG_ON(!pending_tx_is_head(vif, head));
-	BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
+	BUG_ON(!pending_tx_is_head(queue, head));
+	BUG_ON(queue->pending_ring[pending_index(head)] != pending_idx);
 
 	do {
 		pending_ring_idx_t index;
 		pending_ring_idx_t idx = pending_index(head);
-		u16 info_idx = vif->pending_ring[idx];
+		u16 info_idx = queue->pending_ring[idx];
 
-		pending_tx_info = &vif->pending_tx_info[info_idx];
-		make_tx_response(vif, &pending_tx_info->req, status);
+		pending_tx_info = &queue->pending_tx_info[info_idx];
+		make_tx_response(queue, &pending_tx_info->req, status);
 
 		/* Setting any number other than
 		 * INVALID_PENDING_RING_IDX indicates this slot is
@@ -1413,50 +1414,50 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 		 */
 		pending_tx_info->head = 0;
 
-		index = pending_index(vif->pending_prod++);
-		vif->pending_ring[index] = vif->pending_ring[info_idx];
+		index = pending_index(queue->pending_prod++);
+		queue->pending_ring[index] = queue->pending_ring[info_idx];
 
-		peek = vif->pending_ring[pending_index(++head)];
+		peek = queue->pending_ring[pending_index(++head)];
 
-	} while (!pending_tx_is_head(vif, peek));
+	} while (!pending_tx_is_head(queue, peek));
 
-	put_page(vif->mmap_pages[pending_idx]);
-	vif->mmap_pages[pending_idx] = NULL;
+	put_page(queue->mmap_pages[pending_idx]);
+	queue->mmap_pages[pending_idx] = NULL;
 }
 
 
-static void make_tx_response(struct xenvif *vif,
+static void make_tx_response(struct xenvif_queue *queue,
 			     struct xen_netif_tx_request *txp,
 			     s8       st)
 {
-	RING_IDX i = vif->tx.rsp_prod_pvt;
+	RING_IDX i = queue->tx.rsp_prod_pvt;
 	struct xen_netif_tx_response *resp;
 	int notify;
 
-	resp = RING_GET_RESPONSE(&vif->tx, i);
+	resp = RING_GET_RESPONSE(&queue->tx, i);
 	resp->id     = txp->id;
 	resp->status = st;
 
 	if (txp->flags & XEN_NETTXF_extra_info)
-		RING_GET_RESPONSE(&vif->tx, ++i)->status = XEN_NETIF_RSP_NULL;
+		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
 
-	vif->tx.rsp_prod_pvt = ++i;
-	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->tx, notify);
+	queue->tx.rsp_prod_pvt = ++i;
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
 	if (notify)
-		notify_remote_via_irq(vif->tx_irq);
+		notify_remote_via_irq(queue->tx_irq);
 }
 
-static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
+static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue,
 					     u16      id,
 					     s8       st,
 					     u16      offset,
 					     u16      size,
 					     u16      flags)
 {
-	RING_IDX i = vif->rx.rsp_prod_pvt;
+	RING_IDX i = queue->rx.rsp_prod_pvt;
 	struct xen_netif_rx_response *resp;
 
-	resp = RING_GET_RESPONSE(&vif->rx, i);
+	resp = RING_GET_RESPONSE(&queue->rx, i);
 	resp->offset     = offset;
 	resp->flags      = flags;
 	resp->id         = id;
@@ -1464,39 +1465,39 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 	if (st < 0)
 		resp->status = (s16)st;
 
-	vif->rx.rsp_prod_pvt = ++i;
+	queue->rx.rsp_prod_pvt = ++i;
 
 	return resp;
 }
 
-static inline int rx_work_todo(struct xenvif *vif)
+static inline int rx_work_todo(struct xenvif_queue *queue)
 {
-	return !skb_queue_empty(&vif->rx_queue) &&
-	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots);
+	return !skb_queue_empty(&queue->rx_queue) &&
+	       xenvif_rx_ring_slots_available(queue, queue->rx_last_skb_slots);
 }
 
-static inline int tx_work_todo(struct xenvif *vif)
+static inline int tx_work_todo(struct xenvif_queue *queue)
 {
 
-	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
-	    (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
+	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&queue->tx)) &&
+	    (nr_pending_reqs(queue) + XEN_NETBK_LEGACY_SLOTS_MAX
 	     < MAX_PENDING_REQS))
 		return 1;
 
 	return 0;
 }
 
-void xenvif_unmap_frontend_rings(struct xenvif *vif)
+void xenvif_unmap_frontend_rings(struct xenvif_queue *queue)
 {
-	if (vif->tx.sring)
-		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-					vif->tx.sring);
-	if (vif->rx.sring)
-		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-					vif->rx.sring);
+	if (queue->tx.sring)
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(queue->vif),
+					queue->tx.sring);
+	if (queue->rx.sring)
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(queue->vif),
+					queue->rx.sring);
 }
 
-int xenvif_map_frontend_rings(struct xenvif *vif,
+int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 			      grant_ref_t tx_ring_ref,
 			      grant_ref_t rx_ring_ref)
 {
@@ -1506,67 +1507,72 @@ int xenvif_map_frontend_rings(struct xenvif *vif,
 
 	int err = -ENOMEM;
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     tx_ring_ref, &addr);
 	if (err)
 		goto err;
 
 	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
 				     rx_ring_ref, &addr);
 	if (err)
 		goto err;
 
 	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE);
+	BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
 
 	return 0;
 
 err:
-	xenvif_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(queue);
 	return err;
 }
 
-void xenvif_stop_queue(struct xenvif *vif)
+static inline void xenvif_wake_queue(struct xenvif_queue *queue)
 {
-	if (!vif->can_queue)
-		return;
+	struct net_device *dev = queue->vif->dev;
+	netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
+}
 
-	netif_stop_queue(vif->dev);
+static void xenvif_start_queue(struct xenvif_queue *queue)
+{
+	if (xenvif_schedulable(queue->vif))
+		xenvif_wake_queue(queue);
 }
 
-static void xenvif_start_queue(struct xenvif *vif)
+static int xenvif_queue_stopped(struct xenvif_queue *queue)
 {
-	if (xenvif_schedulable(vif))
-		netif_wake_queue(vif->dev);
+	struct net_device *dev = queue->vif->dev;
+	unsigned int id = queue->id;
+	return netif_tx_queue_stopped(netdev_get_tx_queue(dev, id));
 }
 
 int xenvif_kthread(void *data)
 {
-	struct xenvif *vif = data;
+	struct xenvif_queue *queue = data;
 	struct sk_buff *skb;
 
 	while (!kthread_should_stop()) {
-		wait_event_interruptible(vif->wq,
-					 rx_work_todo(vif) ||
+		wait_event_interruptible(queue->wq,
+					 rx_work_todo(queue) ||
 					 kthread_should_stop());
 		if (kthread_should_stop())
 			break;
 
-		if (!skb_queue_empty(&vif->rx_queue))
-			xenvif_rx_action(vif);
+		if (!skb_queue_empty(&queue->rx_queue))
+			xenvif_rx_action(queue);
 
-		if (skb_queue_empty(&vif->rx_queue) &&
-		    netif_queue_stopped(vif->dev))
-			xenvif_start_queue(vif);
+		if (skb_queue_empty(&queue->rx_queue) &&
+		    xenvif_queue_stopped(queue))
+			xenvif_start_queue(queue);
 
 		cond_resched();
 	}
 
 	/* Bin any remaining skbs */
-	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL)
+	while ((skb = skb_dequeue(&queue->rx_queue)) != NULL)
 		dev_kfree_skb(skb);
 
 	return 0;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 7a206cf..f23ea0a 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -19,6 +19,7 @@
 */
 
 #include "common.h"
+#include <linux/vmalloc.h>
 
 struct backend_info {
 	struct xenbus_device *dev;
@@ -34,8 +35,9 @@ struct backend_info {
 	u8 have_hotplug_status_watch:1;
 };
 
-static int connect_rings(struct backend_info *);
-static void connect(struct backend_info *);
+static int connect_rings(struct backend_info *be, struct xenvif_queue *queue);
+static void connect(struct backend_info *be);
+static int read_xenbus_vif_flags(struct backend_info *be);
 static void backend_create_xenvif(struct backend_info *be);
 static void unregister_hotplug_status_watch(struct backend_info *be);
 static void set_backend_state(struct backend_info *be,
@@ -485,10 +487,9 @@ static void connect(struct backend_info *be)
 {
 	int err;
 	struct xenbus_device *dev = be->dev;
-
-	err = connect_rings(be);
-	if (err)
-		return;
+	unsigned long credit_bytes, credit_usec;
+	unsigned int queue_index;
+	struct xenvif_queue *queue;
 
 	err = xen_net_read_mac(dev, be->vif->fe_dev_addr);
 	if (err) {
@@ -496,9 +497,30 @@ static void connect(struct backend_info *be)
 		return;
 	}
 
-	xen_net_read_rate(dev, &be->vif->credit_bytes,
-			  &be->vif->credit_usec);
-	be->vif->remaining_credit = be->vif->credit_bytes;
+	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
+	read_xenbus_vif_flags(be);
+
+	be->vif->num_queues = 1;
+	be->vif->queues = vzalloc(be->vif->num_queues *
+			sizeof(struct xenvif_queue));
+
+	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
+		queue = &be->vif->queues[queue_index];
+		queue->vif = be->vif;
+		queue->id = queue_index;
+		snprintf(queue->name, sizeof(queue->name), "%s-q%u",
+				be->vif->dev->name, queue->id);
+
+		xenvif_init_queue(queue);
+
+		queue->remaining_credit = credit_bytes;
+
+		err = connect_rings(be, queue);
+		if (err)
+			goto err;
+	}
+
+	xenvif_carrier_on(be->vif);
 
 	unregister_hotplug_status_watch(be);
 	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
@@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
 	if (!err)
 		be->have_hotplug_status_watch = 1;
 
-	netif_wake_queue(be->vif->dev);
+	netif_tx_wake_all_queues(be->vif->dev);
+
+	return;
+
+err:
+	vfree(be->vif->queues);
+	be->vif->queues = NULL;
+	be->vif->num_queues = 0;
+	return;
 }
 
 
-static int connect_rings(struct backend_info *be)
+static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 {
-	struct xenvif *vif = be->vif;
 	struct xenbus_device *dev = be->dev;
 	unsigned long tx_ring_ref, rx_ring_ref;
-	unsigned int tx_evtchn, rx_evtchn, rx_copy;
+	unsigned int tx_evtchn, rx_evtchn;
 	int err;
-	int val;
 
 	err = xenbus_gather(XBT_NIL, dev->otherend,
 			    "tx-ring-ref", "%lu", &tx_ring_ref,
@@ -546,6 +574,27 @@ static int connect_rings(struct backend_info *be)
 		rx_evtchn = tx_evtchn;
 	}
 
+	/* Map the shared frame, irq etc. */
+	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
+			     tx_evtchn, rx_evtchn);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "mapping shared-frames %lu/%lu port tx %u rx %u",
+				 tx_ring_ref, rx_ring_ref,
+				 tx_evtchn, rx_evtchn);
+		return err;
+	}
+
+	return 0;
+}
+
+static int read_xenbus_vif_flags(struct backend_info *be)
+{
+	struct xenvif *vif = be->vif;
+	struct xenbus_device *dev = be->dev;
+	unsigned int rx_copy;
+	int err, val;
+
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
 			   &rx_copy);
 	if (err == -ENOENT) {
@@ -621,16 +670,6 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->ipv6_csum = !!val;
 
-	/* Map the shared frame, irq etc. */
-	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
-			     tx_evtchn, rx_evtchn);
-	if (err) {
-		xenbus_dev_fatal(dev, err,
-				 "mapping shared-frames %lu/%lu port tx %u rx %u",
-				 tx_ring_ref, rx_ring_ref,
-				 tx_evtchn, rx_evtchn);
-		return err;
-	}
 	return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
  2014-03-03 11:47 ` [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct Andrew J. Bennieston
@ 2014-03-03 11:47 ` Andrew J. Bennieston
  2014-03-14 16:03   ` Ian Campbell
  2014-03-03 11:47 ` [PATCH V6 net-next 3/5] xen-netfront: Factor queue-specific data into queue struct Andrew J. Bennieston
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Andrew J. Bennieston @ 2014-03-03 11:47 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, netdev, paul.durrant, david.vrabel,
	Andrew J. Bennieston

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

Builds on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.

Writes the maximum supported number of queues into XenStore, and reads
the values written by the frontend to determine how many queues to use.

Ring references and event channels are read from XenStore on a per-queue
basis and rings are connected accordingly.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
 drivers/net/xen-netback/common.h    |    2 +
 drivers/net/xen-netback/interface.c |    7 +++-
 drivers/net/xen-netback/netback.c   |    8 ++++
 drivers/net/xen-netback/xenbus.c    |   76 ++++++++++++++++++++++++++++++-----
 4 files changed, 82 insertions(+), 11 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 4176539..e72bf38 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -261,4 +261,6 @@ void xenvif_carrier_on(struct xenvif *vif);
 
 extern bool separate_tx_rx_irq;
 
+extern unsigned int xenvif_max_queues;
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 0297980..3f623b4 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -381,7 +381,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	char name[IFNAMSIZ] = {};
 
 	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
-	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup, 1);
+	/* Allocate a netdev with the max. supported number of queues.
+	 * When the guest selects the desired number, it will be updated
+	 * via netif_set_real_num_tx_queues().
+	 */
+	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup,
+			      xenvif_max_queues);
 	if (dev == NULL) {
 		pr_warn("Could not allocate netdev for %s\n", name);
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a32abd6..7dd9049 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -54,6 +54,11 @@
 bool separate_tx_rx_irq = 1;
 module_param(separate_tx_rx_irq, bool, 0644);
 
+unsigned int xenvif_max_queues;
+module_param_named(max_queues, xenvif_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+		"Maximum number of queues per virtual interface");
+
 /*
  * This is the maximum slots a skb can have. If a guest sends a skb
  * which exceeds this limit it is considered malicious.
@@ -1585,6 +1590,9 @@ static int __init netback_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
+	/* Allow as many queues as there are CPUs, by default */
+	xenvif_max_queues = num_online_cpus();
+
 	if (fatal_skb_slots < XEN_NETBK_LEGACY_SLOTS_MAX) {
 		pr_info("fatal_skb_slots too small (%d), bump it to XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n",
 			fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX);
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index f23ea0a..c1ae148 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -20,6 +20,7 @@
 
 #include "common.h"
 #include <linux/vmalloc.h>
+#include <linux/rtnetlink.h>
 
 struct backend_info {
 	struct xenbus_device *dev;
@@ -159,6 +160,12 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		pr_debug("Error writing feature-split-event-channels\n");
 
+	/* Multi-queue support: This is an optional feature. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			"multi-queue-max-queues", "%u", xenvif_max_queues);
+	if (err)
+		pr_debug("Error writing multi-queue-max-queues\n");
+
 	err = xenbus_switch_state(dev, XenbusStateInitWait);
 	if (err)
 		goto fail;
@@ -490,6 +497,23 @@ static void connect(struct backend_info *be)
 	unsigned long credit_bytes, credit_usec;
 	unsigned int queue_index;
 	struct xenvif_queue *queue;
+	unsigned int requested_num_queues;
+
+	/* Check whether the frontend requested multiple queues
+	 * and read the number requested.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			"multi-queue-num-queues",
+			"%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1; /* Fall back to single queue */
+	} else if (requested_num_queues > xenvif_max_queues) {
+		/* buggy or malicious guest */
+		xenbus_dev_fatal(dev, err,
+			"guest requested %u queues, exceeding the maximum of %u.",
+			requested_num_queues, xenvif_max_queues);
+		return;
+	}
 
 	err = xen_net_read_mac(dev, be->vif->fe_dev_addr);
 	if (err) {
@@ -500,9 +524,13 @@ static void connect(struct backend_info *be)
 	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
 	read_xenbus_vif_flags(be);
 
-	be->vif->num_queues = 1;
+	/* Use the number of queues requested by the frontend */
+	be->vif->num_queues = requested_num_queues;
 	be->vif->queues = vzalloc(be->vif->num_queues *
 			sizeof(struct xenvif_queue));
+	rtnl_lock();
+	netif_set_real_num_tx_queues(be->vif->dev, be->vif->num_queues);
+	rtnl_unlock();
 
 	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
 		queue = &be->vif->queues[queue_index];
@@ -547,29 +575,52 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 	unsigned long tx_ring_ref, rx_ring_ref;
 	unsigned int tx_evtchn, rx_evtchn;
 	int err;
+	char *xspath;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+
+	/* If the frontend requested 1 queue, or we have fallen back
+	 * to single queue due to lack of frontend support for multi-
+	 * queue, expect the remaining XenStore keys in the toplevel
+	 * directory. Otherwise, expect them in a subdirectory called
+	 * queue-N.
+	 */
+	if (queue->vif->num_queues == 1) {
+		xspath = (char *)dev->otherend;
+	} else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kzalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM,
+					"reading ring references");
+			return -ENOMEM;
+		}
+		snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend,
+				 queue->id);
+	}
 
-	err = xenbus_gather(XBT_NIL, dev->otherend,
+	err = xenbus_gather(XBT_NIL, xspath,
 			    "tx-ring-ref", "%lu", &tx_ring_ref,
 			    "rx-ring-ref", "%lu", &rx_ring_ref, NULL);
 	if (err) {
 		xenbus_dev_fatal(dev, err,
 				 "reading %s/ring-ref",
-				 dev->otherend);
-		return err;
+				 xspath);
+		goto err;
 	}
 
 	/* Try split event channels first, then single event channel. */
-	err = xenbus_gather(XBT_NIL, dev->otherend,
+	err = xenbus_gather(XBT_NIL, xspath,
 			    "event-channel-tx", "%u", &tx_evtchn,
 			    "event-channel-rx", "%u", &rx_evtchn, NULL);
 	if (err < 0) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend,
+		err = xenbus_scanf(XBT_NIL, xspath,
 				   "event-channel", "%u", &tx_evtchn);
 		if (err < 0) {
 			xenbus_dev_fatal(dev, err,
 					 "reading %s/event-channel(-tx/rx)",
-					 dev->otherend);
-			return err;
+					 xspath);
+			goto err;
 		}
 		rx_evtchn = tx_evtchn;
 	}
@@ -582,10 +633,15 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 				 "mapping shared-frames %lu/%lu port tx %u rx %u",
 				 tx_ring_ref, rx_ring_ref,
 				 tx_evtchn, rx_evtchn);
-		return err;
+		goto err;
 	}
 
-	return 0;
+	err = 0;
+err: /* Regular return falls through with err == 0 */
+	if (xspath != dev->otherend)
+		kfree(xspath);
+
+	return err;
 }
 
 static int read_xenbus_vif_flags(struct backend_info *be)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 net-next 3/5] xen-netfront: Factor queue-specific data into queue struct.
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
  2014-03-03 11:47 ` [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct Andrew J. Bennieston
  2014-03-03 11:47 ` [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues Andrew J. Bennieston
@ 2014-03-03 11:47 ` Andrew J. Bennieston
  2014-03-03 11:47 ` [PATCH V6 net-next 4/5] xen-netfront: Add support for multiple queues Andrew J. Bennieston
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Andrew J. Bennieston @ 2014-03-03 11:47 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, netdev, paul.durrant, david.vrabel,
	Andrew J. Bennieston

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

In preparation for multi-queue support in xen-netfront, move the
queue-specific data from struct netfront_info to struct netfront_queue,
and update the rest of the code to use this.

Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_etherdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.

Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0, selecting the first (and
only) queue.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netfront.c |  945 ++++++++++++++++++++++++++------------------
 1 file changed, 552 insertions(+), 393 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 2b62d79..4f5a431 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -73,6 +73,12 @@ struct netfront_cb {
 #define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
 #define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
 
+/* Queue name is interface name with "-qNNN" appended */
+#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
+
+/* IRQ name is queue name with "-tx" or "-rx" appended */
+#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
+
 struct netfront_stats {
 	u64			rx_packets;
 	u64			tx_packets;
@@ -81,9 +87,12 @@ struct netfront_stats {
 	struct u64_stats_sync	syncp;
 };
 
-struct netfront_info {
-	struct list_head list;
-	struct net_device *netdev;
+struct netfront_info;
+
+struct netfront_queue {
+	unsigned int id; /* Queue ID, 0-based */
+	char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
+	struct netfront_info *info;
 
 	struct napi_struct napi;
 
@@ -93,10 +102,8 @@ struct netfront_info {
 	unsigned int tx_evtchn, rx_evtchn;
 	unsigned int tx_irq, rx_irq;
 	/* Only used when split event channels support is enabled */
-	char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
-	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
-
-	struct xenbus_device *xbdev;
+	char tx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-tx */
+	char rx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-rx */
 
 	spinlock_t   tx_lock;
 	struct xen_netif_tx_front_ring tx;
@@ -140,11 +147,22 @@ struct netfront_info {
 	unsigned long rx_pfn_array[NET_RX_RING_SIZE];
 	struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
 	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
+};
+
+struct netfront_info {
+	struct list_head list;
+	struct net_device *netdev;
+
+	struct xenbus_device *xbdev;
+
+	/* Multi-queue support */
+	unsigned int num_queues;
+	struct netfront_queue *queues;
 
 	/* Statistics */
 	struct netfront_stats __percpu *stats;
 
-	unsigned long rx_gso_checksum_fixup;
+	atomic_t rx_gso_checksum_fixup;
 };
 
 struct netfront_rx_info {
@@ -187,21 +205,21 @@ static int xennet_rxidx(RING_IDX idx)
 	return idx & (NET_RX_RING_SIZE - 1);
 }
 
-static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
+static struct sk_buff *xennet_get_rx_skb(struct netfront_queue *queue,
 					 RING_IDX ri)
 {
 	int i = xennet_rxidx(ri);
-	struct sk_buff *skb = np->rx_skbs[i];
-	np->rx_skbs[i] = NULL;
+	struct sk_buff *skb = queue->rx_skbs[i];
+	queue->rx_skbs[i] = NULL;
 	return skb;
 }
 
-static grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
+static grant_ref_t xennet_get_rx_ref(struct netfront_queue *queue,
 					    RING_IDX ri)
 {
 	int i = xennet_rxidx(ri);
-	grant_ref_t ref = np->grant_rx_ref[i];
-	np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	grant_ref_t ref = queue->grant_rx_ref[i];
+	queue->grant_rx_ref[i] = GRANT_INVALID_REF;
 	return ref;
 }
 
@@ -221,41 +239,40 @@ static bool xennet_can_sg(struct net_device *dev)
 
 static void rx_refill_timeout(unsigned long data)
 {
-	struct net_device *dev = (struct net_device *)data;
-	struct netfront_info *np = netdev_priv(dev);
-	napi_schedule(&np->napi);
+	struct netfront_queue *queue = (struct netfront_queue *)data;
+	napi_schedule(&queue->napi);
 }
 
-static int netfront_tx_slot_available(struct netfront_info *np)
+static int netfront_tx_slot_available(struct netfront_queue *queue)
 {
-	return (np->tx.req_prod_pvt - np->tx.rsp_cons) <
+	return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) <
 		(TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
 }
 
-static void xennet_maybe_wake_tx(struct net_device *dev)
+static void xennet_maybe_wake_tx(struct netfront_queue *queue)
 {
-	struct netfront_info *np = netdev_priv(dev);
+	struct net_device *dev = queue->info->netdev;
+	struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, queue->id);
 
-	if (unlikely(netif_queue_stopped(dev)) &&
-	    netfront_tx_slot_available(np) &&
+	if (unlikely(netif_tx_queue_stopped(dev_queue)) &&
+	    netfront_tx_slot_available(queue) &&
 	    likely(netif_running(dev)))
-		netif_wake_queue(dev);
+		netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
 }
 
-static void xennet_alloc_rx_buffers(struct net_device *dev)
+static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 {
 	unsigned short id;
-	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
 	struct page *page;
 	int i, batch_target, notify;
-	RING_IDX req_prod = np->rx.req_prod_pvt;
+	RING_IDX req_prod = queue->rx.req_prod_pvt;
 	grant_ref_t ref;
 	unsigned long pfn;
 	void *vaddr;
 	struct xen_netif_rx_request *req;
 
-	if (unlikely(!netif_carrier_ok(dev)))
+	if (unlikely(!netif_carrier_ok(queue->info->netdev)))
 		return;
 
 	/*
@@ -264,9 +281,10 @@ static void xennet_alloc_rx_buffers(struct net_device *dev)
 	 * allocator, so should reduce the chance of failed allocation requests
 	 * both for ourself and for other kernel subsystems.
 	 */
-	batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
-	for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
-		skb = __netdev_alloc_skb(dev, RX_COPY_THRESHOLD + NET_IP_ALIGN,
+	batch_target = queue->rx_target - (req_prod - queue->rx.rsp_cons);
+	for (i = skb_queue_len(&queue->rx_batch); i < batch_target; i++) {
+		skb = __netdev_alloc_skb(queue->info->netdev,
+					 RX_COPY_THRESHOLD + NET_IP_ALIGN,
 					 GFP_ATOMIC | __GFP_NOWARN);
 		if (unlikely(!skb))
 			goto no_skb;
@@ -279,7 +297,7 @@ static void xennet_alloc_rx_buffers(struct net_device *dev)
 			kfree_skb(skb);
 no_skb:
 			/* Could not allocate any skbuffs. Try again later. */
-			mod_timer(&np->rx_refill_timer,
+			mod_timer(&queue->rx_refill_timer,
 				  jiffies + (HZ/10));
 
 			/* Any skbuffs queued for refill? Force them out. */
@@ -289,44 +307,44 @@ no_skb:
 		}
 
 		skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE);
-		__skb_queue_tail(&np->rx_batch, skb);
+		__skb_queue_tail(&queue->rx_batch, skb);
 	}
 
 	/* Is the batch large enough to be worthwhile? */
-	if (i < (np->rx_target/2)) {
-		if (req_prod > np->rx.sring->req_prod)
+	if (i < (queue->rx_target/2)) {
+		if (req_prod > queue->rx.sring->req_prod)
 			goto push;
 		return;
 	}
 
 	/* Adjust our fill target if we risked running out of buffers. */
-	if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4)) &&
-	    ((np->rx_target *= 2) > np->rx_max_target))
-		np->rx_target = np->rx_max_target;
+	if (((req_prod - queue->rx.sring->rsp_prod) < (queue->rx_target / 4)) &&
+	    ((queue->rx_target *= 2) > queue->rx_max_target))
+		queue->rx_target = queue->rx_max_target;
 
  refill:
 	for (i = 0; ; i++) {
-		skb = __skb_dequeue(&np->rx_batch);
+		skb = __skb_dequeue(&queue->rx_batch);
 		if (skb == NULL)
 			break;
 
-		skb->dev = dev;
+		skb->dev = queue->info->netdev;
 
 		id = xennet_rxidx(req_prod + i);
 
-		BUG_ON(np->rx_skbs[id]);
-		np->rx_skbs[id] = skb;
+		BUG_ON(queue->rx_skbs[id]);
+		queue->rx_skbs[id] = skb;
 
-		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
+		ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
-		np->grant_rx_ref[id] = ref;
+		queue->grant_rx_ref[id] = ref;
 
 		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 		vaddr = page_address(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 
-		req = RING_GET_REQUEST(&np->rx, req_prod + i);
+		req = RING_GET_REQUEST(&queue->rx, req_prod + i);
 		gnttab_grant_foreign_access_ref(ref,
-						np->xbdev->otherend_id,
+						queue->info->xbdev->otherend_id,
 						pfn_to_mfn(pfn),
 						0);
 
@@ -337,72 +355,76 @@ no_skb:
 	wmb();		/* barrier so backend seens requests */
 
 	/* Above is a suitable barrier to ensure backend will see requests. */
-	np->rx.req_prod_pvt = req_prod + i;
+	queue->rx.req_prod_pvt = req_prod + i;
  push:
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->rx, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->rx, notify);
 	if (notify)
-		notify_remote_via_irq(np->rx_irq);
+		notify_remote_via_irq(queue->rx_irq);
 }
 
 static int xennet_open(struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
-
-	napi_enable(&np->napi);
-
-	spin_lock_bh(&np->rx_lock);
-	if (netif_carrier_ok(dev)) {
-		xennet_alloc_rx_buffers(dev);
-		np->rx.sring->rsp_event = np->rx.rsp_cons + 1;
-		if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
-			napi_schedule(&np->napi);
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
+
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		napi_enable(&queue->napi);
+
+		spin_lock_bh(&queue->rx_lock);
+		if (netif_carrier_ok(dev)) {
+			xennet_alloc_rx_buffers(queue);
+			queue->rx.sring->rsp_event = queue->rx.rsp_cons + 1;
+			if (RING_HAS_UNCONSUMED_RESPONSES(&queue->rx))
+				napi_schedule(&queue->napi);
+		}
+		spin_unlock_bh(&queue->rx_lock);
 	}
-	spin_unlock_bh(&np->rx_lock);
 
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 
 	return 0;
 }
 
-static void xennet_tx_buf_gc(struct net_device *dev)
+static void xennet_tx_buf_gc(struct netfront_queue *queue)
 {
 	RING_IDX cons, prod;
 	unsigned short id;
-	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
 
-	BUG_ON(!netif_carrier_ok(dev));
+	BUG_ON(!netif_carrier_ok(queue->info->netdev));
 
 	do {
-		prod = np->tx.sring->rsp_prod;
+		prod = queue->tx.sring->rsp_prod;
 		rmb(); /* Ensure we see responses up to 'rp'. */
 
-		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
+		for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
 			struct xen_netif_tx_response *txrsp;
 
-			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			txrsp = RING_GET_RESPONSE(&queue->tx, cons);
 			if (txrsp->status == XEN_NETIF_RSP_NULL)
 				continue;
 
 			id  = txrsp->id;
-			skb = np->tx_skbs[id].skb;
+			skb = queue->tx_skbs[id].skb;
 			if (unlikely(gnttab_query_foreign_access(
-				np->grant_tx_ref[id]) != 0)) {
+				queue->grant_tx_ref[id]) != 0)) {
 				pr_alert("%s: warning -- grant still in use by backend domain\n",
 					 __func__);
 				BUG();
 			}
 			gnttab_end_foreign_access_ref(
-				np->grant_tx_ref[id], GNTMAP_readonly);
+				queue->grant_tx_ref[id], GNTMAP_readonly);
 			gnttab_release_grant_reference(
-				&np->gref_tx_head, np->grant_tx_ref[id]);
-			np->grant_tx_ref[id] = GRANT_INVALID_REF;
-			np->grant_tx_page[id] = NULL;
-			add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, id);
+				&queue->gref_tx_head, queue->grant_tx_ref[id]);
+			queue->grant_tx_ref[id] = GRANT_INVALID_REF;
+			queue->grant_tx_page[id] = NULL;
+			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
 
-		np->tx.rsp_cons = prod;
+		queue->tx.rsp_cons = prod;
 
 		/*
 		 * Set a new event, then check for race with update of tx_cons.
@@ -412,21 +434,20 @@ static void xennet_tx_buf_gc(struct net_device *dev)
 		 * data is outstanding: in such cases notification from Xen is
 		 * likely to be the only kick that we'll get.
 		 */
-		np->tx.sring->rsp_event =
-			prod + ((np->tx.sring->req_prod - prod) >> 1) + 1;
+		queue->tx.sring->rsp_event =
+			prod + ((queue->tx.sring->req_prod - prod) >> 1) + 1;
 		mb();		/* update shared area */
-	} while ((cons == prod) && (prod != np->tx.sring->rsp_prod));
+	} while ((cons == prod) && (prod != queue->tx.sring->rsp_prod));
 
-	xennet_maybe_wake_tx(dev);
+	xennet_maybe_wake_tx(queue);
 }
 
-static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
+static void xennet_make_frags(struct sk_buff *skb, struct netfront_queue *queue,
 			      struct xen_netif_tx_request *tx)
 {
-	struct netfront_info *np = netdev_priv(dev);
 	char *data = skb->data;
 	unsigned long mfn;
-	RING_IDX prod = np->tx.req_prod_pvt;
+	RING_IDX prod = queue->tx.req_prod_pvt;
 	int frags = skb_shinfo(skb)->nr_frags;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
@@ -443,19 +464,19 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 		data += tx->size;
 		offset = 0;
 
-		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
-		np->tx_skbs[id].skb = skb_get(skb);
-		tx = RING_GET_REQUEST(&np->tx, prod++);
+		id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
+		queue->tx_skbs[id].skb = skb_get(skb);
+		tx = RING_GET_REQUEST(&queue->tx, prod++);
 		tx->id = id;
-		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+		ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 		BUG_ON((signed short)ref < 0);
 
 		mfn = virt_to_mfn(data);
-		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
+		gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
 						mfn, GNTMAP_readonly);
 
-		np->grant_tx_page[id] = virt_to_page(data);
-		tx->gref = np->grant_tx_ref[id] = ref;
+		queue->grant_tx_page[id] = virt_to_page(data);
+		tx->gref = queue->grant_tx_ref[id] = ref;
 		tx->offset = offset;
 		tx->size = len;
 		tx->flags = 0;
@@ -487,21 +508,21 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 
 			tx->flags |= XEN_NETTXF_more_data;
 
-			id = get_id_from_freelist(&np->tx_skb_freelist,
-						  np->tx_skbs);
-			np->tx_skbs[id].skb = skb_get(skb);
-			tx = RING_GET_REQUEST(&np->tx, prod++);
+			id = get_id_from_freelist(&queue->tx_skb_freelist,
+						  queue->tx_skbs);
+			queue->tx_skbs[id].skb = skb_get(skb);
+			tx = RING_GET_REQUEST(&queue->tx, prod++);
 			tx->id = id;
-			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+			ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 			BUG_ON((signed short)ref < 0);
 
 			mfn = pfn_to_mfn(page_to_pfn(page));
 			gnttab_grant_foreign_access_ref(ref,
-							np->xbdev->otherend_id,
+							queue->info->xbdev->otherend_id,
 							mfn, GNTMAP_readonly);
 
-			np->grant_tx_page[id] = page;
-			tx->gref = np->grant_tx_ref[id] = ref;
+			queue->grant_tx_page[id] = page;
+			tx->gref = queue->grant_tx_ref[id] = ref;
 			tx->offset = offset;
 			tx->size = bytes;
 			tx->flags = 0;
@@ -518,7 +539,7 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	np->tx.req_prod_pvt = prod;
+	queue->tx.req_prod_pvt = prod;
 }
 
 /*
@@ -544,6 +565,12 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
 	return pages;
 }
 
+static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+	/* Stub for later implementation of queue selection */
+	return 0;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -559,6 +586,15 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
+	struct netfront_queue *queue = NULL;
+	u16 queue_index;
+
+	/* Drop the packet if no queues are set up */
+	if (np->num_queues < 1)
+		goto drop;
+	/* Determine which queue to transmit this SKB on */
+	queue_index = skb_get_queue_mapping(skb);
+	queue = &np->queues[queue_index];
 
 	/* If skb->len is too big for wire format, drop skb and alert
 	 * user about misconfiguration.
@@ -578,30 +614,30 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 	}
 
-	spin_lock_irqsave(&np->tx_lock, flags);
+	spin_lock_irqsave(&queue->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
 		     (slots > 1 && !xennet_can_sg(dev)) ||
 		     netif_needs_gso(skb, netif_skb_features(skb)))) {
-		spin_unlock_irqrestore(&np->tx_lock, flags);
+		spin_unlock_irqrestore(&queue->tx_lock, flags);
 		goto drop;
 	}
 
-	i = np->tx.req_prod_pvt;
+	i = queue->tx.req_prod_pvt;
 
-	id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
-	np->tx_skbs[id].skb = skb;
+	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
+	queue->tx_skbs[id].skb = skb;
 
-	tx = RING_GET_REQUEST(&np->tx, i);
+	tx = RING_GET_REQUEST(&queue->tx, i);
 
 	tx->id   = id;
-	ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
 	BUG_ON((signed short)ref < 0);
 	mfn = virt_to_mfn(data);
 	gnttab_grant_foreign_access_ref(
-		ref, np->xbdev->otherend_id, mfn, GNTMAP_readonly);
-	np->grant_tx_page[id] = virt_to_page(data);
-	tx->gref = np->grant_tx_ref[id] = ref;
+		ref, queue->info->xbdev->otherend_id, mfn, GNTMAP_readonly);
+	queue->grant_tx_page[id] = virt_to_page(data);
+	tx->gref = queue->grant_tx_ref[id] = ref;
 	tx->offset = offset;
 	tx->size = len;
 
@@ -617,7 +653,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		struct xen_netif_extra_info *gso;
 
 		gso = (struct xen_netif_extra_info *)
-			RING_GET_REQUEST(&np->tx, ++i);
+			RING_GET_REQUEST(&queue->tx, ++i);
 
 		tx->flags |= XEN_NETTXF_extra_info;
 
@@ -632,14 +668,14 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		gso->flags = 0;
 	}
 
-	np->tx.req_prod_pvt = i + 1;
+	queue->tx.req_prod_pvt = i + 1;
 
-	xennet_make_frags(skb, dev, tx);
+	xennet_make_frags(skb, queue, tx);
 	tx->size = skb->len;
 
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->tx, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->tx, notify);
 	if (notify)
-		notify_remote_via_irq(np->tx_irq);
+		notify_remote_via_irq(queue->tx_irq);
 
 	u64_stats_update_begin(&stats->syncp);
 	stats->tx_bytes += skb->len;
@@ -647,12 +683,12 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	u64_stats_update_end(&stats->syncp);
 
 	/* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
-	xennet_tx_buf_gc(dev);
+	xennet_tx_buf_gc(queue);
 
-	if (!netfront_tx_slot_available(np))
-		netif_stop_queue(dev);
+	if (!netfront_tx_slot_available(queue))
+		netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
 
-	spin_unlock_irqrestore(&np->tx_lock, flags);
+	spin_unlock_irqrestore(&queue->tx_lock, flags);
 
 	return NETDEV_TX_OK;
 
@@ -665,32 +701,37 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 static int xennet_close(struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
-	netif_stop_queue(np->netdev);
-	napi_disable(&np->napi);
+	unsigned int i;
+	struct netfront_queue *queue;
+	netif_tx_stop_all_queues(np->netdev);
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		napi_disable(&queue->napi);
+	}
 	return 0;
 }
 
-static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
+static void xennet_move_rx_slot(struct netfront_queue *queue, struct sk_buff *skb,
 				grant_ref_t ref)
 {
-	int new = xennet_rxidx(np->rx.req_prod_pvt);
-
-	BUG_ON(np->rx_skbs[new]);
-	np->rx_skbs[new] = skb;
-	np->grant_rx_ref[new] = ref;
-	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
-	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
-	np->rx.req_prod_pvt++;
+	int new = xennet_rxidx(queue->rx.req_prod_pvt);
+
+	BUG_ON(queue->rx_skbs[new]);
+	queue->rx_skbs[new] = skb;
+	queue->grant_rx_ref[new] = ref;
+	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->id = new;
+	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->gref = ref;
+	queue->rx.req_prod_pvt++;
 }
 
-static int xennet_get_extras(struct netfront_info *np,
+static int xennet_get_extras(struct netfront_queue *queue,
 			     struct xen_netif_extra_info *extras,
 			     RING_IDX rp)
 
 {
 	struct xen_netif_extra_info *extra;
-	struct device *dev = &np->netdev->dev;
-	RING_IDX cons = np->rx.rsp_cons;
+	struct device *dev = &queue->info->netdev->dev;
+	RING_IDX cons = queue->rx.rsp_cons;
 	int err = 0;
 
 	do {
@@ -705,7 +746,7 @@ static int xennet_get_extras(struct netfront_info *np,
 		}
 
 		extra = (struct xen_netif_extra_info *)
-			RING_GET_RESPONSE(&np->rx, ++cons);
+			RING_GET_RESPONSE(&queue->rx, ++cons);
 
 		if (unlikely(!extra->type ||
 			     extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
@@ -718,33 +759,33 @@ static int xennet_get_extras(struct netfront_info *np,
 			       sizeof(*extra));
 		}
 
-		skb = xennet_get_rx_skb(np, cons);
-		ref = xennet_get_rx_ref(np, cons);
-		xennet_move_rx_slot(np, skb, ref);
+		skb = xennet_get_rx_skb(queue, cons);
+		ref = xennet_get_rx_ref(queue, cons);
+		xennet_move_rx_slot(queue, skb, ref);
 	} while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
-	np->rx.rsp_cons = cons;
+	queue->rx.rsp_cons = cons;
 	return err;
 }
 
-static int xennet_get_responses(struct netfront_info *np,
+static int xennet_get_responses(struct netfront_queue *queue,
 				struct netfront_rx_info *rinfo, RING_IDX rp,
 				struct sk_buff_head *list)
 {
 	struct xen_netif_rx_response *rx = &rinfo->rx;
 	struct xen_netif_extra_info *extras = rinfo->extras;
-	struct device *dev = &np->netdev->dev;
-	RING_IDX cons = np->rx.rsp_cons;
-	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
-	grant_ref_t ref = xennet_get_rx_ref(np, cons);
+	struct device *dev = &queue->info->netdev->dev;
+	RING_IDX cons = queue->rx.rsp_cons;
+	struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
+	grant_ref_t ref = xennet_get_rx_ref(queue, cons);
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int slots = 1;
 	int err = 0;
 	unsigned long ret;
 
 	if (rx->flags & XEN_NETRXF_extra_info) {
-		err = xennet_get_extras(np, extras, rp);
-		cons = np->rx.rsp_cons;
+		err = xennet_get_extras(queue, extras, rp);
+		cons = queue->rx.rsp_cons;
 	}
 
 	for (;;) {
@@ -753,7 +794,7 @@ static int xennet_get_responses(struct netfront_info *np,
 			if (net_ratelimit())
 				dev_warn(dev, "rx->offset: %x, size: %u\n",
 					 rx->offset, rx->status);
-			xennet_move_rx_slot(np, skb, ref);
+			xennet_move_rx_slot(queue, skb, ref);
 			err = -EINVAL;
 			goto next;
 		}
@@ -774,7 +815,7 @@ static int xennet_get_responses(struct netfront_info *np,
 		ret = gnttab_end_foreign_access_ref(ref, 0);
 		BUG_ON(!ret);
 
-		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+		gnttab_release_grant_reference(&queue->gref_rx_head, ref);
 
 		__skb_queue_tail(list, skb);
 
@@ -789,9 +830,9 @@ next:
 			break;
 		}
 
-		rx = RING_GET_RESPONSE(&np->rx, cons + slots);
-		skb = xennet_get_rx_skb(np, cons + slots);
-		ref = xennet_get_rx_ref(np, cons + slots);
+		rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
+		skb = xennet_get_rx_skb(queue, cons + slots);
+		ref = xennet_get_rx_ref(queue, cons + slots);
 		slots++;
 	}
 
@@ -802,7 +843,7 @@ next:
 	}
 
 	if (unlikely(err))
-		np->rx.rsp_cons = cons + slots;
+		queue->rx.rsp_cons = cons + slots;
 
 	return err;
 }
@@ -836,17 +877,17 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
 	return 0;
 }
 
-static RING_IDX xennet_fill_frags(struct netfront_info *np,
+static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 				  struct sk_buff *skb,
 				  struct sk_buff_head *list)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
-	RING_IDX cons = np->rx.rsp_cons;
+	RING_IDX cons = queue->rx.rsp_cons;
 	struct sk_buff *nskb;
 
 	while ((nskb = __skb_dequeue(list))) {
 		struct xen_netif_rx_response *rx =
-			RING_GET_RESPONSE(&np->rx, ++cons);
+			RING_GET_RESPONSE(&queue->rx, ++cons);
 		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
 
 		if (shinfo->nr_frags == MAX_SKB_FRAGS) {
@@ -879,7 +920,7 @@ static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
 	 */
 	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
 		struct netfront_info *np = netdev_priv(dev);
-		np->rx_gso_checksum_fixup++;
+		atomic_inc(&np->rx_gso_checksum_fixup);
 		skb->ip_summed = CHECKSUM_PARTIAL;
 		recalculate_partial_csum = true;
 	}
@@ -891,11 +932,10 @@ static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
 	return skb_checksum_setup(skb, recalculate_partial_csum);
 }
 
-static int handle_incoming_queue(struct net_device *dev,
+static int handle_incoming_queue(struct netfront_queue *queue,
 				 struct sk_buff_head *rxq)
 {
-	struct netfront_info *np = netdev_priv(dev);
-	struct netfront_stats *stats = this_cpu_ptr(np->stats);
+	struct netfront_stats *stats = this_cpu_ptr(queue->info->stats);
 	int packets_dropped = 0;
 	struct sk_buff *skb;
 
@@ -906,12 +946,12 @@ static int handle_incoming_queue(struct net_device *dev,
 			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
 
 		/* Ethernet work: Delayed to here as it peeks the header. */
-		skb->protocol = eth_type_trans(skb, dev);
+		skb->protocol = eth_type_trans(skb, queue->info->netdev);
 
-		if (checksum_setup(dev, skb)) {
+		if (checksum_setup(queue->info->netdev, skb)) {
 			kfree_skb(skb);
 			packets_dropped++;
-			dev->stats.rx_errors++;
+			queue->info->netdev->stats.rx_errors++;
 			continue;
 		}
 
@@ -921,7 +961,7 @@ static int handle_incoming_queue(struct net_device *dev,
 		u64_stats_update_end(&stats->syncp);
 
 		/* Pass it up. */
-		napi_gro_receive(&np->napi, skb);
+		napi_gro_receive(&queue->napi, skb);
 	}
 
 	return packets_dropped;
@@ -929,8 +969,8 @@ static int handle_incoming_queue(struct net_device *dev,
 
 static int xennet_poll(struct napi_struct *napi, int budget)
 {
-	struct netfront_info *np = container_of(napi, struct netfront_info, napi);
-	struct net_device *dev = np->netdev;
+	struct netfront_queue *queue = container_of(napi, struct netfront_queue, napi);
+	struct net_device *dev = queue->info->netdev;
 	struct sk_buff *skb;
 	struct netfront_rx_info rinfo;
 	struct xen_netif_rx_response *rx = &rinfo.rx;
@@ -943,29 +983,29 @@ static int xennet_poll(struct napi_struct *napi, int budget)
 	unsigned long flags;
 	int err;
 
-	spin_lock(&np->rx_lock);
+	spin_lock(&queue->rx_lock);
 
 	skb_queue_head_init(&rxq);
 	skb_queue_head_init(&errq);
 	skb_queue_head_init(&tmpq);
 
-	rp = np->rx.sring->rsp_prod;
+	rp = queue->rx.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
 
-	i = np->rx.rsp_cons;
+	i = queue->rx.rsp_cons;
 	work_done = 0;
 	while ((i != rp) && (work_done < budget)) {
-		memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx));
+		memcpy(rx, RING_GET_RESPONSE(&queue->rx, i), sizeof(*rx));
 		memset(extras, 0, sizeof(rinfo.extras));
 
-		err = xennet_get_responses(np, &rinfo, rp, &tmpq);
+		err = xennet_get_responses(queue, &rinfo, rp, &tmpq);
 
 		if (unlikely(err)) {
 err:
 			while ((skb = __skb_dequeue(&tmpq)))
 				__skb_queue_tail(&errq, skb);
 			dev->stats.rx_errors++;
-			i = np->rx.rsp_cons;
+			i = queue->rx.rsp_cons;
 			continue;
 		}
 
@@ -977,7 +1017,7 @@ err:
 
 			if (unlikely(xennet_set_skb_gso(skb, gso))) {
 				__skb_queue_head(&tmpq, skb);
-				np->rx.rsp_cons += skb_queue_len(&tmpq);
+				queue->rx.rsp_cons += skb_queue_len(&tmpq);
 				goto err;
 			}
 		}
@@ -991,7 +1031,7 @@ err:
 		skb->data_len = rx->status;
 		skb->len += rx->status;
 
-		i = xennet_fill_frags(np, skb, &tmpq);
+		i = xennet_fill_frags(queue, skb, &tmpq);
 
 		if (rx->flags & XEN_NETRXF_csum_blank)
 			skb->ip_summed = CHECKSUM_PARTIAL;
@@ -1000,22 +1040,22 @@ err:
 
 		__skb_queue_tail(&rxq, skb);
 
-		np->rx.rsp_cons = ++i;
+		queue->rx.rsp_cons = ++i;
 		work_done++;
 	}
 
 	__skb_queue_purge(&errq);
 
-	work_done -= handle_incoming_queue(dev, &rxq);
+	work_done -= handle_incoming_queue(queue, &rxq);
 
 	/* If we get a callback with very few responses, reduce fill target. */
 	/* NB. Note exponential increase, linear decrease. */
-	if (((np->rx.req_prod_pvt - np->rx.sring->rsp_prod) >
-	     ((3*np->rx_target) / 4)) &&
-	    (--np->rx_target < np->rx_min_target))
-		np->rx_target = np->rx_min_target;
+	if (((queue->rx.req_prod_pvt - queue->rx.sring->rsp_prod) >
+	     ((3*queue->rx_target) / 4)) &&
+	    (--queue->rx_target < queue->rx_min_target))
+		queue->rx_target = queue->rx_min_target;
 
-	xennet_alloc_rx_buffers(dev);
+	xennet_alloc_rx_buffers(queue);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -1024,14 +1064,14 @@ err:
 
 		local_irq_save(flags);
 
-		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
+		RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do);
 		if (!more_to_do)
 			__napi_complete(napi);
 
 		local_irq_restore(flags);
 	}
 
-	spin_unlock(&np->rx_lock);
+	spin_unlock(&queue->rx_lock);
 
 	return work_done;
 }
@@ -1079,43 +1119,43 @@ static struct rtnl_link_stats64 *xennet_get_stats64(struct net_device *dev,
 	return tot;
 }
 
-static void xennet_release_tx_bufs(struct netfront_info *np)
+static void xennet_release_tx_bufs(struct netfront_queue *queue)
 {
 	struct sk_buff *skb;
 	int i;
 
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
 		/* Skip over entries which are actually freelist references */
-		if (skb_entry_is_link(&np->tx_skbs[i]))
+		if (skb_entry_is_link(&queue->tx_skbs[i]))
 			continue;
 
-		skb = np->tx_skbs[i].skb;
-		get_page(np->grant_tx_page[i]);
-		gnttab_end_foreign_access(np->grant_tx_ref[i],
+		skb = queue->tx_skbs[i].skb;
+		get_page(queue->grant_tx_page[i]);
+		gnttab_end_foreign_access(queue->grant_tx_ref[i],
 					  GNTMAP_readonly,
-					  (unsigned long)page_address(np->grant_tx_page[i]));
-		np->grant_tx_page[i] = NULL;
-		np->grant_tx_ref[i] = GRANT_INVALID_REF;
-		add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, i);
+					  (unsigned long)page_address(queue->grant_tx_page[i]));
+		queue->grant_tx_page[i] = NULL;
+		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
+		add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i);
 		dev_kfree_skb_irq(skb);
 	}
 }
 
-static void xennet_release_rx_bufs(struct netfront_info *np)
+static void xennet_release_rx_bufs(struct netfront_queue *queue)
 {
 	int id, ref;
 
-	spin_lock_bh(&np->rx_lock);
+	spin_lock_bh(&queue->rx_lock);
 
 	for (id = 0; id < NET_RX_RING_SIZE; id++) {
 		struct sk_buff *skb;
 		struct page *page;
 
-		skb = np->rx_skbs[id];
+		skb = queue->rx_skbs[id];
 		if (!skb)
 			continue;
 
-		ref = np->grant_rx_ref[id];
+		ref = queue->grant_rx_ref[id];
 		if (ref == GRANT_INVALID_REF)
 			continue;
 
@@ -1127,21 +1167,27 @@ static void xennet_release_rx_bufs(struct netfront_info *np)
 		get_page(page);
 		gnttab_end_foreign_access(ref, 0,
 					  (unsigned long)page_address(page));
-		np->grant_rx_ref[id] = GRANT_INVALID_REF;
+		queue->grant_rx_ref[id] = GRANT_INVALID_REF;
 
 		kfree_skb(skb);
 	}
 
-	spin_unlock_bh(&np->rx_lock);
+	spin_unlock_bh(&queue->rx_lock);
 }
 
 static void xennet_uninit(struct net_device *dev)
 {
 	struct netfront_info *np = netdev_priv(dev);
-	xennet_release_tx_bufs(np);
-	xennet_release_rx_bufs(np);
-	gnttab_free_grant_references(np->gref_tx_head);
-	gnttab_free_grant_references(np->gref_rx_head);
+	struct netfront_queue *queue;
+	unsigned int i;
+
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		xennet_release_tx_bufs(queue);
+		xennet_release_rx_bufs(queue);
+		gnttab_free_grant_references(queue->gref_tx_head);
+		gnttab_free_grant_references(queue->gref_rx_head);
+	}
 }
 
 static netdev_features_t xennet_fix_features(struct net_device *dev,
@@ -1202,25 +1248,24 @@ static int xennet_set_features(struct net_device *dev,
 
 static irqreturn_t xennet_tx_interrupt(int irq, void *dev_id)
 {
-	struct netfront_info *np = dev_id;
-	struct net_device *dev = np->netdev;
+	struct netfront_queue *queue = dev_id;
 	unsigned long flags;
 
-	spin_lock_irqsave(&np->tx_lock, flags);
-	xennet_tx_buf_gc(dev);
-	spin_unlock_irqrestore(&np->tx_lock, flags);
+	spin_lock_irqsave(&queue->tx_lock, flags);
+	xennet_tx_buf_gc(queue);
+	spin_unlock_irqrestore(&queue->tx_lock, flags);
 
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t xennet_rx_interrupt(int irq, void *dev_id)
 {
-	struct netfront_info *np = dev_id;
-	struct net_device *dev = np->netdev;
+	struct netfront_queue *queue = dev_id;
+	struct net_device *dev = queue->info->netdev;
 
 	if (likely(netif_carrier_ok(dev) &&
-		   RING_HAS_UNCONSUMED_RESPONSES(&np->rx)))
-			napi_schedule(&np->napi);
+		   RING_HAS_UNCONSUMED_RESPONSES(&queue->rx)))
+			napi_schedule(&queue->napi);
 
 	return IRQ_HANDLED;
 }
@@ -1235,7 +1280,11 @@ static irqreturn_t xennet_interrupt(int irq, void *dev_id)
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void xennet_poll_controller(struct net_device *dev)
 {
-	xennet_interrupt(0, dev);
+	/* Poll each queue */
+	struct netfront_info *info = netdev_priv(dev);
+	unsigned int i;
+	for (i = 0; i < info->num_queues; ++i)
+		xennet_interrupt(0, &info->queues[i]);
 }
 #endif
 
@@ -1250,6 +1299,7 @@ static const struct net_device_ops xennet_netdev_ops = {
 	.ndo_validate_addr   = eth_validate_addr,
 	.ndo_fix_features    = xennet_fix_features,
 	.ndo_set_features    = xennet_set_features,
+	.ndo_select_queue    = xennet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = xennet_poll_controller,
 #endif
@@ -1257,66 +1307,27 @@ static const struct net_device_ops xennet_netdev_ops = {
 
 static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 {
-	int i, err;
+	int err;
 	struct net_device *netdev;
 	struct netfront_info *np;
 
-	netdev = alloc_etherdev(sizeof(struct netfront_info));
+	netdev = alloc_etherdev_mq(sizeof(struct netfront_info), 1);
 	if (!netdev)
 		return ERR_PTR(-ENOMEM);
 
 	np                   = netdev_priv(netdev);
 	np->xbdev            = dev;
 
-	spin_lock_init(&np->tx_lock);
-	spin_lock_init(&np->rx_lock);
-
-	skb_queue_head_init(&np->rx_batch);
-	np->rx_target     = RX_DFL_MIN_TARGET;
-	np->rx_min_target = RX_DFL_MIN_TARGET;
-	np->rx_max_target = RX_MAX_TARGET;
-
-	init_timer(&np->rx_refill_timer);
-	np->rx_refill_timer.data = (unsigned long)netdev;
-	np->rx_refill_timer.function = rx_refill_timeout;
+	np->num_queues = 0;
+	np->queues = NULL;
 
 	err = -ENOMEM;
 	np->stats = netdev_alloc_pcpu_stats(struct netfront_stats);
 	if (np->stats == NULL)
 		goto exit;
 
-	/* Initialise tx_skbs as a free chain containing every entry. */
-	np->tx_skb_freelist = 0;
-	for (i = 0; i < NET_TX_RING_SIZE; i++) {
-		skb_entry_set_link(&np->tx_skbs[i], i+1);
-		np->grant_tx_ref[i] = GRANT_INVALID_REF;
-	}
-
-	/* Clear out rx_skbs */
-	for (i = 0; i < NET_RX_RING_SIZE; i++) {
-		np->rx_skbs[i] = NULL;
-		np->grant_rx_ref[i] = GRANT_INVALID_REF;
-		np->grant_tx_page[i] = NULL;
-	}
-
-	/* A grant for every tx ring slot */
-	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
-					  &np->gref_tx_head) < 0) {
-		pr_alert("can't alloc tx grant refs\n");
-		err = -ENOMEM;
-		goto exit_free_stats;
-	}
-	/* A grant for every rx ring slot */
-	if (gnttab_alloc_grant_references(RX_MAX_TARGET,
-					  &np->gref_rx_head) < 0) {
-		pr_alert("can't alloc rx grant refs\n");
-		err = -ENOMEM;
-		goto exit_free_tx;
-	}
-
 	netdev->netdev_ops	= &xennet_netdev_ops;
 
-	netif_napi_add(netdev, &np->napi, xennet_poll, 64);
 	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 				  NETIF_F_GSO_ROBUST;
 	netdev->hw_features	= NETIF_F_SG |
@@ -1342,10 +1353,6 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 
 	return netdev;
 
- exit_free_tx:
-	gnttab_free_grant_references(np->gref_tx_head);
- exit_free_stats:
-	free_percpu(np->stats);
  exit:
 	free_netdev(netdev);
 	return ERR_PTR(err);
@@ -1403,30 +1410,35 @@ static void xennet_end_access(int ref, void *page)
 
 static void xennet_disconnect_backend(struct netfront_info *info)
 {
-	/* Stop old i/f to prevent errors whilst we rebuild the state. */
-	spin_lock_bh(&info->rx_lock);
-	spin_lock_irq(&info->tx_lock);
-	netif_carrier_off(info->netdev);
-	spin_unlock_irq(&info->tx_lock);
-	spin_unlock_bh(&info->rx_lock);
-
-	if (info->tx_irq && (info->tx_irq == info->rx_irq))
-		unbind_from_irqhandler(info->tx_irq, info);
-	if (info->tx_irq && (info->tx_irq != info->rx_irq)) {
-		unbind_from_irqhandler(info->tx_irq, info);
-		unbind_from_irqhandler(info->rx_irq, info);
-	}
-	info->tx_evtchn = info->rx_evtchn = 0;
-	info->tx_irq = info->rx_irq = 0;
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
+
+	for (i = 0; i < info->num_queues; ++i) {
+		/* Stop old i/f to prevent errors whilst we rebuild the state. */
+		spin_lock_bh(&queue->rx_lock);
+		spin_lock_irq(&queue->tx_lock);
+		netif_carrier_off(queue->info->netdev);
+		spin_unlock_irq(&queue->tx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+
+		if (queue->tx_irq && (queue->tx_irq == queue->rx_irq))
+			unbind_from_irqhandler(queue->tx_irq, queue);
+		if (queue->tx_irq && (queue->tx_irq != queue->rx_irq)) {
+			unbind_from_irqhandler(queue->tx_irq, queue);
+			unbind_from_irqhandler(queue->rx_irq, queue);
+		}
+		queue->tx_evtchn = queue->rx_evtchn = 0;
+		queue->tx_irq = queue->rx_irq = 0;
 
-	/* End access and free the pages */
-	xennet_end_access(info->tx_ring_ref, info->tx.sring);
-	xennet_end_access(info->rx_ring_ref, info->rx.sring);
+		/* End access and free the pages */
+		xennet_end_access(queue->tx_ring_ref, queue->tx.sring);
+		xennet_end_access(queue->rx_ring_ref, queue->rx.sring);
 
-	info->tx_ring_ref = GRANT_INVALID_REF;
-	info->rx_ring_ref = GRANT_INVALID_REF;
-	info->tx.sring = NULL;
-	info->rx.sring = NULL;
+		queue->tx_ring_ref = GRANT_INVALID_REF;
+		queue->rx_ring_ref = GRANT_INVALID_REF;
+		queue->tx.sring = NULL;
+		queue->rx.sring = NULL;
+	}
 }
 
 /**
@@ -1467,100 +1479,86 @@ static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
 	return 0;
 }
 
-static int setup_netfront_single(struct netfront_info *info)
+static int setup_netfront_single(struct netfront_queue *queue)
 {
 	int err;
 
-	err = xenbus_alloc_evtchn(info->xbdev, &info->tx_evtchn);
+	err = xenbus_alloc_evtchn(queue->info->xbdev, &queue->tx_evtchn);
 	if (err < 0)
 		goto fail;
 
-	err = bind_evtchn_to_irqhandler(info->tx_evtchn,
+	err = bind_evtchn_to_irqhandler(queue->tx_evtchn,
 					xennet_interrupt,
-					0, info->netdev->name, info);
+					0, queue->info->netdev->name, queue);
 	if (err < 0)
 		goto bind_fail;
-	info->rx_evtchn = info->tx_evtchn;
-	info->rx_irq = info->tx_irq = err;
+	queue->rx_evtchn = queue->tx_evtchn;
+	queue->rx_irq = queue->tx_irq = err;
 
 	return 0;
 
 bind_fail:
-	xenbus_free_evtchn(info->xbdev, info->tx_evtchn);
-	info->tx_evtchn = 0;
+	xenbus_free_evtchn(queue->info->xbdev, queue->tx_evtchn);
+	queue->tx_evtchn = 0;
 fail:
 	return err;
 }
 
-static int setup_netfront_split(struct netfront_info *info)
+static int setup_netfront_split(struct netfront_queue *queue)
 {
 	int err;
 
-	err = xenbus_alloc_evtchn(info->xbdev, &info->tx_evtchn);
+	err = xenbus_alloc_evtchn(queue->info->xbdev, &queue->tx_evtchn);
 	if (err < 0)
 		goto fail;
-	err = xenbus_alloc_evtchn(info->xbdev, &info->rx_evtchn);
+	err = xenbus_alloc_evtchn(queue->info->xbdev, &queue->rx_evtchn);
 	if (err < 0)
 		goto alloc_rx_evtchn_fail;
 
-	snprintf(info->tx_irq_name, sizeof(info->tx_irq_name),
-		 "%s-tx", info->netdev->name);
-	err = bind_evtchn_to_irqhandler(info->tx_evtchn,
+	snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
+		 "%s-tx", queue->name);
+	err = bind_evtchn_to_irqhandler(queue->tx_evtchn,
 					xennet_tx_interrupt,
-					0, info->tx_irq_name, info);
+					0, queue->tx_irq_name, queue);
 	if (err < 0)
 		goto bind_tx_fail;
-	info->tx_irq = err;
+	queue->tx_irq = err;
 
-	snprintf(info->rx_irq_name, sizeof(info->rx_irq_name),
-		 "%s-rx", info->netdev->name);
-	err = bind_evtchn_to_irqhandler(info->rx_evtchn,
+	snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
+		 "%s-rx", queue->name);
+	err = bind_evtchn_to_irqhandler(queue->rx_evtchn,
 					xennet_rx_interrupt,
-					0, info->rx_irq_name, info);
+					0, queue->rx_irq_name, queue);
 	if (err < 0)
 		goto bind_rx_fail;
-	info->rx_irq = err;
+	queue->rx_irq = err;
 
 	return 0;
 
 bind_rx_fail:
-	unbind_from_irqhandler(info->tx_irq, info);
-	info->tx_irq = 0;
+	unbind_from_irqhandler(queue->tx_irq, queue);
+	queue->tx_irq = 0;
 bind_tx_fail:
-	xenbus_free_evtchn(info->xbdev, info->rx_evtchn);
-	info->rx_evtchn = 0;
+	xenbus_free_evtchn(queue->info->xbdev, queue->rx_evtchn);
+	queue->rx_evtchn = 0;
 alloc_rx_evtchn_fail:
-	xenbus_free_evtchn(info->xbdev, info->tx_evtchn);
-	info->tx_evtchn = 0;
+	xenbus_free_evtchn(queue->info->xbdev, queue->tx_evtchn);
+	queue->tx_evtchn = 0;
 fail:
 	return err;
 }
 
-static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
+static int setup_netfront(struct xenbus_device *dev,
+			struct netfront_queue *queue, unsigned int feature_split_evtchn)
 {
 	struct xen_netif_tx_sring *txs;
 	struct xen_netif_rx_sring *rxs;
 	int err;
-	struct net_device *netdev = info->netdev;
-	unsigned int feature_split_evtchn;
 
-	info->tx_ring_ref = GRANT_INVALID_REF;
-	info->rx_ring_ref = GRANT_INVALID_REF;
-	info->rx.sring = NULL;
-	info->tx.sring = NULL;
-	netdev->irq = 0;
-
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-			   "feature-split-event-channels", "%u",
-			   &feature_split_evtchn);
-	if (err < 0)
-		feature_split_evtchn = 0;
-
-	err = xen_net_read_mac(dev, netdev->dev_addr);
-	if (err) {
-		xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename);
-		goto fail;
-	}
+	queue->tx_ring_ref = GRANT_INVALID_REF;
+	queue->rx_ring_ref = GRANT_INVALID_REF;
+	queue->rx.sring = NULL;
+	queue->tx.sring = NULL;
 
 	txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
 	if (!txs) {
@@ -1569,13 +1567,13 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, virt_to_mfn(txs));
 	if (err < 0)
 		goto grant_tx_ring_fail;
+	queue->tx_ring_ref = err;
 
-	info->tx_ring_ref = err;
 	rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
 	if (!rxs) {
 		err = -ENOMEM;
@@ -1583,21 +1581,21 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
+	FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
 
 	err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
 	if (err < 0)
 		goto grant_rx_ring_fail;
-	info->rx_ring_ref = err;
+	queue->rx_ring_ref = err;
 
 	if (feature_split_evtchn)
-		err = setup_netfront_split(info);
+		err = setup_netfront_split(queue);
 	/* setup single event channel if
 	 *  a) feature-split-event-channels == 0
 	 *  b) feature-split-event-channels == 1 but failed to setup
 	 */
 	if (!feature_split_evtchn || (feature_split_evtchn && err))
-		err = setup_netfront_single(info);
+		err = setup_netfront_single(queue);
 
 	if (err)
 		goto alloc_evtchn_fail;
@@ -1608,17 +1606,78 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 	 * granted pages because backend is not accessing it at this point.
 	 */
 alloc_evtchn_fail:
-	gnttab_end_foreign_access_ref(info->rx_ring_ref, 0);
+	gnttab_end_foreign_access_ref(queue->rx_ring_ref, 0);
 grant_rx_ring_fail:
 	free_page((unsigned long)rxs);
 alloc_rx_ring_fail:
-	gnttab_end_foreign_access_ref(info->tx_ring_ref, 0);
+	gnttab_end_foreign_access_ref(queue->tx_ring_ref, 0);
 grant_tx_ring_fail:
 	free_page((unsigned long)txs);
 fail:
 	return err;
 }
 
+/* Queue-specific initialisation
+ * This used to be done in xennet_create_dev() but must now
+ * be run per-queue.
+ */
+static int xennet_init_queue(struct netfront_queue *queue)
+{
+	unsigned short i;
+	int err = 0;
+
+	spin_lock_init(&queue->tx_lock);
+	spin_lock_init(&queue->rx_lock);
+
+	skb_queue_head_init(&queue->rx_batch);
+	queue->rx_target     = RX_DFL_MIN_TARGET;
+	queue->rx_min_target = RX_DFL_MIN_TARGET;
+	queue->rx_max_target = RX_MAX_TARGET;
+
+	init_timer(&queue->rx_refill_timer);
+	queue->rx_refill_timer.data = (unsigned long)queue;
+	queue->rx_refill_timer.function = rx_refill_timeout;
+
+	/* Initialise tx_skbs as a free chain containing every entry. */
+	queue->tx_skb_freelist = 0;
+	for (i = 0; i < NET_TX_RING_SIZE; i++) {
+		skb_entry_set_link(&queue->tx_skbs[i], i+1);
+		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	/* Clear out rx_skbs */
+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+		queue->rx_skbs[i] = NULL;
+		queue->grant_rx_ref[i] = GRANT_INVALID_REF;
+		queue->grant_tx_page[i] = NULL;
+	}
+
+	/* A grant for every tx ring slot */
+	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
+					  &queue->gref_tx_head) < 0) {
+		pr_alert("can't alloc tx grant refs\n");
+		err = -ENOMEM;
+		goto exit;
+	}
+
+	/* A grant for every rx ring slot */
+	if (gnttab_alloc_grant_references(RX_MAX_TARGET,
+					  &queue->gref_rx_head) < 0) {
+		pr_alert("can't alloc rx grant refs\n");
+		err = -ENOMEM;
+		goto exit_free_tx;
+	}
+
+	netif_napi_add(queue->info->netdev, &queue->napi, xennet_poll, 64);
+
+	return 0;
+
+ exit_free_tx:
+	gnttab_free_grant_references(queue->gref_tx_head);
+ exit:
+	return err;
+}
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_netback(struct xenbus_device *dev,
 			   struct netfront_info *info)
@@ -1626,13 +1685,72 @@ static int talk_to_netback(struct xenbus_device *dev,
 	const char *message;
 	struct xenbus_transaction xbt;
 	int err;
+	unsigned int feature_split_evtchn;
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
 
-	/* Create shared ring, alloc event channel. */
-	err = setup_netfront(dev, info);
-	if (err)
+	info->netdev->irq = 0;
+
+	/* Check feature-split-event-channels */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "feature-split-event-channels", "%u",
+			   &feature_split_evtchn);
+	if (err < 0)
+		feature_split_evtchn = 0;
+
+	/* Read mac addr. */
+	err = xen_net_read_mac(dev, info->netdev->dev_addr);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename);
 		goto out;
+	}
+
+	/* Allocate array of queues */
+	info->queues = kcalloc(1, sizeof(struct netfront_queue), GFP_KERNEL);
+	if (!info->queues) {
+		err = -ENOMEM;
+		goto out;
+	}
+	info->num_queues = 1;
+
+	/* Create shared ring, alloc event channel -- for each queue */
+	for (i = 0; i < info->num_queues; ++i) {
+		queue = &info->queues[i];
+		queue->id = i;
+		queue->info = info;
+		err = xennet_init_queue(queue);
+		if (err) {
+			/* xennet_init_queue() cleans up after itself on failure,
+			 * but we still have to clean up any previously initialised
+			 * queues. If i > 0, set info->num_queues to i, then goto
+			 * destroy_ring, which calls xennet_disconnect_backend()
+			 * to tidy up.
+			 */
+			if (i > 0) {
+				info->num_queues = i;
+				goto destroy_ring;
+			} else {
+				goto out;
+			}
+		}
+		err = setup_netfront(dev, queue, feature_split_evtchn);
+		if (err) {
+			/* As for xennet_init_queue(), setup_netfront() will tidy
+			 * up the current queue on error, but we need to clean up
+			 * those already allocated.
+			 */
+			if (i > 0) {
+				info->num_queues = i;
+				goto destroy_ring;
+			} else {
+				goto out;
+			}
+		}
+	}
 
 again:
+	queue = &info->queues[0]; /* Use first queue only */
+
 	err = xenbus_transaction_start(&xbt);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
@@ -1640,34 +1758,34 @@ again:
 	}
 
 	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
-			    info->tx_ring_ref);
+			    queue->tx_ring_ref);
 	if (err) {
 		message = "writing tx ring-ref";
 		goto abort_transaction;
 	}
 	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
-			    info->rx_ring_ref);
+			    queue->rx_ring_ref);
 	if (err) {
 		message = "writing rx ring-ref";
 		goto abort_transaction;
 	}
 
-	if (info->tx_evtchn == info->rx_evtchn) {
+	if (queue->tx_evtchn == queue->rx_evtchn) {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", info->tx_evtchn);
+				    "event-channel", "%u", queue->tx_evtchn);
 		if (err) {
 			message = "writing event-channel";
 			goto abort_transaction;
 		}
 	} else {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-tx", "%u", info->tx_evtchn);
+				    "event-channel-tx", "%u", queue->tx_evtchn);
 		if (err) {
 			message = "writing event-channel-tx";
 			goto abort_transaction;
 		}
 		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-rx", "%u", info->rx_evtchn);
+				    "event-channel-rx", "%u", queue->rx_evtchn);
 		if (err) {
 			message = "writing event-channel-rx";
 			goto abort_transaction;
@@ -1727,6 +1845,9 @@ again:
 	xenbus_dev_fatal(dev, err, "%s", message);
  destroy_ring:
 	xennet_disconnect_backend(info);
+	kfree(info->queues);
+	info->queues = NULL;
+	info->num_queues = 0;
  out:
 	return err;
 }
@@ -1739,6 +1860,8 @@ static int xennet_connect(struct net_device *dev)
 	grant_ref_t ref;
 	struct xen_netif_rx_request *req;
 	unsigned int feature_rx_copy;
+	unsigned int j = 0;
+	struct netfront_queue *queue = NULL;
 
 	err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
 			   "feature-rx-copy", "%u", &feature_rx_copy);
@@ -1759,36 +1882,40 @@ static int xennet_connect(struct net_device *dev)
 	netdev_update_features(dev);
 	rtnl_unlock();
 
-	spin_lock_bh(&np->rx_lock);
-	spin_lock_irq(&np->tx_lock);
+	/* By now, the queue structures have been set up */
+	for (j = 0; j < np->num_queues; ++j) {
+		queue = &np->queues[j];
+		spin_lock_bh(&queue->rx_lock);
+		spin_lock_irq(&queue->tx_lock);
 
-	/* Step 1: Discard all pending TX packet fragments. */
-	xennet_release_tx_bufs(np);
+		/* Step 1: Discard all pending TX packet fragments. */
+		xennet_release_tx_bufs(queue);
 
-	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
-	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
-		skb_frag_t *frag;
-		const struct page *page;
-		if (!np->rx_skbs[i])
-			continue;
+		/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
+		for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
+			skb_frag_t *frag;
+			const struct page *page;
+			if (!queue->rx_skbs[i])
+				continue;
 
-		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
-		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
-		req = RING_GET_REQUEST(&np->rx, requeue_idx);
+			skb = queue->rx_skbs[requeue_idx] = xennet_get_rx_skb(queue, i);
+			ref = queue->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(queue, i);
+			req = RING_GET_REQUEST(&queue->rx, requeue_idx);
 
-		frag = &skb_shinfo(skb)->frags[0];
-		page = skb_frag_page(frag);
-		gnttab_grant_foreign_access_ref(
-			ref, np->xbdev->otherend_id,
-			pfn_to_mfn(page_to_pfn(page)),
-			0);
-		req->gref = ref;
-		req->id   = requeue_idx;
+			frag = &skb_shinfo(skb)->frags[0];
+			page = skb_frag_page(frag);
+			gnttab_grant_foreign_access_ref(
+				ref, queue->info->xbdev->otherend_id,
+				pfn_to_mfn(page_to_pfn(page)),
+				0);
+			req->gref = ref;
+			req->id   = requeue_idx;
 
-		requeue_idx++;
-	}
+			requeue_idx++;
+		}
 
-	np->rx.req_prod_pvt = requeue_idx;
+		queue->rx.req_prod_pvt = requeue_idx;
+	}
 
 	/*
 	 * Step 3: All public and private state should now be sane.  Get
@@ -1797,14 +1924,17 @@ static int xennet_connect(struct net_device *dev)
 	 * packets.
 	 */
 	netif_carrier_on(np->netdev);
-	notify_remote_via_irq(np->tx_irq);
-	if (np->tx_irq != np->rx_irq)
-		notify_remote_via_irq(np->rx_irq);
-	xennet_tx_buf_gc(dev);
-	xennet_alloc_rx_buffers(dev);
-
-	spin_unlock_irq(&np->tx_lock);
-	spin_unlock_bh(&np->rx_lock);
+	for (j = 0; j < np->num_queues; ++j) {
+		queue = &np->queues[j];
+		notify_remote_via_irq(queue->tx_irq);
+		if (queue->tx_irq != queue->rx_irq)
+			notify_remote_via_irq(queue->rx_irq);
+		xennet_tx_buf_gc(queue);
+		xennet_alloc_rx_buffers(queue);
+
+		spin_unlock_irq(&queue->tx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+	}
 
 	return 0;
 }
@@ -1877,7 +2007,7 @@ static void xennet_get_ethtool_stats(struct net_device *dev,
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(xennet_stats); i++)
-		data[i] = *(unsigned long *)(np + xennet_stats[i].offset);
+		data[i] = atomic_read((atomic_t *)(np + xennet_stats[i].offset));
 }
 
 static void xennet_get_strings(struct net_device *dev, u32 stringset, u8 * data)
@@ -1909,7 +2039,10 @@ static ssize_t show_rxbuf_min(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	struct netfront_info *info = netdev_priv(netdev);
 
-	return sprintf(buf, "%u\n", info->rx_min_target);
+	if (info->num_queues)
+		return sprintf(buf, "%u\n", info->queues[0].rx_min_target);
+	else
+		return sprintf(buf, "%u\n", RX_MIN_TARGET);
 }
 
 static ssize_t store_rxbuf_min(struct device *dev,
@@ -1920,6 +2053,8 @@ static ssize_t store_rxbuf_min(struct device *dev,
 	struct netfront_info *np = netdev_priv(netdev);
 	char *endp;
 	unsigned long target;
+	unsigned int i;
+	struct netfront_queue *queue;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -1933,16 +2068,19 @@ static ssize_t store_rxbuf_min(struct device *dev,
 	if (target > RX_MAX_TARGET)
 		target = RX_MAX_TARGET;
 
-	spin_lock_bh(&np->rx_lock);
-	if (target > np->rx_max_target)
-		np->rx_max_target = target;
-	np->rx_min_target = target;
-	if (target > np->rx_target)
-		np->rx_target = target;
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		spin_lock_bh(&queue->rx_lock);
+		if (target > queue->rx_max_target)
+			queue->rx_max_target = target;
+		queue->rx_min_target = target;
+		if (target > queue->rx_target)
+			queue->rx_target = target;
 
-	xennet_alloc_rx_buffers(netdev);
+		xennet_alloc_rx_buffers(queue);
 
-	spin_unlock_bh(&np->rx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+	}
 	return len;
 }
 
@@ -1952,7 +2090,10 @@ static ssize_t show_rxbuf_max(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	struct netfront_info *info = netdev_priv(netdev);
 
-	return sprintf(buf, "%u\n", info->rx_max_target);
+	if (info->num_queues)
+		return sprintf(buf, "%u\n", info->queues[0].rx_max_target);
+	else
+		return sprintf(buf, "%u\n", RX_MAX_TARGET);
 }
 
 static ssize_t store_rxbuf_max(struct device *dev,
@@ -1963,6 +2104,8 @@ static ssize_t store_rxbuf_max(struct device *dev,
 	struct netfront_info *np = netdev_priv(netdev);
 	char *endp;
 	unsigned long target;
+	unsigned int i = 0;
+	struct netfront_queue *queue = NULL;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -1976,16 +2119,19 @@ static ssize_t store_rxbuf_max(struct device *dev,
 	if (target > RX_MAX_TARGET)
 		target = RX_MAX_TARGET;
 
-	spin_lock_bh(&np->rx_lock);
-	if (target < np->rx_min_target)
-		np->rx_min_target = target;
-	np->rx_max_target = target;
-	if (target < np->rx_target)
-		np->rx_target = target;
+	for (i = 0; i < np->num_queues; ++i) {
+		queue = &np->queues[i];
+		spin_lock_bh(&queue->rx_lock);
+		if (target < queue->rx_min_target)
+			queue->rx_min_target = target;
+		queue->rx_max_target = target;
+		if (target < queue->rx_target)
+			queue->rx_target = target;
 
-	xennet_alloc_rx_buffers(netdev);
+		xennet_alloc_rx_buffers(queue);
 
-	spin_unlock_bh(&np->rx_lock);
+		spin_unlock_bh(&queue->rx_lock);
+	}
 	return len;
 }
 
@@ -1995,7 +2141,10 @@ static ssize_t show_rxbuf_cur(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	struct netfront_info *info = netdev_priv(netdev);
 
-	return sprintf(buf, "%u\n", info->rx_target);
+	if (info->num_queues)
+		return sprintf(buf, "%u\n", info->queues[0].rx_target);
+	else
+		return sprintf(buf, "0\n");
 }
 
 static struct device_attribute xennet_attrs[] = {
@@ -2042,6 +2191,8 @@ static const struct xenbus_device_id netfront_ids[] = {
 static int xennet_remove(struct xenbus_device *dev)
 {
 	struct netfront_info *info = dev_get_drvdata(&dev->dev);
+	struct netfront_queue *queue = NULL;
+	unsigned int i = 0;
 
 	dev_dbg(&dev->dev, "%s\n", dev->nodename);
 
@@ -2051,7 +2202,15 @@ static int xennet_remove(struct xenbus_device *dev)
 
 	unregister_netdev(info->netdev);
 
-	del_timer_sync(&info->rx_refill_timer);
+	for (i = 0; i < info->num_queues; ++i) {
+		queue = &info->queues[i];
+		del_timer_sync(&queue->rx_refill_timer);
+	}
+
+	if (info->num_queues) {
+		kfree(info->queues);
+		info->queues = NULL;
+	}
 
 	free_percpu(info->stats);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 net-next 4/5] xen-netfront: Add support for multiple queues
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
                   ` (2 preceding siblings ...)
  2014-03-03 11:47 ` [PATCH V6 net-next 3/5] xen-netfront: Factor queue-specific data into queue struct Andrew J. Bennieston
@ 2014-03-03 11:47 ` Andrew J. Bennieston
  2014-03-03 11:47 ` [PATCH V6 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h Andrew J. Bennieston
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Andrew J. Bennieston @ 2014-03-03 11:47 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, netdev, paul.durrant, david.vrabel,
	Andrew J. Bennieston

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

Build on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.

Check XenStore for multi-queue support, and set up the rings and event
channels accordingly.

Write ring references and event channels to XenStore in a queue
hierarchy if appropriate, or flat when using only one queue.

Update the xennet_select_queue() function to choose the queue on which
to transmit a packet based on the skb hash result.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netfront.c |  175 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 137 insertions(+), 38 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 4f5a431..a0dff31 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -57,6 +57,12 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/grant_table.h>
 
+/* Module parameters */
+static unsigned int xennet_max_queues;
+module_param_named(max_queues, xennet_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+		"Maximum number of queues per virtual interface");
+
 static const struct ethtool_ops xennet_ethtool_ops;
 
 struct netfront_cb {
@@ -565,10 +571,22 @@ static int xennet_count_skb_frag_slots(struct sk_buff *skb)
 	return pages;
 }
 
-static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb)
+static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
+			       void *accel_priv, select_queue_fallback_t fallback)
 {
-	/* Stub for later implementation of queue selection */
-	return 0;
+	struct netfront_info *info = netdev_priv(dev);
+	u32 hash;
+	u16 queue_idx;
+
+	/* First, check if there is only one queue */
+	if (info->num_queues == 1) {
+		queue_idx = 0;
+	} else {
+		hash = skb_get_hash(skb);
+		queue_idx = (u16) (((u64)hash * info->num_queues) >> 32);
+	}
+
+	return queue_idx;
 }
 
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -1311,7 +1329,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	struct net_device *netdev;
 	struct netfront_info *np;
 
-	netdev = alloc_etherdev_mq(sizeof(struct netfront_info), 1);
+	netdev = alloc_etherdev_mq(sizeof(struct netfront_info), xennet_max_queues);
 	if (!netdev)
 		return ERR_PTR(-ENOMEM);
 
@@ -1678,6 +1696,88 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	return err;
 }
 
+static int write_queue_xenstore_keys(struct netfront_queue *queue,
+			   struct xenbus_transaction *xbt, int write_hierarchical)
+{
+	/* Write the queue-specific keys into XenStore in the traditional
+	 * way for a single queue, or in a queue subkeys for multiple
+	 * queues.
+	 */
+	struct xenbus_device *dev = queue->info->xbdev;
+	int err;
+	const char *message;
+	char *path;
+	size_t pathsize;
+
+	/* Choose the correct place to write the keys */
+	if (write_hierarchical) {
+		pathsize = strlen(dev->nodename) + 10;
+		path = kzalloc(pathsize, GFP_KERNEL);
+		if (!path) {
+			err = -ENOMEM;
+			message = "out of memory while writing ring references";
+			goto error;
+		}
+		snprintf(path, pathsize, "%s/queue-%u",
+				dev->nodename, queue->id);
+	} else {
+		path = (char *)dev->nodename;
+	}
+
+	/* Write ring references */
+	err = xenbus_printf(*xbt, path, "tx-ring-ref", "%u",
+			queue->tx_ring_ref);
+	if (err) {
+		message = "writing tx-ring-ref";
+		goto error;
+	}
+
+	err = xenbus_printf(*xbt, path, "rx-ring-ref", "%u",
+			queue->rx_ring_ref);
+	if (err) {
+		message = "writing rx-ring-ref";
+		goto error;
+	}
+
+	/* Write event channels; taking into account both shared
+	 * and split event channel scenarios.
+	 */
+	if (queue->tx_evtchn == queue->rx_evtchn) {
+		/* Shared event channel */
+		err = xenbus_printf(*xbt, path,
+				"event-channel", "%u", queue->tx_evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto error;
+		}
+	} else {
+		/* Split event channels */
+		err = xenbus_printf(*xbt, path,
+				"event-channel-tx", "%u", queue->tx_evtchn);
+		if (err) {
+			message = "writing event-channel-tx";
+			goto error;
+		}
+
+		err = xenbus_printf(*xbt, path,
+				"event-channel-rx", "%u", queue->rx_evtchn);
+		if (err) {
+			message = "writing event-channel-rx";
+			goto error;
+		}
+	}
+
+	if (write_hierarchical)
+		kfree(path);
+	return 0;
+
+error:
+	if (write_hierarchical)
+		kfree(path);
+	xenbus_dev_fatal(dev, err, "%s", message);
+	return err;
+}
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_netback(struct xenbus_device *dev,
 			   struct netfront_info *info)
@@ -1687,10 +1787,18 @@ static int talk_to_netback(struct xenbus_device *dev,
 	int err;
 	unsigned int feature_split_evtchn;
 	unsigned int i = 0;
+	unsigned int max_queues = 0;
 	struct netfront_queue *queue = NULL;
 
 	info->netdev->irq = 0;
 
+	/* Check if backend supports multiple queues */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			"multi-queue-max-queues", "%u", &max_queues);
+	if (err < 0)
+		max_queues = 1;
+	max_queues = min(max_queues, xennet_max_queues);
+
 	/* Check feature-split-event-channels */
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "feature-split-event-channels", "%u",
@@ -1706,12 +1814,13 @@ static int talk_to_netback(struct xenbus_device *dev,
 	}
 
 	/* Allocate array of queues */
-	info->queues = kcalloc(1, sizeof(struct netfront_queue), GFP_KERNEL);
+	info->queues = kcalloc(max_queues, sizeof(struct netfront_queue), GFP_KERNEL);
 	if (!info->queues) {
 		err = -ENOMEM;
 		goto out;
 	}
-	info->num_queues = 1;
+	info->num_queues = max_queues;
+	netif_set_real_num_tx_queues(info->netdev, info->num_queues);
 
 	/* Create shared ring, alloc event channel -- for each queue */
 	for (i = 0; i < info->num_queues; ++i) {
@@ -1749,49 +1858,35 @@ static int talk_to_netback(struct xenbus_device *dev,
 	}
 
 again:
-	queue = &info->queues[0]; /* Use first queue only */
-
 	err = xenbus_transaction_start(&xbt);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
 		goto destroy_ring;
 	}
 
-	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
-			    queue->tx_ring_ref);
-	if (err) {
-		message = "writing tx ring-ref";
-		goto abort_transaction;
-	}
-	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
-			    queue->rx_ring_ref);
-	if (err) {
-		message = "writing rx ring-ref";
-		goto abort_transaction;
-	}
-
-	if (queue->tx_evtchn == queue->rx_evtchn) {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", queue->tx_evtchn);
-		if (err) {
-			message = "writing event-channel";
-			goto abort_transaction;
-		}
+	if (info->num_queues == 1) {
+		err = write_queue_xenstore_keys(&info->queues[0], &xbt, 0); /* flat */
+		if (err)
+			goto abort_transaction_no_dev_fatal;
 	} else {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-tx", "%u", queue->tx_evtchn);
+		/* Write the number of queues */
+		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues",
+				"%u", info->num_queues);
 		if (err) {
-			message = "writing event-channel-tx";
-			goto abort_transaction;
+			message = "writing multi-queue-num-queues";
+			goto abort_transaction_no_dev_fatal;
 		}
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel-rx", "%u", queue->rx_evtchn);
-		if (err) {
-			message = "writing event-channel-rx";
-			goto abort_transaction;
+
+		/* Write the keys for each queue */
+		for (i = 0; i < info->num_queues; ++i) {
+			queue = &info->queues[i];
+			err = write_queue_xenstore_keys(queue, &xbt, 1); /* hierarchical */
+			if (err)
+				goto abort_transaction_no_dev_fatal;
 		}
 	}
 
+	/* The remaining keys are not queue-specific */
 	err = xenbus_printf(xbt, dev->nodename, "request-rx-copy", "%u",
 			    1);
 	if (err) {
@@ -1841,8 +1936,9 @@ again:
 	return 0;
 
  abort_transaction:
-	xenbus_transaction_end(xbt, 1);
 	xenbus_dev_fatal(dev, err, "%s", message);
+abort_transaction_no_dev_fatal:
+	xenbus_transaction_end(xbt, 1);
  destroy_ring:
 	xennet_disconnect_backend(info);
 	kfree(info->queues);
@@ -2236,6 +2332,9 @@ static int __init netif_init(void)
 
 	pr_info("Initialising Xen virtual ethernet driver\n");
 
+	/* Allow as many queues as there are CPUs, by default */
+	xennet_max_queues = num_online_cpus();
+
 	return xenbus_register_frontend(&netfront_driver);
 }
 module_init(netif_init);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
                   ` (3 preceding siblings ...)
  2014-03-03 11:47 ` [PATCH V6 net-next 4/5] xen-netfront: Add support for multiple queues Andrew J. Bennieston
@ 2014-03-03 11:47 ` Andrew J. Bennieston
  2014-03-03 12:53   ` [PATCH V6 net-next 5/5] xen-net{back,front}: " Paul Durrant
  2014-03-14 16:04   ` Ian Campbell
  2014-03-05 12:38 ` [PATCH V6 net-next 0/5] xen-net{back,front}: Multiple transmit and receive queues Wei Liu
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 21+ messages in thread
From: Andrew J. Bennieston @ 2014-03-03 11:47 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, netdev, paul.durrant, david.vrabel,
	Andrew J. Bennieston

From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>

Document the multi-queue feature in terms of XenStore keys to be written
by the backend and by the frontend.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
 include/xen/interface/io/netif.h |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index c50061d..a375a75 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -51,6 +51,35 @@
  */
 
 /*
+ * Multiple transmit and receive queues:
+ * If supported, the backend will write "multi-queue-max-queues" and set its
+ * value to the maximum supported number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues", set to the number they wish to use.
+ *
+ * Queues replicate the shared rings and event channels, and
+ * "feature-split-event-channels" may be used when using multiple queues.
+ * Each queue consists of one shared ring pair, i.e. there must be the same
+ * number of tx and rx rings.
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channel (or event-channel-{tx,rx}) and {tx,rx}-ring-ref keys,
+ * instead writing them under sub-keys having the name "queue-N" where
+ * N is the integer ID of the queue for which those keys belong. Queues 
+ * are indexed from zero.
+ *
+ * Mapping of packets to queues is considered to be a function of the
+ * transmitting system (backend or frontend) and is not negotiated
+ * between the two. Guests are free to transmit packets on any queue
+ * they choose, provided it has been set up correctly.
+ */
+
+/*
  * "feature-no-csum-offload" should be used to turn IPv4 TCP/UDP checksum
  * offload off or on. If it is missing then the feature is assumed to be on.
  * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP checksum
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH V6 net-next 5/5] xen-net{back,front}: Document multi-queue feature in netif.h
  2014-03-03 11:47 ` [PATCH V6 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h Andrew J. Bennieston
@ 2014-03-03 12:53   ` Paul Durrant
  2014-03-14 16:04   ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Paul Durrant @ 2014-03-03 12:53 UTC (permalink / raw)
  To: Andrew Bennieston, xen-devel@lists.xenproject.org
  Cc: Ian Campbell, Wei Liu, netdev@vger.kernel.org, David Vrabel,
	Andrew Bennieston

> -----Original Message-----
> From: Andrew J. Bennieston [mailto:andrew.bennieston@citrix.com]
> Sent: 03 March 2014 11:48
> To: xen-devel@lists.xenproject.org
> Cc: Ian Campbell; Wei Liu; Paul Durrant; netdev@vger.kernel.org; David
> Vrabel; Andrew Bennieston
> Subject: [PATCH V6 net-next 5/5] xen-net{back,front}: Document multi-
> queue feature in netif.h
> 
> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
> 
> Document the multi-queue feature in terms of XenStore keys to be written
> by the backend and by the frontend.
> 
> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
>  include/xen/interface/io/netif.h |   29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/xen/interface/io/netif.h
> b/include/xen/interface/io/netif.h
> index c50061d..a375a75 100644
> --- a/include/xen/interface/io/netif.h
> +++ b/include/xen/interface/io/netif.h
> @@ -51,6 +51,35 @@
>   */
> 
>  /*
> + * Multiple transmit and receive queues:
> + * If supported, the backend will write "multi-queue-max-queues" and set
> its
> + * value to the maximum supported number of queues.
> + * Frontends that are aware of this feature and wish to use it can write the
> + * key "multi-queue-num-queues", set to the number they wish to use.
> + *
> + * Queues replicate the shared rings and event channels, and
> + * "feature-split-event-channels" may be used when using multiple queues.
> + * Each queue consists of one shared ring pair, i.e. there must be the same
> + * number of tx and rx rings.
> + *
> + * For frontends requesting just one queue, the usual event-channel and
> + * ring-ref keys are written as before, simplifying the backend processing
> + * to avoid distinguishing between a frontend that doesn't understand the
> + * multi-queue feature, and one that does, but requested only one queue.
> + *
> + * Frontends requesting two or more queues must not write the toplevel
> + * event-channel (or event-channel-{tx,rx}) and {tx,rx}-ring-ref keys,
> + * instead writing them under sub-keys having the name "queue-N" where
> + * N is the integer ID of the queue for which those keys belong. Queues
> + * are indexed from zero.
> + *
> + * Mapping of packets to queues is considered to be a function of the
> + * transmitting system (backend or frontend) and is not negotiated
> + * between the two. Guests are free to transmit packets on any queue
> + * they choose, provided it has been set up correctly.
> + */
> +
> +/*
>   * "feature-no-csum-offload" should be used to turn IPv4 TCP/UDP
> checksum
>   * offload off or on. If it is missing then the feature is assumed to be on.
>   * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP
> checksum
> --
> 1.7.10.4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 0/5] xen-net{back,front}: Multiple transmit and receive queues
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
                   ` (4 preceding siblings ...)
  2014-03-03 11:47 ` [PATCH V6 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h Andrew J. Bennieston
@ 2014-03-05 12:38 ` Wei Liu
  2014-03-05 17:46 ` [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: " Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Wei Liu @ 2014-03-05 12:38 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, ian.campbell, wei.liu2, paul.durrant, netdev,
	david.vrabel

This series looks good enough for me.

IIRC Ian said it's still in his queue so I will wait for his final
review.

Thanks
Wei.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
                   ` (5 preceding siblings ...)
  2014-03-05 12:38 ` [PATCH V6 net-next 0/5] xen-net{back,front}: Multiple transmit and receive queues Wei Liu
@ 2014-03-05 17:46 ` Konrad Rzeszutek Wilk
  2014-03-06 16:52 ` Sander Eikelenboom
  2014-03-14 16:10 ` [PATCH V6 net-next 0/5] xen-net{back,front}: " Ian Campbell
  8 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-05 17:46 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, netdev, paul.durrant, wei.liu2, ian.campbell,
	david.vrabel, Adnan Misherfi

On Mon, Mar 03, 2014 at 11:47:44AM +0000, Andrew J. Bennieston wrote:
> 
> This patch series implements multiple transmit and receive queues (i.e.
> multiple shared rings) for the xen virtual network interfaces.
> 
> The series is split up as follows:
>  - Patches 1 and 3 factor out the queue-specific data for netback and
>     netfront respectively, and modify the rest of the code to use these
>     as appropriate.
>  - Patches 2 and 4 introduce new XenStore keys to negotiate and use
>    multiple shared rings and event channels, and code to connect these
>    as appropriate.
>  - Patch 5 documents the XenStore keys required for the new feature
>    in include/xen/interface/io/netif.h

It looks like you got all the Acks from the Xen tree maintainers (David or
Boris or me) - so that is all set.

It should probably go through David Miller's tree - and the mechanics of which
tree it should go escapes me (And I think you need have him on your 'To:')
part of the email.
> 
> All other transmit and receive processing remains unchanged, i.e. there
> is a kthread per queue and a NAPI context per queue.
> 
> The performance of these patches has been analysed in detail, with
> results available at:
> 
> http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing
> 
> To summarise:
>   * Using multiple queues allows a VM to transmit at line rate on a 10
>     Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
>     with a single queue.
>   * For intra-host VM--VM traffic, eight queues provide 171% of the
>     throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
>   * There is a corresponding increase in total CPU usage, i.e. this is a
>     scaling out over available resources, not an efficiency improvement.
>   * Results depend on the availability of sufficient CPUs, as well as the
>     distribution of interrupts and the distribution of TCP streams across
>     the queues.
> 
> Queue selection is currently achieved via an L4 hash on the packet (i.e.
> TCP src/dst port, IP src/dst address) and is not negotiated between the
> frontend and backend, since only one option exists. Future patches to
> support other frontends (particularly Windows) will need to add some
> capability to negotiate not only the hash algorithm selection, but also
> allow the frontend to specify some parameters to this.
> 
> Note that queue selection is a decision by the transmitting system about
> which queue to use for a particular packet. In general, the algorithm
> may differ between the frontend and the backend with no adverse effects.
> 
> Queue-specific XenStore entries for ring references and event channels
> are stored hierarchically, i.e. under .../queue-N/... where N varies
> from 0 to one less than the requested number of queues (inclusive). If
> only one queue is requested, it falls back to the flat structure where
> the ring references and event channels are written at the same level as
> other vif information.
> 
> V6:
> - Use 'max_queues' as the module param. name for both netback and netfront.
> 
> V5:
> - Fix bug in xenvif_free() that could lead to an attempt to transmit an
>   skb after the queue structures had been freed.
> - Improve the XenStore protocol documentation in netif.h.
> - Fix IRQ_NAME_SIZE double-accounting for null terminator.
> - Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
> - Don't initialise a local variable that is set in both branches (xspath).
> 
> V4:
> - Add MODULE_PARM_DESC() for the multi-queue parameters for netback
>   and netfront modules.
> - Move del_timer_sync() in netfront to after unregister_netdev, which
>   restores the order in which these functions were called before applying
>   these patches.
> 
> V3:
> - Further indentation and style fixups.
> 
> V2:
> - Rebase onto net-next.
> - Change queue->number to queue->id.
> - Add atomic operations around the small number of stats variables that
>   are not queue-specific or per-cpu.
> - Fixup formatting and style issues.
> - XenStore protocol changes documented in netif.h.
> - Default max. number of queues to num_online_cpus().
> - Check requested number of queues does not exceed maximum.
> 
> --
> Andrew J. Bennieston
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
                   ` (6 preceding siblings ...)
  2014-03-05 17:46 ` [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: " Konrad Rzeszutek Wilk
@ 2014-03-06 16:52 ` Sander Eikelenboom
  2014-03-14 16:06   ` Ian Campbell
  2014-03-14 16:10 ` [PATCH V6 net-next 0/5] xen-net{back,front}: " Ian Campbell
  8 siblings, 1 reply; 21+ messages in thread
From: Sander Eikelenboom @ 2014-03-06 16:52 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, netdev, paul.durrant, wei.liu2, ian.campbell,
	david.vrabel


Monday, March 3, 2014, 12:47:44 PM, you wrote:


> This patch series implements multiple transmit and receive queues (i.e.
> multiple shared rings) for the xen virtual network interfaces.

> The series is split up as follows:
>  - Patches 1 and 3 factor out the queue-specific data for netback and
>     netfront respectively, and modify the rest of the code to use these
>     as appropriate.
>  - Patches 2 and 4 introduce new XenStore keys to negotiate and use
>    multiple shared rings and event channels, and code to connect these
>    as appropriate.
>  - Patch 5 documents the XenStore keys required for the new feature
>    in include/xen/interface/io/netif.h

> All other transmit and receive processing remains unchanged, i.e. there
> is a kthread per queue and a NAPI context per queue.

> The performance of these patches has been analysed in detail, with
> results available at:

> http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing

> To summarise:
>   * Using multiple queues allows a VM to transmit at line rate on a 10
>     Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
>     with a single queue.
>   * For intra-host VM--VM traffic, eight queues provide 171% of the
>     throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
>   * There is a corresponding increase in total CPU usage, i.e. this is a
>     scaling out over available resources, not an efficiency improvement.
>   * Results depend on the availability of sufficient CPUs, as well as the
>     distribution of interrupts and the distribution of TCP streams across
>     the queues.

> Queue selection is currently achieved via an L4 hash on the packet (i.e.
> TCP src/dst port, IP src/dst address) and is not negotiated between the
> frontend and backend, since only one option exists. Future patches to
> support other frontends (particularly Windows) will need to add some
> capability to negotiate not only the hash algorithm selection, but also
> allow the frontend to specify some parameters to this.

> Note that queue selection is a decision by the transmitting system about
> which queue to use for a particular packet. In general, the algorithm
> may differ between the frontend and the backend with no adverse effects.

> Queue-specific XenStore entries for ring references and event channels
> are stored hierarchically, i.e. under .../queue-N/... where N varies
> from 0 to one less than the requested number of queues (inclusive). If
> only one queue is requested, it falls back to the flat structure where
> the ring references and event channels are written at the same level as
> other vif information.

> V6:
> - Use 'max_queues' as the module param. name for both netback and netfront.

> V5:
> - Fix bug in xenvif_free() that could lead to an attempt to transmit an
>   skb after the queue structures had been freed.
> - Improve the XenStore protocol documentation in netif.h.
> - Fix IRQ_NAME_SIZE double-accounting for null terminator.
> - Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
> - Don't initialise a local variable that is set in both branches (xspath).

> V4:
> - Add MODULE_PARM_DESC() for the multi-queue parameters for netback
>   and netfront modules.
> - Move del_timer_sync() in netfront to after unregister_netdev, which
>   restores the order in which these functions were called before applying
>   these patches.

> V3:
> - Further indentation and style fixups.

> V2:
> - Rebase onto net-next.
- Change queue->>number to queue->id.
> - Add atomic operations around the small number of stats variables that
>   are not queue-specific or per-cpu.
> - Fixup formatting and style issues.
> - XenStore protocol changes documented in netif.h.
> - Default max. number of queues to num_online_cpus().
> - Check requested number of queues does not exceed maximum.

> --
> Andrew J. Bennieston

Hi Andrew,

Just tried your series but i ran into this lockdep warning:

[    0.932289]
[    0.932293] =============================================
[    0.932297] [ INFO: possible recursive locking detected ]
[    0.932302] 3.14.0-rc5-20140306-xennext-netnext-bennie+ #1 Not tainted
[    0.932306] ---------------------------------------------
[    0.932311] xenwatch/26 is trying to acquire lock:
[    0.932315]  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
[    0.932328]
[    0.932328] but task is already holding lock:
[    0.932333]  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
[    0.932343]
[    0.932343] other info that might help us debug this:
[    0.932348]  Possible unsafe locking scenario:
[    0.932348]
[    0.932353]        CPU0
[    0.932355]        ----
[    0.932358]   lock(&(&queue->rx_lock)->rlock);
[    0.932363]   lock(&(&queue->rx_lock)->rlock);
[    0.932367]
[    0.932367]  *** DEADLOCK ***
[    0.932367]
[    0.932372]  May be due to missing lock nesting notation
[    0.932372]
[    0.932378] 3 locks held by xenwatch/26:
[    0.935540]  #0:  (xenwatch_mutex){+.+.+.}, at: [<ffffffff81581d96>] xenwatch_thread+0x86/0x130
[    0.935540]  #1:  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
[    0.935540]  #2:  (&(&queue->tx_lock)->rlock){......}, at: [<ffffffff817b3101>] netback_changed+0xc91/0xea0
[    0.935540]
[    0.935540] stack backtrace:
[    0.935540] CPU: 1 PID: 26 Comm: xenwatch Not tainted 3.14.0-rc5-20140306-xennext-netnext-bennie+ #1
[    0.935540]  ffffffff82766230 ffff88001eac3b98 ffffffff81b83684 ffff88001e97d870
[    0.935540]  ffffffff82766230 ffff88001eac3c68 ffffffff81115b7e 00000000000233a0
[    0.935540]  ffffffff00000003 ffffffff82766230 ffffffff82ca7ec0 5001f47aeae10000
[    0.935540] Call Trace:
[    0.935540]  [<ffffffff81b83684>] dump_stack+0x46/0x58
[    0.935540]  [<ffffffff81115b7e>] __lock_acquire+0x86e/0x2220
[    0.935540]  [<ffffffff811e40be>] ? kfree+0x1ee/0x200
[    0.935540]  [<ffffffff81117b9d>] lock_acquire+0xbd/0x150
[    0.935540]  [<ffffffff817b30f4>] ? netback_changed+0xc84/0xea0
[    0.935540]  [<ffffffff81b8c4fe>] ? mutex_unlock+0xe/0x10
[    0.935540]  [<ffffffff817b00f4>] ? xennet_release_tx_bufs+0x104/0x110
[    0.935540]  [<ffffffff81b8d7cf>] _raw_spin_lock_bh+0x3f/0x50
[    0.935540]  [<ffffffff817b30f4>] ? netback_changed+0xc84/0xea0
[    0.935540]  [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
[    0.935540]  [<ffffffff815835f0>] xenbus_otherend_changed+0xb0/0xc0
[    0.935540]  [<ffffffff81581d10>] ? xs_watch+0x60/0x60
[    0.935540]  [<ffffffff815851d3>] backend_changed+0x13/0x20
[    0.935540]  [<ffffffff81581d55>] xenwatch_thread+0x45/0x130
[    0.935540]  [<ffffffff8110d590>] ? __init_waitqueue_head+0x60/0x60
[    0.935540]  [<ffffffff810ee394>] kthread+0xe4/0x100
[    0.935540]  [<ffffffff81b8ddb0>] ? _raw_spin_unlock_irq+0x30/0x50
[    0.935540]  [<ffffffff810ee2b0>] ? __init_kthread_worker+0x70/0x70
[    0.935540]  [<ffffffff81b8efbc>] ret_from_fork+0x7c/0xb0
[    0.935540]  [<ffffffff810ee2b0>] ? __init_kthread_worker+0x70/0x70



--
Sander

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct.
  2014-03-03 11:47 ` [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct Andrew J. Bennieston
@ 2014-03-14 15:55   ` Ian Campbell
  2014-03-17 11:53     ` Andrew Bennieston
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2014-03-14 15:55 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
> 
> In preparation for multi-queue support in xen-netback, move the
> queue-specific data from struct xenvif into struct xenvif_queue, and
> update the rest of the code to use this.
> 
> Also[...]
> 
> Finally,[...]

This is already quite a big patch, and I don't think the commit log
covers everything it changes/refactors, does it?

It's always a good idea to break these things apart but in particular
separating the mechanical stuff (s/vif/queue/g) from the non-mechanical
stuff, since the mechanical stuff is essentially trivial to review and
getting it out the way makes the non-mechanical stuff much easier to
check (or even spot).


> 
> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  drivers/net/xen-netback/common.h    |   85 ++++--
>  drivers/net/xen-netback/interface.c |  329 ++++++++++++++--------
>  drivers/net/xen-netback/netback.c   |  530 ++++++++++++++++++-----------------
>  drivers/net/xen-netback/xenbus.c    |   87 ++++--
>  4 files changed, 608 insertions(+), 423 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index ae413a2..4176539 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -108,17 +108,39 @@ struct xenvif_rx_meta {
>   */
>  #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
>  
> -struct xenvif {
> -	/* Unique identifier for this interface. */
> -	domid_t          domid;
> -	unsigned int     handle;
> +/* Queue name is interface name with "-qNNN" appended */
> +#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)

One more than necessary? Or does IFNAMSIZ not include the NULL? (I can't
figure out if it does or not!)

> [...] 
> -	/* This array is allocated seperately as it is large */
> -	struct gnttab_copy *grant_copy_op;
> +	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];

Is this deliberate? It seems like a retrograde step reverting parts of
ac3d5ac27735 "xen-netback: fix guest-receive-side array sizes" from Paul
(at least you are nuking a speeling erorr)

How does this series interact with Zoltan's foreign mapping one? Badly I
should imagine, are you going to rebase?

> +	/* First, check if there is only one queue to optimise the
> +	 * single-queue or old frontend scenario.
> +	 */
> +	if (vif->num_queues == 1) {
> +		queue_index = 0;
> +	} else {
> +		/* Use skb_get_hash to obtain an L4 hash if available */
> +		hash = skb_get_hash(skb);
> +		queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);

No modulo num_queues here?

Is the multiply and shift from some best practice somewhere? Or else
what is it doing?


> +	/* Obtain the queue to be used to transmit this packet */
> +	index = skb_get_queue_mapping(skb);
> +	if (index >= vif->num_queues)
> +		index = 0; /* Fall back to queue 0 if out of range */

Is this actually allowed to happen?

Even if yes, not modulo num_queue so spread it around a bit?

>  static void xenvif_up(struct xenvif *vif)
>  {
> -	napi_enable(&vif->napi);
> -	enable_irq(vif->tx_irq);
> -	if (vif->tx_irq != vif->rx_irq)
> -		enable_irq(vif->rx_irq);
> -	xenvif_check_rx_xenvif(vif);
> +	struct xenvif_queue *queue = NULL;
> +	unsigned int queue_index;
> +
> +	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {

This vif->num_queues -- is it the same as dev->num_tx_queues? Or areew
there differing concepts of queue around?

> +		queue = &vif->queues[queue_index];
> +		napi_enable(&queue->napi);
> +		enable_irq(queue->tx_irq);
> +		if (queue->tx_irq != queue->rx_irq)
> +			enable_irq(queue->rx_irq);
> +		xenvif_check_rx_xenvif(queue);
> +	}
>  }
>  
>  static void xenvif_down(struct xenvif *vif)
>  {
> -	napi_disable(&vif->napi);
> -	disable_irq(vif->tx_irq);
> -	if (vif->tx_irq != vif->rx_irq)
> -		disable_irq(vif->rx_irq);
> -	del_timer_sync(&vif->credit_timeout);
> +	struct xenvif_queue *queue = NULL;
> +	unsigned int queue_index;

Why unsigned?

> @@ -496,9 +497,30 @@ static void connect(struct backend_info *be)
>  		return;
>  	}
>  
> -	xen_net_read_rate(dev, &be->vif->credit_bytes,
> -			  &be->vif->credit_usec);
> -	be->vif->remaining_credit = be->vif->credit_bytes;
> +	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
> +	read_xenbus_vif_flags(be);
> +
> +	be->vif->num_queues = 1;
> +	be->vif->queues = vzalloc(be->vif->num_queues *
> +			sizeof(struct xenvif_queue));
> +
> +	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
> +		queue = &be->vif->queues[queue_index];
> +		queue->vif = be->vif;
> +		queue->id = queue_index;
> +		snprintf(queue->name, sizeof(queue->name), "%s-q%u",
> +				be->vif->dev->name, queue->id);
> +
> +		xenvif_init_queue(queue);
> +
> +		queue->remaining_credit = credit_bytes;
> +
> +		err = connect_rings(be, queue);
> +		if (err)
> +			goto err;
> +	}
> +
> +	xenvif_carrier_on(be->vif);
>  
>  	unregister_hotplug_status_watch(be);
>  	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
> @@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
>  	if (!err)
>  		be->have_hotplug_status_watch = 1;
>  
> -	netif_wake_queue(be->vif->dev);
> +	netif_tx_wake_all_queues(be->vif->dev);
> +
> +	return;
> +
> +err:
> +	vfree(be->vif->queues);
> +	be->vif->queues = NULL;
> +	be->vif->num_queues = 0;
> +	return;

Do you not need to unwind the setup already done on the previous queues
before the failure?

>  }
>  
> 
> -static int connect_rings(struct backend_info *be)
> +static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>  {
> -	struct xenvif *vif = be->vif;
>  	struct xenbus_device *dev = be->dev;
>  	unsigned long tx_ring_ref, rx_ring_ref;
> -	unsigned int tx_evtchn, rx_evtchn, rx_copy;
> +	unsigned int tx_evtchn, rx_evtchn;
>  	int err;
> -	int val;
>  
>  	err = xenbus_gather(XBT_NIL, dev->otherend,
>  			    "tx-ring-ref", "%lu", &tx_ring_ref,
> @@ -546,6 +574,27 @@ static int connect_rings(struct backend_info *be)
>  		rx_evtchn = tx_evtchn;
>  	}
>  
> +	/* Map the shared frame, irq etc. */
> +	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
> +			     tx_evtchn, rx_evtchn);
> +	if (err) {
> +		xenbus_dev_fatal(dev, err,
> +				 "mapping shared-frames %lu/%lu port tx %u rx %u",
> +				 tx_ring_ref, rx_ring_ref,
> +				 tx_evtchn, rx_evtchn);
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues
  2014-03-03 11:47 ` [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues Andrew J. Bennieston
@ 2014-03-14 16:03   ` Ian Campbell
  2014-03-18 10:48     ` Andrew Bennieston
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2014-03-14 16:03 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
> 
> Builds on the refactoring of the previous patch to implement multiple
> queues between xen-netfront and xen-netback.
> 
> Writes the maximum supported number of queues into XenStore, and reads
> the values written by the frontend to determine how many queues to use.

> 
> Ring references and event channels are read from XenStore on a per-queue
> basis and rings are connected accordingly.
> 
> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  drivers/net/xen-netback/common.h    |    2 +
>  drivers/net/xen-netback/interface.c |    7 +++-
>  drivers/net/xen-netback/netback.c   |    8 ++++
>  drivers/net/xen-netback/xenbus.c    |   76 ++++++++++++++++++++++++++++++-----
>  4 files changed, 82 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 4176539..e72bf38 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -261,4 +261,6 @@ void xenvif_carrier_on(struct xenvif *vif);
>  
>  extern bool separate_tx_rx_irq;
>  
> +extern unsigned int xenvif_max_queues;
> +
>  #endif /* __XEN_NETBACK__COMMON_H__ */
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 0297980..3f623b4 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -381,7 +381,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	char name[IFNAMSIZ] = {};
>  
>  	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
> -	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup, 1);
> +	/* Allocate a netdev with the max. supported number of queues.
> +	 * When the guest selects the desired number, it will be updated
> +	 * via netif_set_real_num_tx_queues().

Does this allocate and then waste a load of resources? Or does it free
them when you shrink things?

I suppose it is not possible to allocate small and grow or you'd have
done so?

Can the host/guest admin change the number of queues on the fly?

>  	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
>  	read_xenbus_vif_flags(be);
>  
> -	be->vif->num_queues = 1;
> +	/* Use the number of queues requested by the frontend */
> +	be->vif->num_queues = requested_num_queues;
>  	be->vif->queues = vzalloc(be->vif->num_queues *
>  			sizeof(struct xenvif_queue));
> +	rtnl_lock();
> +	netif_set_real_num_tx_queues(be->vif->dev, be->vif->num_queues);
> +	rtnl_unlock();

I'm always a bit suspicious of this construct -- it makes me thing the
call is happening from the wrong context and that the right context
would naturally hold the lock already.

>  
>  	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
>  		queue = &be->vif->queues[queue_index];
> @@ -547,29 +575,52 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>  	unsigned long tx_ring_ref, rx_ring_ref;
>  	unsigned int tx_evtchn, rx_evtchn;
>  	int err;
> +	char *xspath;
> +	size_t xspathsize;
> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +
> +	/* If the frontend requested 1 queue, or we have fallen back
> +	 * to single queue due to lack of frontend support for multi-
> +	 * queue, expect the remaining XenStore keys in the toplevel
> +	 * directory. Otherwise, expect them in a subdirectory called
> +	 * queue-N.
> +	 */
> +	if (queue->vif->num_queues == 1) {
> +		xspath = (char *)dev->otherend;

Casting away a const is naughty. Either make xspath const or if that
isn't possible make it dynamic in all cases with a strcpy in this
degenerate case.

> +	} else {
> +		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
> +		xspath = kzalloc(xspathsize, GFP_KERNEL);
> +		if (!xspath) {
> +			xenbus_dev_fatal(dev, -ENOMEM,
> +					"reading ring references");
> +			return -ENOMEM;
> +		}
> +		snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend,
> +				 queue->id);
> +	}
>  
[...]
> @@ -582,10 +633,15 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>  				 "mapping shared-frames %lu/%lu port tx %u rx %u",
>  				 tx_ring_ref, rx_ring_ref,
>  				 tx_evtchn, rx_evtchn);
> -		return err;
> +		goto err;
>  	}
>  
> -	return 0;
> +	err = 0;
> +err: /* Regular return falls through with err == 0 */
> +	if (xspath != dev->otherend)
> +		kfree(xspath);

Yet another reason to not cast away the const!

> +
> +	return err;
>  }
>  
>  static int read_xenbus_vif_flags(struct backend_info *be)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 5/5] xen-net{back,front}: Document multi-queue feature in netif.h
  2014-03-03 11:47 ` [PATCH V6 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h Andrew J. Bennieston
  2014-03-03 12:53   ` [PATCH V6 net-next 5/5] xen-net{back,front}: " Paul Durrant
@ 2014-03-14 16:04   ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2014-03-14 16:04 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
> 
> Document the multi-queue feature in terms of XenStore keys to be written
> by the backend and by the frontend.
> 
> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
> ---
>  include/xen/interface/io/netif.h |   29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
> index c50061d..a375a75 100644
> --- a/include/xen/interface/io/netif.h
> +++ b/include/xen/interface/io/netif.h

I assume this is (going to be) identical to the patch to the Xen copy of
these headers. So once I ack that feel free to sync the text and apply
the ack here too.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
  2014-03-06 16:52 ` Sander Eikelenboom
@ 2014-03-14 16:06   ` Ian Campbell
  2014-03-14 16:21     ` Sander Eikelenboom
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2014-03-14 16:06 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Andrew J. Bennieston, xen-devel, netdev, paul.durrant, wei.liu2,
	david.vrabel

On Thu, 2014-03-06 at 17:52 +0100, Sander Eikelenboom wrote:
> Hi Andrew,
> 
> Just tried your series but i ran into this lockdep warning:

In the guest (so netfront), correct?

> 
> [    0.932289]
> [    0.932293] =============================================
> [    0.932297] [ INFO: possible recursive locking detected ]
> [    0.932302] 3.14.0-rc5-20140306-xennext-netnext-bennie+ #1 Not tainted
> [    0.932306] ---------------------------------------------
> [    0.932311] xenwatch/26 is trying to acquire lock:
> [    0.932315]  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
> [    0.932328]
> [    0.932328] but task is already holding lock:
> [    0.932333]  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
> [    0.932343]
> [    0.932343] other info that might help us debug this:
> [    0.932348]  Possible unsafe locking scenario:
> [    0.932348]
> [    0.932353]        CPU0
> [    0.932355]        ----
> [    0.932358]   lock(&(&queue->rx_lock)->rlock);
> [    0.932363]   lock(&(&queue->rx_lock)->rlock);
> [    0.932367]
> [    0.932367]  *** DEADLOCK ***
> [    0.932367]
> [    0.932372]  May be due to missing lock nesting notation
> [    0.932372]
> [    0.932378] 3 locks held by xenwatch/26:
> [    0.935540]  #0:  (xenwatch_mutex){+.+.+.}, at: [<ffffffff81581d96>] xenwatch_thread+0x86/0x130
> [    0.935540]  #1:  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
> [    0.935540]  #2:  (&(&queue->tx_lock)->rlock){......}, at: [<ffffffff817b3101>] netback_changed+0xc91/0xea0
> [    0.935540]
> [    0.935540] stack backtrace:
> [    0.935540] CPU: 1 PID: 26 Comm: xenwatch Not tainted 3.14.0-rc5-20140306-xennext-netnext-bennie+ #1
> [    0.935540]  ffffffff82766230 ffff88001eac3b98 ffffffff81b83684 ffff88001e97d870
> [    0.935540]  ffffffff82766230 ffff88001eac3c68 ffffffff81115b7e 00000000000233a0
> [    0.935540]  ffffffff00000003 ffffffff82766230 ffffffff82ca7ec0 5001f47aeae10000
> [    0.935540] Call Trace:
> [    0.935540]  [<ffffffff81b83684>] dump_stack+0x46/0x58
> [    0.935540]  [<ffffffff81115b7e>] __lock_acquire+0x86e/0x2220
> [    0.935540]  [<ffffffff811e40be>] ? kfree+0x1ee/0x200
> [    0.935540]  [<ffffffff81117b9d>] lock_acquire+0xbd/0x150
> [    0.935540]  [<ffffffff817b30f4>] ? netback_changed+0xc84/0xea0
> [    0.935540]  [<ffffffff81b8c4fe>] ? mutex_unlock+0xe/0x10
> [    0.935540]  [<ffffffff817b00f4>] ? xennet_release_tx_bufs+0x104/0x110
> [    0.935540]  [<ffffffff81b8d7cf>] _raw_spin_lock_bh+0x3f/0x50
> [    0.935540]  [<ffffffff817b30f4>] ? netback_changed+0xc84/0xea0
> [    0.935540]  [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
> [    0.935540]  [<ffffffff815835f0>] xenbus_otherend_changed+0xb0/0xc0
> [    0.935540]  [<ffffffff81581d10>] ? xs_watch+0x60/0x60
> [    0.935540]  [<ffffffff815851d3>] backend_changed+0x13/0x20
> [    0.935540]  [<ffffffff81581d55>] xenwatch_thread+0x45/0x130
> [    0.935540]  [<ffffffff8110d590>] ? __init_waitqueue_head+0x60/0x60
> [    0.935540]  [<ffffffff810ee394>] kthread+0xe4/0x100
> [    0.935540]  [<ffffffff81b8ddb0>] ? _raw_spin_unlock_irq+0x30/0x50
> [    0.935540]  [<ffffffff810ee2b0>] ? __init_kthread_worker+0x70/0x70
> [    0.935540]  [<ffffffff81b8efbc>] ret_from_fork+0x7c/0xb0
> [    0.935540]  [<ffffffff810ee2b0>] ? __init_kthread_worker+0x70/0x70
> 
> 
> 
> --
> Sander
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 0/5] xen-net{back,front}: Multiple transmit and receive queues
  2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
                   ` (7 preceding siblings ...)
  2014-03-06 16:52 ` Sander Eikelenboom
@ 2014-03-14 16:10 ` Ian Campbell
  2014-03-14 16:16   ` [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: " Ian Campbell
  8 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2014-03-14 16:10 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
> This patch series implements multiple transmit and receive queues (i.e.
> multiple shared rings) for the xen virtual network interfaces.
> 
> The series is split up as follows:
>  - Patches 1 and 3 factor out the queue-specific data for netback and
>     netfront respectively, and modify the rest of the code to use these
>     as appropriate.
>  - Patches 2 and 4 introduce new XenStore keys to negotiate and use
>    multiple shared rings and event channels, and code to connect these
>    as appropriate.
>  - Patch 5 documents the XenStore keys required for the new feature
>    in include/xen/interface/io/netif.h
> 
> All other transmit and receive processing remains unchanged, i.e. there
> is a kthread per queue and a NAPI context per queue.
> 
> The performance of these patches has been analysed in detail, with
> results available at:
> 
> http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing
> 
> To summarise:
>   * Using multiple queues allows a VM to transmit at line rate on a 10
>     Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
>     with a single queue.
>   * For intra-host VM--VM traffic, eight queues provide 171% of the
>     throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.

>From the graphs it looks like 8 queues doesn't offer that much over 4
and the bulk of the improvement comes from going to just 2 queues.

Any idea what the bottleneck is? i.e. why does the graph flatten so
soon?

>   * There is a corresponding increase in total CPU usage, i.e. this is a
>     scaling out over available resources, not an efficiency improvement.

corresponding to the number of queues or the throughput improvement?
i.e. is it 8x or 1.71x with 8 queues?

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
  2014-03-14 16:10 ` [PATCH V6 net-next 0/5] xen-net{back,front}: " Ian Campbell
@ 2014-03-14 16:16   ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2014-03-14 16:16 UTC (permalink / raw)
  To: Andrew J. Bennieston
  Cc: xen-devel, paul.durrant, wei.liu2, david.vrabel, netdev

On Fri, 2014-03-14 at 16:10 +0000, Ian Campbell wrote:
> On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
> > This patch series implements multiple transmit and receive queues (i.e.
> > multiple shared rings) for the xen virtual network interfaces.
> > 
> > The series is split up as follows:
> >  - Patches 1 and 3 factor out the queue-specific data for netback and
> >     netfront respectively, and modify the rest of the code to use these
> >     as appropriate.
> >  - Patches 2 and 4 introduce new XenStore keys to negotiate and use
> >    multiple shared rings and event channels, and code to connect these
> >    as appropriate.
> >  - Patch 5 documents the XenStore keys required for the new feature
> >    in include/xen/interface/io/netif.h
> > 
> > All other transmit and receive processing remains unchanged, i.e. there
> > is a kthread per queue and a NAPI context per queue.
> > 
> > The performance of these patches has been analysed in detail, with
> > results available at:
> > 
> > http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing
> > 
> > To summarise:
> >   * Using multiple queues allows a VM to transmit at line rate on a 10
> >     Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
> >     with a single queue.
> >   * For intra-host VM--VM traffic, eight queues provide 171% of the
> >     throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
> 
> From the graphs it looks like 8 queues doesn't offer that much over 4
> and the bulk of the improvement comes from going to just 2 queues.
> 
> Any idea what the bottleneck is? i.e. why does the graph flatten so
> soon?

It's going offbox over a 0G link isn't it, so ignore me.

> 
> >   * There is a corresponding increase in total CPU usage, i.e. this is a
> >     scaling out over available resources, not an efficiency improvement.
> 
> corresponding to the number of queues or the throughput improvement?
> i.e. is it 8x or 1.71x with 8 queues?
> 
> Ian.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues
  2014-03-14 16:06   ` Ian Campbell
@ 2014-03-14 16:21     ` Sander Eikelenboom
  0 siblings, 0 replies; 21+ messages in thread
From: Sander Eikelenboom @ 2014-03-14 16:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew J. Bennieston, xen-devel, netdev, paul.durrant, wei.liu2,
	david.vrabel


Friday, March 14, 2014, 5:06:15 PM, you wrote:

> On Thu, 2014-03-06 at 17:52 +0100, Sander Eikelenboom wrote:
>> Hi Andrew,
>> 
>> Just tried your series but i ran into this lockdep warning:

> In the guest (so netfront), correct?

Erhmm yes correct, sorry for not mentioning that.

>> 
>> [    0.932289]
>> [    0.932293] =============================================
>> [    0.932297] [ INFO: possible recursive locking detected ]
>> [    0.932302] 3.14.0-rc5-20140306-xennext-netnext-bennie+ #1 Not tainted
>> [    0.932306] ---------------------------------------------
>> [    0.932311] xenwatch/26 is trying to acquire lock:
>> [    0.932315]  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
>> [    0.932328]
>> [    0.932328] but task is already holding lock:
>> [    0.932333]  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
>> [    0.932343]
>> [    0.932343] other info that might help us debug this:
>> [    0.932348]  Possible unsafe locking scenario:
>> [    0.932348]
>> [    0.932353]        CPU0
>> [    0.932355]        ----
>> [    0.932358]   lock(&(&queue->rx_lock)->rlock);
>> [    0.932363]   lock(&(&queue->rx_lock)->rlock);
>> [    0.932367]
>> [    0.932367]  *** DEADLOCK ***
>> [    0.932367]
>> [    0.932372]  May be due to missing lock nesting notation
>> [    0.932372]
>> [    0.932378] 3 locks held by xenwatch/26:
>> [    0.935540]  #0:  (xenwatch_mutex){+.+.+.}, at: [<ffffffff81581d96>] xenwatch_thread+0x86/0x130
>> [    0.935540]  #1:  (&(&queue->rx_lock)->rlock){+.....}, at: [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
>> [    0.935540]  #2:  (&(&queue->tx_lock)->rlock){......}, at: [<ffffffff817b3101>] netback_changed+0xc91/0xea0
>> [    0.935540]
>> [    0.935540] stack backtrace:
>> [    0.935540] CPU: 1 PID: 26 Comm: xenwatch Not tainted 3.14.0-rc5-20140306-xennext-netnext-bennie+ #1
>> [    0.935540]  ffffffff82766230 ffff88001eac3b98 ffffffff81b83684 ffff88001e97d870
>> [    0.935540]  ffffffff82766230 ffff88001eac3c68 ffffffff81115b7e 00000000000233a0
>> [    0.935540]  ffffffff00000003 ffffffff82766230 ffffffff82ca7ec0 5001f47aeae10000
>> [    0.935540] Call Trace:
>> [    0.935540]  [<ffffffff81b83684>] dump_stack+0x46/0x58
>> [    0.935540]  [<ffffffff81115b7e>] __lock_acquire+0x86e/0x2220
>> [    0.935540]  [<ffffffff811e40be>] ? kfree+0x1ee/0x200
>> [    0.935540]  [<ffffffff81117b9d>] lock_acquire+0xbd/0x150
>> [    0.935540]  [<ffffffff817b30f4>] ? netback_changed+0xc84/0xea0
>> [    0.935540]  [<ffffffff81b8c4fe>] ? mutex_unlock+0xe/0x10
>> [    0.935540]  [<ffffffff817b00f4>] ? xennet_release_tx_bufs+0x104/0x110
>> [    0.935540]  [<ffffffff81b8d7cf>] _raw_spin_lock_bh+0x3f/0x50
>> [    0.935540]  [<ffffffff817b30f4>] ? netback_changed+0xc84/0xea0
>> [    0.935540]  [<ffffffff817b30f4>] netback_changed+0xc84/0xea0
>> [    0.935540]  [<ffffffff815835f0>] xenbus_otherend_changed+0xb0/0xc0
>> [    0.935540]  [<ffffffff81581d10>] ? xs_watch+0x60/0x60
>> [    0.935540]  [<ffffffff815851d3>] backend_changed+0x13/0x20
>> [    0.935540]  [<ffffffff81581d55>] xenwatch_thread+0x45/0x130
>> [    0.935540]  [<ffffffff8110d590>] ? __init_waitqueue_head+0x60/0x60
>> [    0.935540]  [<ffffffff810ee394>] kthread+0xe4/0x100
>> [    0.935540]  [<ffffffff81b8ddb0>] ? _raw_spin_unlock_irq+0x30/0x50
>> [    0.935540]  [<ffffffff810ee2b0>] ? __init_kthread_worker+0x70/0x70
>> [    0.935540]  [<ffffffff81b8efbc>] ret_from_fork+0x7c/0xb0
>> [    0.935540]  [<ffffffff810ee2b0>] ? __init_kthread_worker+0x70/0x70
>> 
>> 
>> 
>> --
>> Sander
>> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct.
  2014-03-14 15:55   ` Ian Campbell
@ 2014-03-17 11:53     ` Andrew Bennieston
  2014-03-17 12:19       ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Bennieston @ 2014-03-17 11:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On 14/03/14 15:55, Ian Campbell wrote:
> On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
>> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
>>
>> In preparation for multi-queue support in xen-netback, move the
>> queue-specific data from struct xenvif into struct xenvif_queue, and
>> update the rest of the code to use this.
>>
>> Also[...]
>>
>> Finally,[...]
>
> This is already quite a big patch, and I don't think the commit log
> covers everything it changes/refactors, does it?
>
> It's always a good idea to break these things apart but in particular
> separating the mechanical stuff (s/vif/queue/g) from the non-mechanical
> stuff, since the mechanical stuff is essentially trivial to review and
> getting it out the way makes the non-mechanical stuff much easier to
> check (or even spot).
>

The vast majority of changes in this patch are s/vif/queue/g. The rest
are related changes, such as inserting loops over queues, and moving
queue-specific initialisation away from the vif-wide initialisation, so
that it can be done once per queue.

I consider these things to be logically related and definitely within
the purview of this single patch. Without doing this, it is difficult to
get a patch that results in something that even compiles, without
putting in a bunch of placeholder code that will be removed in the very
next patch.

When I split this feature into multiple patches, I took care to group
as little as possible into this first patch (and the same for netfront).
It is still a large patch, but by my count most of this is a simple
replacement of vif with queue...

A first-order approximation, searching for line pairs where the first
has 'vif' and the second has 'queue', yields:

➜  xen-netback git:(saturn) git show HEAD~4 | grep -A 1 vif | grep queue 
| wc -l
380

i.e. 760 (=380*2) lines out of the 2240 (~ 40%) are trivial replacements
of vif with queue, and this is not counting multi-line replacements, of
which there are many. What remains is mostly adding loops over these
queues. This could, in principle, be done in a second patch, but the
impact of this is small.

>
>>
>> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
>> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
>> ---
>>   drivers/net/xen-netback/common.h    |   85 ++++--
>>   drivers/net/xen-netback/interface.c |  329 ++++++++++++++--------
>>   drivers/net/xen-netback/netback.c   |  530 ++++++++++++++++++-----------------
>>   drivers/net/xen-netback/xenbus.c    |   87 ++++--
>>   4 files changed, 608 insertions(+), 423 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index ae413a2..4176539 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -108,17 +108,39 @@ struct xenvif_rx_meta {
>>    */
>>   #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
>>
>> -struct xenvif {
>> -	/* Unique identifier for this interface. */
>> -	domid_t          domid;
>> -	unsigned int     handle;
>> +/* Queue name is interface name with "-qNNN" appended */
>> +#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
>
> One more than necessary? Or does IFNAMSIZ not include the NULL? (I can't
> figure out if it does or not!)

interface.c contains the line:
snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);

This suggests that IFNAMSIZ counts the trailing NULL, so I can reduce
this count by 1 on that basis.

>
>> [...]
>> -	/* This array is allocated seperately as it is large */
>> -	struct gnttab_copy *grant_copy_op;
>> +	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>
> Is this deliberate? It seems like a retrograde step reverting parts of
> ac3d5ac27735 "xen-netback: fix guest-receive-side array sizes" from Paul
> (at least you are nuking a speeling erorr)

Yes, this was deliberate. These arrays were moved out to avoid problems
with kmalloc for the struct net_device (which contains the struct xenvif
in its netdev_priv space). Since the queues are now allocated via
vzalloc, there is no need to do separate allocations (with the
requirement to also separately free on every error/teardown path) so I
moved these back into the main queue structure.

>
> How does this series interact with Zoltan's foreign mapping one? Badly I
> should imagine, are you going to rebase?

I'm working on the rebase right now.

>
>> +	/* First, check if there is only one queue to optimise the
>> +	 * single-queue or old frontend scenario.
>> +	 */
>> +	if (vif->num_queues == 1) {
>> +		queue_index = 0;
>> +	} else {
>> +		/* Use skb_get_hash to obtain an L4 hash if available */
>> +		hash = skb_get_hash(skb);
>> +		queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);
>
> No modulo num_queues here?
>
> Is the multiply and shift from some best practice somewhere? Or else
> what is it doing?

It seems to be what a bunch of other net drivers do in this scenario. I
guess the reasoning is it'll be faster than a mod num_queues.

>
>
>> +	/* Obtain the queue to be used to transmit this packet */
>> +	index = skb_get_queue_mapping(skb);
>> +	if (index >= vif->num_queues)
>> +		index = 0; /* Fall back to queue 0 if out of range */
>
> Is this actually allowed to happen?
>
> Even if yes, not modulo num_queue so spread it around a bit?

This probably isn't allowed to happen. I figured it didn't hurt to be a
little defensive with the code here, and falling back to queue 0 is a
fairly safe thing to do.

>>   static void xenvif_up(struct xenvif *vif)
>>   {
>> -	napi_enable(&vif->napi);
>> -	enable_irq(vif->tx_irq);
>> -	if (vif->tx_irq != vif->rx_irq)
>> -		enable_irq(vif->rx_irq);
>> -	xenvif_check_rx_xenvif(vif);
>> +	struct xenvif_queue *queue = NULL;
>> +	unsigned int queue_index;
>> +
>> +	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
>
> This vif->num_queues -- is it the same as dev->num_tx_queues? Or areew
> there differing concepts of queue around?

It should be the same as dev->real_num_tx_queues, which may be less than
dev->num_tx_queues.

>> +		queue = &vif->queues[queue_index];
>> +		napi_enable(&queue->napi);
>> +		enable_irq(queue->tx_irq);
>> +		if (queue->tx_irq != queue->rx_irq)
>> +			enable_irq(queue->rx_irq);
>> +		xenvif_check_rx_xenvif(queue);
>> +	}
>>   }
>>
>>   static void xenvif_down(struct xenvif *vif)
>>   {
>> -	napi_disable(&vif->napi);
>> -	disable_irq(vif->tx_irq);
>> -	if (vif->tx_irq != vif->rx_irq)
>> -		disable_irq(vif->rx_irq);
>> -	del_timer_sync(&vif->credit_timeout);
>> +	struct xenvif_queue *queue = NULL;
>> +	unsigned int queue_index;
>
> Why unsigned?
Why not? You can't have a negative number of queues. Zero indicates "I
don't have any set up yet". I'm not expecting people to have 4 billion
or so queues, but equally I can't see a valid use for negative values
here.

>
>> @@ -496,9 +497,30 @@ static void connect(struct backend_info *be)
>>   		return;
>>   	}
>>
>> -	xen_net_read_rate(dev, &be->vif->credit_bytes,
>> -			  &be->vif->credit_usec);
>> -	be->vif->remaining_credit = be->vif->credit_bytes;
>> +	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
>> +	read_xenbus_vif_flags(be);
>> +
>> +	be->vif->num_queues = 1;
>> +	be->vif->queues = vzalloc(be->vif->num_queues *
>> +			sizeof(struct xenvif_queue));
>> +
>> +	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
>> +		queue = &be->vif->queues[queue_index];
>> +		queue->vif = be->vif;
>> +		queue->id = queue_index;
>> +		snprintf(queue->name, sizeof(queue->name), "%s-q%u",
>> +				be->vif->dev->name, queue->id);
>> +
>> +		xenvif_init_queue(queue);
>> +
>> +		queue->remaining_credit = credit_bytes;
>> +
>> +		err = connect_rings(be, queue);
>> +		if (err)
>> +			goto err;
>> +	}
>> +
>> +	xenvif_carrier_on(be->vif);
>>
>>   	unregister_hotplug_status_watch(be);
>>   	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
>> @@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
>>   	if (!err)
>>   		be->have_hotplug_status_watch = 1;
>>
>> -	netif_wake_queue(be->vif->dev);
>> +	netif_tx_wake_all_queues(be->vif->dev);
>> +
>> +	return;
>> +
>> +err:
>> +	vfree(be->vif->queues);
>> +	be->vif->queues = NULL;
>> +	be->vif->num_queues = 0;
>> +	return;
>
> Do you not need to unwind the setup already done on the previous queues
> before the failure?


Err... yes. I was sure that code existed at some point, but I can't find
it now. Oops!


-Andrew
>
>>   }
>>
>>
>> -static int connect_rings(struct backend_info *be)
>> +static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>>   {
>> -	struct xenvif *vif = be->vif;
>>   	struct xenbus_device *dev = be->dev;
>>   	unsigned long tx_ring_ref, rx_ring_ref;
>> -	unsigned int tx_evtchn, rx_evtchn, rx_copy;
>> +	unsigned int tx_evtchn, rx_evtchn;
>>   	int err;
>> -	int val;
>>
>>   	err = xenbus_gather(XBT_NIL, dev->otherend,
>>   			    "tx-ring-ref", "%lu", &tx_ring_ref,
>> @@ -546,6 +574,27 @@ static int connect_rings(struct backend_info *be)
>>   		rx_evtchn = tx_evtchn;
>>   	}
>>
>> +	/* Map the shared frame, irq etc. */
>> +	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
>> +			     tx_evtchn, rx_evtchn);
>> +	if (err) {
>> +		xenbus_dev_fatal(dev, err,
>> +				 "mapping shared-frames %lu/%lu port tx %u rx %u",
>> +				 tx_ring_ref, rx_ring_ref,
>> +				 tx_evtchn, rx_evtchn);
>> +		return err;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct.
  2014-03-17 11:53     ` Andrew Bennieston
@ 2014-03-17 12:19       ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2014-03-17 12:19 UTC (permalink / raw)
  To: Andrew Bennieston; +Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On Mon, 2014-03-17 at 11:53 +0000, Andrew Bennieston wrote:
> On 14/03/14 15:55, Ian Campbell wrote:
> > On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
> >> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
> >>
> >> In preparation for multi-queue support in xen-netback, move the
> >> queue-specific data from struct xenvif into struct xenvif_queue, and
> >> update the rest of the code to use this.
> >>
> >> Also[...]
> >>
> >> Finally,[...]
> >
> > This is already quite a big patch, and I don't think the commit log
> > covers everything it changes/refactors, does it?
> >
> > It's always a good idea to break these things apart but in particular
> > separating the mechanical stuff (s/vif/queue/g) from the non-mechanical
> > stuff, since the mechanical stuff is essentially trivial to review and
> > getting it out the way makes the non-mechanical stuff much easier to
> > check (or even spot).
> >
> 
> The vast majority of changes in this patch are s/vif/queue/g. The rest
> are related changes, such as inserting loops over queues, and moving
> queue-specific initialisation away from the vif-wide initialisation, so
> that it can be done once per queue.
> 
> I consider these things to be logically related and definitely within
> the purview of this single patch. Without doing this, it is difficult to
> get a patch that results in something that even compiles, without
> putting in a bunch of placeholder code that will be removed in the very
> next patch.

Well, I'd have introduced a single xenvif_queue instance without telling
the core network stuff we were doing multiqueue yet which would allow
all the function arguments etc to be changed mechanically without doing
a lot of the other work at the same time, like refactoring the carrier
handling or even adding the loops. But what's done is done now.

> 
> When I split this feature into multiple patches, I took care to group
> as little as possible into this first patch (and the same for netfront).
> It is still a large patch, but by my count most of this is a simple
> replacement of vif with queue...
> 
> A first-order approximation, searching for line pairs where the first
> has 'vif' and the second has 'queue', yields:
> 
> ➜  xen-netback git:(saturn) git show HEAD~4 | grep -A 1 vif | grep queue 
> | wc -l
> 380
> 
> i.e. 760 (=380*2) lines out of the 2240 (~ 40%) are trivial replacements
> of vif with queue, and this is not counting multi-line replacements, of
> which there are many. What remains is mostly adding loops over these
> queues. This could, in principle, be done in a second patch, but the
> impact of this is small.

Actually, the readability/reviewability impact is quite high IMHO.

> 
> >
> >>
> >> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
> >> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> >> ---
> >>   drivers/net/xen-netback/common.h    |   85 ++++--
> >>   drivers/net/xen-netback/interface.c |  329 ++++++++++++++--------
> >>   drivers/net/xen-netback/netback.c   |  530 ++++++++++++++++++-----------------
> >>   drivers/net/xen-netback/xenbus.c    |   87 ++++--
> >>   4 files changed, 608 insertions(+), 423 deletions(-)
> >>
> >> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> >> index ae413a2..4176539 100644
> >> --- a/drivers/net/xen-netback/common.h
> >> +++ b/drivers/net/xen-netback/common.h
> >> @@ -108,17 +108,39 @@ struct xenvif_rx_meta {
> >>    */
> >>   #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
> >>
> >> -struct xenvif {
> >> -	/* Unique identifier for this interface. */
> >> -	domid_t          domid;
> >> -	unsigned int     handle;
> >> +/* Queue name is interface name with "-qNNN" appended */
> >> +#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
> >
> > One more than necessary? Or does IFNAMSIZ not include the NULL? (I can't
> > figure out if it does or not!)
> 
> interface.c contains the line:
> snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
> 
> This suggests that IFNAMSIZ counts the trailing NULL, so I can reduce
> this count by 1 on that basis.

Thanks.

> >> [...]
> >> -	/* This array is allocated seperately as it is large */
> >> -	struct gnttab_copy *grant_copy_op;
> >> +	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
> >
> > Is this deliberate? It seems like a retrograde step reverting parts of
> > ac3d5ac27735 "xen-netback: fix guest-receive-side array sizes" from Paul
> > (at least you are nuking a speeling erorr)
> 
> Yes, this was deliberate. These arrays were moved out to avoid problems
> with kmalloc for the struct net_device (which contains the struct xenvif
> in its netdev_priv space). Since the queues are now allocated via
> vzalloc, there is no need to do separate allocations (with the
> requirement to also separately free on every error/teardown path) so I
> moved these back into the main queue structure.

Please do this as a separate change then, either pre or post as
appropriate.

> > How does this series interact with Zoltan's foreign mapping one? Badly I
> > should imagine, are you going to rebase?
> 
> I'm working on the rebase right now.
> 
> >
> >> +	/* First, check if there is only one queue to optimise the
> >> +	 * single-queue or old frontend scenario.
> >> +	 */
> >> +	if (vif->num_queues == 1) {
> >> +		queue_index = 0;
> >> +	} else {
> >> +		/* Use skb_get_hash to obtain an L4 hash if available */
> >> +		hash = skb_get_hash(skb);
> >> +		queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);
> >
> > No modulo num_queues here?
> >
> > Is the multiply and shift from some best practice somewhere? Or else
> > what is it doing?
> 
> It seems to be what a bunch of other net drivers do in this scenario. I
> guess the reasoning is it'll be faster than a mod num_queues.

Hard to believe that's the reason for the num_queues == 2^x case at
least (which must be most common I'd expect).

> >
> >> +	/* Obtain the queue to be used to transmit this packet */
> >> +	index = skb_get_queue_mapping(skb);
> >> +	if (index >= vif->num_queues)
> >> +		index = 0; /* Fall back to queue 0 if out of range */
> >
> > Is this actually allowed to happen?
> >
> > Even if yes, not modulo num_queue so spread it around a bit?
> 
> This probably isn't allowed to happen. I figured it didn't hurt to be a
> little defensive with the code here, and falling back to queue 0 is a
> fairly safe thing to do.

If it shouldn't happen then is a ratelimited warning appropriate?

> 
> >>   static void xenvif_up(struct xenvif *vif)
> >>   {
> >> -	napi_enable(&vif->napi);
> >> -	enable_irq(vif->tx_irq);
> >> -	if (vif->tx_irq != vif->rx_irq)
> >> -		enable_irq(vif->rx_irq);
> >> -	xenvif_check_rx_xenvif(vif);
> >> +	struct xenvif_queue *queue = NULL;
> >> +	unsigned int queue_index;
> >> +
> >> +	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
> >
> > This vif->num_queues -- is it the same as dev->num_tx_queues? Or areew
> > there differing concepts of queue around?
> 
> It should be the same as dev->real_num_tx_queues, which may be less than
> dev->num_tx_queues.

It'd be better to use the single variable in dev then I think, rather
than duplicate and risk them getting out of sync etc. I'm a bit
surprised there are no helpers in the core for this sort of thing (at
least not that I can find).

> 
> >> +		queue = &vif->queues[queue_index];
> >> +		napi_enable(&queue->napi);
> >> +		enable_irq(queue->tx_irq);
> >> +		if (queue->tx_irq != queue->rx_irq)
> >> +			enable_irq(queue->rx_irq);
> >> +		xenvif_check_rx_xenvif(queue);
> >> +	}
> >>   }
> >>
> >>   static void xenvif_down(struct xenvif *vif)
> >>   {
> >> -	napi_disable(&vif->napi);
> >> -	disable_irq(vif->tx_irq);
> >> -	if (vif->tx_irq != vif->rx_irq)
> >> -		disable_irq(vif->rx_irq);
> >> -	del_timer_sync(&vif->credit_timeout);
> >> +	struct xenvif_queue *queue = NULL;
> >> +	unsigned int queue_index;
> >
> > Why unsigned?
> Why not? You can't have a negative number of queues. Zero indicates "I
> don't have any set up yet". I'm not expecting people to have 4 billion
> or so queues, but equally I can't see a valid use for negative values
> here.

It's just unusual that's all.

> @@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
> >>   	if (!err)
> >>   		be->have_hotplug_status_watch = 1;
> >>
> >> -	netif_wake_queue(be->vif->dev);
> >> +	netif_tx_wake_all_queues(be->vif->dev);
> >> +
> >> +	return;
> >> +
> >> +err:
> >> +	vfree(be->vif->queues);
> >> +	be->vif->queues = NULL;
> >> +	be->vif->num_queues = 0;
> >> +	return;
> >
> > Do you not need to unwind the setup already done on the previous queues
> > before the failure?
> 
> 
> Err... yes. I was sure that code existed at some point, but I can't find
> it now. Oops!

;-)

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues
  2014-03-14 16:03   ` Ian Campbell
@ 2014-03-18 10:48     ` Andrew Bennieston
  2014-03-18 10:56       ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Bennieston @ 2014-03-18 10:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On 14/03/14 16:03, Ian Campbell wrote:
> On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
>> From: "Andrew J. Bennieston" <andrew.bennieston@citrix.com>
>>
>> Builds on the refactoring of the previous patch to implement multiple
>> queues between xen-netfront and xen-netback.
>>
>> Writes the maximum supported number of queues into XenStore, and reads
>> the values written by the frontend to determine how many queues to use.
>
>>
>> Ring references and event channels are read from XenStore on a per-queue
>> basis and rings are connected accordingly.
>>
>> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
>> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
>> ---
>>   drivers/net/xen-netback/common.h    |    2 +
>>   drivers/net/xen-netback/interface.c |    7 +++-
>>   drivers/net/xen-netback/netback.c   |    8 ++++
>>   drivers/net/xen-netback/xenbus.c    |   76 ++++++++++++++++++++++++++++++-----
>>   4 files changed, 82 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index 4176539..e72bf38 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -261,4 +261,6 @@ void xenvif_carrier_on(struct xenvif *vif);
>>
>>   extern bool separate_tx_rx_irq;
>>
>> +extern unsigned int xenvif_max_queues;
>> +
>>   #endif /* __XEN_NETBACK__COMMON_H__ */
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
>> index 0297980..3f623b4 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -381,7 +381,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>>   	char name[IFNAMSIZ] = {};
>>
>>   	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
>> -	dev = alloc_netdev_mq(sizeof(struct xenvif), name, ether_setup, 1);
>> +	/* Allocate a netdev with the max. supported number of queues.
>> +	 * When the guest selects the desired number, it will be updated
>> +	 * via netif_set_real_num_tx_queues().
>
> Does this allocate and then waste a load of resources? Or does it free
> them when you shrink things?
It allocates a small amount of resource; each struct netdev_queue is 256
bytes, and there are a few other things allocated at the same time. For
a xenvif_max_queues of 8, this is allocating 2K of netdev_queue
structs, plus a few other things; pretty small compared to the struct
xenvif_queue objects and the arrays contained within!

The resources aren't freed when netif_set_real_num_tx_queues() is
called; that just changes the value of dev->real_num_tx_queues.

> I suppose it is not possible to allocate small and grow or you'd have
> done so?
Indeed. This approach is taken by most drivers that support multiple
queues; they allocate as many as the device has, then use only as many
as there are online CPUs, or similar. In this case, xenvif_max_queues is
initialised to num_online_cpus(), but is also exported as a module
param. so the memory-conscious admin can reduce it further if desired.

>
> Can the host/guest admin change the number of queues on the fly?

This depends what you mean by 'on the fly'. The host admin can set the
module parameter in dom0, which will affect guests started after that
point, or the guest admin can set the module param in the guest. The
actual number used is always the minimum of the two.

It's important to keep in mind the distinction between a netdev_queue
and a xenvif_queue; a netdev_queue is small, but you have to allocate as
many as you think you might need, at a point in time too early to be
able to ask the guest how many it wants to use. A xenvif_queue is large,
but we only allocate as many as will actually be used.

>
>>   	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
>>   	read_xenbus_vif_flags(be);
>>
>> -	be->vif->num_queues = 1;
>> +	/* Use the number of queues requested by the frontend */
>> +	be->vif->num_queues = requested_num_queues;
>>   	be->vif->queues = vzalloc(be->vif->num_queues *
>>   			sizeof(struct xenvif_queue));
>> +	rtnl_lock();
>> +	netif_set_real_num_tx_queues(be->vif->dev, be->vif->num_queues);
>> +	rtnl_unlock();
>
> I'm always a bit suspicious of this construct -- it makes me thing the
> call is happening from the wrong context and that the right context
> would naturally hold the lock already.

netif_set_real_num_tx_queues() must be called either with this lock
held, or before the netdev is registered. The netdev is registered early
so that it can be plugged into a bridge or whatever other network
configuration has to happen. The point at which we know the correct
number of tx queues happens in response to the frontend changing
Xenstore entries, so the rtnl lock is not naturally held here.
xenvif_carrier_on() and xenvif_carrier_off() also take this lock, but
they are not the appropriate place to set the number of queues.

>
>>
>>   	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
>>   		queue = &be->vif->queues[queue_index];
>> @@ -547,29 +575,52 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>>   	unsigned long tx_ring_ref, rx_ring_ref;
>>   	unsigned int tx_evtchn, rx_evtchn;
>>   	int err;
>> +	char *xspath;
>> +	size_t xspathsize;
>> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
>> +
>> +	/* If the frontend requested 1 queue, or we have fallen back
>> +	 * to single queue due to lack of frontend support for multi-
>> +	 * queue, expect the remaining XenStore keys in the toplevel
>> +	 * directory. Otherwise, expect them in a subdirectory called
>> +	 * queue-N.
>> +	 */
>> +	if (queue->vif->num_queues == 1) {
>> +		xspath = (char *)dev->otherend;
>
> Casting away a const is naughty. Either make xspath const or if that
> isn't possible make it dynamic in all cases with a strcpy in this
> degenerate case.
>

Ok, I can change this. I was trying to avoid the strcpy.

>> +	} else {
>> +		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
>> +		xspath = kzalloc(xspathsize, GFP_KERNEL);
>> +		if (!xspath) {
>> +			xenbus_dev_fatal(dev, -ENOMEM,
>> +					"reading ring references");
>> +			return -ENOMEM;
>> +		}
>> +		snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend,
>> +				 queue->id);
>> +	}
>>
> [...]
>> @@ -582,10 +633,15 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>>   				 "mapping shared-frames %lu/%lu port tx %u rx %u",
>>   				 tx_ring_ref, rx_ring_ref,
>>   				 tx_evtchn, rx_evtchn);
>> -		return err;
>> +		goto err;
>>   	}
>>
>> -	return 0;
>> +	err = 0;
>> +err: /* Regular return falls through with err == 0 */
>> +	if (xspath != dev->otherend)
>> +		kfree(xspath);
>
> Yet another reason to not cast away the const!

You're right; this is a little messy.

Andrew

>
>> +
>> +	return err;
>>   }
>>
>>   static int read_xenbus_vif_flags(struct backend_info *be)
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues
  2014-03-18 10:48     ` Andrew Bennieston
@ 2014-03-18 10:56       ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2014-03-18 10:56 UTC (permalink / raw)
  To: Andrew Bennieston; +Cc: xen-devel, wei.liu2, paul.durrant, netdev, david.vrabel

On Tue, 2014-03-18 at 10:48 +0000, Andrew Bennieston wrote:
[snip... queeues...]

All sounds fine, thanks for the explanations.

> >>   	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
> >>   	read_xenbus_vif_flags(be);
> >>
> >> -	be->vif->num_queues = 1;
> >> +	/* Use the number of queues requested by the frontend */
> >> +	be->vif->num_queues = requested_num_queues;
> >>   	be->vif->queues = vzalloc(be->vif->num_queues *
> >>   			sizeof(struct xenvif_queue));
> >> +	rtnl_lock();
> >> +	netif_set_real_num_tx_queues(be->vif->dev, be->vif->num_queues);
> >> +	rtnl_unlock();
> >
> > I'm always a bit suspicious of this construct -- it makes me thing the
> > call is happening from the wrong context and that the right context
> > would naturally hold the lock already.
> 
> netif_set_real_num_tx_queues() must be called either with this lock
> held, or before the netdev is registered. The netdev is registered early
> so that it can be plugged into a bridge or whatever other network
> configuration has to happen. The point at which we know the correct
> number of tx queues happens in response to the frontend changing
> Xenstore entries, so the rtnl lock is not naturally held here.
> xenvif_carrier_on() and xenvif_carrier_off() also take this lock, but
> they are not the appropriate place to set the number of queues.

Great, just wanted to be sure it had been thought about and not just
"slap a lock around it to make it stop complaining" :-)

> >>   	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
> >>   		queue = &be->vif->queues[queue_index];
> >> @@ -547,29 +575,52 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
> >>   	unsigned long tx_ring_ref, rx_ring_ref;
> >>   	unsigned int tx_evtchn, rx_evtchn;
> >>   	int err;
> >> +	char *xspath;
> >> +	size_t xspathsize;
> >> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> >> +
> >> +	/* If the frontend requested 1 queue, or we have fallen back
> >> +	 * to single queue due to lack of frontend support for multi-
> >> +	 * queue, expect the remaining XenStore keys in the toplevel
> >> +	 * directory. Otherwise, expect them in a subdirectory called
> >> +	 * queue-N.
> >> +	 */
> >> +	if (queue->vif->num_queues == 1) {
> >> +		xspath = (char *)dev->otherend;
> >
> > Casting away a const is naughty. Either make xspath const or if that
> > isn't possible make it dynamic in all cases with a strcpy in this
> > degenerate case.
> >
> 
> Ok, I can change this. I was trying to avoid the strcpy.

Thanks.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-03-18 10:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-03 11:47 [PATCH V6 net-next 0/5] xen-net{back, front}: Multiple transmit and receive queues Andrew J. Bennieston
2014-03-03 11:47 ` [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data into queue struct Andrew J. Bennieston
2014-03-14 15:55   ` Ian Campbell
2014-03-17 11:53     ` Andrew Bennieston
2014-03-17 12:19       ` Ian Campbell
2014-03-03 11:47 ` [PATCH V6 net-next 2/5] xen-netback: Add support for multiple queues Andrew J. Bennieston
2014-03-14 16:03   ` Ian Campbell
2014-03-18 10:48     ` Andrew Bennieston
2014-03-18 10:56       ` Ian Campbell
2014-03-03 11:47 ` [PATCH V6 net-next 3/5] xen-netfront: Factor queue-specific data into queue struct Andrew J. Bennieston
2014-03-03 11:47 ` [PATCH V6 net-next 4/5] xen-netfront: Add support for multiple queues Andrew J. Bennieston
2014-03-03 11:47 ` [PATCH V6 net-next 5/5] xen-net{back, front}: Document multi-queue feature in netif.h Andrew J. Bennieston
2014-03-03 12:53   ` [PATCH V6 net-next 5/5] xen-net{back,front}: " Paul Durrant
2014-03-14 16:04   ` Ian Campbell
2014-03-05 12:38 ` [PATCH V6 net-next 0/5] xen-net{back,front}: Multiple transmit and receive queues Wei Liu
2014-03-05 17:46 ` [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: " Konrad Rzeszutek Wilk
2014-03-06 16:52 ` Sander Eikelenboom
2014-03-14 16:06   ` Ian Campbell
2014-03-14 16:21     ` Sander Eikelenboom
2014-03-14 16:10 ` [PATCH V6 net-next 0/5] xen-net{back,front}: " Ian Campbell
2014-03-14 16:16   ` [Xen-devel] [PATCH V6 net-next 0/5] xen-net{back, front}: " Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).