Linux virtualization list
 help / color / mirror / Atom feed
* [RFC V2 PATCH 1/4] option: introduce qemu_get_opt_all()
From: Jason Wang @ 2012-06-25 10:04 UTC (permalink / raw)
  To: krkumar2, habanero, aliguori, rusty, mst, mashirle, qemu-devel,
	virtualization, tahm, jwhan, akong
  Cc: kvm
In-Reply-To: <20120625095059.8096.49429.stgit@amd-6168-8-1.englab.nay.redhat.com>

Sometimes, we need to pass option like -netdev tap,fd=100,fd=101,fd=102 which
can not be properly parsed by qemu_find_opt() because it only returns the first
matched option. So qemu_get_opt_all() were introduced to return an array of
pointers which contains all matched option.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 qemu-option.c |   19 +++++++++++++++++++
 qemu-option.h |    2 ++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu-option.c b/qemu-option.c
index bb3886c..9263125 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -545,6 +545,25 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name)
     return NULL;
 }
 
+int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp,
+                     int max)
+{
+    QemuOpt *opt;
+    int index = 0;
+
+    QTAILQ_FOREACH_REVERSE(opt, &opts->head, QemuOptHead, next) {
+        if (strcmp(opt->name, name) == 0) {
+            if (index < max) {
+                optp[index++] = opt->str;
+            }
+            if (index == max) {
+                break;
+            }
+        }
+    }
+    return index;
+}
+
 const char *qemu_opt_get(QemuOpts *opts, const char *name)
 {
     QemuOpt *opt = qemu_opt_find(opts, name);
diff --git a/qemu-option.h b/qemu-option.h
index 951dec3..3c9a273 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -106,6 +106,8 @@ struct QemuOptsList {
     QemuOptDesc desc[];
 };
 
+int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp,
+                     int max);
 const char *qemu_opt_get(QemuOpts *opts, const char *name);
 bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval);
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval);

^ permalink raw reply related

* [RFC V2 PATCH 0/4] Multiqueue support for tap and virtio-net/vhost
From: Jason Wang @ 2012-06-25 10:04 UTC (permalink / raw)
  To: krkumar2, habanero, aliguori, rusty, mst, mashirle, qemu-devel,
	virtualization, tahm, jwhan, akong
  Cc: kvm

Hello all:

This seires is an update of last version of multiqueue support to add multiqueue
capability to both tap and virtio-net.

Some kinds of tap backends has (macvatp in linux) or would (tap) support
multiqueue. In such kind of tap backend, each file descriptor of a tap is a
qeueu and ioctls were prodived to attach an exist tap file descriptor to the
tun/tap device. So the patch let qemu to use this kind of backend, and let it
can transmit and receving packets through multiple file descriptors.

Patch 1 introduce a new help to get all matched options, after this patch, we
could pass multiple file descriptors to a signle netdev by:

      qemu -netdev tap,id=hn0,fd=10,fd=11,...

Patch 2 introduce generic helpers in tap to attach or detach a file descriptor
from a tap device, emulated nics could use this helper to enable/disable queues.

Patch 3 modifies the NICState to allow multiple VLANClientState to be stored in
it, with this patch, qemu has basic support of multiple capable tap backend.

Patch 4 converts virtio-net/vhost to be multiple capable. The vhost device were
created per tx/rx queue pairs as usual.

Changes from V1:

- rebase to the latest
- fix memory leak in parse_netdev
- fix guest notifiers assignment/de-assignment
- changes the command lines to:
   qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

TODO:
- netdev_add
- bridge helper for multiple queue backend

References:
- V1 http://comments.gmane.org/gmane.comp.emulators.qemu/100481

Please review and comments.
---

Jason Wang (4):
      option: introduce qemu_get_opt_all()
      tap: multiqueue support
      net: multiqueue support
      virtio-net: add multiqueue support


 hw/dp8393x.c         |    2 
 hw/mcf_fec.c         |    2 
 hw/qdev-properties.c |   33 +++-
 hw/qdev.h            |    3 
 hw/vhost.c           |   58 ++++--
 hw/vhost.h           |    1 
 hw/vhost_net.c       |    7 +
 hw/vhost_net.h       |    2 
 hw/virtio-net.c      |  461 +++++++++++++++++++++++++++++++++-----------------
 hw/virtio-net.h      |    3 
 net.c                |   62 ++++++-
 net.h                |   16 +-
 net/tap-aix.c        |   13 +
 net/tap-bsd.c        |   13 +
 net/tap-haiku.c      |   13 +
 net/tap-linux.c      |   55 ++++++
 net/tap-linux.h      |    3 
 net/tap-solaris.c    |   13 +
 net/tap-win32.c      |   11 +
 net/tap.c            |  189 +++++++++++++-------
 net/tap.h            |    7 +
 qemu-option.c        |   19 ++
 qemu-option.h        |    2 
 23 files changed, 714 insertions(+), 274 deletions(-)

-- 
Signature

^ permalink raw reply

* [net-next RFC V4 PATCH 4/4] virtio_net: multiqueue support
From: Jason Wang @ 2012-06-25  9:41 UTC (permalink / raw)
  To: mst, akong, habanero, tahm, jwhan, mashirle, krkumar2, edumazet,
	davem, rusty, linux-kernel, virtualization, netdev
  Cc: qemu-devel, kvm
In-Reply-To: <20120625090829.7263.65026.stgit@amd-6168-8-1.englab.nay.redhat.com>

This addes multiqueue support to virtio_net driver. This feature is negotiated
through VIRTIO_NET_F_MULTIQUEUE.

The driver expects the number of rx/tx queue paris is equal to the number of
vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
optimization were introduced:

- Txq selection is based on the processor id in order to avoid contending a lock
 whose owner may exits to host.
- Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
 the queue pairs.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c   |  695 ++++++++++++++++++++++++++++++++------------
 include/linux/virtio_net.h |    2 +
 2 files changed, 506 insertions(+), 191 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f18149a..5c719de 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
 #include <linux/scatterlist.h>
 #include <linux/if_vlan.h>
 #include <linux/slab.h>
+#include <linux/interrupt.h>
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -51,43 +52,73 @@ struct virtnet_stats {
 	u64 rx_packets;
 };
 
-struct virtnet_info {
-	struct virtio_device *vdev;
-	struct virtqueue *rvq, *svq, *cvq;
-	struct net_device *dev;
+/* Internal representation of a send virtqueue */
+struct send_queue {
+	/* Virtqueue associated with this send _queue */
+	struct virtqueue *vq;
+
+	/* TX: fragments + linear part + virtio header */
+	struct scatterlist sg[MAX_SKB_FRAGS + 2];
+
+	cpumask_var_t affinity_mask;
+};
+
+/* Internal representation of a receive virtqueue */
+struct receive_queue {
+	/* Virtqueue associated with this receive_queue */
+	struct virtqueue *vq;
+
+	/* Back pointer to the virtnet_info */
+	struct virtnet_info *vi;
+
 	struct napi_struct napi;
-	unsigned int status;
 
 	/* Number of input buffers, and max we've ever had. */
 	unsigned int num, max;
 
+	/* Work struct for refilling if we run low on memory. */
+	struct delayed_work refill;
+
+	/* Chain pages by the private ptr. */
+	struct page *pages;
+
+	/* RX: fragments + linear part + virtio header */
+	struct scatterlist sg[MAX_SKB_FRAGS + 2];
+
+	cpumask_var_t affinity_mask;
+};
+
+struct virtnet_info {
+	int num_queue_pairs;		/* # of RX/TX vq pairs */
+
+	struct send_queue **sq;
+	struct receive_queue **rq;
+	struct virtqueue *cvq;
+
+	struct virtio_device *vdev;
+	struct net_device *dev;
+	unsigned int status;
+
 	/* I like... big packets and I cannot lie! */
 	bool big_packets;
 
 	/* Host will merge rx buffers for big packets (shake it! shake it!) */
 	bool mergeable_rx_bufs;
 
+	/* Has control virtqueue */
+	bool has_cvq;
+
 	/* enable config space updates */
 	bool config_enable;
 
 	/* Active statistics */
 	struct virtnet_stats __percpu *stats;
 
-	/* Work struct for refilling if we run low on memory. */
-	struct delayed_work refill;
-
 	/* Work struct for config space updates */
 	struct work_struct config_work;
 
 	/* Lock for config space updates */
 	struct mutex config_lock;
-
-	/* Chain pages by the private ptr. */
-	struct page *pages;
-
-	/* fragments + linear part + virtio header */
-	struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
-	struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 };
 
 struct skb_vnet_hdr {
@@ -108,6 +139,22 @@ struct padded_vnet_hdr {
 	char padding[6];
 };
 
+static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
+{
+	int ret = virtqueue_get_queue_index(vq);
+
+	/* skip ctrl vq */
+	if (vi->has_cvq)
+		return (ret - 1) / 2;
+	else
+		return ret / 2;
+}
+
+static inline int rxq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
+{
+	return virtqueue_get_queue_index(vq) / 2;
+}
+
 static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
 {
 	return (struct skb_vnet_hdr *)skb->cb;
@@ -117,22 +164,22 @@ static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
  */
-static void give_pages(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct receive_queue *rq, struct page *page)
 {
 	struct page *end;
 
 	/* Find end of list, sew whole thing into vi->pages. */
 	for (end = page; end->private; end = (struct page *)end->private);
-	end->private = (unsigned long)vi->pages;
-	vi->pages = page;
+	end->private = (unsigned long)rq->pages;
+	rq->pages = page;
 }
 
-static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 {
-	struct page *p = vi->pages;
+	struct page *p = rq->pages;
 
 	if (p) {
-		vi->pages = (struct page *)p->private;
+		rq->pages = (struct page *)p->private;
 		/* clear private here, it is used to chain pages */
 		p->private = 0;
 	} else
@@ -140,15 +187,15 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
 	return p;
 }
 
-static void skb_xmit_done(struct virtqueue *svq)
+static void skb_xmit_done(struct virtqueue *vq)
 {
-	struct virtnet_info *vi = svq->vdev->priv;
+	struct virtnet_info *vi = vq->vdev->priv;
 
 	/* Suppress further interrupts. */
-	virtqueue_disable_cb(svq);
+	virtqueue_disable_cb(vq);
 
 	/* We were probably waiting for more output buffers. */
-	netif_wake_queue(vi->dev);
+	netif_wake_subqueue(vi->dev, txq_get_qnum(vi, vq));
 }
 
 static void set_skb_frag(struct sk_buff *skb, struct page *page,
@@ -167,9 +214,10 @@ static void set_skb_frag(struct sk_buff *skb, struct page *page,
 }
 
 /* Called from bottom half context */
-static struct sk_buff *page_to_skb(struct virtnet_info *vi,
+static struct sk_buff *page_to_skb(struct receive_queue *rq,
 				   struct page *page, unsigned int len)
 {
+	struct virtnet_info *vi = rq->vi;
 	struct sk_buff *skb;
 	struct skb_vnet_hdr *hdr;
 	unsigned int copy, hdr_len, offset;
@@ -225,12 +273,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	}
 
 	if (page)
-		give_pages(vi, page);
+		give_pages(rq, page);
 
 	return skb;
 }
 
-static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
+static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
 	struct page *page;
@@ -244,7 +292,7 @@ static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
 			skb->dev->stats.rx_length_errors++;
 			return -EINVAL;
 		}
-		page = virtqueue_get_buf(vi->rvq, &len);
+		page = virtqueue_get_buf(rq->vq, &len);
 		if (!page) {
 			pr_debug("%s: rx error: %d buffers missing\n",
 				 skb->dev->name, hdr->mhdr.num_buffers);
@@ -257,13 +305,14 @@ static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
 
 		set_skb_frag(skb, page, 0, &len);
 
-		--vi->num;
+		--rq->num;
 	}
 	return 0;
 }
 
-static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
+static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 {
+	struct net_device *dev = rq->vi->dev;
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	struct sk_buff *skb;
@@ -274,7 +323,7 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
 		pr_debug("%s: short packet %i\n", dev->name, len);
 		dev->stats.rx_length_errors++;
 		if (vi->mergeable_rx_bufs || vi->big_packets)
-			give_pages(vi, buf);
+			give_pages(rq, buf);
 		else
 			dev_kfree_skb(buf);
 		return;
@@ -286,14 +335,14 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
 		skb_trim(skb, len);
 	} else {
 		page = buf;
-		skb = page_to_skb(vi, page, len);
+		skb = page_to_skb(rq, page, len);
 		if (unlikely(!skb)) {
 			dev->stats.rx_dropped++;
-			give_pages(vi, page);
+			give_pages(rq, page);
 			return;
 		}
 		if (vi->mergeable_rx_bufs)
-			if (receive_mergeable(vi, skb)) {
+			if (receive_mergeable(rq, skb)) {
 				dev_kfree_skb(skb);
 				return;
 			}
@@ -363,90 +412,91 @@ frame_err:
 	dev_kfree_skb(skb);
 }
 
-static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	struct skb_vnet_hdr *hdr;
 	int err;
 
-	skb = __netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN, gfp);
+	skb = __netdev_alloc_skb_ip_align(rq->vi->dev, MAX_PACKET_LEN, gfp);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
 	skb_put(skb, MAX_PACKET_LEN);
 
 	hdr = skb_vnet_hdr(skb);
-	sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
+	sg_set_buf(rq->sg, &hdr->hdr, sizeof hdr->hdr);
 
-	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
+	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
+
+	err = virtqueue_add_buf(rq->vq, rq->sg, 0, 2, skb, gfp);
 
-	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
 	return err;
 }
 
-static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_big(struct receive_queue *rq, gfp_t gfp)
 {
 	struct page *first, *list = NULL;
 	char *p;
 	int i, err, offset;
 
-	/* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
+	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
 	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
-		first = get_a_page(vi, gfp);
+		first = get_a_page(rq, gfp);
 		if (!first) {
 			if (list)
-				give_pages(vi, list);
+				give_pages(rq, list);
 			return -ENOMEM;
 		}
-		sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
+		sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
 		first->private = (unsigned long)list;
 		list = first;
 	}
 
-	first = get_a_page(vi, gfp);
+	first = get_a_page(rq, gfp);
 	if (!first) {
-		give_pages(vi, list);
+		give_pages(rq, list);
 		return -ENOMEM;
 	}
 	p = page_address(first);
 
-	/* vi->rx_sg[0], vi->rx_sg[1] share the same page */
-	/* a separated vi->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
-	sg_set_buf(&vi->rx_sg[0], p, sizeof(struct virtio_net_hdr));
+	/* rq->sg[0], rq->sg[1] share the same page */
+	/* a separated rq->sg[0] for virtio_net_hdr only due to QEMU bug */
+	sg_set_buf(&rq->sg[0], p, sizeof(struct virtio_net_hdr));
 
-	/* vi->rx_sg[1] for data packet, from offset */
+	/* rq->sg[1] for data packet, from offset */
 	offset = sizeof(struct padded_vnet_hdr);
-	sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
+	sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+	err = virtqueue_add_buf(rq->vq, rq->sg, 0, MAX_SKB_FRAGS + 2,
 				first, gfp);
 	if (err < 0)
-		give_pages(vi, first);
+		give_pages(rq, first);
 
 	return err;
 }
 
-static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
 {
 	struct page *page;
 	int err;
 
-	page = get_a_page(vi, gfp);
+	page = get_a_page(rq, gfp);
 	if (!page)
 		return -ENOMEM;
 
-	sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
+	sg_init_one(rq->sg, page_address(page), PAGE_SIZE);
 
-	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+	err = virtqueue_add_buf(rq->vq, rq->sg, 0, 1, page, gfp);
 	if (err < 0)
-		give_pages(vi, page);
+		give_pages(rq, page);
 
 	return err;
 }
@@ -458,97 +508,104 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
  * before we're receiving packets, or from refill_work which is
  * careful to disable receiving (using napi_disable).
  */
-static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
+static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
 {
+	struct virtnet_info *vi = rq->vi;
 	int err;
 	bool oom;
 
 	do {
 		if (vi->mergeable_rx_bufs)
-			err = add_recvbuf_mergeable(vi, gfp);
+			err = add_recvbuf_mergeable(rq, gfp);
 		else if (vi->big_packets)
-			err = add_recvbuf_big(vi, gfp);
+			err = add_recvbuf_big(rq, gfp);
 		else
-			err = add_recvbuf_small(vi, gfp);
+			err = add_recvbuf_small(rq, gfp);
 
 		oom = err == -ENOMEM;
 		if (err < 0)
 			break;
-		++vi->num;
+		++rq->num;
 	} while (err > 0);
-	if (unlikely(vi->num > vi->max))
-		vi->max = vi->num;
-	virtqueue_kick(vi->rvq);
+	if (unlikely(rq->num > rq->max))
+		rq->max = rq->num;
+	virtqueue_kick(rq->vq);
 	return !oom;
 }
 
-static void skb_recv_done(struct virtqueue *rvq)
+static void skb_recv_done(struct virtqueue *vq)
 {
-	struct virtnet_info *vi = rvq->vdev->priv;
+	struct virtnet_info *vi = vq->vdev->priv;
+	struct napi_struct *napi = &vi->rq[rxq_get_qnum(vi, vq)]->napi;
+
 	/* Schedule NAPI, Suppress further interrupts if successful. */
-	if (napi_schedule_prep(&vi->napi)) {
-		virtqueue_disable_cb(rvq);
-		__napi_schedule(&vi->napi);
+	if (napi_schedule_prep(napi)) {
+		virtqueue_disable_cb(vq);
+		__napi_schedule(napi);
 	}
 }
 
-static void virtnet_napi_enable(struct virtnet_info *vi)
+static void virtnet_napi_enable(struct receive_queue *rq)
 {
-	napi_enable(&vi->napi);
+	napi_enable(&rq->napi);
 
 	/* If all buffers were filled by other side before we napi_enabled, we
 	 * won't get another interrupt, so process any outstanding packets
 	 * now.  virtnet_poll wants re-enable the queue, so we disable here.
 	 * We synchronize against interrupts via NAPI_STATE_SCHED */
-	if (napi_schedule_prep(&vi->napi)) {
-		virtqueue_disable_cb(vi->rvq);
+	if (napi_schedule_prep(&rq->napi)) {
+		virtqueue_disable_cb(rq->vq);
 		local_bh_disable();
-		__napi_schedule(&vi->napi);
+		__napi_schedule(&rq->napi);
 		local_bh_enable();
 	}
 }
 
 static void refill_work(struct work_struct *work)
 {
-	struct virtnet_info *vi;
+	struct napi_struct *napi;
+	struct receive_queue *rq;
 	bool still_empty;
 
-	vi = container_of(work, struct virtnet_info, refill.work);
-	napi_disable(&vi->napi);
-	still_empty = !try_fill_recv(vi, GFP_KERNEL);
-	virtnet_napi_enable(vi);
+	rq = container_of(work, struct receive_queue, refill.work);
+	napi = &rq->napi;
+
+	napi_disable(napi);
+	still_empty = !try_fill_recv(rq, GFP_KERNEL);
+	virtnet_napi_enable(rq);
 
 	/* In theory, this can happen: if we don't get any buffers in
 	 * we will *never* try to fill again. */
 	if (still_empty)
-		queue_delayed_work(system_nrt_wq, &vi->refill, HZ/2);
+		queue_delayed_work(system_nrt_wq, &rq->refill, HZ/2);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
 {
-	struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
+	struct receive_queue *rq = container_of(napi, struct receive_queue,
+						napi);
 	void *buf;
 	unsigned int len, received = 0;
 
 again:
 	while (received < budget &&
-	       (buf = virtqueue_get_buf(vi->rvq, &len)) != NULL) {
-		receive_buf(vi->dev, buf, len);
-		--vi->num;
+	       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+		receive_buf(rq, buf, len);
+		--rq->num;
 		received++;
 	}
 
-	if (vi->num < vi->max / 2) {
-		if (!try_fill_recv(vi, GFP_ATOMIC))
-			queue_delayed_work(system_nrt_wq, &vi->refill, 0);
+	if (rq->num < rq->max / 2) {
+		if (!try_fill_recv(rq, GFP_ATOMIC))
+			queue_delayed_work(system_nrt_wq, &rq->refill, 0);
 	}
 
 	/* Out of packets? */
 	if (received < budget) {
 		napi_complete(napi);
-		if (unlikely(!virtqueue_enable_cb(vi->rvq)) &&
+		if (unlikely(!virtqueue_enable_cb(rq->vq)) &&
 		    napi_schedule_prep(napi)) {
-			virtqueue_disable_cb(vi->rvq);
+			virtqueue_disable_cb(rq->vq);
 			__napi_schedule(napi);
 			goto again;
 		}
@@ -557,13 +614,14 @@ again:
 	return received;
 }
 
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static unsigned int free_old_xmit_skbs(struct virtnet_info *vi,
+				       struct virtqueue *vq)
 {
 	struct sk_buff *skb;
 	unsigned int len, tot_sgs = 0;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 
-	while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
+	while ((skb = virtqueue_get_buf(vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
 		u64_stats_update_begin(&stats->tx_syncp);
@@ -577,7 +635,8 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
 	return tot_sgs;
 }
 
-static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
+static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb,
+		    struct virtqueue *vq, struct scatterlist *sg)
 {
 	struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
@@ -615,44 +674,47 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 
 	/* Encode metadata header at front. */
 	if (vi->mergeable_rx_bufs)
-		sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
+		sg_set_buf(sg, &hdr->mhdr, sizeof hdr->mhdr);
 	else
-		sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
+		sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
 
-	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
-	return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
+	hdr->num_sg = skb_to_sgvec(skb, sg + 1, 0, skb->len) + 1;
+	return virtqueue_add_buf(vq, sg, hdr->num_sg,
 				 0, skb, GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	int qnum = skb_get_queue_mapping(skb);
+	struct virtqueue *vq = vi->sq[qnum]->vq;
 	int capacity;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(vi);
+	free_old_xmit_skbs(vi, vq);
 
 	/* Try to transmit */
-	capacity = xmit_skb(vi, skb);
+	capacity = xmit_skb(vi, skb, vq, vi->sq[qnum]->sg);
 
 	/* This can happen with OOM and indirect buffers. */
 	if (unlikely(capacity < 0)) {
 		if (likely(capacity == -ENOMEM)) {
 			if (net_ratelimit())
 				dev_warn(&dev->dev,
-					 "TX queue failure: out of memory\n");
+					"TXQ (%d) failure: out of memory\n",
+					qnum);
 		} else {
 			dev->stats.tx_fifo_errors++;
 			if (net_ratelimit())
 				dev_warn(&dev->dev,
-					 "Unexpected TX queue failure: %d\n",
-					 capacity);
+					"Unexpected TXQ (%d) failure: %d\n",
+					qnum, capacity);
 		}
 		dev->stats.tx_dropped++;
 		kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
-	virtqueue_kick(vi->svq);
+	virtqueue_kick(vq);
 
 	/* Don't wait up for transmitted skbs to be freed. */
 	skb_orphan(skb);
@@ -661,13 +723,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (capacity < 2+MAX_SKB_FRAGS) {
-		netif_stop_queue(dev);
-		if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
+		netif_stop_subqueue(dev, qnum);
+		if (unlikely(!virtqueue_enable_cb_delayed(vq))) {
 			/* More just got used, free them then recheck. */
-			capacity += free_old_xmit_skbs(vi);
+			capacity += free_old_xmit_skbs(vi, vq);
 			if (capacity >= 2+MAX_SKB_FRAGS) {
-				netif_start_queue(dev);
-				virtqueue_disable_cb(vi->svq);
+				netif_start_subqueue(dev, qnum);
+				virtqueue_disable_cb(vq);
 			}
 		}
 	}
@@ -700,7 +762,8 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 	unsigned int start;
 
 	for_each_possible_cpu(cpu) {
-		struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
+		struct virtnet_stats __percpu *stats
+			= per_cpu_ptr(vi->stats, cpu);
 		u64 tpackets, tbytes, rpackets, rbytes;
 
 		do {
@@ -734,20 +797,31 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 static void virtnet_netpoll(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
 
-	napi_schedule(&vi->napi);
+	for (i = 0; i < vi->num_queue_pairs; i++)
+		napi_schedule(&vi->rq[i]->napi);
 }
 #endif
 
+static void free_stats(struct virtnet_info *vi)
+{
+	free_percpu(vi->stats);
+}
+
 static int virtnet_open(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
 
-	/* Make sure we have some buffers: if oom use wq. */
-	if (!try_fill_recv(vi, GFP_KERNEL))
-		queue_delayed_work(system_nrt_wq, &vi->refill, 0);
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		/* Make sure we have some buffers: if oom use wq. */
+		if (!try_fill_recv(vi->rq[i], GFP_KERNEL))
+			queue_delayed_work(system_nrt_wq,
+					   &vi->rq[i]->refill, 0);
+		virtnet_napi_enable(vi->rq[i]);
+	}
 
-	virtnet_napi_enable(vi);
 	return 0;
 }
 
@@ -809,10 +883,13 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
 static int virtnet_close(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
 
 	/* Make sure refill_work doesn't re-enable napi! */
-	cancel_delayed_work_sync(&vi->refill);
-	napi_disable(&vi->napi);
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		cancel_delayed_work_sync(&vi->rq[i]->refill);
+		napi_disable(&vi->rq[i]->napi);
+	}
 
 	return 0;
 }
@@ -924,11 +1001,10 @@ static void virtnet_get_ringparam(struct net_device *dev,
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 
-	ring->rx_max_pending = virtqueue_get_vring_size(vi->rvq);
-	ring->tx_max_pending = virtqueue_get_vring_size(vi->svq);
+	ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0]->vq);
+	ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0]->vq);
 	ring->rx_pending = ring->rx_max_pending;
 	ring->tx_pending = ring->tx_max_pending;
-
 }
 
 
@@ -961,6 +1037,19 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+/* To avoid contending a lock hold by a vcpu who would exit to host, select the
+ * txq based on the processor id.
+ */
+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+	int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+		  smp_processor_id();
+
+	while (unlikely(txq >= dev->real_num_tx_queues))
+		txq -= dev->real_num_tx_queues;
+	return txq;
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -972,6 +1061,7 @@ static const struct net_device_ops virtnet_netdev = {
 	.ndo_get_stats64     = virtnet_stats,
 	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+	.ndo_select_queue     = virtnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = virtnet_netpoll,
 #endif
@@ -1007,10 +1097,10 @@ static void virtnet_config_changed_work(struct work_struct *work)
 
 	if (vi->status & VIRTIO_NET_S_LINK_UP) {
 		netif_carrier_on(vi->dev);
-		netif_wake_queue(vi->dev);
+		netif_tx_wake_all_queues(vi->dev);
 	} else {
 		netif_carrier_off(vi->dev);
-		netif_stop_queue(vi->dev);
+		netif_tx_stop_all_queues(vi->dev);
 	}
 done:
 	mutex_unlock(&vi->config_lock);
@@ -1023,41 +1113,252 @@ static void virtnet_config_changed(struct virtio_device *vdev)
 	queue_work(system_nrt_wq, &vi->config_work);
 }
 
-static int init_vqs(struct virtnet_info *vi)
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+	int i;
+
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		while (vi->rq[i]->pages)
+			__free_pages(get_a_page(vi->rq[i], GFP_KERNEL), 0);
+	}
+}
+
+/* Free memory allocated for send and receive queues */
+static void free_rq_sq(struct virtnet_info *vi)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
-	const char *names[] = { "input", "output", "control" };
-	int nvqs, err;
+	int i;
 
-	/* We expect two virtqueues, receive then send,
-	 * and optionally control. */
-	nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
+	free_stats(vi);
 
-	err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
-	if (err)
-		return err;
+	if (vi->rq) {
+		for (i = 0; i < vi->num_queue_pairs; i++)
+			kfree(vi->rq[i]);
+		kfree(vi->rq);
+	}
+
+	if (vi->sq) {
+		for (i = 0; i < vi->num_queue_pairs; i++)
+			kfree(vi->sq[i]);
+		kfree(vi->sq);
+	}
+}
+
+static void free_unused_bufs(struct virtnet_info *vi)
+{
+	void *buf;
+	int i;
+
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		struct virtqueue *vq = vi->sq[i]->vq;
+
+		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
+			dev_kfree_skb(buf);
+	}
+
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		struct virtqueue *vq = vi->rq[i]->vq;
 
-	vi->rvq = vqs[0];
-	vi->svq = vqs[1];
+		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+			if (vi->mergeable_rx_bufs || vi->big_packets)
+				give_pages(vi->rq[i], buf);
+			else
+				dev_kfree_skb(buf);
+			--vi->rq[i]->num;
+		}
+		BUG_ON(vi->rq[i]->num != 0);
+	}
+}
+
+static int virtnet_find_vqs(struct virtnet_info *vi)
+{
+	vq_callback_t **callbacks;
+	struct virtqueue **vqs;
+	int ret = -ENOMEM;
+	int i, total_vqs;
+	char **names;
+
+	/*
+	 * We expect 1 RX virtqueue followed by 1 TX virtqueue, followd by
+	 * possible control virtqueue and followed by the same
+	 * 'vi->num_queue_pairs-1' more times
+	 */
+	total_vqs = vi->num_queue_pairs * 2 +
+		    virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
+
+	/* Allocate space for find_vqs parameters */
+	vqs = kmalloc(total_vqs * sizeof(*vqs), GFP_KERNEL);
+	callbacks = kmalloc(total_vqs * sizeof(*callbacks), GFP_KERNEL);
+	names = kmalloc(total_vqs * sizeof(*names), GFP_KERNEL);
+	if (!vqs || !callbacks || !names)
+		goto err;
+
+	/* Parameters for control virtqueue, if any */
+	if (vi->has_cvq) {
+		callbacks[2] = NULL;
+		names[2] = "control";
+	}
+
+	/* Allocate/initialize parameters for send/receive virtqueues */
+	for (i = 0; i < vi->num_queue_pairs * 2; i += 2) {
+		int j = (i == 0 ? i : i + vi->has_cvq);
+		callbacks[j] = skb_recv_done;
+		callbacks[j + 1] = skb_xmit_done;
+		names[j] = kasprintf(GFP_KERNEL, "input.%d", i / 2);
+		names[j + 1] = kasprintf(GFP_KERNEL, "output.%d", i / 2);
+	}
 
-	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
+	ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
+					 (const char **)names);
+	if (ret)
+		goto err;
+
+	if (vi->has_cvq)
 		vi->cvq = vqs[2];
 
-		if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
-			vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
+	for (i = 0; i < vi->num_queue_pairs * 2; i += 2) {
+		int j = i == 0 ? i : i + vi->has_cvq;
+		vi->rq[i / 2]->vq = vqs[j];
+		vi->sq[i / 2]->vq = vqs[j + 1];
 	}
+
+err:
+	if (ret && names)
+		for (i = 0; i < vi->num_queue_pairs * 2; i++)
+			kfree(names[i]);
+
+	kfree(names);
+	kfree(callbacks);
+	kfree(vqs);
+
+	return ret;
+}
+
+static int allocate_queues(struct virtnet_info *vi)
+{
+	int ret = -ENOMEM;
+	int i;
+
+	vi->rq = kcalloc(vi->num_queue_pairs, sizeof(*vi->rq), GFP_KERNEL);
+	vi->sq = kcalloc(vi->num_queue_pairs, sizeof(*vi->sq), GFP_KERNEL);
+	if (!vi->sq || !vi->rq)
+		goto err;
+
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		vi->rq[i] = kzalloc(sizeof(*vi->rq[i]), GFP_KERNEL);
+		vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
+		if (!vi->rq[i] || !vi->sq[i])
+			goto err;
+	}
+
+	ret = 0;
+
+	/* setup initial receive and send queue parameters */
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		vi->rq[i]->vi = vi;
+		vi->rq[i]->pages = NULL;
+		INIT_DELAYED_WORK(&vi->rq[i]->refill, refill_work);
+		netif_napi_add(vi->dev, &vi->rq[i]->napi, virtnet_poll,
+			       napi_weight);
+
+		sg_init_table(vi->rq[i]->sg, ARRAY_SIZE(vi->rq[i]->sg));
+		sg_init_table(vi->sq[i]->sg, ARRAY_SIZE(vi->sq[i]->sg));
+	}
+
+err:
+	if (ret)
+		free_rq_sq(vi);
+
+	return ret;
+}
+
+static int virtnet_setup_vqs(struct virtnet_info *vi)
+{
+	int ret;
+
+	/* Allocate send & receive queues */
+	ret = allocate_queues(vi);
+	if (!ret) {
+		ret = virtnet_find_vqs(vi);
+		if (ret)
+			free_rq_sq(vi);
+	}
+
+	return ret;
+}
+
+static void _virtnet_free_affinity(struct virtnet_info *vi, int i)
+{
+	struct virtio_device *vdev = vi->vdev;
+	int irq;
+
+	while (i) {
+		i--;
+		irq = vdev->config->get_vq_irq(vdev, vi->rq[i]->vq);
+		irq_set_affinity_hint(irq, NULL);
+		free_cpumask_var(vi->rq[i]->affinity_mask);
+		irq = vdev->config->get_vq_irq(vdev, vi->sq[i]->vq);
+		irq_set_affinity_hint(irq, NULL);
+		free_cpumask_var(vi->sq[i]->affinity_mask);
+	}
+}
+
+static int virtnet_init_affinity(struct virtnet_info *vi)
+{
+	struct virtio_device *vdev = vi->vdev;
+	int i, irq;
+
+	for (i = 0; i < vi->num_queue_pairs; i++) {
+		if (!alloc_cpumask_var(&vi->rq[i]->affinity_mask, GFP_KERNEL))
+			goto err_out;
+		if (!alloc_cpumask_var(&vi->sq[i]->affinity_mask, GFP_KERNEL)) {
+			free_cpumask_var(vi->rq[i]->affinity_mask);
+			goto err_out;
+		}
+		cpumask_set_cpu(i, vi->rq[i]->affinity_mask);
+		cpumask_set_cpu(i, vi->sq[i]->affinity_mask);
+
+		irq = vdev->config->get_vq_irq(vdev, vi->rq[i]->vq);
+		if (irq_set_affinity_hint(irq, vi->rq[i]->affinity_mask))
+			goto err_out;
+		irq = vdev->config->get_vq_irq(vdev, vi->sq[i]->vq);
+		if (irq_set_affinity_hint(irq, vi->sq[i]->affinity_mask)) {
+			irq = vdev->config->get_vq_irq(vdev, vi->rq[i]->vq);
+			irq_set_affinity_hint(irq, NULL);
+			goto err_out;
+		}
+	}
+
 	return 0;
+err_out:
+	_virtnet_free_affinity(vi, i);
+	return -ENOMEM;
+}
+
+static void virtnet_free_affinity(struct virtnet_info *vi)
+{
+	_virtnet_free_affinity(vi, vi->num_queue_pairs);
 }
 
 static int virtnet_probe(struct virtio_device *vdev)
 {
-	int err;
+	int i, err;
 	struct net_device *dev;
 	struct virtnet_info *vi;
+	u16 num_queues, num_queue_pairs;
+
+	/* Find if host supports multiqueue virtio_net device */
+	err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
+				offsetof(struct virtio_net_config,
+				num_queues), &num_queues);
+
+	/* We need atleast 2 queue's */
+	if (err || num_queues < 2)
+		num_queues = 2;
+
+	num_queue_pairs = num_queues / 2;
 
 	/* Allocate ourselves a network device with room for our info */
-	dev = alloc_etherdev(sizeof(struct virtnet_info));
+	dev = alloc_etherdev_mq(sizeof(struct virtnet_info), num_queue_pairs);
 	if (!dev)
 		return -ENOMEM;
 
@@ -1103,22 +1404,18 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	/* Set up our device-specific information */
 	vi = netdev_priv(dev);
-	netif_napi_add(dev, &vi->napi, virtnet_poll, napi_weight);
 	vi->dev = dev;
 	vi->vdev = vdev;
 	vdev->priv = vi;
-	vi->pages = NULL;
 	vi->stats = alloc_percpu(struct virtnet_stats);
 	err = -ENOMEM;
 	if (vi->stats == NULL)
-		goto free;
+		goto free_netdev;
 
-	INIT_DELAYED_WORK(&vi->refill, refill_work);
 	mutex_init(&vi->config_lock);
 	vi->config_enable = true;
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
-	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
-	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
+	vi->num_queue_pairs = num_queue_pairs;
 
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1129,9 +1426,17 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
 		vi->mergeable_rx_bufs = true;
 
-	err = init_vqs(vi);
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
+		vi->has_cvq = true;
+
+	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
+	err = virtnet_setup_vqs(vi);
 	if (err)
-		goto free_stats;
+		goto free_netdev;
+
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) &&
+	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
+		dev->features |= NETIF_F_HW_VLAN_FILTER;
 
 	err = register_netdev(dev);
 	if (err) {
@@ -1140,12 +1445,21 @@ static int virtnet_probe(struct virtio_device *vdev)
 	}
 
 	/* Last of all, set up some receive buffers. */
-	try_fill_recv(vi, GFP_KERNEL);
+	for (i = 0; i < num_queue_pairs; i++) {
+		try_fill_recv(vi->rq[i], GFP_KERNEL);
+
+		/* If we didn't even get one input buffer, we're useless. */
+		if (vi->rq[i]->num == 0) {
+			free_unused_bufs(vi);
+			err = -ENOMEM;
+			goto free_recv_bufs;
+		}
+	}
 
-	/* If we didn't even get one input buffer, we're useless. */
-	if (vi->num == 0) {
-		err = -ENOMEM;
-		goto unregister;
+	err = virtnet_init_affinity(vi);
+	if (err) {
+		pr_debug("virtio_net: fail to initialize irq affinity\n");
+		goto free_recv_bufs;
 	}
 
 	/* Assume link up if device can't report link status,
@@ -1158,42 +1472,26 @@ static int virtnet_probe(struct virtio_device *vdev)
 		netif_carrier_on(dev);
 	}
 
-	pr_debug("virtnet: registered device %s\n", dev->name);
+	pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
+		 dev->name, num_queue_pairs);
+
 	return 0;
 
-unregister:
+free_recv_bufs:
+	free_receive_bufs(vi);
 	unregister_netdev(dev);
 free_vqs:
+	for (i = 0; i < num_queue_pairs; i++)
+		cancel_delayed_work_sync(&vi->rq[i]->refill);
 	vdev->config->del_vqs(vdev);
-free_stats:
-	free_percpu(vi->stats);
-free:
+
+free_netdev:
+	free_rq_sq(vi);
+
 	free_netdev(dev);
 	return err;
 }
 
-static void free_unused_bufs(struct virtnet_info *vi)
-{
-	void *buf;
-	while (1) {
-		buf = virtqueue_detach_unused_buf(vi->svq);
-		if (!buf)
-			break;
-		dev_kfree_skb(buf);
-	}
-	while (1) {
-		buf = virtqueue_detach_unused_buf(vi->rvq);
-		if (!buf)
-			break;
-		if (vi->mergeable_rx_bufs || vi->big_packets)
-			give_pages(vi, buf);
-		else
-			dev_kfree_skb(buf);
-		--vi->num;
-	}
-	BUG_ON(vi->num != 0);
-}
-
 static void remove_vq_common(struct virtnet_info *vi)
 {
 	vi->vdev->config->reset(vi->vdev);
@@ -1203,14 +1501,17 @@ static void remove_vq_common(struct virtnet_info *vi)
 
 	vi->vdev->config->del_vqs(vi->vdev);
 
-	while (vi->pages)
-		__free_pages(get_a_page(vi, GFP_KERNEL), 0);
+	free_receive_bufs(vi);
 }
 
 static void __devexit virtnet_remove(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
 
+	/* Stop all the virtqueues. */
+	vdev->config->reset(vdev);
+
+
 	/* Prevent config work handler from accessing the device. */
 	mutex_lock(&vi->config_lock);
 	vi->config_enable = false;
@@ -1220,6 +1521,11 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
 
 	remove_vq_common(vi);
 
+	virtnet_free_affinity(vi);
+
+	/* Free memory for send and receive queues */
+	free_rq_sq(vi);
+
 	flush_work(&vi->config_work);
 
 	free_percpu(vi->stats);
@@ -1230,6 +1536,7 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
 static int virtnet_freeze(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
+	int i;
 
 	/* Prevent config work handler from accessing the device */
 	mutex_lock(&vi->config_lock);
@@ -1237,10 +1544,13 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	mutex_unlock(&vi->config_lock);
 
 	netif_device_detach(vi->dev);
-	cancel_delayed_work_sync(&vi->refill);
+	for (i = 0; i < vi->num_queue_pairs; i++)
+		cancel_delayed_work_sync(&vi->rq[i]->refill);
 
 	if (netif_running(vi->dev))
-		napi_disable(&vi->napi);
+		for (i = 0; i < vi->num_queue_pairs; i++)
+			napi_disable(&vi->rq[i]->napi);
+
 
 	remove_vq_common(vi);
 
@@ -1252,19 +1562,22 @@ static int virtnet_freeze(struct virtio_device *vdev)
 static int virtnet_restore(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
-	int err;
+	int err, i;
 
-	err = init_vqs(vi);
+	err = virtnet_setup_vqs(vi);
 	if (err)
 		return err;
 
 	if (netif_running(vi->dev))
-		virtnet_napi_enable(vi);
+		for (i = 0; i < vi->num_queue_pairs; i++)
+			virtnet_napi_enable(vi->rq[i]);
 
 	netif_device_attach(vi->dev);
 
-	if (!try_fill_recv(vi, GFP_KERNEL))
-		queue_delayed_work(system_nrt_wq, &vi->refill, 0);
+	for (i = 0; i < vi->num_queue_pairs; i++)
+		if (!try_fill_recv(vi->rq[i], GFP_KERNEL))
+			queue_delayed_work(system_nrt_wq,
+					   &vi->rq[i]->refill, 0);
 
 	mutex_lock(&vi->config_lock);
 	vi->config_enable = true;
@@ -1287,7 +1600,7 @@ static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
-	VIRTIO_NET_F_GUEST_ANNOUNCE,
+	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MULTIQUEUE,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1bc7e30..60f09ff 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -61,6 +61,8 @@ struct virtio_net_config {
 	__u8 mac[6];
 	/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
 	__u16 status;
+	/* Total number of RX/TX queues */
+	__u16 num_queues;
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't
-- 
1.7.1

^ permalink raw reply related

* [net-next RFC V4 PATCH 3/4] virtio: introduce a method to get the irq of a specific virtqueue
From: Jason Wang @ 2012-06-25  9:41 UTC (permalink / raw)
  To: mst, akong, habanero, tahm, jwhan, mashirle, krkumar2, edumazet,
	davem, rusty, linux-kernel, virtualization, netdev
  Cc: qemu-devel, kvm
In-Reply-To: <20120625090829.7263.65026.stgit@amd-6168-8-1.englab.nay.redhat.com>

Device specific irq optimizations such as irq affinity may be used by virtio
drivers. So this patch introduce a new method to get the irq of a specific
virtqueue.

After this patch, virtio device drivers could query the irq and do device
specific optimizations. First user would be virtio-net.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/lguest/lguest_device.c |    8 ++++++++
 drivers/s390/kvm/kvm_virtio.c  |    6 ++++++
 drivers/virtio/virtio_mmio.c   |    8 ++++++++
 drivers/virtio/virtio_pci.c    |   12 ++++++++++++
 include/linux/virtio_config.h  |    4 ++++
 5 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 9e8388e..bcd080f 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -392,6 +392,13 @@ static const char *lg_bus_name(struct virtio_device *vdev)
 	return "";
 }
 
+static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct lguest_vq_info *lvq = vq->priv;
+
+	return lvq->config.irq;
+}
+
 /* The ops structure which hooks everything together. */
 static struct virtio_config_ops lguest_config_ops = {
 	.get_features = lg_get_features,
@@ -404,6 +411,7 @@ static struct virtio_config_ops lguest_config_ops = {
 	.find_vqs = lg_find_vqs,
 	.del_vqs = lg_del_vqs,
 	.bus_name = lg_bus_name,
+	.get_vq_irq = lg_get_vq_irq,
 };
 
 /*
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index d74e9ae..a897de2 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev)
 	return "";
 }
 
+static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	return 0x2603;
+}
+
 /*
  * The config ops structure as defined by virtio config
  */
@@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = {
 	.find_vqs = kvm_find_vqs,
 	.del_vqs = kvm_del_vqs,
 	.bus_name = kvm_bus_name,
+	.get_vq_irq = kvm_get_vq_irq,
 };
 
 /*
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index f5432b6..2ba37ed 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -411,6 +411,13 @@ static const char *vm_bus_name(struct virtio_device *vdev)
 	return vm_dev->pdev->name;
 }
 
+static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+	return platform_get_irq(vm_dev->pdev, 0);
+}
+
 static struct virtio_config_ops virtio_mmio_config_ops = {
 	.get		= vm_get,
 	.set		= vm_set,
@@ -422,6 +429,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = {
 	.get_features	= vm_get_features,
 	.finalize_features = vm_finalize_features,
 	.bus_name	= vm_bus_name,
+	.get_vq_irq     = vm_get_vq_irq,
 };
 
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index adb24f2..c062227 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -607,6 +607,17 @@ static const char *vp_bus_name(struct virtio_device *vdev)
 	return pci_name(vp_dev->pci_dev);
 }
 
+static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	if (vp_dev->intx_enabled)
+		return vp_dev->pci_dev->irq;
+	else
+		return vp_dev->msix_entries[info->msix_vector].vector;
+}
+
 static struct virtio_config_ops virtio_pci_config_ops = {
 	.get		= vp_get,
 	.set		= vp_set,
@@ -618,6 +629,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
 	.get_features	= vp_get_features,
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
+	.get_vq_irq     = vp_get_vq_irq,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index fc457f4..acd6930 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -98,6 +98,9 @@
  *	vdev: the virtio_device
  *      This returns a pointer to the bus name a la pci_name from which
  *      the caller can then copy.
+ * @get_vq_irq: get the irq numer of the specific virt queue.
+ *      vdev: the virtio_device
+ *      vq: the virtqueue
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -116,6 +119,7 @@ struct virtio_config_ops {
 	u32 (*get_features)(struct virtio_device *vdev);
 	void (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
+	int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
-- 
1.7.1

^ permalink raw reply related

* [net-next RFC V4 PATCH 2/4] virtio_ring: move queue_index to vring_virtqueue
From: Jason Wang @ 2012-06-25  9:17 UTC (permalink / raw)
  To: krkumar2, habanero, rusty, mst, netdev, mashirle, linux-kernel,
	virtualization, edumazet, tahm, jwhan, akong, davem
  Cc: kvm
In-Reply-To: <20120625090829.7263.65026.stgit@amd-6168-8-1.englab.nay.redhat.com>

Instead of storing the queue index in virtio infos, this patch moves them to
vring_virtqueue and introduces helpers to set and get the value. This would
simplify the management and tracing.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_mmio.c |    5 +----
 drivers/virtio/virtio_pci.c  |   12 +++++-------
 drivers/virtio/virtio_ring.c |   17 +++++++++++++++++
 include/linux/virtio.h       |    4 ++++
 4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 453db0c..f5432b6 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -131,9 +131,6 @@ struct virtio_mmio_vq_info {
 	/* the number of entries in the queue */
 	unsigned int num;
 
-	/* the index of the queue */
-	int queue_index;
-
 	/* the virtual address of the ring queue */
 	void *queue;
 
@@ -324,7 +321,6 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
 		err = -ENOMEM;
 		goto error_kmalloc;
 	}
-	info->queue_index = index;
 
 	/* Allocate pages for the queue - start with a queue as big as
 	 * possible (limited by maximum size allowed by device), drop down
@@ -363,6 +359,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
 		goto error_new_virtqueue;
 	}
 
+	virtqueue_set_queue_index(vq, index);
 	vq->priv = info;
 	info->vq = vq;
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 2e03d41..adb24f2 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -79,9 +79,6 @@ struct virtio_pci_vq_info
 	/* the number of entries in the queue */
 	int num;
 
-	/* the index of the queue */
-	int queue_index;
-
 	/* the virtual address of the ring queue */
 	void *queue;
 
@@ -202,11 +199,11 @@ static void vp_reset(struct virtio_device *vdev)
 static void vp_notify(struct virtqueue *vq)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
-	struct virtio_pci_vq_info *info = vq->priv;
 
 	/* we write the queue's selector into the notification register to
 	 * signal the other end */
-	iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+	iowrite16(virtqueue_get_queue_index(vq),
+		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
 /* Handle a configuration change: Tell driver if it wants to know. */
@@ -402,7 +399,6 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 	if (!info)
 		return ERR_PTR(-ENOMEM);
 
-	info->queue_index = index;
 	info->num = num;
 	info->msix_vector = msix_vec;
 
@@ -425,6 +421,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
 		goto out_activate_queue;
 	}
 
+	virtqueue_set_queue_index(vq, index);
 	vq->priv = info;
 	info->vq = vq;
 
@@ -467,7 +464,8 @@ static void vp_del_vq(struct virtqueue *vq)
 	list_del(&info->node);
 	spin_unlock_irqrestore(&vp_dev->lock, flags);
 
-	iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+	iowrite16(virtqueue_get_queue_index(vq),
+		vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
 
 	if (vp_dev->msix_enabled) {
 		iowrite16(VIRTIO_MSI_NO_VECTOR,
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5aa43c3..9c5aeea 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -106,6 +106,9 @@ struct vring_virtqueue
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	void (*notify)(struct virtqueue *vq);
 
+	/* Index of the queue */
+	int queue_index;
+
 #ifdef DEBUG
 	/* They're supposed to lock for us. */
 	unsigned int in_use;
@@ -171,6 +174,20 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
 	return head;
 }
 
+void virtqueue_set_queue_index(struct virtqueue *_vq, int queue_index)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	vq->queue_index = queue_index;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_queue_index);
+
+int virtqueue_get_queue_index(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	return vq->queue_index;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_queue_index);
+
 /**
  * virtqueue_add_buf - expose buffer to other end
  * @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8efd28a..0d8ed46 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -50,6 +50,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
 unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
+void virtqueue_set_queue_index(struct virtqueue *vq, int queue_index);
+
+int virtqueue_get_queue_index(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus

^ permalink raw reply related

* [net-next RFC V4 PATCH 1/4] virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE
From: Jason Wang @ 2012-06-25  9:17 UTC (permalink / raw)
  To: krkumar2, habanero, rusty, mst, netdev, mashirle, linux-kernel,
	virtualization, edumazet, tahm, jwhan, akong, davem
  Cc: kvm
In-Reply-To: <20120625090829.7263.65026.stgit@amd-6168-8-1.englab.nay.redhat.com>

From: Krishna Kumar <krkumar2@in.ibm.com>

Introduce VIRTIO_NET_F_MULTIQUEUE.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/virtio_net.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 2470f54..1bc7e30 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -51,6 +51,7 @@
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support */
 #define VIRTIO_NET_F_GUEST_ANNOUNCE 21	/* Guest can announce device on the
 					 * network */
+#define VIRTIO_NET_F_MULTIQUEUE	22	/* Device supports multiple TXQ/RXQ */
 
 #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
 #define VIRTIO_NET_S_ANNOUNCE	2	/* Announcement is needed */

^ permalink raw reply related

* [net-next RFC V4 PATCH 0/4] Multiqueue virtio-net
From: Jason Wang @ 2012-06-25  9:16 UTC (permalink / raw)
  To: krkumar2, habanero, rusty, mst, netdev, mashirle, linux-kernel,
	virtualization, edumazet, tahm, jwhan, akong, davem
  Cc: kvm

Hello All:

This series is an update version of multiqueue virtio-net driver based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the
packets reception and transmission. Please review and comments.

Test Environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599

Test Summary:

- Highlights: huge improvements on TCP_RR test
- Lowlights: regression on small packet transmission, higher cpu utilization
             than single queue, need further optimization

Analysis of the performance result:

- I count the number of packets sending/receiving during the test, and
  multiqueue show much more ability in terms of packets per second.

- For the tx regression, multiqueue send about 1-2 times of more packets
  compared to single queue, and the packets size were much smaller than single
  queue does. I suspect tcp does less batching in multiqueue, so I hack the
  tcp_write_xmit() to forece more batching, multiqueue works as well as
  singlequeue for both small transmission and throughput

- I didn't pack the accelerate RFS with virtio-net in this sereis as it still
  need further shaping, for the one that interested in this please see:
  http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html

Detail result:

Test results: smp = 2 pin vhosts and vcpus in the same node - 1 sq 2 mq(q=2)
- TCP_MAERTS (Guest to external Host):

sessions size throughput1 throughput2   norm1 norm2:
1 64 424.91 401.49 94% 13.20 12.35 93%
2 64 1211.06 878.31 72% 24.35 15.80 64%
4 64 1292.46 1081.78 83% 26.46 20.14 76%
8 64 1355.57 826.06 60% 27.88 15.32 54%
1 256 1489.37 1406.51 94% 46.93 43.72 93%
2 256 4936.19 2688.46 54% 100.24 46.39 46%
4 256 5251.10 2900.08 55% 107.98 50.47 46%
8 256 5270.11 3562.10 67% 108.26 66.19 61%
1 512 1877.57 2697.64 143% 57.29 83.02 144%
2 512 9183.49 5056.33 55% 205.26 86.43 42%
4 512 9349.59 8918.77 95% 212.78 176.99 83%
8 512 9318.29 8947.69 96% 216.55 176.34 81%
1 1024 4746.47 4669.10 98% 143.87 140.63 97%
2 1024 7406.49 4738.58 63% 245.16 131.04 53%
4 1024 8955.70 9337.82 104% 254.49 208.66 81%
8 1024 9384.48 9378.41 99% 260.24 197.77 75%
1 2048 8947.29 8955.98 100% 338.39 338.60 100%
2 2048 9259.04 9383.77 101% 326.02 224.92 68%
4 2048 9043.04 9325.38 103% 305.09 214.77 70%
8 2048 9358.04 8413.21 89% 291.07 189.74 65%
1 4096 8882.55 8885.40 100% 329.83 326.42 98%
2 4096 9247.07 9387.49 101% 345.55 240.51 69%
4 4096 9365.30 9341.82 99% 333.40 211.97 63%
8 4096 9397.64 8472.35 90% 312.21 182.51 58%
1 16384 8843.82 8725.50 98% 315.06 312.29 99%
2 16384 7978.70 9176.16 115% 280.34 317.62 113%
4 16384 8906.91 9251.15 103% 312.85 226.13 72%
8 16384 9385.36 9401.39 100% 341.53 207.53 60%

- TCP_RR

sessions size trans1 trans2   norm1 norm2
50 1 63021.83 93003.57 147% 2360.36 1972.08 83%
100 1 54599.87 93724.43 171% 2042.64 2119.50 103%
250 1 86546.75 119738.71 138% 2414.80 2508.14 103%
50 64 59785.19 97761.24 163% 2264.59 2136.85 94%
100 64 56028.20 103233.18 184% 2063.65 2260.91 109%
250 64 99116.82 113948.86 114% 2452.77 2393.88 97%
50 128 58025.96 84971.26 146% 2284.48 1907.75 83%
100 128 66302.76 83110.60 125% 2579.87 1881.60 72%
250 128 109262.23 137571.13 125% 2603.96 2911.55 111%
50 256 59766.57 77370.93 129% 2393.53 2018.02 84%
100 256 60862.02 80737.50 132% 2426.71 2024.51 83%
250 256 90076.66 141780.44 157% 2453.06 3056.27 124%

- TCP_STREAM (External Host to Guest)

sessions size throughput1 throughput2   norm1 norm2
1 64 373.53 575.24 154% 15.37 33.44 217%
2 64 1314.17 1087.49 82% 54.21 32.97 60%
4 64 3352.93 2404.70 71% 136.35 53.72 39%
8 64 6866.17 6587.17 95% 252.06 147.03 58%
1 256 677.37 767.65 113% 26.59 29.80 112%
2 256 4386.83 4550.40 103% 164.17 174.81 106%
4 256 8446.27 8814.95 104% 307.02 186.95 60%
8 256 6529.46 9370.44 143% 237.52 207.77 87%
1 512 2542.48 1371.31 53% 102.14 51.68 50%
2 512 9183.10 9354.81 101% 319.85 333.62 104%
4 512 8488.12 9230.58 108% 303.14 258.55 85%
8 512 8115.66 9393.33 115% 293.83 219.82 74%
1 1024 2653.16 2507.78 94% 100.88 96.56 95%
2 1024 6490.12 5033.35 77% 316.74 102.74 32%
4 1024 7582.51 9183.60 121% 298.87 210.48 70%
8 1024 8304.13 9341.48 112% 298.06 242.32 81%
1 2048 2495.06 2508.12 100% 99.60 102.96 103%
2 2048 8968.12 7554.57 84% 324.57 175.56 54%
4 2048 8627.85 9192.90 106% 310.91 205.98 66%
8 2048 7781.72 9422.10 121% 284.31 208.77 73%
1 4096 6067.46 4489.04 73% 233.63 171.01 73%
2 4096 9029.43 8684.09 96% 323.17 191.19 59%
4 4096 8284.30 9253.33 111% 306.93 268.67 87%
8 4096 7789.39 9388.32 120% 283.97 238.28 83%
1 16384 8660.90 9313.87 107% 318.88 315.61 98%
2 16384 8646.37 8871.30 102% 318.81 318.88 100%
4 16384 8386.02 9397.13 112% 306.28 205.44 67%
8 16384 8006.27 9404.98 117% 288.41 224.83 77%

Test results: smp = 4 no pinning - 1 sq 2 mq(q=4)

- TCP_MAERTS - Guest to External Host

sessions size throughput1 throughput2   norm1 norm2
1 64 513.13 331.27 64% 15.86 10.39 65%
2 64 1325.72 543.38 40% 26.54 12.59 47%
4 64 2735.18 1168.01 42% 37.61 12.63 33%
8 64 3084.53 2278.38 73% 41.20 27.01 65%
1 256 1313.25 996.64 75% 41.23 28.59 69%
2 256 3425.64 3337.68 97% 71.57 61.69 86%
4 256 8174.33 4252.54 52% 126.63 49.68 39%
8 256 9365.96 8411.91 89% 145.00 101.64 70%
1 512 2363.07 2370.35 100% 73.00 73.52 100%
2 512 6047.23 3636.29 60% 135.43 61.02 45%
4 512 9330.76 7165.99 76% 184.18 95.50 51%
8 512 8178.20 9221.16 112% 162.52 122.44 75%
1 1024 3196.63 4016.30 125% 98.41 120.75 122%
2 1024 4940.41 9296.51 188% 140.03 191.95 137%
4 1024 5696.43 9018.76 158% 147.30 150.01 101%
8 1024 9355.07 9342.43 99% 216.90 140.48 64%
1 2048 4248.99 5189.24 122% 131.95 157.01 118%
2 2048 9021.22 9262.81 102% 242.37 198.21 81%
4 2048 8357.81 9241.94 110% 225.88 180.61 79%
8 2048 8024.56 9327.27 116% 205.75 145.76 70%
1 4096 9270.51 8199.32 88% 326.88 269.53 82%
2 4096 9151.10 9348.33 102% 257.92 214.46 83%
4 4096 9243.34 9294.30 100% 281.20 164.35 58%
8 4096 9020.35 9339.32 103% 249.87 143.54 57%
1 16384 9357.69 9355.69 99% 319.15 322.94 101%
2 16384 9319.39 9076.63 97% 319.26 215.64 67%
4 16384 9352.99 9183.41 98% 308.98 145.26 47%
8 16384 9384.67 9353.54 99% 322.49 155.78 48%

- TCP_RR
sessions size throughput1 throughput2   norm1 norm2
50 1 59677.80 71761.59 120% 2020.92 1585.19 78%
100 1 55324.42 81302.10 146% 1889.49 1439.22 76%
250 1 73973.48 155031.82 209% 2495.73 2195.91 87%
50 64 53093.57 67893.59 127% 1879.41 1266.19 67%
100 64 59186.33 60128.69 101% 2044.43 1084.57 53%
250 64 64159.90 137389.38 214% 2186.02 1994.61 91%
50 128 54323.45 51615.38 95% 1924.99 908.08 47%
100 128 57543.49 69999.21 121% 2049.99 1266.95 61%
250 128 71541.81 123915.35 173% 2448.38 1845.62 75%
50 256 57184.12 68314.09 119% 2204.47 1393.02 63%
100 256 51983.40 61931.00 119% 1897.89 1216.24 64%
250 256 66542.32 119435.53 179% 2267.97 1887.71 83%

- TCP_STREAM - External Host to Guest
sessions size throughput1 throughput2   norm1 norm2
1 64 589.10 634.81 107% 27.01 34.51 127%
2 64 1805.57 1442.49 79% 70.44 39.59 56%
4 64 3066.91 3869.26 126% 121.89 84.96 69%
8 64 6602.37 7626.61 115% 211.07 116.27 55%
1 256 775.25 2092.37 269% 26.38 115.47 437%
2 256 6213.63 3527.24 56% 191.89 84.18 43%
4 256 7333.13 9190.87 125% 230.74 218.05 94%
8 256 6776.23 9396.71 138% 216.14 153.14 70%
1 512 4308.25 3536.38 82% 160.93 140.38 87%
2 512 7159.09 9212.97 128% 240.80 201.42 83%
4 512 7095.49 9241.70 130% 229.77 193.26 84%
8 512 6935.59 9398.09 135% 221.37 152.36 68%
1 1024 5566.68 6155.74 110% 250.52 245.05 97%
2 1024 7303.83 9212.72 126% 299.33 229.91 76%
4 1024 7179.31 9334.93 130% 233.62 192.51 82%
8 1024 7080.91 9396.46 132% 226.37 173.81 76%
1 2048 7477.66 2671.34 35% 250.25 96.85 38%
2 2048 6743.70 8986.82 133% 295.51 230.07 77%
4 2048 7186.19 9239.56 128% 228.13 219.78 96%
8 2048 6883.19 9375.34 136% 221.11 164.59 74%
1 4096 7582.83 5040.55 66% 291.98 178.36 61%
2 4096 7617.98 9345.37 122% 245.74 197.78 80%
4 4096 7157.87 9383.81 131% 226.80 188.65 83%
8 4096 5916.83 9401.45 158% 189.64 147.61 77%
1 16384 8417.28 5351.11 63% 309.57 301.47 97%
2 16384 8426.42 9303.45 110% 302.13 197.14 65%
4 16384 5695.26 8678.73 152% 180.68 192.30 106%
8 16384 6761.33 9374.67 138% 214.50 160.25 74%

Changes from V3:

- Rebase to the net-next
- Let queue 2 to be the control virtqueue to obey the spec
- Prodives irq affinity
- Choose txq based on processor id

References:

- V3: http://lwn.net/Articles/467283/

---

Jason Wang (3):
      virtio_ring: move queue_index to vring_virtqueue
      virtio: introduce a method to get the irq of a specific virtqueue
      virtio_net: multiqueue support

Krishna Kumar (1):
      virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE


 drivers/lguest/lguest_device.c |    8 
 drivers/net/virtio_net.c       |  695 +++++++++++++++++++++++++++++-----------
 drivers/s390/kvm/kvm_virtio.c  |    6 
 drivers/virtio/virtio_mmio.c   |   13 +
 drivers/virtio/virtio_pci.c    |   24 +
 drivers/virtio/virtio_ring.c   |   17 +
 include/linux/virtio.h         |    4 
 include/linux/virtio_config.h  |    4 
 include/linux/virtio_net.h     |    3 
 9 files changed, 572 insertions(+), 202 deletions(-)

-- 
Jason Wang

^ permalink raw reply

* Re: [Qemu-devel] [PATCH 2/2] virtio-scsi: Implement hotplug support for virtio-scsi
From: Stefan Hajnoczi @ 2012-06-25  9:10 UTC (permalink / raw)
  To: mc
  Cc: zwanp, linuxram, qemu-devel, senwang, Anthony Liguori,
	Paolo Bonzini, virtualization, Andreas Färber
In-Reply-To: <20120625035113.Horde.eaU3vZir309P6Bhxk01CidA@imap.linux.ibm.com>

On Mon, Jun 25, 2012 at 03:51:13AM -0400, mc@linux.vnet.ibm.com wrote:
> 
> Quoting Stefan Hajnoczi <stefanha@gmail.com>:
> 
> >On Wed, Jun 20, 2012 at 7:47 AM, Cong Meng <mc@linux.vnet.ibm.com> wrote:
> >>Implement the hotplug() and hot_unplug() interfaces in virtio-scsi,
> >>by signal
> >>the virtio_scsi.ko in guest kernel via event virtual queue.
> >>
> >>The counterpart patch of virtio_scsi.ko will be sent soon in another thread.
> >
> >>Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
> >>Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
> >>---
> >> hw/virtio-scsi.c |   72
> >>+++++++++++++++++++++++++++++++++++++++++++++++++++--
> >> 1 files changed, 69 insertions(+), 3 deletions(-)
> >
> >I compared against the virtio-scsi specification and this looks good:
> >http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
> >
> >Dropped events and event throttling are not implemented by this patch.
> > This means that the guest can miss events if it runs out of event
> >queue elements.  A scenario that might be able to trigger this is if
> >multiple LUNs are hotplugged in a single QEMU monitor callback.
> >
> >Implementing dropped events is easy in hw/virtio-scsi.c.  Keep a bool
> >or counter of dropped events and report them when the guest kicks us
> >with a free event element (virtio_scsi_handle_event).
> 
> Yes. It's easy to do this in qemu. But I'm not sure what should be done
> in virtio-scsi.ko to respond the "VIRTIO_SCSI_T_EVENTS_MISSED" event.
> The spec says "poll the logical units for unit attention conditions", or
> just a whole bus rescan?

I'm not sure what the answer is either, maybe you can find an existing
SCSI LLD that does what you need.

Stefan

^ permalink raw reply

* Re: [Qemu-devel] [PATCH 2/2] virtio-scsi: Implement hotplug support for virtio-scsi
From: mc @ 2012-06-25  7:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: stefanha, zwanp, linuxram, qemu-devel, senwang, Anthony Liguori,
	Paolo Bonzini, virtualization, Andreas Färber
In-Reply-To: <CAJSP0QWoRt-GgWytPpSBEjNDtSHHRXOs7hDnyUpspW-knYPY5g@mail.gmail.com>


Quoting Stefan Hajnoczi <stefanha@gmail.com>:

> On Wed, Jun 20, 2012 at 7:47 AM, Cong Meng <mc@linux.vnet.ibm.com> wrote:
>> Implement the hotplug() and hot_unplug() interfaces in virtio-scsi,  
>> by signal
>> the virtio_scsi.ko in guest kernel via event virtual queue.
>>
>> The counterpart patch of virtio_scsi.ko will be sent soon in another thread.
>
>> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
>> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
>> ---
>>  hw/virtio-scsi.c |   72  
>> +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  1 files changed, 69 insertions(+), 3 deletions(-)
>
> I compared against the virtio-scsi specification and this looks good:
> http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
>
> Dropped events and event throttling are not implemented by this patch.
>  This means that the guest can miss events if it runs out of event
> queue elements.  A scenario that might be able to trigger this is if
> multiple LUNs are hotplugged in a single QEMU monitor callback.
>
> Implementing dropped events is easy in hw/virtio-scsi.c.  Keep a bool
> or counter of dropped events and report them when the guest kicks us
> with a free event element (virtio_scsi_handle_event).

Yes. It's easy to do this in qemu. But I'm not sure what should be done
in virtio-scsi.ko to respond the "VIRTIO_SCSI_T_EVENTS_MISSED" event.
The spec says "poll the logical units for unit attention conditions", or
just a whole bus rescan?

>
> Stefan

^ permalink raw reply

* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-06-22 18:54 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: gregkh@linuxfoundation.org, ohering@suse.com,
	linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
	apw@canonical.com, devel@linuxdriverproject.org
In-Reply-To: <alpine.LNX.2.00.1206220125410.7671@swampdragon.chaosbits.net>



> -----Original Message-----
> From: Jesper Juhl [mailto:jj@chaosbits.net]
> Sent: Thursday, June 21, 2012 7:31 PM
> To: KY Srinivasan
> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; ohering@suse.com;
> apw@canonical.com
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
> 
> On Thu, 21 Jun 2012, K. Y. Srinivasan wrote:
> 
> > This patchset expands the KVP (Key Value Pair) functionality to
> > implement the mechanism to get/set IP addresses in the guest. This
> > functionality is used in Windows Server 2012 to implement VM
> > replication functionality. The way IP configuration information
> > is managed is distro specific. The current implementation supports
> > RedHat way of doing things. We will expand support to other distros
> > incrementally.
> >
> So there is going to be a continuous flow of patches to add support for
> new distros (Arch Linux, Slackware, Linux Mint, SuSE, Debian, etc etc) and
> if different versions of a distro handles things differently then you are
> also going to deal with that? Might be fine, but it just sounds a bit
> scary to me to try to support m different distros, each in n different
> versions from the kernel... Couldn't this somehow be done once and for all
> in a distro neutral way?
> Just asking :-)

Most of the code is not going to be distro specific. Obviously the kernel component
Is not distro specific. To the extent that the way the IP configuration state is managed
differently across distros, the user level component of the KVP code needs to 
accommodate that. Even here most of the code is distro independent. 

Regards,

K. Y

^ permalink raw reply

* Re: [PATCH 00/13] drivers: hv: kvp
From: Greg KH @ 2012-06-22 13:25 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: apw@canonical.com, devel@linuxdriverproject.org, ohering@suse.com,
	linux-kernel@vger.kernel.org, virtualization@lists.osdl.org
In-Reply-To: <426367E2313C2449837CD2DE46E7EAF9155EC47A@SN2PRD0310MB382.namprd03.prod.outlook.com>

On Fri, Jun 22, 2012 at 01:06:53PM +0000, KY Srinivasan wrote:
> Are you still missing it; do you want me to resend the whole set?

Nope, it showed up a few hours later, thanks.  You really should get
that fixed...

greg k-h

^ permalink raw reply

* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-06-22 13:06 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	virtualization@lists.osdl.org, ohering@suse.com,
	apw@canonical.com
In-Reply-To: <20120621224737.GA5933@kroah.com>

Are you still missing it; do you want me to resend the whole set?

K. Y

> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Thursday, June 21, 2012 6:48 PM
> To: KY Srinivasan
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; ohering@suse.com; apw@canonical.com
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
> 
> On Thu, Jun 21, 2012 at 02:30:00PM -0700, K. Y. Srinivasan wrote:
> > This patchset expands the KVP (Key Value Pair) functionality to
> > implement the mechanism to get/set IP addresses in the guest. This
> > functionality is used in Windows Server 2012 to implement VM
> > replication functionality. The way IP configuration information
> > is managed is distro specific. The current implementation supports
> > RedHat way of doing things. We will expand support to other distros
> > incrementally.
> 
> I'm missing 09/13, did it get stuck somewhere?
> 
> 

^ permalink raw reply

* Re: [PATCH 00/13] drivers: hv: kvp
From: Jesper Juhl @ 2012-06-21 23:30 UTC (permalink / raw)
  To: K. Y. Srinivasan
  Cc: gregkh, linux-kernel, devel, virtualization, ohering, apw
In-Reply-To: <1340314200-27078-1-git-send-email-kys@microsoft.com>

On Thu, 21 Jun 2012, K. Y. Srinivasan wrote:

> This patchset expands the KVP (Key Value Pair) functionality to
> implement the mechanism to get/set IP addresses in the guest. This
> functionality is used in Windows Server 2012 to implement VM
> replication functionality. The way IP configuration information
> is managed is distro specific. The current implementation supports
> RedHat way of doing things. We will expand support to other distros
> incrementally. 
> 
So there is going to be a continuous flow of patches to add support for 
new distros (Arch Linux, Slackware, Linux Mint, SuSE, Debian, etc etc) and 
if different versions of a distro handles things differently then you are 
also going to deal with that? Might be fine, but it just sounds a bit 
scary to me to try to support m different distros, each in n different 
versions from the kernel... Couldn't this somehow be done once and for all 
in a distro neutral way?
Just asking :-)

-- 
Jesper Juhl <jj@chaosbits.net>       http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

^ permalink raw reply

* Re: [PATCH 00/13] drivers: hv: kvp
From: Greg KH @ 2012-06-21 22:47 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: linux-kernel, devel, virtualization, ohering, apw
In-Reply-To: <1340314200-27078-1-git-send-email-kys@microsoft.com>

On Thu, Jun 21, 2012 at 02:30:00PM -0700, K. Y. Srinivasan wrote:
> This patchset expands the KVP (Key Value Pair) functionality to
> implement the mechanism to get/set IP addresses in the guest. This
> functionality is used in Windows Server 2012 to implement VM
> replication functionality. The way IP configuration information
> is managed is distro specific. The current implementation supports
> RedHat way of doing things. We will expand support to other distros
> incrementally. 

I'm missing 09/13, did it get stuck somewhere?

^ permalink raw reply

* KVM Forum and oVirt Workshop Europe 2012 Save the Date
From: KVM Forum 2012 Program Committee @ 2012-06-21 22:45 UTC (permalink / raw)
  To: announce, kvm, libvir-list, qemu-devel, virtualization

KVM is an industry leading open source hypervisor that provides an ideal
platform for datacenter virtualization, virtual desktop infrastructure,
and cloud computing.  Once again, it's time to bring together the
community of developers and users that define the KVM ecosystem for
our annual technical conference.  We will discuss the current state of
affairs and plan for the future of KVM, its surrounding infrastructure,
and management tools.  We are also excited to announce the oVirt Workshop
will run in parallel with the KVM Forum, bringing in a community focused
on enterprise datacenter virtualization management built on KVM.  So mark
your calendar and join us in advancing KVM.

Once again we are colocated with The Linux Foundation's LinuxCon,
Based on feedback from last year, this time it's LinuxCon Europe!

Conference: November 7 - 9, 2012
Location: Hotel Fira Palace - Barcelona, Spain

Details regarding registration and proposal submission are forthcoming.

thanks,
your KVM Forum 2012 Program Commitee

^ permalink raw reply

* [PATCH 13/13] Tools: hv: Implement the KVP verb - KVP_OP_SET_IP_INFO
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Implement the KVP_OP_SET_IP_INFO verb. This operation sets the IP information
on the specified interface.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 198 insertions(+), 0 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 569f4d7..8eefdb5 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -847,6 +847,183 @@ getaddr_done:
 	return error;
 }
 
+int parse_ip_val_buffer(char *in_buf, int *offset, char *out_buf, int out_len)
+{
+	int end;
+	char *x;
+	char *start;
+
+	/*
+	 * in_buf has sequence of characters that are seperated by
+	 * the character ';'. The last sequence is terminated by ';'.
+	 */
+	start = in_buf + *offset;
+
+	x = strchr(start, ';');
+	if (x) {
+		*x = 0;
+		if ((x - start) <= out_len) {
+			strcpy(out_buf, start);
+			strcat(out_buf, "\n");
+			*offset = (x - start) + 1;
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static int kvp_set_ip_info(char *if_name, struct hv_kvp_ipaddr_value *new_val)
+{
+	int error = 0;
+	char if_path[256];
+	char if_tmp_path[256];
+	FILE *file;
+	char addr[INET6_ADDRSTRLEN];
+	int i;
+	char cmd[256];
+	char str[256];
+	int offset;
+
+	/*
+	 * Set the specified info in the appropriate config file
+	 * for the specified interface and flush the settings.
+	 * Currently we support distributions that maintain
+	 * network configuration information in file:
+	 * /etc/sysconfig/network-scripts/icfg-ifname
+	 */
+	memset(if_path, 0, sizeof(if_path));
+	strcat(if_path, "/etc/sysconfig/network-scripts/ifcfg-");
+	strcat(if_path, if_name);
+
+	memset(if_tmp_path, 0, sizeof(if_tmp_path));
+	strcat(if_tmp_path, "/var/opt/hyperv/.kvp_ifcfg-");
+	strcat(if_tmp_path, if_name);
+
+	file = fopen(if_path, "r");
+	if (file == NULL) {
+		/*
+		 * This system maintains the interface configuration
+		 * information in a different location. Need to add
+		 * code to deal with that. For now fail the operation.
+		 */
+		return 1;
+	}
+
+	fclose(file);
+	/*
+	 * The host gives us complete configuration information for the
+	 * interface; just overwrite the current info.
+	 * To deal with intermediate failues, first write a temp file.
+	 */
+
+	file = fopen(if_tmp_path, "w");
+
+	if (file == NULL)
+		return 1;
+
+	if (new_val->dhcp_enabled) {
+		error = fputs("BOOTPROTO=dhcp\n", file);
+		if (error == EOF)
+			goto setval_error;
+
+		/*
+		 * We are done!.
+		 */
+		goto setval_done;
+	}
+
+	/*
+	 * Write the configuration for ipaddress, netmask, gateway and
+	 * name servers.
+	 */
+	i = 0;
+	offset = 0;
+	memset(addr, 0, sizeof(addr));
+	memset(str, 0, sizeof(str));
+
+	while (parse_ip_val_buffer((char *)new_val->ip_addr, &offset, addr,
+					(MAX_IP_ADDR_SIZE * 2))) {
+		sprintf(str, "IPADDR_%d=", i++);
+		strcat(str, addr);
+		error = fputs(str, file);
+		if (error == EOF)
+			goto setval_error;
+	}
+
+	i = 0;
+	offset = 0;
+	memset(addr, 0, sizeof(addr));
+	memset(str, 0, sizeof(str));
+
+	while (parse_ip_val_buffer((char *)new_val->sub_net, &offset, addr,
+					(MAX_IP_ADDR_SIZE * 2))) {
+		sprintf(str, "NETMAK_%d=", i++);
+		strcat(str, addr);
+		error = fputs(str, file);
+		if (error == EOF)
+			goto setval_error;
+	}
+
+	i = 0;
+	offset = 0;
+	memset(addr, 0, sizeof(addr));
+	memset(str, 0, sizeof(str));
+
+	while (parse_ip_val_buffer((char *)new_val->gate_way, &offset, addr,
+					(MAX_IP_ADDR_SIZE * 2))) {
+		sprintf(str, "GATEWAY_%d=", i++);
+		strcat(str, addr);
+		error = fputs(str, file);
+		if (error == EOF)
+			goto setval_error;
+	}
+
+	i = 0;
+	offset = 0;
+	memset(addr, 0, sizeof(addr));
+	memset(str, 0, sizeof(str));
+
+	while (parse_ip_val_buffer((char *)new_val->dns_addr, &offset, addr,
+					(MAX_IP_ADDR_SIZE * 2))) {
+		sprintf(str, "DNS%d=", i++);
+		strcat(str, addr);
+		error = fputs(str, file);
+		if (error == EOF)
+			goto setval_error;
+	}
+
+	error = fputs("ONBOOT=yes\n", file);
+	if (error == EOF)
+		goto setval_error;
+
+	error = fputs("PEERDNS=yes\n", file);
+	if (error == EOF)
+		goto setval_error;
+
+setval_done:
+	/*
+	 * Now move the config file to the final location.
+	 */
+	fclose(file);
+	memset(cmd, 0, sizeof(cmd));
+	strcat(cmd, "mv ");
+	strcat(cmd, if_tmp_path);
+	strcat(cmd, "  ");
+	strcat(cmd, if_path);
+	system(cmd);
+
+	/*
+	 * Now restart the network to flush the parameters.
+	 */
+	memset(cmd, 0, sizeof(cmd));
+	strcat(cmd, "service network restart");
+	system(cmd);
+	return 0;
+
+setval_error:
+	fclose(file);
+	return error;
+}
 
 static int
 kvp_get_domain_name(char *buffer, int length)
@@ -1048,6 +1225,27 @@ int main(void)
 			free(if_name);
 			break;
 
+		case KVP_OP_SET_IP_INFO:
+			kvp_ip_val = &hv_msg->body.kvp_ip_val;
+			if_name = kvp_get_if_name(
+						(char *)kvp_ip_val->adapter_id,
+						MAX_ADAPTER_ID_SIZE);
+			if (if_name == NULL) {
+				/*
+				 * We could not map the guid to an
+				 * interface name; return error.
+				 */
+				*((int *)(&hv_msg->kvp_hdr.operation)) =
+					HV_ERROR_DEVICE_NOT_CONNECTED;
+				break;
+			}
+			error = kvp_set_ip_info(if_name, kvp_ip_val);
+			if (error)
+				*((int *)(&hv_msg->kvp_hdr.operation)) =
+					HV_ERROR_DEVICE_NOT_CONNECTED;
+
+			free(if_name);
+			break;
 		case KVP_OP_SET:
 			if (kvp_key_add_or_modify(hv_msg->kvp_hdr.pool,
 					hv_msg->body.kvp_set.data.key,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 12/13] Tools: hv: Implement the KVP verb - KVP_OP_GET_IP_INFO
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Now implement the KVP verb - KVP_OP_GET_IP_INFO. This operation retrieves IP
information for the specified interface.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   95 ++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index dcf67fa..569f4d7 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -41,6 +41,7 @@
 #include <syslog.h>
 #include <sys/stat.h>
 #include <fcntl.h>
+#include <dirent.h>
 
 /*
  * KVP protocol: The user mode component first registers with the
@@ -490,6 +491,64 @@ done:
 	return;
 }
 
+/*
+ * Retrieve an interface name corresponding to the specified guid.
+ * If there is a match, the function returns a pointer
+ * to the interface name and if not, a NULL is returned.
+ * If a match is found, the caller is responsible for
+ * freeing the memory.
+ */
+
+static char *kvp_get_if_name(char *guid, int guid_len)
+{
+	DIR *dir;
+	struct dirent *entry;
+	FILE    *file;
+	char    *p, *q;
+	char    *if_name = NULL;
+	char    buf[256];
+	char *kvp_net_dir = "/sys/class/net/";
+	char dev_id[100];
+
+	dir = opendir(kvp_net_dir);
+	if (dir == NULL)
+		return NULL;
+
+	memset(dev_id, 0, sizeof(dev_id));
+	strcat(dev_id, kvp_net_dir);
+	q = dev_id + strlen(kvp_net_dir);
+
+	while ((entry = readdir(dir)) != NULL) {
+		/*
+		 * Set the state for the next pass.
+		 */
+		*q = '\0';
+		strcat(dev_id, entry->d_name);
+		strcat(dev_id, "/device/device_id");
+
+		file = fopen(dev_id, "r");
+		if (file == NULL)
+			continue;
+
+		p = fgets(buf, sizeof(buf), file);
+		if (p) {
+			if (!strncmp(p, guid, guid_len)) {
+				/*
+				 * Found the guid match; return the interface
+				 * name. The caller will free the memory.
+				 */
+				if_name = strdup(entry->d_name);
+				break;
+			}
+		}
+		fclose(file);
+	}
+
+	closedir(dir);
+	return if_name;
+}
+
+
 static void kvp_process_ipconfig_file(char *config_file,
 					char *config_buf, int len,
 					int element_size, int offset)
@@ -651,7 +710,7 @@ static int kvp_process_ip_address(void *addrp,
 }
 
 static int
-kvp_get_ip_address(int family, char *if_name, int op,
+kvp_get_ip_info(int family, char *if_name, int op,
 		 void  *out_buffer, int length)
 {
 	struct ifaddrs *ifap;
@@ -858,6 +917,10 @@ int main(void)
 	char	*key_name;
 	int	op;
 	int	pool;
+	char    *if_name;
+	struct hv_kvp_ipaddr_value *kvp_ip_val;
+
+
 
 	daemon(1, 0);
 	openlog("KVP", 0, LOG_USER);
@@ -959,6 +1022,32 @@ int main(void)
 			}
 			continue;
 
+		case KVP_OP_GET_IP_INFO:
+			kvp_ip_val = &hv_msg->body.kvp_ip_val;
+			if_name = kvp_get_if_name(
+						(char *)kvp_ip_val->adapter_id,
+						MAX_ADAPTER_ID_SIZE);
+			if (if_name == NULL) {
+				/*
+				 * We could not map the guid to an
+				 * interface name; return error.
+				 */
+				*((int *)(&hv_msg->kvp_hdr.operation)) =
+					HV_ERROR_DEVICE_NOT_CONNECTED;
+				break;
+			}
+			error = kvp_get_ip_info(
+						0, if_name, KVP_OP_GET_IP_INFO,
+						kvp_ip_val,
+						(MAX_IP_ADDR_SIZE * 2));
+
+			if (error)
+				*((int *)(&hv_msg->kvp_hdr.operation)) =
+					HV_ERROR_DEVICE_NOT_CONNECTED;
+
+			free(if_name);
+			break;
+
 		case KVP_OP_SET:
 			if (kvp_key_add_or_modify(hv_msg->kvp_hdr.pool,
 					hv_msg->body.kvp_set.data.key,
@@ -1026,12 +1115,12 @@ int main(void)
 			strcpy(key_value, lic_version);
 			break;
 		case NetworkAddressIPv4:
-			kvp_get_ip_address(AF_INET, NULL, KVP_OP_ENUMERATE,
+			kvp_get_ip_info(AF_INET, NULL, KVP_OP_ENUMERATE,
 				key_value, HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
 			strcpy(key_name, "NetworkAddressIPv4");
 			break;
 		case NetworkAddressIPv6:
-			kvp_get_ip_address(AF_INET6, NULL, KVP_OP_ENUMERATE,
+			kvp_get_ip_info(AF_INET6, NULL, KVP_OP_ENUMERATE,
 				key_value, HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
 			strcpy(key_name, "NetworkAddressIPv6");
 			break;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 11/13] Tools: hv: Gather dhcp information
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Collect information on dhcp setting for the specified interface.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 68a4cc1..dcf67fa 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -524,6 +524,10 @@ static void kvp_get_ipconfig_info(char *if_name,
 	char cmd[512];
 	char *dns_config = "/var/opt/hyperv/.kvp_dns_cfg";
 	char *gw_config = "/var/opt/hyperv/.kvp_gw_cfg";
+	char if_cfg_file[256];
+	FILE *file;
+	char *p;
+	char buf[256];
 
 	memset(cmd, 0, 512);
 	strcat(cmd, "cat /etc/resolv.conf 2>/tmp/null | ");
@@ -569,6 +573,35 @@ static void kvp_get_ipconfig_info(char *if_name,
 	kvp_process_ipconfig_file(gw_config, (char *)buffer->gate_way,
 				(MAX_GATEWAY_SIZE * 2), INET6_ADDRSTRLEN,
 				strlen(gw_config));
+
+	/*
+	 * Get the boot protocol for the interface.
+	 * We will get the info from:
+	 * /etc/sysconfig/network-scripts/ifcfg-ethx
+	 * If need to parse other files to get
+	 * this info, we need to add code to parse
+	 * other files.
+	 */
+	memset(if_cfg_file, 0, sizeof(if_cfg_file));
+	strcat(if_cfg_file, "/etc/sysconfig/network-scripts/ifcfg-");
+	strcat(if_cfg_file, if_name);
+
+	file = fopen(if_cfg_file, "r");
+	if (file == NULL)
+		/*
+		 * Add code to parse other files.
+		 */
+		return;
+
+	while ((p = fgets(buf, sizeof(buf), file)) != NULL) {
+		if (strncmp(p, "BOOTPROTO", 9))
+			continue;
+		if (strncmp(p, "BOOTPROTO=none", 14))
+			buffer->dhcp_enabled = 0;
+		else
+			buffer->dhcp_enabled = 1;
+	}
+
 }
 
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 10/13] Tools: hv: Gather ipv[4,6] gateway information
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Collect ipv4 and ipv6 gateway information.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   80 ++++++++++++++++++++++++++++++++++------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 101ab7f..68a4cc1 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -490,16 +490,40 @@ done:
 	return;
 }
 
+static void kvp_process_ipconfig_file(char *config_file,
+					char *config_buf, int len,
+					int element_size, int offset)
+{
+	char buf[256];
+	char *p;
+	char *x;
+	FILE *file;
+
+	file = fopen(config_file, "r");
+	if (file == NULL)
+		return;
+
+	if (offset == 0)
+		memset(config_buf, 0, len);
+	while ((p = fgets(buf, sizeof(buf), file)) != NULL) {
+		if ((len - strlen(config_buf)) < (element_size + 1))
+			break;
+
+		x = strchr(p, '\n');
+		*x = '\0';
+		strcat(config_buf, p);
+		strcat(config_buf, ";");
+	}
+	fclose(file);
+
+}
+
 static void kvp_get_ipconfig_info(char *if_name,
 				 struct hv_kvp_ipaddr_value *buffer)
 {
 	char cmd[512];
 	char *dns_config = "/var/opt/hyperv/.kvp_dns_cfg";
-	FILE *file;
-	char buf[256];
-	char *dns_buf;
-	char *p;
-	char *x;
+	char *gw_config = "/var/opt/hyperv/.kvp_gw_cfg";
 
 	memset(cmd, 0, 512);
 	strcat(cmd, "cat /etc/resolv.conf 2>/tmp/null | ");
@@ -508,29 +532,43 @@ static void kvp_get_ipconfig_info(char *if_name,
 
 	/*
 	 * Execute the command to gather dns info.
+	 * Note that all buffers are arrays of __u16.
 	 */
 	system(cmd);
-	file = fopen(dns_config, "r");
-	if (file == NULL)
-		return;
+	kvp_process_ipconfig_file(dns_config, (char *)buffer->dns_addr,
+				(MAX_IP_ADDR_SIZE * 2), INET6_ADDRSTRLEN, 0);
 
-	dns_buf = (char *)(buffer->dns_addr);
 	/*
-	 * The buffer is an array of __u16 elements.
+	 * Get the address of default gateway (ipv4).
 	 */
-	memset(dns_buf, 0, (MAX_IP_ADDR_SIZE * 2));
-	while ((p = fgets(buf, sizeof(buf), file)) != NULL) {
-		if (((MAX_IP_ADDR_SIZE * 2) - strlen(dns_buf)) <
-				(INET6_ADDRSTRLEN + 1))
-			break;
+	memset(cmd, 0, 512);
+	strcat(cmd, "/sbin/ip -f inet  route | grep -w ");
+	strcat(cmd, if_name);
+	strcat(cmd, " | awk '/default/ {print $3 }' > ");
+	strcat(cmd, "/var/opt/hyperv/.kvp_gw_cfg");
 
-		x = strchr(p, '\n');
-		*x = '\0';
-		strcat(dns_buf, p);
-		strcat(dns_buf, ";");
-	}
-	fclose(file);
+	/*
+	 * Execute the command to gather gateway info.
+	 */
+	system(cmd);
+	kvp_process_ipconfig_file(gw_config, (char *)buffer->gate_way,
+				(MAX_GATEWAY_SIZE * 2), INET_ADDRSTRLEN, 0);
+	/*
+	 * Get the address of default gateway (ipv6).
+	 */
+	memset(cmd, 0, 512);
+	strcat(cmd, "/sbin/ip -f inet6  route | grep -w ");
+	strcat(cmd, if_name);
+	strcat(cmd, " | awk '/default/ {print $3 }' > ");
+	strcat(cmd, "/var/opt/hyperv/.kvp_gw_cfg");
 
+	/*
+	 * Execute the command to gather gateway info (ipv6).
+	 */
+	system(cmd);
+	kvp_process_ipconfig_file(gw_config, (char *)buffer->gate_way,
+				(MAX_GATEWAY_SIZE * 2), INET6_ADDRSTRLEN,
+				strlen(gw_config));
 }
 
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 09/13] Tools: hv: Gather DNS information
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Now gather DNS information.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 8a153c5..101ab7f 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -490,6 +490,50 @@ done:
 	return;
 }
 
+static void kvp_get_ipconfig_info(char *if_name,
+				 struct hv_kvp_ipaddr_value *buffer)
+{
+	char cmd[512];
+	char *dns_config = "/var/opt/hyperv/.kvp_dns_cfg";
+	FILE *file;
+	char buf[256];
+	char *dns_buf;
+	char *p;
+	char *x;
+
+	memset(cmd, 0, 512);
+	strcat(cmd, "cat /etc/resolv.conf 2>/tmp/null | ");
+	strcat(cmd, " awk '/nameserver/ {print $2 }' > ");
+	strcat(cmd, "/var/opt/hyperv/.kvp_dns_cfg");
+
+	/*
+	 * Execute the command to gather dns info.
+	 */
+	system(cmd);
+	file = fopen(dns_config, "r");
+	if (file == NULL)
+		return;
+
+	dns_buf = (char *)(buffer->dns_addr);
+	/*
+	 * The buffer is an array of __u16 elements.
+	 */
+	memset(dns_buf, 0, (MAX_IP_ADDR_SIZE * 2));
+	while ((p = fgets(buf, sizeof(buf), file)) != NULL) {
+		if (((MAX_IP_ADDR_SIZE * 2) - strlen(dns_buf)) <
+				(INET6_ADDRSTRLEN + 1))
+			break;
+
+		x = strchr(p, '\n');
+		*x = '\0';
+		strcat(dns_buf, p);
+		strcat(dns_buf, ";");
+	}
+	fclose(file);
+
+}
+
+
 static unsigned int hweight32(unsigned int *w)
 {
 	unsigned int res = *w - ((*w >> 1) & 0x55555555);
@@ -649,6 +693,12 @@ kvp_get_ip_address(int family, char *if_name, int op,
 				strcat((char *)ip_buffer->sub_net, ";");
 				sn_offset += strlen(sn_str) + 1;
 			}
+
+			/*
+			 * Collect other ip related configuration info.
+			 */
+
+			kvp_get_ipconfig_info(if_name, ip_buffer);
 		}
 
 gather_ipaddr:
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 08/13] Tools: hv: Represent the ipv6 mask using CIDR notation
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Transfor ipv6 subnet information to CIDR notation.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   44 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 2160462..8a153c5 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -490,6 +490,15 @@ done:
 	return;
 }
 
+static unsigned int hweight32(unsigned int *w)
+{
+	unsigned int res = *w - ((*w >> 1) & 0x55555555);
+	res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+	res = (res + (res >> 4)) & 0x0F0F0F0F;
+	res = res + (res >> 8);
+	return (res + (res >> 16)) & 0x000000FF;
+}
+
 static int kvp_process_ip_address(void *addrp,
 				int family, char *buffer,
 				int length,  int *offset)
@@ -538,6 +547,12 @@ kvp_get_ip_address(int family, char *if_name, int op,
 	int error = 0;
 	char *buffer;
 	struct hv_kvp_ipaddr_value *ip_buffer;
+	char cidr_mask[5]; /* /xyz */
+	int weight;
+	int i;
+	int *w;
+	char *sn_str;
+	struct sockaddr_in6 *addr6;
 
 	if (op == KVP_OP_ENUMERATE) {
 		buffer = out_buffer;
@@ -611,17 +626,28 @@ kvp_get_ip_address(int family, char *if_name, int op,
 			} else {
 				ip_buffer->addr_family |= ADDR_FAMILY_IPV6;
 				/*
-				 * Get subnet info.
+				 * Get subnet info in CIDR format.
 				 */
-				error = kvp_process_ip_address(
-							curp->ifa_netmask,
-							AF_INET6,
-							(char *)
-							ip_buffer->sub_net,
-							length,
-							&sn_offset);
-				if (error)
+				weight = 0;
+				sn_str = (char *)ip_buffer->sub_net;
+				addr6 = (struct sockaddr_in6 *)
+					curp->ifa_netmask;
+				w = addr6->sin6_addr.s6_addr32;
+
+				for (i = 0; i < 4; i++)
+					weight += hweight32(&w[i]);
+
+				sprintf(cidr_mask, "/%d", weight);
+				if ((length - sn_offset) <
+					(strlen(cidr_mask) + 1))
 					goto gather_ipaddr;
+
+				if (sn_offset == 0)
+					strcpy(sn_str, cidr_mask);
+				else
+					strcat(sn_str, cidr_mask);
+				strcat((char *)ip_buffer->sub_net, ";");
+				sn_offset += strlen(sn_str) + 1;
 			}
 		}
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 07/13] Tools: hv: Gather subnet information
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Now gather sub-net information for the specified interface.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   31 +++++++++++++++++++++++++++++--
 1 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index b5db1b5..2160462 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -533,6 +533,7 @@ kvp_get_ip_address(int family, char *if_name, int op,
 	struct ifaddrs *ifap;
 	struct ifaddrs *curp;
 	int offset = 0;
+	int sn_offset = 0;
 	const char *str;
 	int error = 0;
 	char *buffer;
@@ -593,12 +594,38 @@ kvp_get_ip_address(int family, char *if_name, int op,
 			 * Gather info other than the IP address.
 			 * IP address info will be gathered later.
 			 */
-			if (curp->ifa_addr->sa_family == AF_INET)
+			if (curp->ifa_addr->sa_family == AF_INET) {
 				ip_buffer->addr_family |= ADDR_FAMILY_IPV4;
-			else
+				/*
+				 * Get subnet info.
+				 */
+				error = kvp_process_ip_address(
+							curp->ifa_netmask,
+							AF_INET,
+							(char *)
+							ip_buffer->sub_net,
+							length,
+							&sn_offset);
+				if (error)
+					goto gather_ipaddr;
+			} else {
 				ip_buffer->addr_family |= ADDR_FAMILY_IPV6;
+				/*
+				 * Get subnet info.
+				 */
+				error = kvp_process_ip_address(
+							curp->ifa_netmask,
+							AF_INET6,
+							(char *)
+							ip_buffer->sub_net,
+							length,
+							&sn_offset);
+				if (error)
+					goto gather_ipaddr;
+			}
 		}
 
+gather_ipaddr:
 		error = kvp_process_ip_address(curp->ifa_addr,
 						curp->ifa_addr->sa_family,
 						buffer,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 06/13] Tools: hv: Gather address family information
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

Now, gather address family information for the specified interface.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 2235855..b5db1b5 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -588,6 +588,17 @@ kvp_get_ip_address(int family, char *if_name, int op,
 			continue;
 		}
 
+		if (op == KVP_OP_GET_IP_INFO) {
+			/*
+			 * Gather info other than the IP address.
+			 * IP address info will be gathered later.
+			 */
+			if (curp->ifa_addr->sa_family == AF_INET)
+				ip_buffer->addr_family |= ADDR_FAMILY_IPV4;
+			else
+				ip_buffer->addr_family |= ADDR_FAMILY_IPV6;
+		}
+
 		error = kvp_process_ip_address(curp->ifa_addr,
 						curp->ifa_addr->sa_family,
 						buffer,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 05/13] Tools: hv: Further refactor kvp_get_ip_address()
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

In preparation for making kvp_get_ip_address() more generic, factor out
the code for handling IP addresses.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |   94 ++++++++++++++++++++-------------------------
 1 files changed, 42 insertions(+), 52 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 4fa6f95..2235855 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -490,17 +490,50 @@ done:
 	return;
 }
 
+static int kvp_process_ip_address(void *addrp,
+				int family, char *buffer,
+				int length,  int *offset)
+{
+	struct sockaddr_in *addr;
+	struct sockaddr_in6 *addr6;
+	int addr_length;
+	char tmp[50];
+	const char *str;
+
+	if (family == AF_INET) {
+		addr = (struct sockaddr_in *)addrp;
+		str = inet_ntop(family, &addr->sin_addr, tmp, 50);
+		addr_length = INET_ADDRSTRLEN;
+	} else {
+		addr6 = (struct sockaddr_in6 *)addrp;
+		str = inet_ntop(family, &addr6->sin6_addr.s6_addr, tmp, 50);
+		addr_length = INET6_ADDRSTRLEN;
+	}
+
+	if ((length - *offset) < addr_length + 1)
+		return 1;
+	if (str == NULL) {
+		strcpy(buffer, "inet_ntop failed\n");
+		return 1;
+	}
+	if (*offset == 0)
+		strcpy(buffer, tmp);
+	else
+		strcat(buffer, tmp);
+	strcat(buffer, ";");
+
+	*offset += strlen(str) + 1;
+	return 0;
+}
+
 static int
 kvp_get_ip_address(int family, char *if_name, int op,
 		 void  *out_buffer, int length)
 {
 	struct ifaddrs *ifap;
 	struct ifaddrs *curp;
-	int ipv4_len = strlen("255.255.255.255") + 1;
-	int ipv6_len = strlen("ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff")+1;
 	int offset = 0;
 	const char *str;
-	char tmp[50];
 	int error = 0;
 	char *buffer;
 	struct hv_kvp_ipaddr_value *ip_buffer;
@@ -555,55 +588,12 @@ kvp_get_ip_address(int family, char *if_name, int op,
 			continue;
 		}
 
-		if ((curp->ifa_addr->sa_family == AF_INET) &&
-			((family == AF_INET) || (family == 0))) {
-			struct sockaddr_in *addr =
-			(struct sockaddr_in *) curp->ifa_addr;
-
-			str = inet_ntop(AF_INET, &addr->sin_addr, tmp, 50);
-			if (str == NULL) {
-				strcpy(buffer, "inet_ntop failed\n");
-				error = 1;
-				goto getaddr_done;
-			}
-			if (offset == 0)
-				strcpy(buffer, tmp);
-			else
-				strcat(buffer, tmp);
-			strcat(buffer, ";");
-
-			offset += strlen(str) + 1;
-			if ((length - offset) < (ipv4_len + 1))
-				goto getaddr_done;
-
-		} else if ((family == AF_INET6) || (family == 0)) {
-
-			/*
-			 * We only support AF_INET and AF_INET6
-			 * and the list of addresses is separated by a ";".
-			 */
-			struct sockaddr_in6 *addr =
-				(struct sockaddr_in6 *) curp->ifa_addr;
-
-			str = inet_ntop(AF_INET6,
-					&addr->sin6_addr.s6_addr,
-					tmp, 50);
-			if (str == NULL) {
-				strcpy(buffer, "inet_ntop failed\n");
-				error = 1;
-				goto getaddr_done;
-			}
-			if (offset == 0)
-				strcpy(buffer, tmp);
-			else
-				strcat(buffer, tmp);
-			strcat(buffer, ";");
-			offset += strlen(str) + 1;
-			if ((length - offset) < (ipv6_len + 1))
-				goto getaddr_done;
-
-		}
-
+		error = kvp_process_ip_address(curp->ifa_addr,
+						curp->ifa_addr->sa_family,
+						buffer,
+						length, &offset);
+		if (error)
+			goto getaddr_done;
 
 		curp = curp->ifa_next;
 	}
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 04/13] Tools: hv: Prepare to expand  kvp_get_ip_address() functionality
From: K. Y. Srinivasan @ 2012-06-21 21:31 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, apw
  Cc: K. Y. Srinivasan
In-Reply-To: <1340314305-27126-1-git-send-email-kys@microsoft.com>

kvp_get_ip_address() implemented the functionality to retrieve IP address info.
Make this function more generic so that we could retrieve additional
per-interface information.


Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 tools/hv/hv_kvp_daemon.c |  129 ++++++++++++++++++++++++++++++----------------
 1 files changed, 84 insertions(+), 45 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 4831997..4fa6f95 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -491,7 +491,8 @@ done:
 }
 
 static int
-kvp_get_ip_address(int family, char *buffer, int length)
+kvp_get_ip_address(int family, char *if_name, int op,
+		 void  *out_buffer, int length)
 {
 	struct ifaddrs *ifap;
 	struct ifaddrs *curp;
@@ -501,10 +502,19 @@ kvp_get_ip_address(int family, char *buffer, int length)
 	const char *str;
 	char tmp[50];
 	int error = 0;
-
+	char *buffer;
+	struct hv_kvp_ipaddr_value *ip_buffer;
+
+	if (op == KVP_OP_ENUMERATE) {
+		buffer = out_buffer;
+	} else {
+		ip_buffer = out_buffer;
+		buffer = (char *)ip_buffer->ip_addr;
+		ip_buffer->addr_family = 0;
+	}
 	/*
 	 * On entry into this function, the buffer is capable of holding the
-	 * maximum key value (2048 bytes).
+	 * maximum key value.
 	 */
 
 	if (getifaddrs(&ifap)) {
@@ -514,58 +524,87 @@ kvp_get_ip_address(int family, char *buffer, int length)
 
 	curp = ifap;
 	while (curp != NULL) {
-		if ((curp->ifa_addr != NULL) &&
-		   (curp->ifa_addr->sa_family == family)) {
-			if (family == AF_INET) {
-				struct sockaddr_in *addr =
-				(struct sockaddr_in *) curp->ifa_addr;
-
-				str = inet_ntop(family, &addr->sin_addr,
-						tmp, 50);
-				if (str == NULL) {
-					strcpy(buffer, "inet_ntop failed\n");
-					error = 1;
-					goto getaddr_done;
-				}
-				if (offset == 0)
-					strcpy(buffer, tmp);
-				else
-					strcat(buffer, tmp);
-				strcat(buffer, ";");
+		if (curp->ifa_addr == NULL) {
+			curp = curp->ifa_next;
+			continue;
+		}
 
-				offset += strlen(str) + 1;
-				if ((length - offset) < (ipv4_len + 1))
-					goto getaddr_done;
+		if ((if_name != NULL) &&
+			(strncmp(curp->ifa_name, if_name, strlen(if_name)))) {
+			/*
+			 * We want info about a specific interface;
+			 * just continue.
+			 */
+			curp = curp->ifa_next;
+			continue;
+		}
 
-			} else {
+		/*
+		 * We only support two address families: AF_INET and AF_INET6.
+		 * If a family value of 0 is specified, we collect both
+		 * supported address families; if not we gather info on
+		 * the specified address family.
+		 */
+		if ((family != 0) && (curp->ifa_addr->sa_family != family)) {
+			curp = curp->ifa_next;
+			continue;
+		}
+		if ((curp->ifa_addr->sa_family != AF_INET) &&
+			(curp->ifa_addr->sa_family != AF_INET6)) {
+			curp = curp->ifa_next;
+			continue;
+		}
+
+		if ((curp->ifa_addr->sa_family == AF_INET) &&
+			((family == AF_INET) || (family == 0))) {
+			struct sockaddr_in *addr =
+			(struct sockaddr_in *) curp->ifa_addr;
+
+			str = inet_ntop(AF_INET, &addr->sin_addr, tmp, 50);
+			if (str == NULL) {
+				strcpy(buffer, "inet_ntop failed\n");
+				error = 1;
+				goto getaddr_done;
+			}
+			if (offset == 0)
+				strcpy(buffer, tmp);
+			else
+				strcat(buffer, tmp);
+			strcat(buffer, ";");
+
+			offset += strlen(str) + 1;
+			if ((length - offset) < (ipv4_len + 1))
+				goto getaddr_done;
+
+		} else if ((family == AF_INET6) || (family == 0)) {
 
 			/*
 			 * We only support AF_INET and AF_INET6
 			 * and the list of addresses is separated by a ";".
 			 */
-				struct sockaddr_in6 *addr =
+			struct sockaddr_in6 *addr =
 				(struct sockaddr_in6 *) curp->ifa_addr;
 
-				str = inet_ntop(family,
+			str = inet_ntop(AF_INET6,
 					&addr->sin6_addr.s6_addr,
 					tmp, 50);
-				if (str == NULL) {
-					strcpy(buffer, "inet_ntop failed\n");
-					error = 1;
-					goto getaddr_done;
-				}
-				if (offset == 0)
-					strcpy(buffer, tmp);
-				else
-					strcat(buffer, tmp);
-				strcat(buffer, ";");
-				offset += strlen(str) + 1;
-				if ((length - offset) < (ipv6_len + 1))
-					goto getaddr_done;
-
+			if (str == NULL) {
+				strcpy(buffer, "inet_ntop failed\n");
+				error = 1;
+				goto getaddr_done;
 			}
+			if (offset == 0)
+				strcpy(buffer, tmp);
+			else
+				strcat(buffer, tmp);
+			strcat(buffer, ";");
+			offset += strlen(str) + 1;
+			if ((length - offset) < (ipv6_len + 1))
+				goto getaddr_done;
 
 		}
+
+
 		curp = curp->ifa_next;
 	}
 
@@ -812,13 +851,13 @@ int main(void)
 			strcpy(key_value, lic_version);
 			break;
 		case NetworkAddressIPv4:
-			kvp_get_ip_address(AF_INET, key_value,
-					HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
+			kvp_get_ip_address(AF_INET, NULL, KVP_OP_ENUMERATE,
+				key_value, HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
 			strcpy(key_name, "NetworkAddressIPv4");
 			break;
 		case NetworkAddressIPv6:
-			kvp_get_ip_address(AF_INET6, key_value,
-					HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
+			kvp_get_ip_address(AF_INET6, NULL, KVP_OP_ENUMERATE,
+				key_value, HV_KVP_EXCHANGE_MAX_VALUE_SIZE);
 			strcpy(key_name, "NetworkAddressIPv6");
 			break;
 		case OSBuildNumber:
-- 
1.7.4.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox