All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH 0/8] Multiqueue API for macvtap
@ 2013-06-06  9:54 Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Hi all:

This series implements a fully tuntap compatiable API which could be used
by userspace to manage multiple macvtap queues. The main parts is to add
TUNSETQUEUE ioctl support for macvtap.

Patch 1 - 5 was some tuntap compatibility and misc cleanups.
Patch 6 removes the linear search in macvtap by reshuffling the macvtaps array
each time a queue is removed.
Patch 7 implement TUNSETQUEUE ioctl by introducing a list to track all queues
while use the array to just track all enabled queues.
Patch 8 reports IFF_MULTI_QUEUE to userspace to notify the userspace that the
multiqueue API is completed.

Flow caches implememtation were missed in this version, since I am doing
rework on the tuntap flow caches. Have some some stress test with both netperf
and pktgen.

Please review, thanks.

Changes from RFC V3:
- drop the TUNSETIFF checks
- disable taps first during macvtap_put_queue() to simplify codes
- typo fixes and better comments from Michael on numvtaps in macvtap_get_queue()
- BUG_ON() and unnecessary checks of numdisabled in macvtap_del_queues()
- drop numdisabled and introduce numqueues to track the number of queues
- better style in macvtap_ioctl_set_queue()
- rename tap_link to queue_list
- blank line fixes again :(

Changes from RFC V2:
- Don't place disabled taps in the array and use a linked like to track all taps
- Remove unnecessary best-effort barriers
- Add ACCESS_ONCE() for vlan->numvtaps to prevent compiler generate unexpected
codes
- Add a comment to explain that numvtaps is just a hint in macvtap_get_queue()
- blank line fixes

Changes from RFC V1:
- Drop the patch of "return -EBADFD when TUNGETIFF fails" to preserve backward
compatibility
- Add a small patch of optimizing macvtap_do_read() (see patch 1/8)
- Use different functions to do tap enabling and creating
- Use different functions to do tap disabling and closing
- Remove the emtpy line in patch 6/8
- Add a line in patch 3/8
- Add a comment to explain the checking when doing TUNSETIFF with
IFF_MULTI_QUEUE
- Add comments to explian the second swapping when moving taps between disabled
areas and enabled areas.
- Fix the commit log of patch 7/8 ( v1 says the linked list were used which is
wrong )
- Other minor bug fixes

Jason Wang (8):
  macvtap: fix a possible race between queue selection and changing
    queues
  macvtap: do not add self to waitqueue if doing a nonblock read
  macvlan: switch to use IS_ENABLED()
  macvtap: introduce macvtap_get_vlan()
  macvlan: change the max number of queues to 16
  macvtap: eliminate linear search
  macvtap: add TUNSETQUEUE ioctl
  macvtap: enable multiqueue flag

 drivers/net/macvtap.c      |  206 +++++++++++++++++++++++++++++++-------------
 include/linux/if_macvlan.h |    8 ++-
 2 files changed, 152 insertions(+), 62 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Complier may generate codes that re-read the vlan->numvtaps during
macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the
same time and which can lead unexpected result (e.g. very huge value).

We need prevent the compiler from generating such codes by adding an
ACCESS_ONCE() to make sure vlan->numvtaps were only read once.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 68efb91..5e485e3 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -172,7 +172,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *tap = NULL;
-	int numvtaps = vlan->numvtaps;
+	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
 	__u32 rxq;
 
 	if (!numvtaps)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED() Jason Wang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

There's no need to add self to waitqueue if doing a nonblock read. This could
help to avoid the spinlock contention.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5e485e3..8949631 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -845,7 +845,9 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	ssize_t ret = 0;
 
 	while (len) {
-		prepare_to_wait(sk_sleep(&q->sk), &wait, TASK_INTERRUPTIBLE);
+		if (!noblock)
+			prepare_to_wait(sk_sleep(&q->sk), &wait,
+					TASK_INTERRUPTIBLE);
 
 		/* Read frames from the queue */
 		skb = skb_dequeue(&q->sk.sk_receive_queue);
@@ -867,7 +869,8 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 		break;
 	}
 
-	finish_wait(sk_sleep(&q->sk), &wait);
+	if (!noblock)
+		finish_wait(sk_sleep(&q->sk), &wait);
 	return ret;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED()
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_macvlan.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 84dde1d..e47ad46 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -8,7 +8,7 @@
 #include <net/netlink.h>
 #include <linux/u64_stats_sync.h>
 
-#if defined(CONFIG_MACVTAP) || defined(CONFIG_MACVTAP_MODULE)
+#if IS_ENABLED(CONFIG_MACVTAP)
 struct socket *macvtap_get_socket(struct file *);
 #else
 #include <linux/err.h>
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan()
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (2 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED() Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:05   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Factor out the device holding logic to a macvtap_get_vlan(), this will be also
used by multiqueue API.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   27 ++++++++++++++++++++-------
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 8949631..d18130b 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -893,6 +893,24 @@ out:
 	return ret;
 }
 
+static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan;
+
+	rcu_read_lock_bh();
+	vlan = rcu_dereference_bh(q->vlan);
+	if (vlan)
+		dev_hold(vlan->dev);
+	rcu_read_unlock_bh();
+
+	return vlan;
+}
+
+static void macvtap_put_vlan(struct macvlan_dev *vlan)
+{
+	dev_put(vlan->dev);
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -924,12 +942,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		return ret;
 
 	case TUNGETIFF:
-		rcu_read_lock_bh();
-		vlan = rcu_dereference_bh(q->vlan);
-		if (vlan)
-			dev_hold(vlan->dev);
-		rcu_read_unlock_bh();
-
+		vlan = macvtap_get_vlan(q);
 		if (!vlan)
 			return -ENOLINK;
 
@@ -937,7 +950,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
 		    put_user(q->flags, &ifr->ifr_flags))
 			ret = -EFAULT;
-		dev_put(vlan->dev);
+		macvtap_put_vlan(vlan);
 		return ret;
 
 	case TUNGETFEATURES:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 5/8] macvlan: change the max number of queues to 16
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (3 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:05   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Macvtap should be at least compatible with tap, so change the max number to 16.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/if_macvlan.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index e47ad46..62d8bda 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
  * Maximum times a macvtap device can be opened. This can be used to
  * configure the number of receive queue, e.g. for multiqueue virtio.
  */
-#define MAX_MACVTAP_QUEUES	(NR_CPUS < 16 ? NR_CPUS : 16)
+#define MAX_MACVTAP_QUEUES	16
 
 #define MACVLAN_MC_FILTER_BITS	8
 #define MACVLAN_MC_FILTER_SZ	(1 << MACVLAN_MC_FILTER_BITS)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 6/8] macvtap: eliminate linear search
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (4 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:06   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

Linear search were used in both get_slot() and macvtap_get_queue(), this is
because:

- macvtap didn't reshuffle the array of taps when create or destroy a queue, so
  when adding a new queue, macvtap must do linear search to find a location for
  the new queue. This will also complicate the TUNSETQUEUE implementation for
  multiqueue API.
- the queue itself didn't track the queue index, so the we must do a linear
  search in the array to find the location of a existed queue.

The solution is straightforward: reshuffle the array and introduce a queue_index
to macvtap_queue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |   64 +++++++++++++++---------------------------------
 1 files changed, 20 insertions(+), 44 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index d18130b..5ccba99 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -44,6 +44,7 @@ struct macvtap_queue {
 	struct macvlan_dev __rcu *vlan;
 	struct file *file;
 	unsigned int flags;
+	u16 queue_index;
 };
 
 static struct proto macvtap_proto = {
@@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
  */
 static DEFINE_SPINLOCK(macvtap_lock);
 
-/*
- * get_slot: return a [unused/occupied] slot in vlan->taps[]:
- *	- if 'q' is NULL, return the first empty slot;
- *	- otherwise, return the slot this pointer occupies.
- */
-static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
-{
-	int i;
-
-	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
-		if (rcu_dereference_protected(vlan->taps[i],
-					      lockdep_is_held(&macvtap_lock)) == q)
-			return i;
-	}
-
-	/* Should never happen */
-	BUG_ON(1);
-}
-
 static int macvtap_set_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	int index;
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
@@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
 		goto out;
 
 	err = 0;
-	index = get_slot(vlan, NULL);
 	rcu_assign_pointer(q->vlan, vlan);
-	rcu_assign_pointer(vlan->taps[index], q);
+	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
 	sock_hold(&q->sk);
 
 	q->file = file;
+	q->queue_index = vlan->numvtaps;
 	file->private_data = q;
 
 	vlan->numvtaps++;
@@ -140,15 +121,22 @@ out:
  */
 static void macvtap_put_queue(struct macvtap_queue *q)
 {
+	struct macvtap_queue *nq;
 	struct macvlan_dev *vlan;
 
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference_protected(q->vlan,
 					 lockdep_is_held(&macvtap_lock));
 	if (vlan) {
-		int index = get_slot(vlan, q);
+		int index = q->queue_index;
+		BUG_ON(index >= vlan->numvtaps);
+
+		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
+					       lockdep_is_held(&macvtap_lock));
+		rcu_assign_pointer(vlan->taps[index], nq);
+		nq->queue_index = index;
 
-		RCU_INIT_POINTER(vlan->taps[index], NULL);
+		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
 		--vlan->numvtaps;
@@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 	rxq = skb_get_rxhash(skb);
 	if (rxq) {
 		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
-		if (tap)
-			goto out;
+		goto out;
 	}
 
 	if (likely(skb_rx_queue_recorded(skb))) {
@@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 			rxq -= numvtaps;
 
 		tap = rcu_dereference(vlan->taps[rxq]);
-		if (tap)
-			goto out;
-	}
-
-	/* Everything failed - find first available queue */
-	for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
-		tap = rcu_dereference(vlan->taps[rxq]);
-		if (tap)
-			break;
+		goto out;
 	}
 
+	tap = rcu_dereference(vlan->taps[0]);
 out:
 	return tap;
 }
@@ -219,19 +199,15 @@ static void macvtap_del_queues(struct net_device *dev)
 	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
 	int i, j = 0;
 
-	/* macvtap_put_queue can free some slots, so go through all slots */
 	spin_lock(&macvtap_lock);
-	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+	for (i = 0; i < vlan->numvtaps; i++) {
 		q = rcu_dereference_protected(vlan->taps[i],
 					      lockdep_is_held(&macvtap_lock));
-		if (q) {
-			qlist[j++] = q;
-			RCU_INIT_POINTER(vlan->taps[i], NULL);
-			RCU_INIT_POINTER(q->vlan, NULL);
-			vlan->numvtaps--;
-		}
+		BUG_ON(q == NULL);
+		qlist[j++] = q;
+		RCU_INIT_POINTER(vlan->taps[i], NULL);
+		RCU_INIT_POINTER(q->vlan, NULL);
 	}
-	BUG_ON(vlan->numvtaps != 0);
 	/* guarantee that any future macvtap_set_queue will fail */
 	vlan->numvtaps = MAX_MACVTAP_QUEUES;
 	spin_unlock(&macvtap_lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (5 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:06   ` Michael S. Tsirkin
  2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
  2013-06-08  6:49 ` [net-next PATCH 0/8] Multiqueue API for macvtap David Miller
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
enable a queue of macvtap. This is used to be compatible at API layer of tuntap
to simplify the userspace to manage the queues. This is done through introducing
a linked list to track all taps while using vlan->taps array to only track
active taps.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c      |  135 +++++++++++++++++++++++++++++++++++++------
 include/linux/if_macvlan.h |    4 +
 2 files changed, 120 insertions(+), 19 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 5ccba99..d2d1d55 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -45,6 +45,8 @@ struct macvtap_queue {
 	struct file *file;
 	unsigned int flags;
 	u16 queue_index;
+	bool enabled;
+	struct list_head next;
 };
 
 static struct proto macvtap_proto = {
@@ -85,14 +87,36 @@ static const struct proto_ops macvtap_socket_ops;
  */
 static DEFINE_SPINLOCK(macvtap_lock);
 
-static int macvtap_set_queue(struct net_device *dev, struct file *file,
+static int macvtap_enable_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	spin_lock(&macvtap_lock);
+
+	if (q->enabled)
+		goto out;
+
+	err = 0;
+	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
+	q->queue_index = vlan->numvtaps;
+	q->enabled = true;
+
+	vlan->numvtaps++;
+out:
+	spin_unlock(&macvtap_lock);
+	return err;
+}
+
+static int macvtap_set_queue(struct net_device *dev, struct file *file,
+			     struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
-	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
+	if (vlan->numqueues == MAX_MACVTAP_QUEUES)
 		goto out;
 
 	err = 0;
@@ -102,15 +126,57 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
 
 	q->file = file;
 	q->queue_index = vlan->numvtaps;
+	q->enabled = true;
 	file->private_data = q;
+	list_add_tail(&q->next, &vlan->queue_list);
 
 	vlan->numvtaps++;
+	vlan->numqueues++;
 
 out:
 	spin_unlock(&macvtap_lock);
 	return err;
 }
 
+static int __macvtap_disable_queue(struct macvtap_queue *q)
+{
+	struct macvlan_dev *vlan;
+	struct macvtap_queue *nq;
+
+	vlan = rcu_dereference_protected(q->vlan,
+					 lockdep_is_held(&macvtap_lock));
+
+	if (!q->enabled)
+		return -EINVAL;
+
+	if (vlan) {
+		int index = q->queue_index;
+		BUG_ON(index >= vlan->numvtaps);
+		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
+					       lockdep_is_held(&macvtap_lock));
+		nq->queue_index = index;
+
+		rcu_assign_pointer(vlan->taps[index], nq);
+		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
+		q->enabled = false;
+
+		vlan->numvtaps--;
+	}
+
+	return 0;
+}
+
+static int macvtap_disable_queue(struct macvtap_queue *q)
+{
+	int err;
+
+	spin_lock(&macvtap_lock);
+	err = __macvtap_disable_queue(q);
+	spin_unlock(&macvtap_lock);
+
+	return err;
+}
+
 /*
  * The file owning the queue got closed, give up both
  * the reference that the files holds as well as the
@@ -121,25 +187,19 @@ out:
  */
 static void macvtap_put_queue(struct macvtap_queue *q)
 {
-	struct macvtap_queue *nq;
 	struct macvlan_dev *vlan;
 
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference_protected(q->vlan,
 					 lockdep_is_held(&macvtap_lock));
 	if (vlan) {
-		int index = q->queue_index;
-		BUG_ON(index >= vlan->numvtaps);
-
-		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
-					       lockdep_is_held(&macvtap_lock));
-		rcu_assign_pointer(vlan->taps[index], nq);
-		nq->queue_index = index;
+		if (q->enabled)
+			BUG_ON(__macvtap_disable_queue(q));
 
-		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
+		vlan->numqueues--;
 		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
-		--vlan->numvtaps;
+		list_del_init(&q->next);
 	}
 
 	spin_unlock(&macvtap_lock);
@@ -160,6 +220,11 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *tap = NULL;
+	/* Access to taps array is protected by rcu, but access to numvtaps
+	 * isn't. Below we use it to lookup a queue, but treat it as a hint
+	 * and validate that the result isn't NULL - in case we are
+	 * racing against queue removal.
+	 */
 	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
 	__u32 rxq;
 
@@ -196,18 +261,22 @@ out:
 static void macvtap_del_queues(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
+	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
 	int i, j = 0;
 
 	spin_lock(&macvtap_lock);
-	for (i = 0; i < vlan->numvtaps; i++) {
-		q = rcu_dereference_protected(vlan->taps[i],
-					      lockdep_is_held(&macvtap_lock));
-		BUG_ON(q == NULL);
+	list_for_each_entry_safe(q, tmp, &vlan->queue_list, next) {
+		list_del_init(&q->next);
 		qlist[j++] = q;
-		RCU_INIT_POINTER(vlan->taps[i], NULL);
 		RCU_INIT_POINTER(q->vlan, NULL);
+		if (q->enabled)
+			vlan->numvtaps--;
+		vlan->numqueues--;
 	}
+	for (i = 0; i < vlan->numvtaps; i++)
+		RCU_INIT_POINTER(vlan->taps[i], NULL);
+	BUG_ON(vlan->numvtaps);
+	BUG_ON(vlan->numqueues);
 	/* guarantee that any future macvtap_set_queue will fail */
 	vlan->numvtaps = MAX_MACVTAP_QUEUES;
 	spin_unlock(&macvtap_lock);
@@ -298,6 +367,9 @@ static int macvtap_newlink(struct net *src_net,
 			   struct nlattr *tb[],
 			   struct nlattr *data[])
 {
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	INIT_LIST_HEAD(&vlan->queue_list);
+
 	/* Don't put anything that may fail after macvlan_common_newlink
 	 * because we can't undo what it does.
 	 */
@@ -887,6 +959,25 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
 	dev_put(vlan->dev);
 }
 
+static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
+{
+	struct macvtap_queue *q = file->private_data;
+	struct macvlan_dev *vlan;
+	int ret;
+
+	vlan = macvtap_get_vlan(q);
+	if (!vlan)
+		return -EINVAL;
+
+	if (flags & IFF_ATTACH_QUEUE)
+		ret = macvtap_enable_queue(vlan->dev, file, q);
+	else if (flags & IFF_DETACH_QUEUE)
+		ret = macvtap_disable_queue(q);
+
+	macvtap_put_vlan(vlan);
+	return ret;
+}
+
 /*
  * provide compatibility with generic tun/tap interface
  */
@@ -910,7 +1001,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 			return -EFAULT;
 
 		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
+		if ((u & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
+		    (IFF_NO_PI | IFF_TAP))
 			ret = -EINVAL;
 		else
 			q->flags = u;
@@ -929,6 +1021,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		macvtap_put_vlan(vlan);
 		return ret;
 
+	case TUNSETQUEUE:
+		if (get_user(u, &ifr->ifr_flags))
+			return -EFAULT;
+		return macvtap_ioctl_set_queue(file, u);
+
 	case TUNGETFEATURES:
 		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
 			return -EFAULT;
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 62d8bda..1912133 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -69,8 +69,12 @@ struct macvlan_dev {
 	u16			flags;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
+	/* This array tracks active taps. */
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
+	/* This list tracks all taps (both enabled and disabled) */
+	struct list_head	queue_list;
 	int			numvtaps;
+	int			numqueues;
 	int			minor;
 };
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next PATCH 8/8] macvtap: enable multiqueue flag
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (6 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-06-06  9:54 ` Jason Wang
  2013-06-06 11:05   ` Michael S. Tsirkin
  2013-06-08  6:49 ` [net-next PATCH 0/8] Multiqueue API for macvtap David Miller
  8 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2013-06-06  9:54 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: sergei.shtylyov, Jason Wang

To notify the userspace about our capability of multiqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index d2d1d55..992151c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -31,10 +31,6 @@
  * macvtap_proto is used to allocate queues through the sock allocation
  * mechanism.
  *
- * TODO: multiqueue support is currently not implemented, even though
- * macvtap is basically prepared for that. We will need to add this
- * here as well as in virtio-net and qemu to get line rate on 10gbit
- * adapters from a guest.
  */
 struct macvtap_queue {
 	struct sock sk;
@@ -1027,7 +1023,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		return macvtap_ioctl_set_queue(file, u);
 
 	case TUNGETFEATURES:
-		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
+		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR |
+			     IFF_MULTI_QUEUE, up))
 			return -EFAULT;
 		return 0;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan()
  2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
@ 2013-06-06 11:05   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:36PM +0800, Jason Wang wrote:
> Factor out the device holding logic to a macvtap_get_vlan(), this will be also
> used by multiqueue API.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |   27 ++++++++++++++++++++-------
>  1 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 8949631..d18130b 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -893,6 +893,24 @@ out:
>  	return ret;
>  }
>  
> +static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan;
> +
> +	rcu_read_lock_bh();
> +	vlan = rcu_dereference_bh(q->vlan);
> +	if (vlan)
> +		dev_hold(vlan->dev);
> +	rcu_read_unlock_bh();
> +
> +	return vlan;
> +}
> +
> +static void macvtap_put_vlan(struct macvlan_dev *vlan)
> +{
> +	dev_put(vlan->dev);
> +}
> +
>  /*
>   * provide compatibility with generic tun/tap interface
>   */
> @@ -924,12 +942,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		return ret;
>  
>  	case TUNGETIFF:
> -		rcu_read_lock_bh();
> -		vlan = rcu_dereference_bh(q->vlan);
> -		if (vlan)
> -			dev_hold(vlan->dev);
> -		rcu_read_unlock_bh();
> -
> +		vlan = macvtap_get_vlan(q);
>  		if (!vlan)
>  			return -ENOLINK;
>  
> @@ -937,7 +950,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
>  		    put_user(q->flags, &ifr->ifr_flags))
>  			ret = -EFAULT;
> -		dev_put(vlan->dev);
> +		macvtap_put_vlan(vlan);
>  		return ret;
>  
>  	case TUNGETFEATURES:
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 5/8] macvlan: change the max number of queues to 16
  2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
@ 2013-06-06 11:05   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:37PM +0800, Jason Wang wrote:
> Macvtap should be at least compatible with tap, so change the max number to 16.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  include/linux/if_macvlan.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index e47ad46..62d8bda 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
>   * Maximum times a macvtap device can be opened. This can be used to
>   * configure the number of receive queue, e.g. for multiqueue virtio.
>   */
> -#define MAX_MACVTAP_QUEUES	(NR_CPUS < 16 ? NR_CPUS : 16)
> +#define MAX_MACVTAP_QUEUES	16
>  
>  #define MACVLAN_MC_FILTER_BITS	8
>  #define MACVLAN_MC_FILTER_SZ	(1 << MACVLAN_MC_FILTER_BITS)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 8/8] macvtap: enable multiqueue flag
  2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
@ 2013-06-06 11:05   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:40PM +0800, Jason Wang wrote:
> To notify the userspace about our capability of multiqueue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index d2d1d55..992151c 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -31,10 +31,6 @@
>   * macvtap_proto is used to allocate queues through the sock allocation
>   * mechanism.
>   *
> - * TODO: multiqueue support is currently not implemented, even though
> - * macvtap is basically prepared for that. We will need to add this
> - * here as well as in virtio-net and qemu to get line rate on 10gbit
> - * adapters from a guest.
>   */
>  struct macvtap_queue {
>  	struct sock sk;
> @@ -1027,7 +1023,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		return macvtap_ioctl_set_queue(file, u);
>  
>  	case TUNGETFEATURES:
> -		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
> +		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR |
> +			     IFF_MULTI_QUEUE, up))
>  			return -EFAULT;
>  		return 0;
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 6/8] macvtap: eliminate linear search
  2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
@ 2013-06-06 11:06   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:38PM +0800, Jason Wang wrote:
> Linear search were used in both get_slot() and macvtap_get_queue(), this is
> because:
> 
> - macvtap didn't reshuffle the array of taps when create or destroy a queue, so
>   when adding a new queue, macvtap must do linear search to find a location for
>   the new queue. This will also complicate the TUNSETQUEUE implementation for
>   multiqueue API.
> - the queue itself didn't track the queue index, so the we must do a linear
>   search in the array to find the location of a existed queue.
> 
> The solution is straightforward: reshuffle the array and introduce a queue_index
> to macvtap_queue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c |   64 +++++++++++++++---------------------------------
>  1 files changed, 20 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index d18130b..5ccba99 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -44,6 +44,7 @@ struct macvtap_queue {
>  	struct macvlan_dev __rcu *vlan;
>  	struct file *file;
>  	unsigned int flags;
> +	u16 queue_index;
>  };
>  
>  static struct proto macvtap_proto = {
> @@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
>   */
>  static DEFINE_SPINLOCK(macvtap_lock);
>  
> -/*
> - * get_slot: return a [unused/occupied] slot in vlan->taps[]:
> - *	- if 'q' is NULL, return the first empty slot;
> - *	- otherwise, return the slot this pointer occupies.
> - */
> -static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
> -{
> -	int i;
> -
> -	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
> -		if (rcu_dereference_protected(vlan->taps[i],
> -					      lockdep_is_held(&macvtap_lock)) == q)
> -			return i;
> -	}
> -
> -	/* Should never happen */
> -	BUG_ON(1);
> -}
> -
>  static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  				struct macvtap_queue *q)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> -	int index;
>  	int err = -EBUSY;
>  
>  	spin_lock(&macvtap_lock);
> @@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  		goto out;
>  
>  	err = 0;
> -	index = get_slot(vlan, NULL);
>  	rcu_assign_pointer(q->vlan, vlan);
> -	rcu_assign_pointer(vlan->taps[index], q);
> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>  	sock_hold(&q->sk);
>  
>  	q->file = file;
> +	q->queue_index = vlan->numvtaps;
>  	file->private_data = q;
>  
>  	vlan->numvtaps++;
> @@ -140,15 +121,22 @@ out:
>   */
>  static void macvtap_put_queue(struct macvtap_queue *q)
>  {
> +	struct macvtap_queue *nq;
>  	struct macvlan_dev *vlan;
>  
>  	spin_lock(&macvtap_lock);
>  	vlan = rcu_dereference_protected(q->vlan,
>  					 lockdep_is_held(&macvtap_lock));
>  	if (vlan) {
> -		int index = get_slot(vlan, q);
> +		int index = q->queue_index;
> +		BUG_ON(index >= vlan->numvtaps);
> +
> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> +					       lockdep_is_held(&macvtap_lock));
> +		rcu_assign_pointer(vlan->taps[index], nq);
> +		nq->queue_index = index;
>  
> -		RCU_INIT_POINTER(vlan->taps[index], NULL);
> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
>  		RCU_INIT_POINTER(q->vlan, NULL);
>  		sock_put(&q->sk);
>  		--vlan->numvtaps;
> @@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  	rxq = skb_get_rxhash(skb);
>  	if (rxq) {
>  		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
> -		if (tap)
> -			goto out;
> +		goto out;
>  	}
>  
>  	if (likely(skb_rx_queue_recorded(skb))) {
> @@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  			rxq -= numvtaps;
>  
>  		tap = rcu_dereference(vlan->taps[rxq]);
> -		if (tap)
> -			goto out;
> -	}
> -
> -	/* Everything failed - find first available queue */
> -	for (rxq = 0; rxq < MAX_MACVTAP_QUEUES; rxq++) {
> -		tap = rcu_dereference(vlan->taps[rxq]);
> -		if (tap)
> -			break;
> +		goto out;
>  	}
>  
> +	tap = rcu_dereference(vlan->taps[0]);
>  out:
>  	return tap;
>  }
> @@ -219,19 +199,15 @@ static void macvtap_del_queues(struct net_device *dev)
>  	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
>  	int i, j = 0;
>  
> -	/* macvtap_put_queue can free some slots, so go through all slots */
>  	spin_lock(&macvtap_lock);
> -	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
> +	for (i = 0; i < vlan->numvtaps; i++) {
>  		q = rcu_dereference_protected(vlan->taps[i],
>  					      lockdep_is_held(&macvtap_lock));
> -		if (q) {
> -			qlist[j++] = q;
> -			RCU_INIT_POINTER(vlan->taps[i], NULL);
> -			RCU_INIT_POINTER(q->vlan, NULL);
> -			vlan->numvtaps--;
> -		}
> +		BUG_ON(q == NULL);
> +		qlist[j++] = q;
> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
> +		RCU_INIT_POINTER(q->vlan, NULL);
>  	}
> -	BUG_ON(vlan->numvtaps != 0);
>  	/* guarantee that any future macvtap_set_queue will fail */
>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
>  	spin_unlock(&macvtap_lock);
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl
  2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
@ 2013-06-06 11:06   ` Michael S. Tsirkin
  0 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2013-06-06 11:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, sergei.shtylyov

On Thu, Jun 06, 2013 at 05:54:39PM +0800, Jason Wang wrote:
> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
> to simplify the userspace to manage the queues. This is done through introducing
> a linked list to track all taps while using vlan->taps array to only track
> active taps.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/macvtap.c      |  135 +++++++++++++++++++++++++++++++++++++------
>  include/linux/if_macvlan.h |    4 +
>  2 files changed, 120 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 5ccba99..d2d1d55 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -45,6 +45,8 @@ struct macvtap_queue {
>  	struct file *file;
>  	unsigned int flags;
>  	u16 queue_index;
> +	bool enabled;
> +	struct list_head next;
>  };
>  
>  static struct proto macvtap_proto = {
> @@ -85,14 +87,36 @@ static const struct proto_ops macvtap_socket_ops;
>   */
>  static DEFINE_SPINLOCK(macvtap_lock);
>  
> -static int macvtap_set_queue(struct net_device *dev, struct file *file,
> +static int macvtap_enable_queue(struct net_device *dev, struct file *file,
>  				struct macvtap_queue *q)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> +	int err = -EINVAL;
> +
> +	spin_lock(&macvtap_lock);
> +
> +	if (q->enabled)
> +		goto out;
> +
> +	err = 0;
> +	rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
> +	q->queue_index = vlan->numvtaps;
> +	q->enabled = true;
> +
> +	vlan->numvtaps++;
> +out:
> +	spin_unlock(&macvtap_lock);
> +	return err;
> +}
> +
> +static int macvtap_set_queue(struct net_device *dev, struct file *file,
> +			     struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan = netdev_priv(dev);
>  	int err = -EBUSY;
>  
>  	spin_lock(&macvtap_lock);
> -	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
> +	if (vlan->numqueues == MAX_MACVTAP_QUEUES)
>  		goto out;
>  
>  	err = 0;
> @@ -102,15 +126,57 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
>  
>  	q->file = file;
>  	q->queue_index = vlan->numvtaps;
> +	q->enabled = true;
>  	file->private_data = q;
> +	list_add_tail(&q->next, &vlan->queue_list);
>  
>  	vlan->numvtaps++;
> +	vlan->numqueues++;
>  
>  out:
>  	spin_unlock(&macvtap_lock);
>  	return err;
>  }
>  
> +static int __macvtap_disable_queue(struct macvtap_queue *q)
> +{
> +	struct macvlan_dev *vlan;
> +	struct macvtap_queue *nq;
> +
> +	vlan = rcu_dereference_protected(q->vlan,
> +					 lockdep_is_held(&macvtap_lock));
> +
> +	if (!q->enabled)
> +		return -EINVAL;
> +
> +	if (vlan) {
> +		int index = q->queue_index;
> +		BUG_ON(index >= vlan->numvtaps);
> +		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> +					       lockdep_is_held(&macvtap_lock));
> +		nq->queue_index = index;
> +
> +		rcu_assign_pointer(vlan->taps[index], nq);
> +		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> +		q->enabled = false;
> +
> +		vlan->numvtaps--;
> +	}
> +
> +	return 0;
> +}
> +
> +static int macvtap_disable_queue(struct macvtap_queue *q)
> +{
> +	int err;
> +
> +	spin_lock(&macvtap_lock);
> +	err = __macvtap_disable_queue(q);
> +	spin_unlock(&macvtap_lock);
> +
> +	return err;
> +}
> +
>  /*
>   * The file owning the queue got closed, give up both
>   * the reference that the files holds as well as the
> @@ -121,25 +187,19 @@ out:
>   */
>  static void macvtap_put_queue(struct macvtap_queue *q)
>  {
> -	struct macvtap_queue *nq;
>  	struct macvlan_dev *vlan;
>  
>  	spin_lock(&macvtap_lock);
>  	vlan = rcu_dereference_protected(q->vlan,
>  					 lockdep_is_held(&macvtap_lock));
>  	if (vlan) {
> -		int index = q->queue_index;
> -		BUG_ON(index >= vlan->numvtaps);
> -
> -		nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
> -					       lockdep_is_held(&macvtap_lock));
> -		rcu_assign_pointer(vlan->taps[index], nq);
> -		nq->queue_index = index;
> +		if (q->enabled)
> +			BUG_ON(__macvtap_disable_queue(q));
>  
> -		RCU_INIT_POINTER(vlan->taps[vlan->numvtaps - 1], NULL);
> +		vlan->numqueues--;
>  		RCU_INIT_POINTER(q->vlan, NULL);
>  		sock_put(&q->sk);
> -		--vlan->numvtaps;
> +		list_del_init(&q->next);
>  	}
>  
>  	spin_unlock(&macvtap_lock);
> @@ -160,6 +220,11 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
>  	struct macvtap_queue *tap = NULL;
> +	/* Access to taps array is protected by rcu, but access to numvtaps
> +	 * isn't. Below we use it to lookup a queue, but treat it as a hint
> +	 * and validate that the result isn't NULL - in case we are
> +	 * racing against queue removal.
> +	 */
>  	int numvtaps = ACCESS_ONCE(vlan->numvtaps);
>  	__u32 rxq;
>  
> @@ -196,18 +261,22 @@ out:
>  static void macvtap_del_queues(struct net_device *dev)
>  {
>  	struct macvlan_dev *vlan = netdev_priv(dev);
> -	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
> +	struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
>  	int i, j = 0;
>  
>  	spin_lock(&macvtap_lock);
> -	for (i = 0; i < vlan->numvtaps; i++) {
> -		q = rcu_dereference_protected(vlan->taps[i],
> -					      lockdep_is_held(&macvtap_lock));
> -		BUG_ON(q == NULL);
> +	list_for_each_entry_safe(q, tmp, &vlan->queue_list, next) {
> +		list_del_init(&q->next);
>  		qlist[j++] = q;
> -		RCU_INIT_POINTER(vlan->taps[i], NULL);
>  		RCU_INIT_POINTER(q->vlan, NULL);
> +		if (q->enabled)
> +			vlan->numvtaps--;
> +		vlan->numqueues--;
>  	}
> +	for (i = 0; i < vlan->numvtaps; i++)
> +		RCU_INIT_POINTER(vlan->taps[i], NULL);
> +	BUG_ON(vlan->numvtaps);
> +	BUG_ON(vlan->numqueues);
>  	/* guarantee that any future macvtap_set_queue will fail */
>  	vlan->numvtaps = MAX_MACVTAP_QUEUES;
>  	spin_unlock(&macvtap_lock);
> @@ -298,6 +367,9 @@ static int macvtap_newlink(struct net *src_net,
>  			   struct nlattr *tb[],
>  			   struct nlattr *data[])
>  {
> +	struct macvlan_dev *vlan = netdev_priv(dev);
> +	INIT_LIST_HEAD(&vlan->queue_list);
> +
>  	/* Don't put anything that may fail after macvlan_common_newlink
>  	 * because we can't undo what it does.
>  	 */
> @@ -887,6 +959,25 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>  	dev_put(vlan->dev);
>  }
>  
> +static int macvtap_ioctl_set_queue(struct file *file, unsigned int flags)
> +{
> +	struct macvtap_queue *q = file->private_data;
> +	struct macvlan_dev *vlan;
> +	int ret;
> +
> +	vlan = macvtap_get_vlan(q);
> +	if (!vlan)
> +		return -EINVAL;
> +
> +	if (flags & IFF_ATTACH_QUEUE)
> +		ret = macvtap_enable_queue(vlan->dev, file, q);
> +	else if (flags & IFF_DETACH_QUEUE)
> +		ret = macvtap_disable_queue(q);
> +
> +	macvtap_put_vlan(vlan);
> +	return ret;
> +}
> +
>  /*
>   * provide compatibility with generic tun/tap interface
>   */
> @@ -910,7 +1001,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  			return -EFAULT;
>  
>  		ret = 0;
> -		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
> +		if ((u & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
> +		    (IFF_NO_PI | IFF_TAP))
>  			ret = -EINVAL;
>  		else
>  			q->flags = u;
> @@ -929,6 +1021,11 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>  		macvtap_put_vlan(vlan);
>  		return ret;
>  
> +	case TUNSETQUEUE:
> +		if (get_user(u, &ifr->ifr_flags))
> +			return -EFAULT;
> +		return macvtap_ioctl_set_queue(file, u);
> +
>  	case TUNGETFEATURES:
>  		if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
>  			return -EFAULT;
> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index 62d8bda..1912133 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -69,8 +69,12 @@ struct macvlan_dev {
>  	u16			flags;
>  	int (*receive)(struct sk_buff *skb);
>  	int (*forward)(struct net_device *dev, struct sk_buff *skb);
> +	/* This array tracks active taps. */
>  	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
> +	/* This list tracks all taps (both enabled and disabled) */
> +	struct list_head	queue_list;
>  	int			numvtaps;
> +	int			numqueues;
>  	int			minor;
>  };
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next PATCH 0/8] Multiqueue API for macvtap
  2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
                   ` (7 preceding siblings ...)
  2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
@ 2013-06-08  6:49 ` David Miller
  8 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2013-06-08  6:49 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, linux-kernel, sergei.shtylyov

From: Jason Wang <jasowang@redhat.com>
Date: Thu,  6 Jun 2013 17:54:32 +0800

> This series implements a fully tuntap compatiable API which could be used
> by userspace to manage multiple macvtap queues. The main parts is to add
> TUNSETQUEUE ioctl support for macvtap.

Series applied, thanks everyone.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-06-08  6:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-06  9:54 [net-next PATCH 0/8] Multiqueue API for macvtap Jason Wang
2013-06-06  9:54 ` [net-next PATCH 1/8] macvtap: fix a possible race between queue selection and changing queues Jason Wang
2013-06-06  9:54 ` [net-next PATCH 2/8] macvtap: do not add self to waitqueue if doing a nonblock read Jason Wang
2013-06-06  9:54 ` [net-next PATCH 3/8] macvlan: switch to use IS_ENABLED() Jason Wang
2013-06-06  9:54 ` [net-next PATCH 4/8] macvtap: introduce macvtap_get_vlan() Jason Wang
2013-06-06 11:05   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 5/8] macvlan: change the max number of queues to 16 Jason Wang
2013-06-06 11:05   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 6/8] macvtap: eliminate linear search Jason Wang
2013-06-06 11:06   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 7/8] macvtap: add TUNSETQUEUE ioctl Jason Wang
2013-06-06 11:06   ` Michael S. Tsirkin
2013-06-06  9:54 ` [net-next PATCH 8/8] macvtap: enable multiqueue flag Jason Wang
2013-06-06 11:05   ` Michael S. Tsirkin
2013-06-08  6:49 ` [net-next PATCH 0/8] Multiqueue API for macvtap David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.