* [PATCH V6 0/3] handle polling errors in vhost/vhost_net
@ 2013-01-16 15:44 Jason Wang
2013-01-16 15:44 ` [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Jason Wang @ 2013-01-16 15:44 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
This is an update version of last version to fix the handling of polling errors
in vhost/vhost_net.
Currently, vhost and vhost_net ignore polling errors which can lead kernel
crashing when it tries to remove itself from waitqueue after the polling
failure. Fix this by:
- examing the POLLERR when setting backend and report erros to userspace
- let tun always add to waitqueue in .poll() after the queue is created even if
it was detached.
Changes from V5:
- use rcu_dereference() instead of the wrong rtnl_dereference() in data path
- test with CONFIG_PROVE_RCU
Changes from V4:
- check the detached state by tfile->detached and protect it by RCU
Changes from V3:
- make a smaller patch that doesn't touch the whole polling state and only check
the polliner errors in backend setting.
- add a patch that allows tuntap to do polling/reading/writing when detached
which could simplify the work of its user.
Changes from v2:
- check poll->wqh instead of the wrong assumption about POLLERR and waitqueue
- drop the whole tx polling state check since it was replaced by the wqh
checking
- drop the buggy tuntap patch
Changes from v1:
- restore the state before the ioctl when vhost_init_used() fails
- log the error when meet polling errors in the data path
- don't put into waitqueue when tun_chr_poll() return POLLERR
Jason Wang (3):
vhost_net: correct error handling in vhost_net_set_backend()
vhost_net: handle polling errors when setting backend
tuntap: allow polling/writing/reading when detached
drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
drivers/vhost/net.c | 41 ++++++++++++++++++++++++++++-------------
drivers/vhost/vhost.c | 18 +++++++++++++++---
drivers/vhost/vhost.h | 2 +-
4 files changed, 70 insertions(+), 36 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend()
2013-01-16 15:44 [PATCH V6 0/3] handle polling errors in vhost/vhost_net Jason Wang
@ 2013-01-16 15:44 ` Jason Wang
2013-01-24 10:38 ` Michael S. Tsirkin
2013-01-16 15:44 ` [PATCH V6 2/3] vhost_net: handle polling errors when setting backend Jason Wang
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Jason Wang @ 2013-01-16 15:44 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 16 +++++++++++-----
1 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ebd08b2..d10ad6f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -827,15 +827,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
r = PTR_ERR(ubufs);
goto err_ubufs;
}
- oldubufs = vq->ubufs;
- vq->ubufs = ubufs;
+
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, sock);
- vhost_net_enable_vq(n, vq);
-
r = vhost_init_used(vq);
if (r)
- goto err_vq;
+ goto err_used;
+ vhost_net_enable_vq(n, vq);
+
+ oldubufs = vq->ubufs;
+ vq->ubufs = ubufs;
n->tx_packets = 0;
n->tx_zcopy_err = 0;
@@ -859,6 +860,11 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
mutex_unlock(&n->dev.mutex);
return 0;
+err_used:
+ rcu_assign_pointer(vq->private_data, oldsock);
+ vhost_net_enable_vq(n, vq);
+ if (ubufs)
+ vhost_ubuf_put_and_wait(ubufs);
err_ubufs:
fput(sock->file);
err_vq:
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V6 2/3] vhost_net: handle polling errors when setting backend
2013-01-16 15:44 [PATCH V6 0/3] handle polling errors in vhost/vhost_net Jason Wang
2013-01-16 15:44 ` [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
@ 2013-01-16 15:44 ` Jason Wang
2013-01-24 10:38 ` Michael S. Tsirkin
2013-01-16 15:44 ` [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
2013-01-24 10:38 ` [PATCH V6 0/3] handle polling errors in vhost/vhost_net Michael S. Tsirkin
3 siblings, 1 reply; 12+ messages in thread
From: Jason Wang @ 2013-01-16 15:44 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Currently, the polling errors were ignored, which can lead following issues:
- vhost remove itself unconditionally from waitqueue when stopping the poll,
this may crash the kernel since the previous attempt of starting may fail to
add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
failed.
Solve this by:
- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
will be checked and returned when userspace want to set the backend
After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 27 ++++++++++++++++++---------
drivers/vhost/vhost.c | 18 +++++++++++++++---
drivers/vhost/vhost.h | 2 +-
3 files changed, 34 insertions(+), 13 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index d10ad6f..959b1cd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -165,12 +165,16 @@ static void tx_poll_stop(struct vhost_net *net)
}
/* Caller must have TX VQ lock */
-static void tx_poll_start(struct vhost_net *net, struct socket *sock)
+static int tx_poll_start(struct vhost_net *net, struct socket *sock)
{
+ int ret;
+
if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
- return;
- vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
- net->tx_poll_state = VHOST_NET_POLL_STARTED;
+ return 0;
+ ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
+ if (!ret)
+ net->tx_poll_state = VHOST_NET_POLL_STARTED;
+ return ret;
}
/* In case of DMA done not in order in lower device driver for some reason.
@@ -642,20 +646,23 @@ static void vhost_net_disable_vq(struct vhost_net *n,
vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
}
-static void vhost_net_enable_vq(struct vhost_net *n,
+static int vhost_net_enable_vq(struct vhost_net *n,
struct vhost_virtqueue *vq)
{
struct socket *sock;
+ int ret;
sock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (!sock)
- return;
+ return 0;
if (vq == n->vqs + VHOST_NET_VQ_TX) {
n->tx_poll_state = VHOST_NET_POLL_STOPPED;
- tx_poll_start(n, sock);
+ ret = tx_poll_start(n, sock);
} else
- vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
+ ret = vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
+
+ return ret;
}
static struct socket *vhost_net_stop_vq(struct vhost_net *n,
@@ -833,7 +840,9 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
r = vhost_init_used(vq);
if (r)
goto err_used;
- vhost_net_enable_vq(n, vq);
+ r = vhost_net_enable_vq(n, vq);
+ if (r)
+ goto err_used;
oldubufs = vq->ubufs;
vq->ubufs = ubufs;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 34389f7..9759249 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -77,26 +77,38 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
init_poll_funcptr(&poll->table, vhost_poll_func);
poll->mask = mask;
poll->dev = dev;
+ poll->wqh = NULL;
vhost_work_init(&poll->work, fn);
}
/* Start polling a file. We add ourselves to file's wait queue. The caller must
* keep a reference to a file until after vhost_poll_stop is called. */
-void vhost_poll_start(struct vhost_poll *poll, struct file *file)
+int vhost_poll_start(struct vhost_poll *poll, struct file *file)
{
unsigned long mask;
+ int ret = 0;
mask = file->f_op->poll(file, &poll->table);
if (mask)
vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
+ if (mask & POLLERR) {
+ if (poll->wqh)
+ remove_wait_queue(poll->wqh, &poll->wait);
+ ret = -EINVAL;
+ }
+
+ return ret;
}
/* Stop polling a file. After this function returns, it becomes safe to drop the
* file reference. You must also flush afterwards. */
void vhost_poll_stop(struct vhost_poll *poll)
{
- remove_wait_queue(poll->wqh, &poll->wait);
+ if (poll->wqh) {
+ remove_wait_queue(poll->wqh, &poll->wait);
+ poll->wqh = NULL;
+ }
}
static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
@@ -792,7 +804,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
fput(filep);
if (pollstart && vq->handle_kick)
- vhost_poll_start(&vq->poll, vq->kick);
+ r = vhost_poll_start(&vq->poll, vq->kick);
mutex_unlock(&vq->mutex);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2639c58..17261e2 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -42,7 +42,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
unsigned long mask, struct vhost_dev *dev);
-void vhost_poll_start(struct vhost_poll *poll, struct file *file);
+int vhost_poll_start(struct vhost_poll *poll, struct file *file);
void vhost_poll_stop(struct vhost_poll *poll);
void vhost_poll_flush(struct vhost_poll *poll);
void vhost_poll_queue(struct vhost_poll *poll);
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached
2013-01-16 15:44 [PATCH V6 0/3] handle polling errors in vhost/vhost_net Jason Wang
2013-01-16 15:44 ` [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
2013-01-16 15:44 ` [PATCH V6 2/3] vhost_net: handle polling errors when setting backend Jason Wang
@ 2013-01-16 15:44 ` Jason Wang
2013-01-16 17:03 ` Michael S. Tsirkin
2013-01-24 10:38 ` Michael S. Tsirkin
2013-01-24 10:38 ` [PATCH V6 0/3] handle polling errors in vhost/vhost_net Michael S. Tsirkin
3 siblings, 2 replies; 12+ messages in thread
From: Jason Wang @ 2013-01-16 15:44 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
We forbid polling, writing and reading when the file were detached, this may
complex the user in several cases:
- when guest pass some buffers to vhost/qemu and then disable some queues,
host/qemu needs to do its own cleanup on those buffers which is complex
sometimes. We can do this simply by allowing a user can still write to an
disabled queue. Write to an disabled queue will cause the packet pass to the
kernel and read will get nothing.
- align the polling behavior with macvtap which never fails when the queue is
created. This can simplify the polling errors handling of its user (e.g vhost)
In order to achieve this, tfile->tun were not assign to NULL when detached. And
tfile->tun were converted to be RCU protected in order to let the data path can
check whether the file is deated in a lockless manner. This will be used to
prevent the flow caches from being updated for a detached queue.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
1 files changed, 26 insertions(+), 19 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c81680d..ec539a9 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -139,7 +139,7 @@ struct tun_file {
unsigned int flags;
u16 queue_index;
struct list_head next;
- struct tun_struct *detached;
+ struct tun_struct __rcu *detached;
};
struct tun_flow_entry {
@@ -295,11 +295,12 @@ static void tun_flow_cleanup(unsigned long data)
}
static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
- u16 queue_index)
+ struct tun_file *tfile)
{
struct hlist_head *head;
struct tun_flow_entry *e;
unsigned long delay = tun->ageing_time;
+ u16 queue_index = tfile->queue_index;
if (!rxhash)
return;
@@ -308,7 +309,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
rcu_read_lock();
- if (tun->numqueues == 1)
+ if (tun->numqueues == 1 || rcu_dereference(tfile->detached))
goto unlock;
e = tun_flow_find(head, rxhash);
@@ -384,16 +385,16 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
{
- tfile->detached = tun;
+ rcu_assign_pointer(tfile->detached, tun);
list_add_tail(&tfile->next, &tun->disabled);
++tun->numdisabled;
}
static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
{
- struct tun_struct *tun = tfile->detached;
+ struct tun_struct *tun = rtnl_dereference(tfile->detached);
- tfile->detached = NULL;
+ rcu_assign_pointer(tfile->detached, NULL);
list_del_init(&tfile->next);
--tun->numdisabled;
return tun;
@@ -402,26 +403,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
static void __tun_detach(struct tun_file *tfile, bool clean)
{
struct tun_file *ntfile;
- struct tun_struct *tun;
+ struct tun_struct *tun, *detached;
struct net_device *dev;
tun = rtnl_dereference(tfile->tun);
+ detached = rtnl_dereference(tfile->detached);
- if (tun) {
+ if (tun && !detached) {
u16 index = tfile->queue_index;
BUG_ON(index >= tun->numqueues);
dev = tun->dev;
rcu_assign_pointer(tun->tfiles[index],
tun->tfiles[tun->numqueues - 1]);
- rcu_assign_pointer(tfile->tun, NULL);
ntfile = rtnl_dereference(tun->tfiles[index]);
ntfile->queue_index = index;
--tun->numqueues;
- if (clean)
+ if (clean) {
+ rcu_assign_pointer(tfile->tun, NULL);
sock_put(&tfile->sk);
- else
+ } else
tun_disable_queue(tun, tfile);
synchronize_net();
@@ -429,7 +431,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
/* Drop read queue */
skb_queue_purge(&tfile->sk.sk_receive_queue);
tun_set_real_num_queues(tun);
- } else if (tfile->detached && clean) {
+ } else if (detached && clean) {
tun = tun_enable_queue(tfile);
sock_put(&tfile->sk);
}
@@ -466,6 +468,10 @@ static void tun_detach_all(struct net_device *dev)
rcu_assign_pointer(tfile->tun, NULL);
--tun->numqueues;
}
+ list_for_each_entry(tfile, &tun->disabled, next) {
+ wake_up_all(&tfile->wq.wait);
+ rcu_assign_pointer(tfile->tun, NULL);
+ }
BUG_ON(tun->numqueues != 0);
synchronize_net();
@@ -496,7 +502,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
goto out;
err = -EINVAL;
- if (rtnl_dereference(tfile->tun))
+ if (rtnl_dereference(tfile->tun) && !rtnl_dereference(tfile->detached))
goto out;
err = -EBUSY;
@@ -504,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
goto out;
err = -E2BIG;
- if (!tfile->detached &&
+ if (!rtnl_dereference(tfile->detached) &&
tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
goto out;
@@ -521,7 +527,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
tun->numqueues++;
- if (tfile->detached)
+ if (rtnl_dereference(tfile->detached))
tun_enable_queue(tfile);
else
sock_hold(&tfile->sk);
@@ -1195,7 +1201,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
tun->dev->stats.rx_packets++;
tun->dev->stats.rx_bytes += len;
- tun_flow_update(tun, rxhash, tfile->queue_index);
+ tun_flow_update(tun, rxhash, tfile);
return total_len;
}
@@ -1552,7 +1558,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
struct net_device *dev;
int err;
- if (tfile->detached)
+ if (rtnl_dereference(tfile->detached))
return -EINVAL;
dev = __dev_get_by_name(net, ifr->ifr_name);
@@ -1796,7 +1802,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
rtnl_lock();
if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
- tun = tfile->detached;
+ tun = rtnl_dereference(tfile->detached);
if (!tun) {
ret = -EINVAL;
goto unlock;
@@ -1807,7 +1813,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
ret = tun_attach(tun, file);
} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
tun = rtnl_dereference(tfile->tun);
- if (!tun || !(tun->flags & TUN_TAP_MQ))
+ if (!tun || !(tun->flags & TUN_TAP_MQ) ||
+ rtnl_dereference(tfile->detached))
ret = -EINVAL;
else
__tun_detach(tfile, false);
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached
2013-01-16 15:44 ` [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
@ 2013-01-16 17:03 ` Michael S. Tsirkin
2013-01-17 1:16 ` Jason Wang
2013-01-24 10:38 ` Michael S. Tsirkin
1 sibling, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2013-01-16 17:03 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Wed, Jan 16, 2013 at 11:44:38PM +0800, Jason Wang wrote:
> We forbid polling, writing and reading when the file were detached, this may
> complex the user in several cases:
>
> - when guest pass some buffers to vhost/qemu and then disable some queues,
> host/qemu needs to do its own cleanup on those buffers which is complex
> sometimes. We can do this simply by allowing a user can still write to an
> disabled queue. Write to an disabled queue will cause the packet pass to the
> kernel and read will get nothing.
> - align the polling behavior with macvtap which never fails when the queue is
> created. This can simplify the polling errors handling of its user (e.g vhost)
>
> In order to achieve this, tfile->tun were not assign to NULL when detached. And
> tfile->tun were converted to be RCU protected in order to let the data path can
> check whether the file is deated in a lockless manner. This will be used to
> prevent the flow caches from being updated for a detached queue.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
> 1 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index c81680d..ec539a9 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -139,7 +139,7 @@ struct tun_file {
> unsigned int flags;
> u16 queue_index;
> struct list_head next;
> - struct tun_struct *detached;
> + struct tun_struct __rcu *detached;
> };
>
> struct tun_flow_entry {
> @@ -295,11 +295,12 @@ static void tun_flow_cleanup(unsigned long data)
> }
>
> static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
> - u16 queue_index)
> + struct tun_file *tfile)
> {
> struct hlist_head *head;
> struct tun_flow_entry *e;
> unsigned long delay = tun->ageing_time;
> + u16 queue_index = tfile->queue_index;
>
> if (!rxhash)
> return;
> @@ -308,7 +309,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>
> rcu_read_lock();
>
> - if (tun->numqueues == 1)
> + if (tun->numqueues == 1 || rcu_dereference(tfile->detached))
> goto unlock;
>
> e = tun_flow_find(head, rxhash);
Sorry, still an issue with this one.
u16 index = tfile->queue_index;
BUG_ON(index >= tun->numqueues);
dev = tun->dev;
rcu_assign_pointer(tun->tfiles[index],
tun->tfiles[tun->numqueues - 1]);
rcu_assign_pointer(tfile->tun, NULL);
ntfile = rtnl_dereference(tun->tfiles[index]);
ntfile->queue_index = index;
--tun->numqueues;
if (clean)
sock_put(&tfile->sk);
else
tun_disable_queue(tun, tfile);
You should first disable queue then synchronize network
only then play with tfiles array.
As it is you might have removed file from array but
did not set detached flag yet, so queue_index
above is stable.
On enable, clear detached last thing.
> @@ -384,16 +385,16 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>
> static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
> {
> - tfile->detached = tun;
> + rcu_assign_pointer(tfile->detached, tun);
> list_add_tail(&tfile->next, &tun->disabled);
> ++tun->numdisabled;
> }
>
> static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> {
> - struct tun_struct *tun = tfile->detached;
> + struct tun_struct *tun = rtnl_dereference(tfile->detached);
>
> - tfile->detached = NULL;
> + rcu_assign_pointer(tfile->detached, NULL);
> list_del_init(&tfile->next);
> --tun->numdisabled;
> return tun;
> @@ -402,26 +403,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> static void __tun_detach(struct tun_file *tfile, bool clean)
> {
> struct tun_file *ntfile;
> - struct tun_struct *tun;
> + struct tun_struct *tun, *detached;
> struct net_device *dev;
>
> tun = rtnl_dereference(tfile->tun);
> + detached = rtnl_dereference(tfile->detached);
>
> - if (tun) {
> + if (tun && !detached) {
> u16 index = tfile->queue_index;
> BUG_ON(index >= tun->numqueues);
> dev = tun->dev;
>
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> - rcu_assign_pointer(tfile->tun, NULL);
> ntfile = rtnl_dereference(tun->tfiles[index]);
> ntfile->queue_index = index;
>
> --tun->numqueues;
> - if (clean)
> + if (clean) {
> + rcu_assign_pointer(tfile->tun, NULL);
> sock_put(&tfile->sk);
> - else
> + } else
> tun_disable_queue(tun, tfile);
>
> synchronize_net();
> @@ -429,7 +431,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
> /* Drop read queue */
> skb_queue_purge(&tfile->sk.sk_receive_queue);
> tun_set_real_num_queues(tun);
> - } else if (tfile->detached && clean) {
> + } else if (detached && clean) {
> tun = tun_enable_queue(tfile);
> sock_put(&tfile->sk);
> }
> @@ -466,6 +468,10 @@ static void tun_detach_all(struct net_device *dev)
> rcu_assign_pointer(tfile->tun, NULL);
> --tun->numqueues;
> }
> + list_for_each_entry(tfile, &tun->disabled, next) {
> + wake_up_all(&tfile->wq.wait);
> + rcu_assign_pointer(tfile->tun, NULL);
> + }
> BUG_ON(tun->numqueues != 0);
>
> synchronize_net();
> @@ -496,7 +502,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> goto out;
>
> err = -EINVAL;
> - if (rtnl_dereference(tfile->tun))
> + if (rtnl_dereference(tfile->tun) && !rtnl_dereference(tfile->detached))
> goto out;
>
> err = -EBUSY;
> @@ -504,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> goto out;
>
> err = -E2BIG;
> - if (!tfile->detached &&
> + if (!rtnl_dereference(tfile->detached) &&
> tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
> goto out;
>
> @@ -521,7 +527,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> tun->numqueues++;
>
> - if (tfile->detached)
> + if (rtnl_dereference(tfile->detached))
> tun_enable_queue(tfile);
> else
> sock_hold(&tfile->sk);
> @@ -1195,7 +1201,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> tun->dev->stats.rx_packets++;
> tun->dev->stats.rx_bytes += len;
>
> - tun_flow_update(tun, rxhash, tfile->queue_index);
> + tun_flow_update(tun, rxhash, tfile);
> return total_len;
> }
>
> @@ -1552,7 +1558,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> struct net_device *dev;
> int err;
>
> - if (tfile->detached)
> + if (rtnl_dereference(tfile->detached))
> return -EINVAL;
>
> dev = __dev_get_by_name(net, ifr->ifr_name);
> @@ -1796,7 +1802,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> rtnl_lock();
>
> if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
> - tun = tfile->detached;
> + tun = rtnl_dereference(tfile->detached);
> if (!tun) {
> ret = -EINVAL;
> goto unlock;
> @@ -1807,7 +1813,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> ret = tun_attach(tun, file);
> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
> tun = rtnl_dereference(tfile->tun);
> - if (!tun || !(tun->flags & TUN_TAP_MQ))
> + if (!tun || !(tun->flags & TUN_TAP_MQ) ||
> + rtnl_dereference(tfile->detached))
> ret = -EINVAL;
> else
> __tun_detach(tfile, false);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached
2013-01-16 17:03 ` Michael S. Tsirkin
@ 2013-01-17 1:16 ` Jason Wang
2013-01-24 10:12 ` Jason Wang
0 siblings, 1 reply; 12+ messages in thread
From: Jason Wang @ 2013-01-17 1:16 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 01/17/2013 01:03 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 16, 2013 at 11:44:38PM +0800, Jason Wang wrote:
>> We forbid polling, writing and reading when the file were detached, this may
>> complex the user in several cases:
>>
>> - when guest pass some buffers to vhost/qemu and then disable some queues,
>> host/qemu needs to do its own cleanup on those buffers which is complex
>> sometimes. We can do this simply by allowing a user can still write to an
>> disabled queue. Write to an disabled queue will cause the packet pass to the
>> kernel and read will get nothing.
>> - align the polling behavior with macvtap which never fails when the queue is
>> created. This can simplify the polling errors handling of its user (e.g vhost)
>>
>> In order to achieve this, tfile->tun were not assign to NULL when detached. And
>> tfile->tun were converted to be RCU protected in order to let the data path can
>> check whether the file is deated in a lockless manner. This will be used to
>> prevent the flow caches from being updated for a detached queue.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
>> 1 files changed, 26 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index c81680d..ec539a9 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -139,7 +139,7 @@ struct tun_file {
>> unsigned int flags;
>> u16 queue_index;
>> struct list_head next;
>> - struct tun_struct *detached;
>> + struct tun_struct __rcu *detached;
>> };
>>
>> struct tun_flow_entry {
>> @@ -295,11 +295,12 @@ static void tun_flow_cleanup(unsigned long data)
>> }
>>
>> static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>> - u16 queue_index)
>> + struct tun_file *tfile)
>> {
>> struct hlist_head *head;
>> struct tun_flow_entry *e;
>> unsigned long delay = tun->ageing_time;
>> + u16 queue_index = tfile->queue_index;
>>
>> if (!rxhash)
>> return;
>> @@ -308,7 +309,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>>
>> rcu_read_lock();
>>
>> - if (tun->numqueues == 1)
>> + if (tun->numqueues == 1 || rcu_dereference(tfile->detached))
>> goto unlock;
>>
>> e = tun_flow_find(head, rxhash);
> Sorry, still an issue with this one.
No problem, thanks for the checking.
>
> u16 index = tfile->queue_index;
> BUG_ON(index >= tun->numqueues);
> dev = tun->dev;
>
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> rcu_assign_pointer(tfile->tun, NULL);
> ntfile = rtnl_dereference(tun->tfiles[index]);
> ntfile->queue_index = index;
>
> --tun->numqueues;
> if (clean)
> sock_put(&tfile->sk);
> else
> tun_disable_queue(tun, tfile);
>
> You should first disable queue then synchronize network
> only then play with tfiles array.
> As it is you might have removed file from array but
> did not set detached flag yet, so queue_index
> above is stable.
I think the code is ok here. With this patch, before synchronize_net(),
the only thing we do for the tfile that will be detached is to set the
tfile->detached (tun_disable_queue), and the queue_index is kept
unchanged. So if the data path don't see the new value of detached, it
still can treat the tfile is undetached and do the sending and receiving
as usual. We only do the cleanup after the synchronization which all
reader are guaranteed to see the new detached value.
For the tfile that will be moved to the new place, some (should be very
little) OOO will occur which I think is acceptable and can be optimized
in the future.
>
> On enable, clear detached last thing.
>
>> @@ -384,16 +385,16 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>>
>> static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
>> {
>> - tfile->detached = tun;
>> + rcu_assign_pointer(tfile->detached, tun);
>> list_add_tail(&tfile->next, &tun->disabled);
>> ++tun->numdisabled;
>> }
>>
>> static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
>> {
>> - struct tun_struct *tun = tfile->detached;
>> + struct tun_struct *tun = rtnl_dereference(tfile->detached);
>>
>> - tfile->detached = NULL;
>> + rcu_assign_pointer(tfile->detached, NULL);
>> list_del_init(&tfile->next);
>> --tun->numdisabled;
>> return tun;
>> @@ -402,26 +403,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
>> static void __tun_detach(struct tun_file *tfile, bool clean)
>> {
>> struct tun_file *ntfile;
>> - struct tun_struct *tun;
>> + struct tun_struct *tun, *detached;
>> struct net_device *dev;
>>
>> tun = rtnl_dereference(tfile->tun);
>> + detached = rtnl_dereference(tfile->detached);
>>
>> - if (tun) {
>> + if (tun && !detached) {
>> u16 index = tfile->queue_index;
>> BUG_ON(index >= tun->numqueues);
>> dev = tun->dev;
>>
>> rcu_assign_pointer(tun->tfiles[index],
>> tun->tfiles[tun->numqueues - 1]);
>> - rcu_assign_pointer(tfile->tun, NULL);
>> ntfile = rtnl_dereference(tun->tfiles[index]);
>> ntfile->queue_index = index;
>>
>> --tun->numqueues;
>> - if (clean)
>> + if (clean) {
>> + rcu_assign_pointer(tfile->tun, NULL);
>> sock_put(&tfile->sk);
>> - else
>> + } else
>> tun_disable_queue(tun, tfile);
>>
>> synchronize_net();
>> @@ -429,7 +431,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>> /* Drop read queue */
>> skb_queue_purge(&tfile->sk.sk_receive_queue);
>> tun_set_real_num_queues(tun);
>> - } else if (tfile->detached && clean) {
>> + } else if (detached && clean) {
>> tun = tun_enable_queue(tfile);
>> sock_put(&tfile->sk);
>> }
>> @@ -466,6 +468,10 @@ static void tun_detach_all(struct net_device *dev)
>> rcu_assign_pointer(tfile->tun, NULL);
>> --tun->numqueues;
>> }
>> + list_for_each_entry(tfile, &tun->disabled, next) {
>> + wake_up_all(&tfile->wq.wait);
>> + rcu_assign_pointer(tfile->tun, NULL);
>> + }
>> BUG_ON(tun->numqueues != 0);
>>
>> synchronize_net();
>> @@ -496,7 +502,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>> goto out;
>>
>> err = -EINVAL;
>> - if (rtnl_dereference(tfile->tun))
>> + if (rtnl_dereference(tfile->tun) && !rtnl_dereference(tfile->detached))
>> goto out;
>>
>> err = -EBUSY;
>> @@ -504,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>> goto out;
>>
>> err = -E2BIG;
>> - if (!tfile->detached &&
>> + if (!rtnl_dereference(tfile->detached) &&
>> tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
>> goto out;
>>
>> @@ -521,7 +527,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>> tun->numqueues++;
>>
>> - if (tfile->detached)
>> + if (rtnl_dereference(tfile->detached))
>> tun_enable_queue(tfile);
>> else
>> sock_hold(&tfile->sk);
>> @@ -1195,7 +1201,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>> tun->dev->stats.rx_packets++;
>> tun->dev->stats.rx_bytes += len;
>>
>> - tun_flow_update(tun, rxhash, tfile->queue_index);
>> + tun_flow_update(tun, rxhash, tfile);
>> return total_len;
>> }
>>
>> @@ -1552,7 +1558,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>> struct net_device *dev;
>> int err;
>>
>> - if (tfile->detached)
>> + if (rtnl_dereference(tfile->detached))
>> return -EINVAL;
>>
>> dev = __dev_get_by_name(net, ifr->ifr_name);
>> @@ -1796,7 +1802,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>> rtnl_lock();
>>
>> if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
>> - tun = tfile->detached;
>> + tun = rtnl_dereference(tfile->detached);
>> if (!tun) {
>> ret = -EINVAL;
>> goto unlock;
>> @@ -1807,7 +1813,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>> ret = tun_attach(tun, file);
>> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
>> tun = rtnl_dereference(tfile->tun);
>> - if (!tun || !(tun->flags & TUN_TAP_MQ))
>> + if (!tun || !(tun->flags & TUN_TAP_MQ) ||
>> + rtnl_dereference(tfile->detached))
>> ret = -EINVAL;
>> else
>> __tun_detach(tfile, false);
>> --
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached
2013-01-17 1:16 ` Jason Wang
@ 2013-01-24 10:12 ` Jason Wang
2013-01-24 10:37 ` Michael S. Tsirkin
0 siblings, 1 reply; 12+ messages in thread
From: Jason Wang @ 2013-01-24 10:12 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel
On 01/17/2013 09:16 AM, Jason Wang wrote:
> On 01/17/2013 01:03 AM, Michael S. Tsirkin wrote:
>> On Wed, Jan 16, 2013 at 11:44:38PM +0800, Jason Wang wrote:
>>> We forbid polling, writing and reading when the file were detached, this may
>>> complex the user in several cases:
>>>
>>> - when guest pass some buffers to vhost/qemu and then disable some queues,
>>> host/qemu needs to do its own cleanup on those buffers which is complex
>>> sometimes. We can do this simply by allowing a user can still write to an
>>> disabled queue. Write to an disabled queue will cause the packet pass to the
>>> kernel and read will get nothing.
>>> - align the polling behavior with macvtap which never fails when the queue is
>>> created. This can simplify the polling errors handling of its user (e.g vhost)
>>>
>>> In order to achieve this, tfile->tun were not assign to NULL when detached. And
>>> tfile->tun were converted to be RCU protected in order to let the data path can
>>> check whether the file is deated in a lockless manner. This will be used to
>>> prevent the flow caches from being updated for a detached queue.
>>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>> drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
>>> 1 files changed, 26 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index c81680d..ec539a9 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -139,7 +139,7 @@ struct tun_file {
>>> unsigned int flags;
>>> u16 queue_index;
>>> struct list_head next;
>>> - struct tun_struct *detached;
>>> + struct tun_struct __rcu *detached;
>>> };
>>>
>>> struct tun_flow_entry {
>>> @@ -295,11 +295,12 @@ static void tun_flow_cleanup(unsigned long data)
>>> }
>>>
>>> static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>>> - u16 queue_index)
>>> + struct tun_file *tfile)
>>> {
>>> struct hlist_head *head;
>>> struct tun_flow_entry *e;
>>> unsigned long delay = tun->ageing_time;
>>> + u16 queue_index = tfile->queue_index;
>>>
>>> if (!rxhash)
>>> return;
>>> @@ -308,7 +309,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>>>
>>> rcu_read_lock();
>>>
>>> - if (tun->numqueues == 1)
>>> + if (tun->numqueues == 1 || rcu_dereference(tfile->detached))
>>> goto unlock;
>>>
>>> e = tun_flow_find(head, rxhash);
>> Sorry, still an issue with this one.
> No problem, thanks for the checking.
>> u16 index = tfile->queue_index;
>> BUG_ON(index >= tun->numqueues);
>> dev = tun->dev;
>>
>> rcu_assign_pointer(tun->tfiles[index],
>> tun->tfiles[tun->numqueues - 1]);
>> rcu_assign_pointer(tfile->tun, NULL);
>> ntfile = rtnl_dereference(tun->tfiles[index]);
>> ntfile->queue_index = index;
>>
>> --tun->numqueues;
>> if (clean)
>> sock_put(&tfile->sk);
>> else
>> tun_disable_queue(tun, tfile);
>>
>> You should first disable queue then synchronize network
>> only then play with tfiles array.
>> As it is you might have removed file from array but
>> did not set detached flag yet, so queue_index
>> above is stable.
> I think the code is ok here. With this patch, before synchronize_net(),
> the only thing we do for the tfile that will be detached is to set the
> tfile->detached (tun_disable_queue), and the queue_index is kept
> unchanged. So if the data path don't see the new value of detached, it
> still can treat the tfile is undetached and do the sending and receiving
> as usual. We only do the cleanup after the synchronization which all
> reader are guaranteed to see the new detached value.
>
> For the tfile that will be moved to the new place, some (should be very
> little) OOO will occur which I think is acceptable and can be optimized
> in the future.
Having a thought about this patch, looks like it's suboptimal since:
- If we can make sure no packets were sent to the disabled queue and
stop the vhost thread during switching (then it can flush). There's no
need for this patch.
- Allowing writing/polling to a detached fd is strange and can hide the
bugs of userspace / guest driver.
So looks like we'd better drop this patch?
>> On enable, clear detached last thing.
>>
>>> @@ -384,16 +385,16 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>>>
>>> static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
>>> {
>>> - tfile->detached = tun;
>>> + rcu_assign_pointer(tfile->detached, tun);
>>> list_add_tail(&tfile->next, &tun->disabled);
>>> ++tun->numdisabled;
>>> }
>>>
>>> static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
>>> {
>>> - struct tun_struct *tun = tfile->detached;
>>> + struct tun_struct *tun = rtnl_dereference(tfile->detached);
>>>
>>> - tfile->detached = NULL;
>>> + rcu_assign_pointer(tfile->detached, NULL);
>>> list_del_init(&tfile->next);
>>> --tun->numdisabled;
>>> return tun;
>>> @@ -402,26 +403,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
>>> static void __tun_detach(struct tun_file *tfile, bool clean)
>>> {
>>> struct tun_file *ntfile;
>>> - struct tun_struct *tun;
>>> + struct tun_struct *tun, *detached;
>>> struct net_device *dev;
>>>
>>> tun = rtnl_dereference(tfile->tun);
>>> + detached = rtnl_dereference(tfile->detached);
>>>
>>> - if (tun) {
>>> + if (tun && !detached) {
>>> u16 index = tfile->queue_index;
>>> BUG_ON(index >= tun->numqueues);
>>> dev = tun->dev;
>>>
>>> rcu_assign_pointer(tun->tfiles[index],
>>> tun->tfiles[tun->numqueues - 1]);
>>> - rcu_assign_pointer(tfile->tun, NULL);
>>> ntfile = rtnl_dereference(tun->tfiles[index]);
>>> ntfile->queue_index = index;
>>>
>>> --tun->numqueues;
>>> - if (clean)
>>> + if (clean) {
>>> + rcu_assign_pointer(tfile->tun, NULL);
>>> sock_put(&tfile->sk);
>>> - else
>>> + } else
>>> tun_disable_queue(tun, tfile);
>>>
>>> synchronize_net();
>>> @@ -429,7 +431,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>>> /* Drop read queue */
>>> skb_queue_purge(&tfile->sk.sk_receive_queue);
>>> tun_set_real_num_queues(tun);
>>> - } else if (tfile->detached && clean) {
>>> + } else if (detached && clean) {
>>> tun = tun_enable_queue(tfile);
>>> sock_put(&tfile->sk);
>>> }
>>> @@ -466,6 +468,10 @@ static void tun_detach_all(struct net_device *dev)
>>> rcu_assign_pointer(tfile->tun, NULL);
>>> --tun->numqueues;
>>> }
>>> + list_for_each_entry(tfile, &tun->disabled, next) {
>>> + wake_up_all(&tfile->wq.wait);
>>> + rcu_assign_pointer(tfile->tun, NULL);
>>> + }
>>> BUG_ON(tun->numqueues != 0);
>>>
>>> synchronize_net();
>>> @@ -496,7 +502,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>>> goto out;
>>>
>>> err = -EINVAL;
>>> - if (rtnl_dereference(tfile->tun))
>>> + if (rtnl_dereference(tfile->tun) && !rtnl_dereference(tfile->detached))
>>> goto out;
>>>
>>> err = -EBUSY;
>>> @@ -504,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>>> goto out;
>>>
>>> err = -E2BIG;
>>> - if (!tfile->detached &&
>>> + if (!rtnl_dereference(tfile->detached) &&
>>> tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
>>> goto out;
>>>
>>> @@ -521,7 +527,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>> tun->numqueues++;
>>>
>>> - if (tfile->detached)
>>> + if (rtnl_dereference(tfile->detached))
>>> tun_enable_queue(tfile);
>>> else
>>> sock_hold(&tfile->sk);
>>> @@ -1195,7 +1201,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>> tun->dev->stats.rx_packets++;
>>> tun->dev->stats.rx_bytes += len;
>>>
>>> - tun_flow_update(tun, rxhash, tfile->queue_index);
>>> + tun_flow_update(tun, rxhash, tfile);
>>> return total_len;
>>> }
>>>
>>> @@ -1552,7 +1558,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>> struct net_device *dev;
>>> int err;
>>>
>>> - if (tfile->detached)
>>> + if (rtnl_dereference(tfile->detached))
>>> return -EINVAL;
>>>
>>> dev = __dev_get_by_name(net, ifr->ifr_name);
>>> @@ -1796,7 +1802,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>>> rtnl_lock();
>>>
>>> if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
>>> - tun = tfile->detached;
>>> + tun = rtnl_dereference(tfile->detached);
>>> if (!tun) {
>>> ret = -EINVAL;
>>> goto unlock;
>>> @@ -1807,7 +1813,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>>> ret = tun_attach(tun, file);
>>> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
>>> tun = rtnl_dereference(tfile->tun);
>>> - if (!tun || !(tun->flags & TUN_TAP_MQ))
>>> + if (!tun || !(tun->flags & TUN_TAP_MQ) ||
>>> + rtnl_dereference(tfile->detached))
>>> ret = -EINVAL;
>>> else
>>> __tun_detach(tfile, false);
>>> --
>>> 1.7.1
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached
2013-01-24 10:12 ` Jason Wang
@ 2013-01-24 10:37 ` Michael S. Tsirkin
0 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2013-01-24 10:37 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Thu, Jan 24, 2013 at 06:12:44PM +0800, Jason Wang wrote:
> On 01/17/2013 09:16 AM, Jason Wang wrote:
> > On 01/17/2013 01:03 AM, Michael S. Tsirkin wrote:
> >> On Wed, Jan 16, 2013 at 11:44:38PM +0800, Jason Wang wrote:
> >>> We forbid polling, writing and reading when the file were detached, this may
> >>> complex the user in several cases:
> >>>
> >>> - when guest pass some buffers to vhost/qemu and then disable some queues,
> >>> host/qemu needs to do its own cleanup on those buffers which is complex
> >>> sometimes. We can do this simply by allowing a user can still write to an
> >>> disabled queue. Write to an disabled queue will cause the packet pass to the
> >>> kernel and read will get nothing.
> >>> - align the polling behavior with macvtap which never fails when the queue is
> >>> created. This can simplify the polling errors handling of its user (e.g vhost)
> >>>
> >>> In order to achieve this, tfile->tun were not assign to NULL when detached. And
> >>> tfile->tun were converted to be RCU protected in order to let the data path can
> >>> check whether the file is deated in a lockless manner. This will be used to
> >>> prevent the flow caches from being updated for a detached queue.
> >>>
> >>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> ---
> >>> drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
> >>> 1 files changed, 26 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>> index c81680d..ec539a9 100644
> >>> --- a/drivers/net/tun.c
> >>> +++ b/drivers/net/tun.c
> >>> @@ -139,7 +139,7 @@ struct tun_file {
> >>> unsigned int flags;
> >>> u16 queue_index;
> >>> struct list_head next;
> >>> - struct tun_struct *detached;
> >>> + struct tun_struct __rcu *detached;
> >>> };
> >>>
> >>> struct tun_flow_entry {
> >>> @@ -295,11 +295,12 @@ static void tun_flow_cleanup(unsigned long data)
> >>> }
> >>>
> >>> static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
> >>> - u16 queue_index)
> >>> + struct tun_file *tfile)
> >>> {
> >>> struct hlist_head *head;
> >>> struct tun_flow_entry *e;
> >>> unsigned long delay = tun->ageing_time;
> >>> + u16 queue_index = tfile->queue_index;
> >>>
> >>> if (!rxhash)
> >>> return;
> >>> @@ -308,7 +309,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
> >>>
> >>> rcu_read_lock();
> >>>
> >>> - if (tun->numqueues == 1)
> >>> + if (tun->numqueues == 1 || rcu_dereference(tfile->detached))
> >>> goto unlock;
> >>>
> >>> e = tun_flow_find(head, rxhash);
> >> Sorry, still an issue with this one.
> > No problem, thanks for the checking.
> >> u16 index = tfile->queue_index;
> >> BUG_ON(index >= tun->numqueues);
> >> dev = tun->dev;
> >>
> >> rcu_assign_pointer(tun->tfiles[index],
> >> tun->tfiles[tun->numqueues - 1]);
> >> rcu_assign_pointer(tfile->tun, NULL);
> >> ntfile = rtnl_dereference(tun->tfiles[index]);
> >> ntfile->queue_index = index;
> >>
> >> --tun->numqueues;
> >> if (clean)
> >> sock_put(&tfile->sk);
> >> else
> >> tun_disable_queue(tun, tfile);
> >>
> >> You should first disable queue then synchronize network
> >> only then play with tfiles array.
> >> As it is you might have removed file from array but
> >> did not set detached flag yet, so queue_index
> >> above is stable.
> > I think the code is ok here. With this patch, before synchronize_net(),
> > the only thing we do for the tfile that will be detached is to set the
> > tfile->detached (tun_disable_queue), and the queue_index is kept
> > unchanged. So if the data path don't see the new value of detached, it
> > still can treat the tfile is undetached and do the sending and receiving
> > as usual. We only do the cleanup after the synchronization which all
> > reader are guaranteed to see the new detached value.
> >
> > For the tfile that will be moved to the new place, some (should be very
> > little) OOO will occur which I think is acceptable and can be optimized
> > in the future.
>
> Having a thought about this patch, looks like it's suboptimal since:
>
> - If we can make sure no packets were sent to the disabled queue and
> stop the vhost thread during switching (then it can flush). There's no
> need for this patch.
This assumes synchronization in userspace/vhost, this will make
datapath slower without real need.
> - Allowing writing/polling to a detached fd
It's not a detached fd - it's attached to tun.
We just disabled receiving packets on it.
> is strange and can hide the
> bugs of userspace / guest driver.
That's a good thing, you don't want a fragile interface.
>
> So looks like we'd better drop this patch?
I actually think it's the right approach.
And since you clarified I think the patch is allright.
> >> On enable, clear detached last thing.
> >>
> >>> @@ -384,16 +385,16 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
> >>>
> >>> static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
> >>> {
> >>> - tfile->detached = tun;
> >>> + rcu_assign_pointer(tfile->detached, tun);
> >>> list_add_tail(&tfile->next, &tun->disabled);
> >>> ++tun->numdisabled;
> >>> }
> >>>
> >>> static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> >>> {
> >>> - struct tun_struct *tun = tfile->detached;
> >>> + struct tun_struct *tun = rtnl_dereference(tfile->detached);
> >>>
> >>> - tfile->detached = NULL;
> >>> + rcu_assign_pointer(tfile->detached, NULL);
> >>> list_del_init(&tfile->next);
> >>> --tun->numdisabled;
> >>> return tun;
> >>> @@ -402,26 +403,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> >>> static void __tun_detach(struct tun_file *tfile, bool clean)
> >>> {
> >>> struct tun_file *ntfile;
> >>> - struct tun_struct *tun;
> >>> + struct tun_struct *tun, *detached;
> >>> struct net_device *dev;
> >>>
> >>> tun = rtnl_dereference(tfile->tun);
> >>> + detached = rtnl_dereference(tfile->detached);
> >>>
> >>> - if (tun) {
> >>> + if (tun && !detached) {
> >>> u16 index = tfile->queue_index;
> >>> BUG_ON(index >= tun->numqueues);
> >>> dev = tun->dev;
> >>>
> >>> rcu_assign_pointer(tun->tfiles[index],
> >>> tun->tfiles[tun->numqueues - 1]);
> >>> - rcu_assign_pointer(tfile->tun, NULL);
> >>> ntfile = rtnl_dereference(tun->tfiles[index]);
> >>> ntfile->queue_index = index;
> >>>
> >>> --tun->numqueues;
> >>> - if (clean)
> >>> + if (clean) {
> >>> + rcu_assign_pointer(tfile->tun, NULL);
> >>> sock_put(&tfile->sk);
> >>> - else
> >>> + } else
> >>> tun_disable_queue(tun, tfile);
> >>>
> >>> synchronize_net();
> >>> @@ -429,7 +431,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
> >>> /* Drop read queue */
> >>> skb_queue_purge(&tfile->sk.sk_receive_queue);
> >>> tun_set_real_num_queues(tun);
> >>> - } else if (tfile->detached && clean) {
> >>> + } else if (detached && clean) {
> >>> tun = tun_enable_queue(tfile);
> >>> sock_put(&tfile->sk);
> >>> }
> >>> @@ -466,6 +468,10 @@ static void tun_detach_all(struct net_device *dev)
> >>> rcu_assign_pointer(tfile->tun, NULL);
> >>> --tun->numqueues;
> >>> }
> >>> + list_for_each_entry(tfile, &tun->disabled, next) {
> >>> + wake_up_all(&tfile->wq.wait);
> >>> + rcu_assign_pointer(tfile->tun, NULL);
> >>> + }
> >>> BUG_ON(tun->numqueues != 0);
> >>>
> >>> synchronize_net();
> >>> @@ -496,7 +502,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> >>> goto out;
> >>>
> >>> err = -EINVAL;
> >>> - if (rtnl_dereference(tfile->tun))
> >>> + if (rtnl_dereference(tfile->tun) && !rtnl_dereference(tfile->detached))
> >>> goto out;
> >>>
> >>> err = -EBUSY;
> >>> @@ -504,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> >>> goto out;
> >>>
> >>> err = -E2BIG;
> >>> - if (!tfile->detached &&
> >>> + if (!rtnl_dereference(tfile->detached) &&
> >>> tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
> >>> goto out;
> >>>
> >>> @@ -521,7 +527,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> >>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> >>> tun->numqueues++;
> >>>
> >>> - if (tfile->detached)
> >>> + if (rtnl_dereference(tfile->detached))
> >>> tun_enable_queue(tfile);
> >>> else
> >>> sock_hold(&tfile->sk);
> >>> @@ -1195,7 +1201,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >>> tun->dev->stats.rx_packets++;
> >>> tun->dev->stats.rx_bytes += len;
> >>>
> >>> - tun_flow_update(tun, rxhash, tfile->queue_index);
> >>> + tun_flow_update(tun, rxhash, tfile);
> >>> return total_len;
> >>> }
> >>>
> >>> @@ -1552,7 +1558,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >>> struct net_device *dev;
> >>> int err;
> >>>
> >>> - if (tfile->detached)
> >>> + if (rtnl_dereference(tfile->detached))
> >>> return -EINVAL;
> >>>
> >>> dev = __dev_get_by_name(net, ifr->ifr_name);
> >>> @@ -1796,7 +1802,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> >>> rtnl_lock();
> >>>
> >>> if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
> >>> - tun = tfile->detached;
> >>> + tun = rtnl_dereference(tfile->detached);
> >>> if (!tun) {
> >>> ret = -EINVAL;
> >>> goto unlock;
> >>> @@ -1807,7 +1813,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> >>> ret = tun_attach(tun, file);
> >>> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
> >>> tun = rtnl_dereference(tfile->tun);
> >>> - if (!tun || !(tun->flags & TUN_TAP_MQ))
> >>> + if (!tun || !(tun->flags & TUN_TAP_MQ) ||
> >>> + rtnl_dereference(tfile->detached))
> >>> ret = -EINVAL;
> >>> else
> >>> __tun_detach(tfile, false);
> >>> --
> >>> 1.7.1
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 0/3] handle polling errors in vhost/vhost_net
2013-01-16 15:44 [PATCH V6 0/3] handle polling errors in vhost/vhost_net Jason Wang
` (2 preceding siblings ...)
2013-01-16 15:44 ` [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
@ 2013-01-24 10:38 ` Michael S. Tsirkin
3 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2013-01-24 10:38 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Wed, Jan 16, 2013 at 11:44:35PM +0800, Jason Wang wrote:
> This is an update version of last version to fix the handling of polling errors
> in vhost/vhost_net.
>
> Currently, vhost and vhost_net ignore polling errors which can lead kernel
> crashing when it tries to remove itself from waitqueue after the polling
> failure. Fix this by:
>
> - examing the POLLERR when setting backend and report erros to userspace
> - let tun always add to waitqueue in .poll() after the queue is created even if
> it was detached.
>
> Changes from V5:
> - use rcu_dereference() instead of the wrong rtnl_dereference() in data path
> - test with CONFIG_PROVE_RCU
>
> Changes from V4:
> - check the detached state by tfile->detached and protect it by RCU
>
> Changes from V3:
> - make a smaller patch that doesn't touch the whole polling state and only check
> the polliner errors in backend setting.
> - add a patch that allows tuntap to do polling/reading/writing when detached
> which could simplify the work of its user.
>
> Changes from v2:
> - check poll->wqh instead of the wrong assumption about POLLERR and waitqueue
> - drop the whole tx polling state check since it was replaced by the wqh
> checking
> - drop the buggy tuntap patch
>
> Changes from v1:
> - restore the state before the ioctl when vhost_init_used() fails
> - log the error when meet polling errors in the data path
> - don't put into waitqueue when tun_chr_poll() return POLLERR
>
> Jason Wang (3):
> vhost_net: correct error handling in vhost_net_set_backend()
> vhost_net: handle polling errors when setting backend
> tuntap: allow polling/writing/reading when detached
>
> drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
> drivers/vhost/net.c | 41 ++++++++++++++++++++++++++++-------------
> drivers/vhost/vhost.c | 18 +++++++++++++++---
> drivers/vhost/vhost.h | 2 +-
> 4 files changed, 70 insertions(+), 36 deletions(-)
Acked-by: Michael S. Tsirkin <mst@redhat.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend()
2013-01-16 15:44 ` [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
@ 2013-01-24 10:38 ` Michael S. Tsirkin
0 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2013-01-24 10:38 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Wed, Jan 16, 2013 at 11:44:36PM +0800, Jason Wang wrote:
> Currently, when vhost_init_used() fails the sock refcnt and ubufs were
> leaked. Correct this by calling vhost_init_used() before assign ubufs and
> restore the oldsock when it fails.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/vhost/net.c | 16 +++++++++++-----
> 1 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ebd08b2..d10ad6f 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -827,15 +827,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> r = PTR_ERR(ubufs);
> goto err_ubufs;
> }
> - oldubufs = vq->ubufs;
> - vq->ubufs = ubufs;
> +
> vhost_net_disable_vq(n, vq);
> rcu_assign_pointer(vq->private_data, sock);
> - vhost_net_enable_vq(n, vq);
> -
> r = vhost_init_used(vq);
> if (r)
> - goto err_vq;
> + goto err_used;
> + vhost_net_enable_vq(n, vq);
> +
> + oldubufs = vq->ubufs;
> + vq->ubufs = ubufs;
>
> n->tx_packets = 0;
> n->tx_zcopy_err = 0;
> @@ -859,6 +860,11 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> mutex_unlock(&n->dev.mutex);
> return 0;
>
> +err_used:
> + rcu_assign_pointer(vq->private_data, oldsock);
> + vhost_net_enable_vq(n, vq);
> + if (ubufs)
> + vhost_ubuf_put_and_wait(ubufs);
> err_ubufs:
> fput(sock->file);
> err_vq:
> --
> 1.7.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 2/3] vhost_net: handle polling errors when setting backend
2013-01-16 15:44 ` [PATCH V6 2/3] vhost_net: handle polling errors when setting backend Jason Wang
@ 2013-01-24 10:38 ` Michael S. Tsirkin
0 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2013-01-24 10:38 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Wed, Jan 16, 2013 at 11:44:37PM +0800, Jason Wang wrote:
> Currently, the polling errors were ignored, which can lead following issues:
>
> - vhost remove itself unconditionally from waitqueue when stopping the poll,
> this may crash the kernel since the previous attempt of starting may fail to
> add itself to the waitqueue
> - userspace may think the backend were successfully set even when the polling
> failed.
>
> Solve this by:
>
> - check poll->wqh before trying to remove from waitqueue
> - report polling errors in vhost_poll_start(), tx_poll_start(), the return value
> will be checked and returned when userspace want to set the backend
>
> After this fix, there still could be a polling failure after backend is set, it
> will addressed by the next patch.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/vhost/net.c | 27 ++++++++++++++++++---------
> drivers/vhost/vhost.c | 18 +++++++++++++++---
> drivers/vhost/vhost.h | 2 +-
> 3 files changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index d10ad6f..959b1cd 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -165,12 +165,16 @@ static void tx_poll_stop(struct vhost_net *net)
> }
>
> /* Caller must have TX VQ lock */
> -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
> +static int tx_poll_start(struct vhost_net *net, struct socket *sock)
> {
> + int ret;
> +
> if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
> - return;
> - vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> - net->tx_poll_state = VHOST_NET_POLL_STARTED;
> + return 0;
> + ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> + if (!ret)
> + net->tx_poll_state = VHOST_NET_POLL_STARTED;
> + return ret;
> }
>
> /* In case of DMA done not in order in lower device driver for some reason.
> @@ -642,20 +646,23 @@ static void vhost_net_disable_vq(struct vhost_net *n,
> vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
> }
>
> -static void vhost_net_enable_vq(struct vhost_net *n,
> +static int vhost_net_enable_vq(struct vhost_net *n,
> struct vhost_virtqueue *vq)
> {
> struct socket *sock;
> + int ret;
>
> sock = rcu_dereference_protected(vq->private_data,
> lockdep_is_held(&vq->mutex));
> if (!sock)
> - return;
> + return 0;
> if (vq == n->vqs + VHOST_NET_VQ_TX) {
> n->tx_poll_state = VHOST_NET_POLL_STOPPED;
> - tx_poll_start(n, sock);
> + ret = tx_poll_start(n, sock);
> } else
> - vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> + ret = vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> +
> + return ret;
> }
>
> static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> @@ -833,7 +840,9 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> r = vhost_init_used(vq);
> if (r)
> goto err_used;
> - vhost_net_enable_vq(n, vq);
> + r = vhost_net_enable_vq(n, vq);
> + if (r)
> + goto err_used;
>
> oldubufs = vq->ubufs;
> vq->ubufs = ubufs;
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34389f7..9759249 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -77,26 +77,38 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> init_poll_funcptr(&poll->table, vhost_poll_func);
> poll->mask = mask;
> poll->dev = dev;
> + poll->wqh = NULL;
>
> vhost_work_init(&poll->work, fn);
> }
>
> /* Start polling a file. We add ourselves to file's wait queue. The caller must
> * keep a reference to a file until after vhost_poll_stop is called. */
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file)
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file)
> {
> unsigned long mask;
> + int ret = 0;
>
> mask = file->f_op->poll(file, &poll->table);
> if (mask)
> vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
> + if (mask & POLLERR) {
> + if (poll->wqh)
> + remove_wait_queue(poll->wqh, &poll->wait);
> + ret = -EINVAL;
> + }
> +
> + return ret;
> }
>
> /* Stop polling a file. After this function returns, it becomes safe to drop the
> * file reference. You must also flush afterwards. */
> void vhost_poll_stop(struct vhost_poll *poll)
> {
> - remove_wait_queue(poll->wqh, &poll->wait);
> + if (poll->wqh) {
> + remove_wait_queue(poll->wqh, &poll->wait);
> + poll->wqh = NULL;
> + }
> }
>
> static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
> @@ -792,7 +804,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
> fput(filep);
>
> if (pollstart && vq->handle_kick)
> - vhost_poll_start(&vq->poll, vq->kick);
> + r = vhost_poll_start(&vq->poll, vq->kick);
>
> mutex_unlock(&vq->mutex);
>
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 2639c58..17261e2 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -42,7 +42,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
>
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> unsigned long mask, struct vhost_dev *dev);
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> void vhost_poll_queue(struct vhost_poll *poll);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached
2013-01-16 15:44 ` [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
2013-01-16 17:03 ` Michael S. Tsirkin
@ 2013-01-24 10:38 ` Michael S. Tsirkin
1 sibling, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2013-01-24 10:38 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Wed, Jan 16, 2013 at 11:44:38PM +0800, Jason Wang wrote:
> We forbid polling, writing and reading when the file were detached, this may
> complex the user in several cases:
>
> - when guest pass some buffers to vhost/qemu and then disable some queues,
> host/qemu needs to do its own cleanup on those buffers which is complex
> sometimes. We can do this simply by allowing a user can still write to an
> disabled queue. Write to an disabled queue will cause the packet pass to the
> kernel and read will get nothing.
> - align the polling behavior with macvtap which never fails when the queue is
> created. This can simplify the polling errors handling of its user (e.g vhost)
>
> In order to achieve this, tfile->tun were not assign to NULL when detached. And
> tfile->tun were converted to be RCU protected in order to let the data path can
> check whether the file is deated in a lockless manner. This will be used to
> prevent the flow caches from being updated for a detached queue.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/tun.c | 45 ++++++++++++++++++++++++++-------------------
> 1 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index c81680d..ec539a9 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -139,7 +139,7 @@ struct tun_file {
> unsigned int flags;
> u16 queue_index;
> struct list_head next;
> - struct tun_struct *detached;
> + struct tun_struct __rcu *detached;
> };
>
> struct tun_flow_entry {
> @@ -295,11 +295,12 @@ static void tun_flow_cleanup(unsigned long data)
> }
>
> static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
> - u16 queue_index)
> + struct tun_file *tfile)
> {
> struct hlist_head *head;
> struct tun_flow_entry *e;
> unsigned long delay = tun->ageing_time;
> + u16 queue_index = tfile->queue_index;
>
> if (!rxhash)
> return;
> @@ -308,7 +309,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>
> rcu_read_lock();
>
> - if (tun->numqueues == 1)
> + if (tun->numqueues == 1 || rcu_dereference(tfile->detached))
> goto unlock;
>
> e = tun_flow_find(head, rxhash);
> @@ -384,16 +385,16 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>
> static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
> {
> - tfile->detached = tun;
> + rcu_assign_pointer(tfile->detached, tun);
> list_add_tail(&tfile->next, &tun->disabled);
> ++tun->numdisabled;
> }
>
> static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> {
> - struct tun_struct *tun = tfile->detached;
> + struct tun_struct *tun = rtnl_dereference(tfile->detached);
>
> - tfile->detached = NULL;
> + rcu_assign_pointer(tfile->detached, NULL);
> list_del_init(&tfile->next);
> --tun->numdisabled;
> return tun;
> @@ -402,26 +403,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
> static void __tun_detach(struct tun_file *tfile, bool clean)
> {
> struct tun_file *ntfile;
> - struct tun_struct *tun;
> + struct tun_struct *tun, *detached;
> struct net_device *dev;
>
> tun = rtnl_dereference(tfile->tun);
> + detached = rtnl_dereference(tfile->detached);
>
> - if (tun) {
> + if (tun && !detached) {
> u16 index = tfile->queue_index;
> BUG_ON(index >= tun->numqueues);
> dev = tun->dev;
>
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> - rcu_assign_pointer(tfile->tun, NULL);
> ntfile = rtnl_dereference(tun->tfiles[index]);
> ntfile->queue_index = index;
>
> --tun->numqueues;
> - if (clean)
> + if (clean) {
> + rcu_assign_pointer(tfile->tun, NULL);
> sock_put(&tfile->sk);
> - else
> + } else
> tun_disable_queue(tun, tfile);
>
> synchronize_net();
> @@ -429,7 +431,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
> /* Drop read queue */
> skb_queue_purge(&tfile->sk.sk_receive_queue);
> tun_set_real_num_queues(tun);
> - } else if (tfile->detached && clean) {
> + } else if (detached && clean) {
> tun = tun_enable_queue(tfile);
> sock_put(&tfile->sk);
> }
> @@ -466,6 +468,10 @@ static void tun_detach_all(struct net_device *dev)
> rcu_assign_pointer(tfile->tun, NULL);
> --tun->numqueues;
> }
> + list_for_each_entry(tfile, &tun->disabled, next) {
> + wake_up_all(&tfile->wq.wait);
> + rcu_assign_pointer(tfile->tun, NULL);
> + }
> BUG_ON(tun->numqueues != 0);
>
> synchronize_net();
> @@ -496,7 +502,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> goto out;
>
> err = -EINVAL;
> - if (rtnl_dereference(tfile->tun))
> + if (rtnl_dereference(tfile->tun) && !rtnl_dereference(tfile->detached))
> goto out;
>
> err = -EBUSY;
> @@ -504,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> goto out;
>
> err = -E2BIG;
> - if (!tfile->detached &&
> + if (!rtnl_dereference(tfile->detached) &&
> tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
> goto out;
>
> @@ -521,7 +527,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> tun->numqueues++;
>
> - if (tfile->detached)
> + if (rtnl_dereference(tfile->detached))
> tun_enable_queue(tfile);
> else
> sock_hold(&tfile->sk);
> @@ -1195,7 +1201,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> tun->dev->stats.rx_packets++;
> tun->dev->stats.rx_bytes += len;
>
> - tun_flow_update(tun, rxhash, tfile->queue_index);
> + tun_flow_update(tun, rxhash, tfile);
> return total_len;
> }
>
> @@ -1552,7 +1558,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> struct net_device *dev;
> int err;
>
> - if (tfile->detached)
> + if (rtnl_dereference(tfile->detached))
> return -EINVAL;
>
> dev = __dev_get_by_name(net, ifr->ifr_name);
> @@ -1796,7 +1802,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> rtnl_lock();
>
> if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
> - tun = tfile->detached;
> + tun = rtnl_dereference(tfile->detached);
> if (!tun) {
> ret = -EINVAL;
> goto unlock;
> @@ -1807,7 +1813,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> ret = tun_attach(tun, file);
> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
> tun = rtnl_dereference(tfile->tun);
> - if (!tun || !(tun->flags & TUN_TAP_MQ))
> + if (!tun || !(tun->flags & TUN_TAP_MQ) ||
> + rtnl_dereference(tfile->detached))
> ret = -EINVAL;
> else
> __tun_detach(tfile, false);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-01-24 10:34 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-16 15:44 [PATCH V6 0/3] handle polling errors in vhost/vhost_net Jason Wang
2013-01-16 15:44 ` [PATCH V6 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
2013-01-24 10:38 ` Michael S. Tsirkin
2013-01-16 15:44 ` [PATCH V6 2/3] vhost_net: handle polling errors when setting backend Jason Wang
2013-01-24 10:38 ` Michael S. Tsirkin
2013-01-16 15:44 ` [PATCH V6 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
2013-01-16 17:03 ` Michael S. Tsirkin
2013-01-17 1:16 ` Jason Wang
2013-01-24 10:12 ` Jason Wang
2013-01-24 10:37 ` Michael S. Tsirkin
2013-01-24 10:38 ` Michael S. Tsirkin
2013-01-24 10:38 ` [PATCH V6 0/3] handle polling errors in vhost/vhost_net Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).