* [PATCH V7 0/3] handle polling errors in vhost/vhost_net
@ 2013-01-28 11:05 Jason Wang
2013-01-28 11:05 ` [PATCH V7 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Jason Wang @ 2013-01-28 11:05 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
This is an update version of last version to fix the handling of polling errors
in vhost/vhost_net.
Currently, vhost and vhost_net ignore polling errors which can lead kernel
crashing when it tries to remove itself from waitqueue after the polling
failure. Fix this by:
- examing the POLLERR when setting backend and report erros to userspace
- let tun always add to waitqueue in .poll() after the queue is created even if
it was detached.
Changes from V6:
- don't use RCU to protect the tfile->detached.
Changes from V5:
- use rcu_dereference() instead of the wrong rtnl_dereference() in data path
- test with CONFIG_PROVE_RCU
Changes from V4:
- check the detached state by tfile->detached and protect it by RCU
Changes from V3:
- make a smaller patch that doesn't touch the whole polling state and only check
the polliner errors in backend setting.
- add a patch that allows tuntap to do polling/reading/writing when detached
which could simplify the work of its user.
Changes from v2:
- check poll->wqh instead of the wrong assumption about POLLERR and waitqueue
- drop the whole tx polling state check since it was replaced by the wqh
checking
- drop the buggy tuntap patch
Changes from v1:
- restore the state before the ioctl when vhost_init_used() fails
- log the error when meet polling errors in the data path
- don't put into waitqueue when tun_chr_poll() return POLLERR
Jason Wang (3):
vhost_net: correct error handling in vhost_net_set_backend()
vhost_net: handle polling errors when setting backend
tuntap: allow polling/writing/reading when detached
drivers/net/tun.c | 25 ++++++++++++++++---------
drivers/vhost/net.c | 41 ++++++++++++++++++++++++++++-------------
drivers/vhost/vhost.c | 18 +++++++++++++++---
drivers/vhost/vhost.h | 2 +-
4 files changed, 60 insertions(+), 26 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH V7 1/3] vhost_net: correct error handling in vhost_net_set_backend()
2013-01-28 11:05 [PATCH V7 0/3] handle polling errors in vhost/vhost_net Jason Wang
@ 2013-01-28 11:05 ` Jason Wang
2013-01-28 11:05 ` [PATCH V7 2/3] vhost_net: handle polling errors when setting backend Jason Wang
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2013-01-28 11:05 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 16 +++++++++++-----
1 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ebd08b2..d10ad6f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -827,15 +827,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
r = PTR_ERR(ubufs);
goto err_ubufs;
}
- oldubufs = vq->ubufs;
- vq->ubufs = ubufs;
+
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, sock);
- vhost_net_enable_vq(n, vq);
-
r = vhost_init_used(vq);
if (r)
- goto err_vq;
+ goto err_used;
+ vhost_net_enable_vq(n, vq);
+
+ oldubufs = vq->ubufs;
+ vq->ubufs = ubufs;
n->tx_packets = 0;
n->tx_zcopy_err = 0;
@@ -859,6 +860,11 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
mutex_unlock(&n->dev.mutex);
return 0;
+err_used:
+ rcu_assign_pointer(vq->private_data, oldsock);
+ vhost_net_enable_vq(n, vq);
+ if (ubufs)
+ vhost_ubuf_put_and_wait(ubufs);
err_ubufs:
fput(sock->file);
err_vq:
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V7 2/3] vhost_net: handle polling errors when setting backend
2013-01-28 11:05 [PATCH V7 0/3] handle polling errors in vhost/vhost_net Jason Wang
2013-01-28 11:05 ` [PATCH V7 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
@ 2013-01-28 11:05 ` Jason Wang
2013-01-28 17:30 ` Michael S. Tsirkin
2013-01-28 11:05 ` [PATCH V7 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2013-01-28 11:05 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
Currently, the polling errors were ignored, which can lead following issues:
- vhost remove itself unconditionally from waitqueue when stopping the poll,
this may crash the kernel since the previous attempt of starting may fail to
add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
failed.
Solve this by:
- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
will be checked and returned when userspace want to set the backend
After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 27 ++++++++++++++++++---------
drivers/vhost/vhost.c | 18 +++++++++++++++---
drivers/vhost/vhost.h | 2 +-
3 files changed, 34 insertions(+), 13 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index d10ad6f..959b1cd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -165,12 +165,16 @@ static void tx_poll_stop(struct vhost_net *net)
}
/* Caller must have TX VQ lock */
-static void tx_poll_start(struct vhost_net *net, struct socket *sock)
+static int tx_poll_start(struct vhost_net *net, struct socket *sock)
{
+ int ret;
+
if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
- return;
- vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
- net->tx_poll_state = VHOST_NET_POLL_STARTED;
+ return 0;
+ ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
+ if (!ret)
+ net->tx_poll_state = VHOST_NET_POLL_STARTED;
+ return ret;
}
/* In case of DMA done not in order in lower device driver for some reason.
@@ -642,20 +646,23 @@ static void vhost_net_disable_vq(struct vhost_net *n,
vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
}
-static void vhost_net_enable_vq(struct vhost_net *n,
+static int vhost_net_enable_vq(struct vhost_net *n,
struct vhost_virtqueue *vq)
{
struct socket *sock;
+ int ret;
sock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (!sock)
- return;
+ return 0;
if (vq == n->vqs + VHOST_NET_VQ_TX) {
n->tx_poll_state = VHOST_NET_POLL_STOPPED;
- tx_poll_start(n, sock);
+ ret = tx_poll_start(n, sock);
} else
- vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
+ ret = vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
+
+ return ret;
}
static struct socket *vhost_net_stop_vq(struct vhost_net *n,
@@ -833,7 +840,9 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
r = vhost_init_used(vq);
if (r)
goto err_used;
- vhost_net_enable_vq(n, vq);
+ r = vhost_net_enable_vq(n, vq);
+ if (r)
+ goto err_used;
oldubufs = vq->ubufs;
vq->ubufs = ubufs;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 34389f7..9759249 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -77,26 +77,38 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
init_poll_funcptr(&poll->table, vhost_poll_func);
poll->mask = mask;
poll->dev = dev;
+ poll->wqh = NULL;
vhost_work_init(&poll->work, fn);
}
/* Start polling a file. We add ourselves to file's wait queue. The caller must
* keep a reference to a file until after vhost_poll_stop is called. */
-void vhost_poll_start(struct vhost_poll *poll, struct file *file)
+int vhost_poll_start(struct vhost_poll *poll, struct file *file)
{
unsigned long mask;
+ int ret = 0;
mask = file->f_op->poll(file, &poll->table);
if (mask)
vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
+ if (mask & POLLERR) {
+ if (poll->wqh)
+ remove_wait_queue(poll->wqh, &poll->wait);
+ ret = -EINVAL;
+ }
+
+ return ret;
}
/* Stop polling a file. After this function returns, it becomes safe to drop the
* file reference. You must also flush afterwards. */
void vhost_poll_stop(struct vhost_poll *poll)
{
- remove_wait_queue(poll->wqh, &poll->wait);
+ if (poll->wqh) {
+ remove_wait_queue(poll->wqh, &poll->wait);
+ poll->wqh = NULL;
+ }
}
static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
@@ -792,7 +804,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
fput(filep);
if (pollstart && vq->handle_kick)
- vhost_poll_start(&vq->poll, vq->kick);
+ r = vhost_poll_start(&vq->poll, vq->kick);
mutex_unlock(&vq->mutex);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2639c58..17261e2 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -42,7 +42,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
unsigned long mask, struct vhost_dev *dev);
-void vhost_poll_start(struct vhost_poll *poll, struct file *file);
+int vhost_poll_start(struct vhost_poll *poll, struct file *file);
void vhost_poll_stop(struct vhost_poll *poll);
void vhost_poll_flush(struct vhost_poll *poll);
void vhost_poll_queue(struct vhost_poll *poll);
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V7 3/3] tuntap: allow polling/writing/reading when detached
2013-01-28 11:05 [PATCH V7 0/3] handle polling errors in vhost/vhost_net Jason Wang
2013-01-28 11:05 ` [PATCH V7 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
2013-01-28 11:05 ` [PATCH V7 2/3] vhost_net: handle polling errors when setting backend Jason Wang
@ 2013-01-28 11:05 ` Jason Wang
2013-01-28 17:31 ` Michael S. Tsirkin
2013-01-29 7:03 ` [PATCH V7 0/3] handle polling errors in vhost/vhost_net Wanlong Gao
2013-01-29 20:45 ` David Miller
4 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2013-01-28 11:05 UTC (permalink / raw)
To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang
We forbid polling, writing and reading when the file were detached, this may
complex the user in several cases:
- when guest pass some buffers to vhost/qemu and then disable some queues,
host/qemu needs to do its own cleanup on those buffers which is complex
sometimes. We can do this simply by allowing a user can still write to an
disabled queue. Write to an disabled queue will cause the packet pass to the
kernel and read will get nothing.
- align the polling behavior with macvtap which never fails when the queue is
created. This can simplify the polling errors handling of its user (e.g vhost)
We can simply achieve this by don't assign NULL to tfile->tun when detached.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 25 ++++++++++++++++---------
1 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 30a7d0e..2fb19da 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -298,11 +298,12 @@ static void tun_flow_cleanup(unsigned long data)
}
static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
- u16 queue_index)
+ struct tun_file *tfile)
{
struct hlist_head *head;
struct tun_flow_entry *e;
unsigned long delay = tun->ageing_time;
+ u16 queue_index = tfile->queue_index;
if (!rxhash)
return;
@@ -311,7 +312,9 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
rcu_read_lock();
- if (tun->numqueues == 1)
+ /* We may get a very small possibility of OOO during switching, not
+ * worth to optimize.*/
+ if (tun->numqueues == 1 || tfile->detached)
goto unlock;
e = tun_flow_find(head, rxhash);
@@ -411,21 +414,21 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
tun = rtnl_dereference(tfile->tun);
- if (tun) {
+ if (tun && !tfile->detached) {
u16 index = tfile->queue_index;
BUG_ON(index >= tun->numqueues);
dev = tun->dev;
rcu_assign_pointer(tun->tfiles[index],
tun->tfiles[tun->numqueues - 1]);
- rcu_assign_pointer(tfile->tun, NULL);
ntfile = rtnl_dereference(tun->tfiles[index]);
ntfile->queue_index = index;
--tun->numqueues;
- if (clean)
+ if (clean) {
+ rcu_assign_pointer(tfile->tun, NULL);
sock_put(&tfile->sk);
- else
+ } else
tun_disable_queue(tun, tfile);
synchronize_net();
@@ -473,6 +476,10 @@ static void tun_detach_all(struct net_device *dev)
rcu_assign_pointer(tfile->tun, NULL);
--tun->numqueues;
}
+ list_for_each_entry(tfile, &tun->disabled, next) {
+ wake_up_all(&tfile->wq.wait);
+ rcu_assign_pointer(tfile->tun, NULL);
+ }
BUG_ON(tun->numqueues != 0);
synchronize_net();
@@ -503,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
goto out;
err = -EINVAL;
- if (rtnl_dereference(tfile->tun))
+ if (rtnl_dereference(tfile->tun) && !tfile->detached)
goto out;
err = -EBUSY;
@@ -1202,7 +1209,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
tun->dev->stats.rx_packets++;
tun->dev->stats.rx_bytes += len;
- tun_flow_update(tun, rxhash, tfile->queue_index);
+ tun_flow_update(tun, rxhash, tfile);
return total_len;
}
@@ -1816,7 +1823,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
ret = tun_attach(tun, file);
} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
tun = rtnl_dereference(tfile->tun);
- if (!tun || !(tun->flags & TUN_TAP_MQ))
+ if (!tun || !(tun->flags & TUN_TAP_MQ) || tfile->detached)
ret = -EINVAL;
else
__tun_detach(tfile, false);
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V7 2/3] vhost_net: handle polling errors when setting backend
2013-01-28 11:05 ` [PATCH V7 2/3] vhost_net: handle polling errors when setting backend Jason Wang
@ 2013-01-28 17:30 ` Michael S. Tsirkin
0 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2013-01-28 17:30 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Mon, Jan 28, 2013 at 07:05:18PM +0800, Jason Wang wrote:
> Currently, the polling errors were ignored, which can lead following issues:
>
> - vhost remove itself unconditionally from waitqueue when stopping the poll,
> this may crash the kernel since the previous attempt of starting may fail to
> add itself to the waitqueue
> - userspace may think the backend were successfully set even when the polling
> failed.
>
> Solve this by:
>
> - check poll->wqh before trying to remove from waitqueue
> - report polling errors in vhost_poll_start(), tx_poll_start(), the return value
> will be checked and returned when userspace want to set the backend
>
> After this fix, there still could be a polling failure after backend is set, it
> will addressed by the next patch.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/vhost/net.c | 27 ++++++++++++++++++---------
> drivers/vhost/vhost.c | 18 +++++++++++++++---
> drivers/vhost/vhost.h | 2 +-
> 3 files changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index d10ad6f..959b1cd 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -165,12 +165,16 @@ static void tx_poll_stop(struct vhost_net *net)
> }
>
> /* Caller must have TX VQ lock */
> -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
> +static int tx_poll_start(struct vhost_net *net, struct socket *sock)
> {
> + int ret;
> +
> if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
> - return;
> - vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> - net->tx_poll_state = VHOST_NET_POLL_STARTED;
> + return 0;
> + ret = vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
> + if (!ret)
> + net->tx_poll_state = VHOST_NET_POLL_STARTED;
> + return ret;
> }
>
> /* In case of DMA done not in order in lower device driver for some reason.
> @@ -642,20 +646,23 @@ static void vhost_net_disable_vq(struct vhost_net *n,
> vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
> }
>
> -static void vhost_net_enable_vq(struct vhost_net *n,
> +static int vhost_net_enable_vq(struct vhost_net *n,
> struct vhost_virtqueue *vq)
> {
> struct socket *sock;
> + int ret;
>
> sock = rcu_dereference_protected(vq->private_data,
> lockdep_is_held(&vq->mutex));
> if (!sock)
> - return;
> + return 0;
> if (vq == n->vqs + VHOST_NET_VQ_TX) {
> n->tx_poll_state = VHOST_NET_POLL_STOPPED;
> - tx_poll_start(n, sock);
> + ret = tx_poll_start(n, sock);
> } else
> - vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> + ret = vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
> +
> + return ret;
> }
>
> static struct socket *vhost_net_stop_vq(struct vhost_net *n,
> @@ -833,7 +840,9 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> r = vhost_init_used(vq);
> if (r)
> goto err_used;
> - vhost_net_enable_vq(n, vq);
> + r = vhost_net_enable_vq(n, vq);
> + if (r)
> + goto err_used;
>
> oldubufs = vq->ubufs;
> vq->ubufs = ubufs;
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34389f7..9759249 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -77,26 +77,38 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> init_poll_funcptr(&poll->table, vhost_poll_func);
> poll->mask = mask;
> poll->dev = dev;
> + poll->wqh = NULL;
>
> vhost_work_init(&poll->work, fn);
> }
>
> /* Start polling a file. We add ourselves to file's wait queue. The caller must
> * keep a reference to a file until after vhost_poll_stop is called. */
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file)
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file)
> {
> unsigned long mask;
> + int ret = 0;
>
> mask = file->f_op->poll(file, &poll->table);
> if (mask)
> vhost_poll_wakeup(&poll->wait, 0, 0, (void *)mask);
> + if (mask & POLLERR) {
> + if (poll->wqh)
> + remove_wait_queue(poll->wqh, &poll->wait);
> + ret = -EINVAL;
> + }
> +
> + return ret;
> }
>
> /* Stop polling a file. After this function returns, it becomes safe to drop the
> * file reference. You must also flush afterwards. */
> void vhost_poll_stop(struct vhost_poll *poll)
> {
> - remove_wait_queue(poll->wqh, &poll->wait);
> + if (poll->wqh) {
> + remove_wait_queue(poll->wqh, &poll->wait);
> + poll->wqh = NULL;
> + }
> }
>
> static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
> @@ -792,7 +804,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
> fput(filep);
>
> if (pollstart && vq->handle_kick)
> - vhost_poll_start(&vq->poll, vq->kick);
> + r = vhost_poll_start(&vq->poll, vq->kick);
>
> mutex_unlock(&vq->mutex);
>
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 2639c58..17261e2 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -42,7 +42,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
>
> void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
> unsigned long mask, struct vhost_dev *dev);
> -void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> +int vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> void vhost_poll_queue(struct vhost_poll *poll);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V7 3/3] tuntap: allow polling/writing/reading when detached
2013-01-28 11:05 ` [PATCH V7 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
@ 2013-01-28 17:31 ` Michael S. Tsirkin
0 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2013-01-28 17:31 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, netdev, linux-kernel
On Mon, Jan 28, 2013 at 07:05:19PM +0800, Jason Wang wrote:
> We forbid polling, writing and reading when the file were detached, this may
> complex the user in several cases:
>
> - when guest pass some buffers to vhost/qemu and then disable some queues,
> host/qemu needs to do its own cleanup on those buffers which is complex
> sometimes. We can do this simply by allowing a user can still write to an
> disabled queue. Write to an disabled queue will cause the packet pass to the
> kernel and read will get nothing.
> - align the polling behavior with macvtap which never fails when the queue is
> created. This can simplify the polling errors handling of its user (e.g vhost)
>
> We can simply achieve this by don't assign NULL to tfile->tun when detached.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/tun.c | 25 ++++++++++++++++---------
> 1 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 30a7d0e..2fb19da 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -298,11 +298,12 @@ static void tun_flow_cleanup(unsigned long data)
> }
>
> static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
> - u16 queue_index)
> + struct tun_file *tfile)
> {
> struct hlist_head *head;
> struct tun_flow_entry *e;
> unsigned long delay = tun->ageing_time;
> + u16 queue_index = tfile->queue_index;
>
> if (!rxhash)
> return;
> @@ -311,7 +312,9 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>
> rcu_read_lock();
>
> - if (tun->numqueues == 1)
> + /* We may get a very small possibility of OOO during switching, not
> + * worth to optimize.*/
> + if (tun->numqueues == 1 || tfile->detached)
> goto unlock;
>
> e = tun_flow_find(head, rxhash);
> @@ -411,21 +414,21 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>
> tun = rtnl_dereference(tfile->tun);
>
> - if (tun) {
> + if (tun && !tfile->detached) {
> u16 index = tfile->queue_index;
> BUG_ON(index >= tun->numqueues);
> dev = tun->dev;
>
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> - rcu_assign_pointer(tfile->tun, NULL);
> ntfile = rtnl_dereference(tun->tfiles[index]);
> ntfile->queue_index = index;
>
> --tun->numqueues;
> - if (clean)
> + if (clean) {
> + rcu_assign_pointer(tfile->tun, NULL);
> sock_put(&tfile->sk);
> - else
> + } else
> tun_disable_queue(tun, tfile);
>
> synchronize_net();
> @@ -473,6 +476,10 @@ static void tun_detach_all(struct net_device *dev)
> rcu_assign_pointer(tfile->tun, NULL);
> --tun->numqueues;
> }
> + list_for_each_entry(tfile, &tun->disabled, next) {
> + wake_up_all(&tfile->wq.wait);
> + rcu_assign_pointer(tfile->tun, NULL);
> + }
> BUG_ON(tun->numqueues != 0);
>
> synchronize_net();
> @@ -503,7 +510,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
> goto out;
>
> err = -EINVAL;
> - if (rtnl_dereference(tfile->tun))
> + if (rtnl_dereference(tfile->tun) && !tfile->detached)
> goto out;
>
> err = -EBUSY;
> @@ -1202,7 +1209,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> tun->dev->stats.rx_packets++;
> tun->dev->stats.rx_bytes += len;
>
> - tun_flow_update(tun, rxhash, tfile->queue_index);
> + tun_flow_update(tun, rxhash, tfile);
> return total_len;
> }
>
> @@ -1816,7 +1823,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
> ret = tun_attach(tun, file);
> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
> tun = rtnl_dereference(tfile->tun);
> - if (!tun || !(tun->flags & TUN_TAP_MQ))
> + if (!tun || !(tun->flags & TUN_TAP_MQ) || tfile->detached)
> ret = -EINVAL;
> else
> __tun_detach(tfile, false);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V7 0/3] handle polling errors in vhost/vhost_net
2013-01-28 11:05 [PATCH V7 0/3] handle polling errors in vhost/vhost_net Jason Wang
` (2 preceding siblings ...)
2013-01-28 11:05 ` [PATCH V7 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
@ 2013-01-29 7:03 ` Wanlong Gao
2013-01-29 20:45 ` David Miller
4 siblings, 0 replies; 8+ messages in thread
From: Wanlong Gao @ 2013-01-29 7:03 UTC (permalink / raw)
To: Jason Wang; +Cc: davem, mst, netdev, linux-kernel
On 01/28/2013 07:05 PM, Jason Wang wrote:
> This is an update version of last version to fix the handling of polling errors
> in vhost/vhost_net.
>
> Currently, vhost and vhost_net ignore polling errors which can lead kernel
> crashing when it tries to remove itself from waitqueue after the polling
> failure. Fix this by:
>
> - examing the POLLERR when setting backend and report erros to userspace
> - let tun always add to waitqueue in .poll() after the queue is created even if
> it was detached.
Fixed my kernel oops here, thank you.
Tested-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>
> Changes from V6:
> - don't use RCU to protect the tfile->detached.
>
> Changes from V5:
> - use rcu_dereference() instead of the wrong rtnl_dereference() in data path
> - test with CONFIG_PROVE_RCU
>
> Changes from V4:
> - check the detached state by tfile->detached and protect it by RCU
>
> Changes from V3:
> - make a smaller patch that doesn't touch the whole polling state and only check
> the polliner errors in backend setting.
> - add a patch that allows tuntap to do polling/reading/writing when detached
> which could simplify the work of its user.
>
> Changes from v2:
> - check poll->wqh instead of the wrong assumption about POLLERR and waitqueue
> - drop the whole tx polling state check since it was replaced by the wqh
> checking
> - drop the buggy tuntap patch
>
> Changes from v1:
> - restore the state before the ioctl when vhost_init_used() fails
> - log the error when meet polling errors in the data path
> - don't put into waitqueue when tun_chr_poll() return POLLERR
>
> Jason Wang (3):
> vhost_net: correct error handling in vhost_net_set_backend()
> vhost_net: handle polling errors when setting backend
> tuntap: allow polling/writing/reading when detached
>
> drivers/net/tun.c | 25 ++++++++++++++++---------
> drivers/vhost/net.c | 41 ++++++++++++++++++++++++++++-------------
> drivers/vhost/vhost.c | 18 +++++++++++++++---
> drivers/vhost/vhost.h | 2 +-
> 4 files changed, 60 insertions(+), 26 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V7 0/3] handle polling errors in vhost/vhost_net
2013-01-28 11:05 [PATCH V7 0/3] handle polling errors in vhost/vhost_net Jason Wang
` (3 preceding siblings ...)
2013-01-29 7:03 ` [PATCH V7 0/3] handle polling errors in vhost/vhost_net Wanlong Gao
@ 2013-01-29 20:45 ` David Miller
4 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-01-29 20:45 UTC (permalink / raw)
To: jasowang; +Cc: mst, netdev, linux-kernel
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 28 Jan 2013 19:05:16 +0800
> This is an update version of last version to fix the handling of polling errors
> in vhost/vhost_net.
>
> Currently, vhost and vhost_net ignore polling errors which can lead kernel
> crashing when it tries to remove itself from waitqueue after the polling
> failure. Fix this by:
>
> - examing the POLLERR when setting backend and report erros to userspace
> - let tun always add to waitqueue in .poll() after the queue is created even if
> it was detached.
Series applied, thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-01-29 20:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-28 11:05 [PATCH V7 0/3] handle polling errors in vhost/vhost_net Jason Wang
2013-01-28 11:05 ` [PATCH V7 1/3] vhost_net: correct error handling in vhost_net_set_backend() Jason Wang
2013-01-28 11:05 ` [PATCH V7 2/3] vhost_net: handle polling errors when setting backend Jason Wang
2013-01-28 17:30 ` Michael S. Tsirkin
2013-01-28 11:05 ` [PATCH V7 3/3] tuntap: allow polling/writing/reading when detached Jason Wang
2013-01-28 17:31 ` Michael S. Tsirkin
2013-01-29 7:03 ` [PATCH V7 0/3] handle polling errors in vhost/vhost_net Wanlong Gao
2013-01-29 20:45 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).