* Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu()
[not found] ` <20211115153003.9140-6-arbn@yandex-team.com>
@ 2021-11-16 5:00 ` Jason Wang
[not found] ` <b163233f-090f-baaf-4460-37978cab4d55@yandex-team.com>
0 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2021-11-16 5:00 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
netdev, John Fastabend, Alexei Starovoitov, virtualization,
Stefan Hajnoczi, kvm, Jakub Kicinski, bpf, David S. Miller,
linux-kernel
On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin <arbn@yandex-team.com> wrote:
>
> Currently vhost_net_release() uses synchronize_rcu() to synchronize
> freeing with vhost_zerocopy_callback(). However synchronize_rcu()
> is quite costly operation. It take more than 10 seconds
> to shutdown qemu launched with couple net devices like this:
> -netdev tap,id=tap0,..,vhost=on,queues=80
> because we end up calling synchronize_rcu() netdev_count*queues times.
>
> Free vhost net structures in rcu callback instead of using
> synchronize_rcu() to fix the problem.
I admit the release code is somehow hard to understand. But I wonder
if the following case can still happen with this:
CPU 0 (vhost_dev_cleanup) CPU1
(vhost_net_zerocopy_callback()->vhost_work_queue())
if (!dev->worker)
dev->worker = NULL
wake_up_process(dev->worker)
If this is true. It seems the fix is to move RCU synchronization stuff
in vhost_net_ubuf_put_and_wait()?
Thanks
>
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
> drivers/vhost/net.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 97a209d6a527..0699d30e83d5 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -132,6 +132,7 @@ struct vhost_net {
> struct vhost_dev dev;
> struct vhost_net_virtqueue vqs[VHOST_NET_VQ_MAX];
> struct vhost_poll poll[VHOST_NET_VQ_MAX];
> + struct rcu_head rcu;
> /* Number of TX recently submitted.
> * Protected by tx vq lock. */
> unsigned tx_packets;
> @@ -1389,6 +1390,18 @@ static void vhost_net_flush(struct vhost_net *n)
> }
> }
>
> +static void vhost_net_free(struct rcu_head *rcu_head)
> +{
> + struct vhost_net *n = container_of(rcu_head, struct vhost_net, rcu);
> +
> + kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
> + kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
> + kfree(n->dev.vqs);
> + if (n->page_frag.page)
> + __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
> + kvfree(n);
> +}
> +
> static int vhost_net_release(struct inode *inode, struct file *f)
> {
> struct vhost_net *n = f->private_data;
> @@ -1404,15 +1417,8 @@ static int vhost_net_release(struct inode *inode, struct file *f)
> sockfd_put(tx_sock);
> if (rx_sock)
> sockfd_put(rx_sock);
> - /* Make sure no callbacks are outstanding */
> - synchronize_rcu();
>
> - kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
> - kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
> - kfree(n->dev.vqs);
> - if (n->page_frag.page)
> - __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
> - kvfree(n);
> + call_rcu(&n->rcu, vhost_net_free);
> return 0;
> }
>
> --
> 2.32.0
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper
[not found] <20211115153003.9140-1-arbn@yandex-team.com>
[not found] ` <20211115153003.9140-6-arbn@yandex-team.com>
@ 2021-11-16 14:33 ` Stefano Garzarella
2021-12-03 17:45 ` Mike Christie
[not found] ` <20211115153003.9140-2-arbn@yandex-team.com>
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Stefano Garzarella @ 2021-11-16 14:33 UTC (permalink / raw)
To: Andrey Ryabinin, Mike Christie
Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
Stefan Hajnoczi
On Mon, Nov 15, 2021 at 06:29:58PM +0300, Andrey Ryabinin wrote:
>vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
>It gives wrong impression that we are doing some work over vhost_poll,
>while in fact it flushes vhost_poll->dev.
>It only complicate understanding of the code and leads to mistakes
>like flushing the same vhost_dev several times in a row.
>
>Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.
>
>Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>---
> drivers/vhost/net.c | 4 ++--
> drivers/vhost/test.c | 2 +-
> drivers/vhost/vhost.c | 12 ++----------
> drivers/vhost/vsock.c | 2 +-
> 4 files changed, 6 insertions(+), 14 deletions(-)
Adding Mike since these changes could be relevant for "[PATCH V4 00/12]
vhost: multiple worker support" [1] series.
>
>diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>index 28ef323882fb..11221f6d11b8 100644
>--- a/drivers/vhost/net.c
>+++ b/drivers/vhost/net.c
>@@ -1375,8 +1375,8 @@ static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock,
>
> static void vhost_net_flush_vq(struct vhost_net *n, int index)
> {
>- vhost_poll_flush(n->poll + index);
>- vhost_poll_flush(&n->vqs[index].vq.poll);
>+ vhost_work_dev_flush(n->poll[index].dev);
>+ vhost_work_dev_flush(n->vqs[index].vq.poll.dev);
> }
>
> static void vhost_net_flush(struct vhost_net *n)
>diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
>index a09dedc79f68..1a8ab1d8cb1c 100644
>--- a/drivers/vhost/test.c
>+++ b/drivers/vhost/test.c
>@@ -146,7 +146,7 @@ static void vhost_test_stop(struct vhost_test *n, void **privatep)
>
> static void vhost_test_flush_vq(struct vhost_test *n, int index)
> {
>- vhost_poll_flush(&n->vqs[index].poll);
>+ vhost_work_dev_flush(n->vqs[index].poll.dev);
> }
>
> static void vhost_test_flush(struct vhost_test *n)
>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>index 59edb5a1ffe2..ca088481da0e 100644
>--- a/drivers/vhost/vhost.c
>+++ b/drivers/vhost/vhost.c
>@@ -245,14 +245,6 @@ void vhost_work_dev_flush(struct vhost_dev *dev)
> }
> EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
>
>-/* Flush any work that has been scheduled. When calling this, don't hold any
>- * locks that are also used by the callback. */
>-void vhost_poll_flush(struct vhost_poll *poll)
>-{
>- vhost_work_dev_flush(poll->dev);
>-}
>-EXPORT_SYMBOL_GPL(vhost_poll_flush);
>-
We should remove also the declaration in vhost.h:
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -45,7 +44,6 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
__poll_t mask, struct vhost_dev *dev);
int vhost_poll_start(struct vhost_poll *poll, struct file *file);
void vhost_poll_stop(struct vhost_poll *poll);
-void vhost_poll_flush(struct vhost_poll *poll);
void vhost_poll_queue(struct vhost_poll *poll);
void vhost_work_dev_flush(struct vhost_dev *dev);
> void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
> {
> if (!dev->worker)
>@@ -663,7 +655,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
> for (i = 0; i < dev->nvqs; ++i) {
> if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
> vhost_poll_stop(&dev->vqs[i]->poll);
>- vhost_poll_flush(&dev->vqs[i]->poll);
>+ vhost_work_dev_flush(dev->vqs[i]->poll.dev);
> }
> }
> }
>@@ -1712,7 +1704,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> mutex_unlock(&vq->mutex);
>
> if (pollstop && vq->handle_kick)
>- vhost_poll_flush(&vq->poll);
>+ vhost_work_dev_flush(vq->poll.dev);
> return r;
> }
> EXPORT_SYMBOL_GPL(vhost_vring_ioctl);
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 938aefbc75ec..b0361ebbd695 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -711,7 +711,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
>
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
> if (vsock->vqs[i].handle_kick)
>- vhost_poll_flush(&vsock->vqs[i].poll);
>+ vhost_work_dev_flush(vsock->vqs[i].poll.dev);
> vhost_work_dev_flush(&vsock->dev);
> }
>
>--
>2.32.0
>
The rest LGTM.
Thanks,
Stefano
[1]
https://lore.kernel.org/virtualization/20211104190502.7053-1-michael.christie@oracle.com/
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/6] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls
[not found] ` <20211115153003.9140-2-arbn@yandex-team.com>
@ 2021-11-16 14:33 ` Stefano Garzarella
0 siblings, 0 replies; 8+ messages in thread
From: Stefano Garzarella @ 2021-11-16 14:33 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
Stefan Hajnoczi
On Mon, Nov 15, 2021 at 06:29:59PM +0300, Andrey Ryabinin wrote:
>vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing
>vhost_dev pointer obtained via 'n->poll[index].dev' and
>'n->vqs[index].vq.poll.dev'. This is actually the same pointer,
>initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init()
>
>Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly.
>Do the flushes only once instead of several flush calls in a row
>which seems rather useless.
>
>Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>---
> drivers/vhost/net.c | 11 ++---------
> drivers/vhost/vhost.h | 1 +
> 2 files changed, 3 insertions(+), 9 deletions(-)
>
>diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>index 11221f6d11b8..b1feb5e0571e 100644
>--- a/drivers/vhost/net.c
>+++ b/drivers/vhost/net.c
>@@ -1373,16 +1373,9 @@ static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock,
> *rx_sock = vhost_net_stop_vq(n, &n->vqs[VHOST_NET_VQ_RX].vq);
> }
>
>-static void vhost_net_flush_vq(struct vhost_net *n, int index)
>-{
>- vhost_work_dev_flush(n->poll[index].dev);
>- vhost_work_dev_flush(n->vqs[index].vq.poll.dev);
>-}
>-
> static void vhost_net_flush(struct vhost_net *n)
> {
>- vhost_net_flush_vq(n, VHOST_NET_VQ_TX);
>- vhost_net_flush_vq(n, VHOST_NET_VQ_RX);
>+ vhost_work_dev_flush(&n->dev);
> if (n->vqs[VHOST_NET_VQ_TX].ubufs) {
> mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
> n->tx_flush = true;
>@@ -1572,7 +1565,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> }
>
> if (oldsock) {
>- vhost_net_flush_vq(n, index);
>+ vhost_work_dev_flush(&n->dev);
> sockfd_put(oldsock);
> }
>
>diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>index 638bb640d6b4..ecbaa5c6005f 100644
>--- a/drivers/vhost/vhost.h
>+++ b/drivers/vhost/vhost.h
>@@ -15,6 +15,7 @@
> #include <linux/vhost_iotlb.h>
> #include <linux/irqbypass.h>
>
>+struct vhost_dev;
Is this change needed?
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/6] vhost_vsock: simplify vhost_vsock_flush()
[not found] ` <20211115153003.9140-4-arbn@yandex-team.com>
@ 2021-11-16 14:35 ` Stefano Garzarella
0 siblings, 0 replies; 8+ messages in thread
From: Stefano Garzarella @ 2021-11-16 14:35 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
Stefan Hajnoczi
On Mon, Nov 15, 2021 at 06:30:01PM +0300, Andrey Ryabinin wrote:
>vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev)
>before vhost_work_dev_flush(&vsock->dev). This seems pointless
>as vsock->vqs[i].poll.dev is the same as &vsock->dev and several flushes
>in a row doesn't do anything useful, one is just enough.
>
>Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>---
> drivers/vhost/vsock.c | 5 -----
> 1 file changed, 5 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index b0361ebbd695..b4dcefbb7e60 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -707,11 +707,6 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>
> static void vhost_vsock_flush(struct vhost_vsock *vsock)
> {
>- int i;
>-
>- for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
>- if (vsock->vqs[i].handle_kick)
>- vhost_work_dev_flush(vsock->vqs[i].poll.dev);
> vhost_work_dev_flush(&vsock->dev);
> }
>
>--
>2.32.0
>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper
[not found] <20211115153003.9140-1-arbn@yandex-team.com>
` (3 preceding siblings ...)
[not found] ` <20211115153003.9140-4-arbn@yandex-team.com>
@ 2021-11-16 14:41 ` Stefano Garzarella
4 siblings, 0 replies; 8+ messages in thread
From: Stefano Garzarella @ 2021-11-16 14:41 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
Stefan Hajnoczi
On Mon, Nov 15, 2021 at 06:29:58PM +0300, Andrey Ryabinin wrote:
>vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
>It gives wrong impression that we are doing some work over vhost_poll,
>while in fact it flushes vhost_poll->dev.
>It only complicate understanding of the code and leads to mistakes
>like flushing the same vhost_dev several times in a row.
>
>Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.
>
>Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>---
> drivers/vhost/net.c | 4 ++--
> drivers/vhost/test.c | 2 +-
> drivers/vhost/vhost.c | 12 ++----------
> drivers/vhost/vsock.c | 2 +-
> 4 files changed, 6 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>index 28ef323882fb..11221f6d11b8 100644
>--- a/drivers/vhost/net.c
>+++ b/drivers/vhost/net.c
>@@ -1375,8 +1375,8 @@ static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock,
>
> static void vhost_net_flush_vq(struct vhost_net *n, int index)
> {
>- vhost_poll_flush(n->poll + index);
>- vhost_poll_flush(&n->vqs[index].vq.poll);
>+ vhost_work_dev_flush(n->poll[index].dev);
>+ vhost_work_dev_flush(n->vqs[index].vq.poll.dev);
> }
>
> static void vhost_net_flush(struct vhost_net *n)
>diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
>index a09dedc79f68..1a8ab1d8cb1c 100644
>--- a/drivers/vhost/test.c
>+++ b/drivers/vhost/test.c
>@@ -146,7 +146,7 @@ static void vhost_test_stop(struct vhost_test *n, void **privatep)
>
> static void vhost_test_flush_vq(struct vhost_test *n, int index)
> {
>- vhost_poll_flush(&n->vqs[index].poll);
>+ vhost_work_dev_flush(n->vqs[index].poll.dev);
> }
>
> static void vhost_test_flush(struct vhost_test *n)
>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>index 59edb5a1ffe2..ca088481da0e 100644
>--- a/drivers/vhost/vhost.c
>+++ b/drivers/vhost/vhost.c
>@@ -245,14 +245,6 @@ void vhost_work_dev_flush(struct vhost_dev *dev)
> }
> EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
>
>-/* Flush any work that has been scheduled. When calling this, don't hold any
>- * locks that are also used by the callback. */
>-void vhost_poll_flush(struct vhost_poll *poll)
>-{
>- vhost_work_dev_flush(poll->dev);
>-}
>-EXPORT_SYMBOL_GPL(vhost_poll_flush);
>-
> void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
> {
> if (!dev->worker)
>@@ -663,7 +655,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
> for (i = 0; i < dev->nvqs; ++i) {
> if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
> vhost_poll_stop(&dev->vqs[i]->poll);
>- vhost_poll_flush(&dev->vqs[i]->poll);
>+ vhost_work_dev_flush(dev->vqs[i]->poll.dev);
Not related to this patch, but while looking at vhost-vsock I'm
wondering if we can do the same here in vhost_dev_stop(), I mean move
vhost_work_dev_flush() outside the loop and and call it once. (In
another patch eventually)
Stefano
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu()
[not found] ` <b163233f-090f-baaf-4460-37978cab4d55@yandex-team.com>
@ 2021-11-22 2:48 ` Jason Wang
2021-11-22 9:37 ` Michael S. Tsirkin
1 sibling, 0 replies; 8+ messages in thread
From: Jason Wang @ 2021-11-22 2:48 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
netdev, John Fastabend, Alexei Starovoitov, virtualization,
Stefan Hajnoczi, kvm, Jakub Kicinski, bpf, David S. Miller,
linux-kernel
On Fri, Nov 19, 2021 at 7:31 PM Andrey Ryabinin <arbn@yandex-team.com> wrote:
>
>
>
> On 11/16/21 8:00 AM, Jason Wang wrote:
> > On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin <arbn@yandex-team.com> wrote:
> >>
> >> Currently vhost_net_release() uses synchronize_rcu() to synchronize
> >> freeing with vhost_zerocopy_callback(). However synchronize_rcu()
> >> is quite costly operation. It take more than 10 seconds
> >> to shutdown qemu launched with couple net devices like this:
> >> -netdev tap,id=tap0,..,vhost=on,queues=80
> >> because we end up calling synchronize_rcu() netdev_count*queues times.
> >>
> >> Free vhost net structures in rcu callback instead of using
> >> synchronize_rcu() to fix the problem.
> >
> > I admit the release code is somehow hard to understand. But I wonder
> > if the following case can still happen with this:
> >
> > CPU 0 (vhost_dev_cleanup) CPU1
> > (vhost_net_zerocopy_callback()->vhost_work_queue())
> > if (!dev->worker)
> > dev->worker = NULL
> >
> > wake_up_process(dev->worker)
> >
> > If this is true. It seems the fix is to move RCU synchronization stuff
> > in vhost_net_ubuf_put_and_wait()?
> >
>
> It all depends whether vhost_zerocopy_callback() can be called outside of vhost
> thread context or not.
I think the answer is yes, the callback will be mainly used in the
zerocopy path when the underlayer NIC finishes the DMA of a packet.
> If it can run after vhost thread stopped, than the race you
> describe seems possible and the fix in commit b0c057ca7e83 ("vhost: fix a theoretical race in device cleanup")
> wasn't complete. I would fix it by calling synchronize_rcu() after vhost_net_flush()
> and before vhost_dev_cleanup().
>
> As for the performance problem, it can be solved by replacing synchronize_rcu() with synchronize_rcu_expedited().
Yes, that's another way, but see below.
>
> But now I'm not sure that this race is actually exists and that synchronize_rcu() needed at all.
> I did a bit of testing and I only see callback being called from vhost thread:
>
> vhost-3724 3733 [002] 2701.768731: probe:vhost_zerocopy_callback: (ffffffff81af8c10)
> ffffffff81af8c11 vhost_zerocopy_callback+0x1 ([kernel.kallsyms])
> ffffffff81bb34f6 skb_copy_ubufs+0x256 ([kernel.kallsyms])
> ffffffff81bce621 __netif_receive_skb_core.constprop.0+0xac1 ([kernel.kallsyms])
> ffffffff81bd062d __netif_receive_skb_one_core+0x3d ([kernel.kallsyms])
> ffffffff81bd0748 netif_receive_skb+0x38 ([kernel.kallsyms])
> ffffffff819a2a1e tun_get_user+0xdce ([kernel.kallsyms])
> ffffffff819a2cf4 tun_sendmsg+0xa4 ([kernel.kallsyms])
> ffffffff81af9229 handle_tx_zerocopy+0x149 ([kernel.kallsyms])
> ffffffff81afaf05 handle_tx+0xc5 ([kernel.kallsyms])
> ffffffff81afce86 vhost_worker+0x76 ([kernel.kallsyms])
> ffffffff811581e9 kthread+0x169 ([kernel.kallsyms])
> ffffffff810018cf ret_from_fork+0x1f ([kernel.kallsyms])
> 0 [unknown] ([unknown])
>
From the call trace you can send packets between two TAP. Since the TX
of TAP is synchronous so we can't see callback to be called out of
vhost thread.
In order to test it, we need 1) enable zerocopy
(experimental_zcopytx=1) and 2) sending the packet to the real NIC
with bridge or macvlan
Zerocopy was disalbed due to a lot of isuses (098eadce3c62 "vhost_net:
disable zerocopy by default"). So if we fix by moving it to
vhost_net_ubuf_put_and_wait(), there won't be a synchronize_rcu() in
the non-zerocopy path which seems to be sufficient. And we can use
synchronize_rcu_expedited() on top if it is really needed.
Thanks
> This means that the callback can't run after kthread_stop() in vhost_dev_cleanup() and no synchronize_rcu() needed.
>
> I'm not confident that my quite limited testing cover all possible vhost_zerocopy_callback() callstacks.
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu()
[not found] ` <b163233f-090f-baaf-4460-37978cab4d55@yandex-team.com>
2021-11-22 2:48 ` Jason Wang
@ 2021-11-22 9:37 ` Michael S. Tsirkin
1 sibling, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2021-11-22 9:37 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
Alexei Starovoitov, virtualization, Stefan Hajnoczi, kvm,
Jakub Kicinski, bpf, David S. Miller, linux-kernel
On Fri, Nov 19, 2021 at 02:32:05PM +0300, Andrey Ryabinin wrote:
>
>
> On 11/16/21 8:00 AM, Jason Wang wrote:
> > On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin <arbn@yandex-team.com> wrote:
> >>
> >> Currently vhost_net_release() uses synchronize_rcu() to synchronize
> >> freeing with vhost_zerocopy_callback(). However synchronize_rcu()
> >> is quite costly operation. It take more than 10 seconds
> >> to shutdown qemu launched with couple net devices like this:
> >> -netdev tap,id=tap0,..,vhost=on,queues=80
> >> because we end up calling synchronize_rcu() netdev_count*queues times.
> >>
> >> Free vhost net structures in rcu callback instead of using
> >> synchronize_rcu() to fix the problem.
> >
> > I admit the release code is somehow hard to understand. But I wonder
> > if the following case can still happen with this:
> >
> > CPU 0 (vhost_dev_cleanup) CPU1
> > (vhost_net_zerocopy_callback()->vhost_work_queue())
> > if (!dev->worker)
> > dev->worker = NULL
> >
> > wake_up_process(dev->worker)
> >
> > If this is true. It seems the fix is to move RCU synchronization stuff
> > in vhost_net_ubuf_put_and_wait()?
> >
>
> It all depends whether vhost_zerocopy_callback() can be called outside of vhost
> thread context or not. If it can run after vhost thread stopped, than the race you
> describe seems possible and the fix in commit b0c057ca7e83 ("vhost: fix a theoretical race in device cleanup")
> wasn't complete. I would fix it by calling synchronize_rcu() after vhost_net_flush()
> and before vhost_dev_cleanup().
>
> As for the performance problem, it can be solved by replacing synchronize_rcu() with synchronize_rcu_expedited().
expedited causes a stop of IPIs though, so it's problematic to
do it upon a userspace syscall.
> But now I'm not sure that this race is actually exists and that synchronize_rcu() needed at all.
> I did a bit of testing and I only see callback being called from vhost thread:
>
> vhost-3724 3733 [002] 2701.768731: probe:vhost_zerocopy_callback: (ffffffff81af8c10)
> ffffffff81af8c11 vhost_zerocopy_callback+0x1 ([kernel.kallsyms])
> ffffffff81bb34f6 skb_copy_ubufs+0x256 ([kernel.kallsyms])
> ffffffff81bce621 __netif_receive_skb_core.constprop.0+0xac1 ([kernel.kallsyms])
> ffffffff81bd062d __netif_receive_skb_one_core+0x3d ([kernel.kallsyms])
> ffffffff81bd0748 netif_receive_skb+0x38 ([kernel.kallsyms])
> ffffffff819a2a1e tun_get_user+0xdce ([kernel.kallsyms])
> ffffffff819a2cf4 tun_sendmsg+0xa4 ([kernel.kallsyms])
> ffffffff81af9229 handle_tx_zerocopy+0x149 ([kernel.kallsyms])
> ffffffff81afaf05 handle_tx+0xc5 ([kernel.kallsyms])
> ffffffff81afce86 vhost_worker+0x76 ([kernel.kallsyms])
> ffffffff811581e9 kthread+0x169 ([kernel.kallsyms])
> ffffffff810018cf ret_from_fork+0x1f ([kernel.kallsyms])
> 0 [unknown] ([unknown])
>
> This means that the callback can't run after kthread_stop() in vhost_dev_cleanup() and no synchronize_rcu() needed.
>
> I'm not confident that my quite limited testing cover all possible vhost_zerocopy_callback() callstacks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper
2021-11-16 14:33 ` [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Stefano Garzarella
@ 2021-12-03 17:45 ` Mike Christie
0 siblings, 0 replies; 8+ messages in thread
From: Mike Christie @ 2021-12-03 17:45 UTC (permalink / raw)
To: Stefano Garzarella, Andrey Ryabinin
Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
Stefan Hajnoczi
On 11/16/21 8:33 AM, Stefano Garzarella wrote:
> On Mon, Nov 15, 2021 at 06:29:58PM +0300, Andrey Ryabinin wrote:
>> vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
>> It gives wrong impression that we are doing some work over vhost_poll,
>> while in fact it flushes vhost_poll->dev.
>> It only complicate understanding of the code and leads to mistakes
>> like flushing the same vhost_dev several times in a row.
>>
>> Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.
>>
>> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>> ---
>> drivers/vhost/net.c | 4 ++--
>> drivers/vhost/test.c | 2 +-
>> drivers/vhost/vhost.c | 12 ++----------
>> drivers/vhost/vsock.c | 2 +-
>> 4 files changed, 6 insertions(+), 14 deletions(-)
>
> Adding Mike since these changes could be relevant for "[PATCH V4 00/12] vhost: multiple worker support" [1] series.
>
I reworked my patches to work with this set and it might make them
a little nicer, because I have less functions to port.
Andrey, please cc me when you repost and I'll send my patches over
your set, or if it's going to take you a while I can help you. I
handled the review comments for the flush related patches and I can
just post them.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-12-03 17:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20211115153003.9140-1-arbn@yandex-team.com>
[not found] ` <20211115153003.9140-6-arbn@yandex-team.com>
2021-11-16 5:00 ` [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu() Jason Wang
[not found] ` <b163233f-090f-baaf-4460-37978cab4d55@yandex-team.com>
2021-11-22 2:48 ` Jason Wang
2021-11-22 9:37 ` Michael S. Tsirkin
2021-11-16 14:33 ` [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Stefano Garzarella
2021-12-03 17:45 ` Mike Christie
[not found] ` <20211115153003.9140-2-arbn@yandex-team.com>
2021-11-16 14:33 ` [PATCH 2/6] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls Stefano Garzarella
[not found] ` <20211115153003.9140-4-arbn@yandex-team.com>
2021-11-16 14:35 ` [PATCH 4/6] vhost_vsock: simplify vhost_vsock_flush() Stefano Garzarella
2021-11-16 14:41 ` [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Stefano Garzarella
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).