* [RFC 0/2] Lift restriction about VDUSE net devices with CVQ
@ 2025-10-07 13:06 Eugenio Pérez
2025-10-07 13:06 ` [RFC 1/2] virtio_net: timeout control virtqueue commands Eugenio Pérez
2025-10-07 13:06 ` [RFC 2/2] vduse: lift restriction about net devices with CVQ Eugenio Pérez
0 siblings, 2 replies; 45+ messages in thread
From: Eugenio Pérez @ 2025-10-07 13:06 UTC (permalink / raw)
To: mst
Cc: Yongji Xie, virtualization, linux-kernel, Eugenio Pérez,
Maxime Coquelin, Xuan Zhuo, Dragos Tatulea DE, jasowang
An userland device implemented through VDUSE could take rtnl forever if the
virtio-net driver is running on top of virtio_vdpa. Let's break the device
if it does not return the buffer in a longer-than-assumible timeout.
A less agressive path can be taken to recover the device, like only resetting
the control virtqueue. However, the state of the device after this action is
taken races, as the vq could be reset after the device writes the OK. Leaving
TODO anyway.
Eugenio Pérez (2):
virtio_net: timeout control virtqueue commands
vduse: lift restriction about net devices with CVQ
drivers/net/virtio_net.c | 10 ++++++++++
drivers/vdpa/vdpa_user/vduse_dev.c | 3 ---
2 files changed, 10 insertions(+), 3 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 45+ messages in thread
* [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-07 13:06 [RFC 0/2] Lift restriction about VDUSE net devices with CVQ Eugenio Pérez
@ 2025-10-07 13:06 ` Eugenio Pérez
2025-10-11 7:44 ` Jason Wang
2025-10-14 8:29 ` Michael S. Tsirkin
2025-10-07 13:06 ` [RFC 2/2] vduse: lift restriction about net devices with CVQ Eugenio Pérez
1 sibling, 2 replies; 45+ messages in thread
From: Eugenio Pérez @ 2025-10-07 13:06 UTC (permalink / raw)
To: mst
Cc: Yongji Xie, virtualization, linux-kernel, Eugenio Pérez,
Maxime Coquelin, Xuan Zhuo, Dragos Tatulea DE, jasowang
An userland device implemented through VDUSE could take rtnl forever if
the virtio-net driver is running on top of virtio_vdpa. Let's break the
device if it does not return the buffer in a longer-than-assumible
timeout.
A less agressive path can be taken to recover the device, like only
resetting the control virtqueue. However, the state of the device after
this action is taken races, as the vq could be reset after the device
writes the OK. Leaving TODO anyway.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/net/virtio_net.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 31bd32bdecaf..ed68ad69a019 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
{
struct scatterlist *sgs[5], hdr, stat;
u32 out_num = 0, tmp, in_num = 0;
+ unsigned long end_time;
bool ok;
int ret;
@@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
/* Spin for a response, the kick causes an ioport write, trapping
* into the hypervisor, so the request should be handled immediately.
+ *
+ * Long timeout so a malicious device is not able to lock rtnl forever.
*/
+ end_time = jiffies + 30 * HZ;
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
!virtqueue_is_broken(vi->cvq)) {
cond_resched();
cpu_relax();
+
+ if (time_after(end_time, jiffies)) {
+ /* TODO Reset vq if possible? */
+ virtio_break_device(vi->vdev);
+ break;
+ }
}
unlock:
--
2.51.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [RFC 2/2] vduse: lift restriction about net devices with CVQ
2025-10-07 13:06 [RFC 0/2] Lift restriction about VDUSE net devices with CVQ Eugenio Pérez
2025-10-07 13:06 ` [RFC 1/2] virtio_net: timeout control virtqueue commands Eugenio Pérez
@ 2025-10-07 13:06 ` Eugenio Pérez
2025-10-09 13:14 ` Maxime Coquelin
2025-10-14 8:31 ` Michael S. Tsirkin
1 sibling, 2 replies; 45+ messages in thread
From: Eugenio Pérez @ 2025-10-07 13:06 UTC (permalink / raw)
To: mst
Cc: Yongji Xie, virtualization, linux-kernel, Eugenio Pérez,
Maxime Coquelin, Xuan Zhuo, Dragos Tatulea DE, jasowang
Now that the virtio_net driver is able to recover from a stall
virtqueue, let's lift the restriction.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
drivers/vdpa/vdpa_user/vduse_dev.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index e7bced0b5542..95d2b898171d 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1726,9 +1726,6 @@ static bool features_is_valid(struct vduse_dev_config *config)
if ((config->device_id == VIRTIO_ID_BLOCK) &&
(config->features & BIT_ULL(VIRTIO_BLK_F_CONFIG_WCE)))
return false;
- else if ((config->device_id == VIRTIO_ID_NET) &&
- (config->features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
- return false;
if ((config->device_id == VIRTIO_ID_NET) &&
!(config->features & BIT_ULL(VIRTIO_F_VERSION_1)))
--
2.51.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [RFC 2/2] vduse: lift restriction about net devices with CVQ
2025-10-07 13:06 ` [RFC 2/2] vduse: lift restriction about net devices with CVQ Eugenio Pérez
@ 2025-10-09 13:14 ` Maxime Coquelin
2025-10-15 6:11 ` Eugenio Perez Martin
2025-10-14 8:31 ` Michael S. Tsirkin
1 sibling, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2025-10-09 13:14 UTC (permalink / raw)
To: Eugenio Pérez, mst
Cc: Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On 10/7/25 3:06 PM, Eugenio Pérez wrote:
> Now that the virtio_net driver is able to recover from a stall
> virtqueue, let's lift the restriction.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index e7bced0b5542..95d2b898171d 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -1726,9 +1726,6 @@ static bool features_is_valid(struct vduse_dev_config *config)
> if ((config->device_id == VIRTIO_ID_BLOCK) &&
> (config->features & BIT_ULL(VIRTIO_BLK_F_CONFIG_WCE)))
> return false;
> - else if ((config->device_id == VIRTIO_ID_NET) &&
> - (config->features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> - return false;
>
> if ((config->device_id == VIRTIO_ID_NET) &&
> !(config->features & BIT_ULL(VIRTIO_F_VERSION_1)))
I wonder whether the API version should be increased, otherwise I don't
see how the app creating the VDUSE device knows whether it can safely
advertises the CVQ support (except without doing trial and error).
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-07 13:06 ` [RFC 1/2] virtio_net: timeout control virtqueue commands Eugenio Pérez
@ 2025-10-11 7:44 ` Jason Wang
2025-10-14 7:30 ` Eugenio Perez Martin
2025-10-14 8:29 ` Michael S. Tsirkin
1 sibling, 1 reply; 45+ messages in thread
From: Jason Wang @ 2025-10-11 7:44 UTC (permalink / raw)
To: Eugenio Pérez
Cc: mst, Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE
On Tue, Oct 7, 2025 at 9:06 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> An userland device implemented through VDUSE could take rtnl forever if
> the virtio-net driver is running on top of virtio_vdpa. Let's break the
> device if it does not return the buffer in a longer-than-assumible
> timeout.
>
> A less agressive path can be taken to recover the device, like only
> resetting the control virtqueue. However, the state of the device after
> this action is taken races, as the vq could be reset after the device
> writes the OK. Leaving TODO anyway.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/net/virtio_net.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 31bd32bdecaf..ed68ad69a019 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> {
> struct scatterlist *sgs[5], hdr, stat;
> u32 out_num = 0, tmp, in_num = 0;
> + unsigned long end_time;
> bool ok;
> int ret;
>
> @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
>
> /* Spin for a response, the kick causes an ioport write, trapping
> * into the hypervisor, so the request should be handled immediately.
> + *
> + * Long timeout so a malicious device is not able to lock rtnl forever.
> */
> + end_time = jiffies + 30 * HZ;
The problem that 30 * HZ is probably long enough to trigger the
warnings like hungtask?
> while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> !virtqueue_is_broken(vi->cvq)) {
> cond_resched();
> cpu_relax();
> +
> + if (time_after(end_time, jiffies)) {
> + /* TODO Reset vq if possible? */
> + virtio_break_device(vi->vdev);
> + break;
> + }
> }
>
> unlock:
> --
> 2.51.0
>
Thansk
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-11 7:44 ` Jason Wang
@ 2025-10-14 7:30 ` Eugenio Perez Martin
0 siblings, 0 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-14 7:30 UTC (permalink / raw)
To: Jason Wang
Cc: mst, Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE
On Sat, Oct 11, 2025 at 9:45 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Oct 7, 2025 at 9:06 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > An userland device implemented through VDUSE could take rtnl forever if
> > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > device if it does not return the buffer in a longer-than-assumible
> > timeout.
> >
> > A less agressive path can be taken to recover the device, like only
> > resetting the control virtqueue. However, the state of the device after
> > this action is taken races, as the vq could be reset after the device
> > writes the OK. Leaving TODO anyway.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/net/virtio_net.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 31bd32bdecaf..ed68ad69a019 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > {
> > struct scatterlist *sgs[5], hdr, stat;
> > u32 out_num = 0, tmp, in_num = 0;
> > + unsigned long end_time;
> > bool ok;
> > int ret;
> >
> > @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> >
> > /* Spin for a response, the kick causes an ioport write, trapping
> > * into the hypervisor, so the request should be handled immediately.
> > + *
> > + * Long timeout so a malicious device is not able to lock rtnl forever.
> > */
> > + end_time = jiffies + 30 * HZ;
>
> The problem that 30 * HZ is probably long enough to trigger the
> warnings like hungtask?
>
That's right. OTOH, the same behavior from the device already triggers
the hungtask.
Maybe it is better to set it to 15*HZ?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-07 13:06 ` [RFC 1/2] virtio_net: timeout control virtqueue commands Eugenio Pérez
2025-10-11 7:44 ` Jason Wang
@ 2025-10-14 8:29 ` Michael S. Tsirkin
2025-10-14 9:14 ` Maxime Coquelin
1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-14 8:29 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> An userland device implemented through VDUSE could take rtnl forever if
> the virtio-net driver is running on top of virtio_vdpa. Let's break the
> device if it does not return the buffer in a longer-than-assumible
> timeout.
So now I can't debug qemu with gdb because guest dies :(
Let's not break valid use-cases please.
Instead, solve it in vduse, probably by handling cvq within
kernel.
> A less agressive path can be taken to recover the device, like only
> resetting the control virtqueue. However, the state of the device after
> this action is taken races, as the vq could be reset after the device
> writes the OK. Leaving TODO anyway.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/net/virtio_net.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 31bd32bdecaf..ed68ad69a019 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> {
> struct scatterlist *sgs[5], hdr, stat;
> u32 out_num = 0, tmp, in_num = 0;
> + unsigned long end_time;
> bool ok;
> int ret;
>
> @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
>
> /* Spin for a response, the kick causes an ioport write, trapping
> * into the hypervisor, so the request should be handled immediately.
> + *
> + * Long timeout so a malicious device is not able to lock rtnl forever.
> */
> + end_time = jiffies + 30 * HZ;
> while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> !virtqueue_is_broken(vi->cvq)) {
> cond_resched();
> cpu_relax();
> +
> + if (time_after(end_time, jiffies)) {
> + /* TODO Reset vq if possible? */
> + virtio_break_device(vi->vdev);
> + break;
> + }
> }
>
> unlock:
> --
> 2.51.0
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 2/2] vduse: lift restriction about net devices with CVQ
2025-10-07 13:06 ` [RFC 2/2] vduse: lift restriction about net devices with CVQ Eugenio Pérez
2025-10-09 13:14 ` Maxime Coquelin
@ 2025-10-14 8:31 ` Michael S. Tsirkin
2025-10-15 6:25 ` Eugenio Perez Martin
1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-14 8:31 UTC (permalink / raw)
To: Eugenio Pérez
Cc: Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 07, 2025 at 03:06:22PM +0200, Eugenio Pérez wrote:
> Now that the virtio_net driver is able to recover from a stall
> virtqueue,
it's not able to recover, is it?
> let's lift the restriction.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> drivers/vdpa/vdpa_user/vduse_dev.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> index e7bced0b5542..95d2b898171d 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -1726,9 +1726,6 @@ static bool features_is_valid(struct vduse_dev_config *config)
> if ((config->device_id == VIRTIO_ID_BLOCK) &&
> (config->features & BIT_ULL(VIRTIO_BLK_F_CONFIG_WCE)))
> return false;
> - else if ((config->device_id == VIRTIO_ID_NET) &&
> - (config->features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> - return false;
>
> if ((config->device_id == VIRTIO_ID_NET) &&
> !(config->features & BIT_ULL(VIRTIO_F_VERSION_1)))
> --
> 2.51.0
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-14 8:29 ` Michael S. Tsirkin
@ 2025-10-14 9:14 ` Maxime Coquelin
2025-10-14 9:25 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2025-10-14 9:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eugenio Pérez, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > An userland device implemented through VDUSE could take rtnl forever if
> > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > device if it does not return the buffer in a longer-than-assumible
> > timeout.
>
> So now I can't debug qemu with gdb because guest dies :(
> Let's not break valid use-cases please.
>
>
> Instead, solve it in vduse, probably by handling cvq within
> kernel.
Would a shadow control virtqueue implementation in the VDUSE driver work?
It would ack systematically messages sent by the Virtio-net driver,
and so assume the userspace application will Ack them.
When the userspace application handles the message, if the handling fails,
it somehow marks the device as broken?
Thanks,
Maxime
>
> > A less agressive path can be taken to recover the device, like only
> > resetting the control virtqueue. However, the state of the device after
> > this action is taken races, as the vq could be reset after the device
> > writes the OK. Leaving TODO anyway.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/net/virtio_net.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 31bd32bdecaf..ed68ad69a019 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > {
> > struct scatterlist *sgs[5], hdr, stat;
> > u32 out_num = 0, tmp, in_num = 0;
> > + unsigned long end_time;
> > bool ok;
> > int ret;
> >
> > @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> >
> > /* Spin for a response, the kick causes an ioport write, trapping
> > * into the hypervisor, so the request should be handled immediately.
> > + *
> > + * Long timeout so a malicious device is not able to lock rtnl forever.
> > */
> > + end_time = jiffies + 30 * HZ;
> > while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > !virtqueue_is_broken(vi->cvq)) {
> > cond_resched();
> > cpu_relax();
> > +
> > + if (time_after(end_time, jiffies)) {
> > + /* TODO Reset vq if possible? */
> > + virtio_break_device(vi->vdev);
> > + break;
> > + }
> > }
> >
> > unlock:
> > --
> > 2.51.0
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-14 9:14 ` Maxime Coquelin
@ 2025-10-14 9:25 ` Michael S. Tsirkin
2025-10-14 10:21 ` Maxime Coquelin
2025-10-15 6:08 ` Eugenio Perez Martin
0 siblings, 2 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-14 9:25 UTC (permalink / raw)
To: Maxime Coquelin
Cc: Eugenio Pérez, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > An userland device implemented through VDUSE could take rtnl forever if
> > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > device if it does not return the buffer in a longer-than-assumible
> > > timeout.
> >
> > So now I can't debug qemu with gdb because guest dies :(
> > Let's not break valid use-cases please.
> >
> >
> > Instead, solve it in vduse, probably by handling cvq within
> > kernel.
>
> Would a shadow control virtqueue implementation in the VDUSE driver work?
> It would ack systematically messages sent by the Virtio-net driver,
> and so assume the userspace application will Ack them.
>
> When the userspace application handles the message, if the handling fails,
> it somehow marks the device as broken?
>
> Thanks,
> Maxime
Yes but it's a bit more convoluted than just acking them.
Once you use the buffer you can get another one and so on
with no limit.
One fix is to actually maintain device state in the
kernel, update it, and then notify userspace.
> >
> > > A less agressive path can be taken to recover the device, like only
> > > resetting the control virtqueue. However, the state of the device after
> > > this action is taken races, as the vq could be reset after the device
> > > writes the OK. Leaving TODO anyway.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > drivers/net/virtio_net.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 31bd32bdecaf..ed68ad69a019 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > {
> > > struct scatterlist *sgs[5], hdr, stat;
> > > u32 out_num = 0, tmp, in_num = 0;
> > > + unsigned long end_time;
> > > bool ok;
> > > int ret;
> > >
> > > @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > >
> > > /* Spin for a response, the kick causes an ioport write, trapping
> > > * into the hypervisor, so the request should be handled immediately.
> > > + *
> > > + * Long timeout so a malicious device is not able to lock rtnl forever.
> > > */
> > > + end_time = jiffies + 30 * HZ;
> > > while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > !virtqueue_is_broken(vi->cvq)) {
> > > cond_resched();
> > > cpu_relax();
> > > +
> > > + if (time_after(end_time, jiffies)) {
> > > + /* TODO Reset vq if possible? */
> > > + virtio_break_device(vi->vdev);
> > > + break;
> > > + }
> > > }
> > >
> > > unlock:
> > > --
> > > 2.51.0
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-14 9:25 ` Michael S. Tsirkin
@ 2025-10-14 10:21 ` Maxime Coquelin
2025-10-15 4:44 ` Jason Wang
2025-10-15 6:08 ` Eugenio Perez Martin
1 sibling, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2025-10-14 10:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eugenio Pérez, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > device if it does not return the buffer in a longer-than-assumible
> > > > timeout.
> > >
> > > So now I can't debug qemu with gdb because guest dies :(
> > > Let's not break valid use-cases please.
> > >
> > >
> > > Instead, solve it in vduse, probably by handling cvq within
> > > kernel.
> >
> > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > It would ack systematically messages sent by the Virtio-net driver,
> > and so assume the userspace application will Ack them.
> >
> > When the userspace application handles the message, if the handling fails,
> > it somehow marks the device as broken?
> >
> > Thanks,
> > Maxime
>
> Yes but it's a bit more convoluted than just acking them.
> Once you use the buffer you can get another one and so on
> with no limit.
> One fix is to actually maintain device state in the
> kernel, update it, and then notify userspace.
I agree, this is the way to go.
Thanks for your insights,
Maxime
>
>
> > >
> > > > A less agressive path can be taken to recover the device, like only
> > > > resetting the control virtqueue. However, the state of the device after
> > > > this action is taken races, as the vq could be reset after the device
> > > > writes the OK. Leaving TODO anyway.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > > drivers/net/virtio_net.c | 10 ++++++++++
> > > > 1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 31bd32bdecaf..ed68ad69a019 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > > {
> > > > struct scatterlist *sgs[5], hdr, stat;
> > > > u32 out_num = 0, tmp, in_num = 0;
> > > > + unsigned long end_time;
> > > > bool ok;
> > > > int ret;
> > > >
> > > > @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > >
> > > > /* Spin for a response, the kick causes an ioport write, trapping
> > > > * into the hypervisor, so the request should be handled immediately.
> > > > + *
> > > > + * Long timeout so a malicious device is not able to lock rtnl forever.
> > > > */
> > > > + end_time = jiffies + 30 * HZ;
> > > > while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > !virtqueue_is_broken(vi->cvq)) {
> > > > cond_resched();
> > > > cpu_relax();
> > > > +
> > > > + if (time_after(end_time, jiffies)) {
> > > > + /* TODO Reset vq if possible? */
> > > > + virtio_break_device(vi->vdev);
> > > > + break;
> > > > + }
> > > > }
> > > >
> > > > unlock:
> > > > --
> > > > 2.51.0
> > >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-14 10:21 ` Maxime Coquelin
@ 2025-10-15 4:44 ` Jason Wang
2025-10-15 6:07 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Jason Wang @ 2025-10-15 4:44 UTC (permalink / raw)
To: Maxime Coquelin
Cc: Michael S. Tsirkin, Eugenio Pérez, Yongji Xie,
virtualization, linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Tue, Oct 14, 2025 at 6:21 PM Maxime Coquelin <mcoqueli@redhat.com> wrote:
>
> On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > timeout.
> > > >
> > > > So now I can't debug qemu with gdb because guest dies :(
> > > > Let's not break valid use-cases please.
> > > >
> > > >
> > > > Instead, solve it in vduse, probably by handling cvq within
> > > > kernel.
> > >
> > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > It would ack systematically messages sent by the Virtio-net driver,
> > > and so assume the userspace application will Ack them.
> > >
> > > When the userspace application handles the message, if the handling fails,
> > > it somehow marks the device as broken?
> > >
> > > Thanks,
> > > Maxime
> >
> > Yes but it's a bit more convoluted than just acking them.
> > Once you use the buffer you can get another one and so on
> > with no limit.
> > One fix is to actually maintain device state in the
> > kernel, update it, and then notify userspace.
>
> I agree, this is the way to go.
>
> Thanks for your insights,
> Maxime
A timeout still needs to be considered in this case. Or I may miss something?
Thanks
>
> >
> >
> > > >
> > > > > A less agressive path can be taken to recover the device, like only
> > > > > resetting the control virtqueue. However, the state of the device after
> > > > > this action is taken races, as the vq could be reset after the device
> > > > > writes the OK. Leaving TODO anyway.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > > drivers/net/virtio_net.c | 10 ++++++++++
> > > > > 1 file changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 31bd32bdecaf..ed68ad69a019 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > > > {
> > > > > struct scatterlist *sgs[5], hdr, stat;
> > > > > u32 out_num = 0, tmp, in_num = 0;
> > > > > + unsigned long end_time;
> > > > > bool ok;
> > > > > int ret;
> > > > >
> > > > > @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > > >
> > > > > /* Spin for a response, the kick causes an ioport write, trapping
> > > > > * into the hypervisor, so the request should be handled immediately.
> > > > > + *
> > > > > + * Long timeout so a malicious device is not able to lock rtnl forever.
> > > > > */
> > > > > + end_time = jiffies + 30 * HZ;
> > > > > while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > !virtqueue_is_broken(vi->cvq)) {
> > > > > cond_resched();
> > > > > cpu_relax();
> > > > > +
> > > > > + if (time_after(end_time, jiffies)) {
> > > > > + /* TODO Reset vq if possible? */
> > > > > + virtio_break_device(vi->vdev);
> > > > > + break;
> > > > > + }
> > > > > }
> > > > >
> > > > > unlock:
> > > > > --
> > > > > 2.51.0
> > > >
> >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 4:44 ` Jason Wang
@ 2025-10-15 6:07 ` Michael S. Tsirkin
0 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-15 6:07 UTC (permalink / raw)
To: Jason Wang
Cc: Maxime Coquelin, Eugenio Pérez, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Wed, Oct 15, 2025 at 12:44:47PM +0800, Jason Wang wrote:
> On Tue, Oct 14, 2025 at 6:21 PM Maxime Coquelin <mcoqueli@redhat.com> wrote:
> >
> > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > timeout.
> > > > >
> > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > Let's not break valid use-cases please.
> > > > >
> > > > >
> > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > kernel.
> > > >
> > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > and so assume the userspace application will Ack them.
> > > >
> > > > When the userspace application handles the message, if the handling fails,
> > > > it somehow marks the device as broken?
> > > >
> > > > Thanks,
> > > > Maxime
> > >
> > > Yes but it's a bit more convoluted than just acking them.
> > > Once you use the buffer you can get another one and so on
> > > with no limit.
> > > One fix is to actually maintain device state in the
> > > kernel, update it, and then notify userspace.
> >
> > I agree, this is the way to go.
> >
> > Thanks for your insights,
> > Maxime
>
> A timeout still needs to be considered in this case. Or I may miss something?
>
> Thanks
Not as such, kernel can use buffers (semi) predictably.
> >
> > >
> > >
> > > > >
> > > > > > A less agressive path can be taken to recover the device, like only
> > > > > > resetting the control virtqueue. However, the state of the device after
> > > > > > this action is taken races, as the vq could be reset after the device
> > > > > > writes the OK. Leaving TODO anyway.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > ---
> > > > > > drivers/net/virtio_net.c | 10 ++++++++++
> > > > > > 1 file changed, 10 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 31bd32bdecaf..ed68ad69a019 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -3576,6 +3576,7 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > > > > {
> > > > > > struct scatterlist *sgs[5], hdr, stat;
> > > > > > u32 out_num = 0, tmp, in_num = 0;
> > > > > > + unsigned long end_time;
> > > > > > bool ok;
> > > > > > int ret;
> > > > > >
> > > > > > @@ -3614,11 +3615,20 @@ static bool virtnet_send_command_reply(struct virtnet_info *vi, u8 class, u8 cmd
> > > > > >
> > > > > > /* Spin for a response, the kick causes an ioport write, trapping
> > > > > > * into the hypervisor, so the request should be handled immediately.
> > > > > > + *
> > > > > > + * Long timeout so a malicious device is not able to lock rtnl forever.
> > > > > > */
> > > > > > + end_time = jiffies + 30 * HZ;
> > > > > > while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > !virtqueue_is_broken(vi->cvq)) {
> > > > > > cond_resched();
> > > > > > cpu_relax();
> > > > > > +
> > > > > > + if (time_after(end_time, jiffies)) {
> > > > > > + /* TODO Reset vq if possible? */
> > > > > > + virtio_break_device(vi->vdev);
> > > > > > + break;
> > > > > > + }
> > > > > > }
> > > > > >
> > > > > > unlock:
> > > > > > --
> > > > > > 2.51.0
> > > > >
> > >
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-14 9:25 ` Michael S. Tsirkin
2025-10-14 10:21 ` Maxime Coquelin
@ 2025-10-15 6:08 ` Eugenio Perez Martin
2025-10-15 6:33 ` Michael S. Tsirkin
1 sibling, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-15 6:08 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > device if it does not return the buffer in a longer-than-assumible
> > > > timeout.
> > >
> > > So now I can't debug qemu with gdb because guest dies :(
> > > Let's not break valid use-cases please.
> > >
> > >
> > > Instead, solve it in vduse, probably by handling cvq within
> > > kernel.
> >
> > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > It would ack systematically messages sent by the Virtio-net driver,
> > and so assume the userspace application will Ack them.
> >
> > When the userspace application handles the message, if the handling fails,
> > it somehow marks the device as broken?
> >
> > Thanks,
> > Maxime
>
> Yes but it's a bit more convoluted than just acking them.
> Once you use the buffer you can get another one and so on
> with no limit.
> One fix is to actually maintain device state in the
> kernel, update it, and then notify userspace.
>
I thought of implementing this approach at first, but it has two drawbacks.
The first one: it's racy. Let's say the driver updates the MAC filter,
VDUSE timeout occurs, the guest receives the fail, and then the device
replies with an OK. There is no way for the device or VDUSE to update
the driver.
The second one, what to do when the VDUSE cvq runs out of descriptors?
While the driver has its descriptor returned with VIRTIO_NET_ERR, the
VDUSE CVQ has the descriptor available. If this process repeats to
make available all of the VDUSE CVQ descriptors, how can we proceed?
I think both of them can be solved with the DEVICE_NEEDS_RESET status
bit, but it is not implemented in the drivers at this moment.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 2/2] vduse: lift restriction about net devices with CVQ
2025-10-09 13:14 ` Maxime Coquelin
@ 2025-10-15 6:11 ` Eugenio Perez Martin
0 siblings, 0 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-15 6:11 UTC (permalink / raw)
To: Maxime Coquelin
Cc: mst, Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Thu, Oct 9, 2025 at 3:15 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
>
>
> On 10/7/25 3:06 PM, Eugenio Pérez wrote:
> > Now that the virtio_net driver is able to recover from a stall
> > virtqueue, let's lift the restriction.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > drivers/vdpa/vdpa_user/vduse_dev.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > index e7bced0b5542..95d2b898171d 100644
> > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -1726,9 +1726,6 @@ static bool features_is_valid(struct vduse_dev_config *config)
> > if ((config->device_id == VIRTIO_ID_BLOCK) &&
> > (config->features & BIT_ULL(VIRTIO_BLK_F_CONFIG_WCE)))
> > return false;
> > - else if ((config->device_id == VIRTIO_ID_NET) &&
> > - (config->features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
> > - return false;
> >
> > if ((config->device_id == VIRTIO_ID_NET) &&
> > !(config->features & BIT_ULL(VIRTIO_F_VERSION_1)))
>
> I wonder whether the API version should be increased, otherwise I don't
> see how the app creating the VDUSE device knows whether it can safely
> advertises the CVQ support (except without doing trial and error).
>
Ok good point! I'll do it in the next version.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 2/2] vduse: lift restriction about net devices with CVQ
2025-10-14 8:31 ` Michael S. Tsirkin
@ 2025-10-15 6:25 ` Eugenio Perez Martin
0 siblings, 0 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-15 6:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Yongji Xie, virtualization, linux-kernel, Maxime Coquelin,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 14, 2025 at 10:31 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 07, 2025 at 03:06:22PM +0200, Eugenio Pérez wrote:
> > Now that the virtio_net driver is able to recover from a stall
> > virtqueue,
>
> it's not able to recover, is it?
>
Maybe recover is not the best word here :). s/recover from a stall
virtqueue/unlock the RTNL from a stalled control virtqueue/.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 6:08 ` Eugenio Perez Martin
@ 2025-10-15 6:33 ` Michael S. Tsirkin
2025-10-15 6:52 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-15 6:33 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > timeout.
> > > >
> > > > So now I can't debug qemu with gdb because guest dies :(
> > > > Let's not break valid use-cases please.
> > > >
> > > >
> > > > Instead, solve it in vduse, probably by handling cvq within
> > > > kernel.
> > >
> > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > It would ack systematically messages sent by the Virtio-net driver,
> > > and so assume the userspace application will Ack them.
> > >
> > > When the userspace application handles the message, if the handling fails,
> > > it somehow marks the device as broken?
> > >
> > > Thanks,
> > > Maxime
> >
> > Yes but it's a bit more convoluted than just acking them.
> > Once you use the buffer you can get another one and so on
> > with no limit.
> > One fix is to actually maintain device state in the
> > kernel, update it, and then notify userspace.
> >
>
> I thought of implementing this approach at first, but it has two drawbacks.
>
> The first one: it's racy. Let's say the driver updates the MAC filter,
> VDUSE timeout occurs, the guest receives the fail, and then the device
> replies with an OK. There is no way for the device or VDUSE to update
> the driver.
There's no timeout. Kernel can guarantee executing all requests.
>
> The second one, what to do when the VDUSE cvq runs out of descriptors?
> While the driver has its descriptor returned with VIRTIO_NET_ERR, the
> VDUSE CVQ has the descriptor available. If this process repeats to
> make available all of the VDUSE CVQ descriptors, how can we proceed?
There's no reason to return VIRTIO_NET_ERR ever and cvq will not run
out of descriptors. Kernel uses cvq buffers.
> I think both of them can be solved with the DEVICE_NEEDS_RESET status
> bit, but it is not implemented in the drivers at this moment.
No need for a reset, either.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 6:33 ` Michael S. Tsirkin
@ 2025-10-15 6:52 ` Eugenio Perez Martin
2025-10-15 7:04 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-15 6:52 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > timeout.
> > > > >
> > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > Let's not break valid use-cases please.
> > > > >
> > > > >
> > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > kernel.
> > > >
> > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > and so assume the userspace application will Ack them.
> > > >
> > > > When the userspace application handles the message, if the handling fails,
> > > > it somehow marks the device as broken?
> > > >
> > > > Thanks,
> > > > Maxime
> > >
> > > Yes but it's a bit more convoluted than just acking them.
> > > Once you use the buffer you can get another one and so on
> > > with no limit.
> > > One fix is to actually maintain device state in the
> > > kernel, update it, and then notify userspace.
> > >
> >
> > I thought of implementing this approach at first, but it has two drawbacks.
> >
> > The first one: it's racy. Let's say the driver updates the MAC filter,
> > VDUSE timeout occurs, the guest receives the fail, and then the device
> > replies with an OK. There is no way for the device or VDUSE to update
> > the driver.
>
> There's no timeout. Kernel can guarantee executing all requests.
>
I don't follow this. How should the VDUSE kernel module act if the
VDUSE userland device does not use the CVQ buffer then?
>
>
> >
> > The second one, what to do when the VDUSE cvq runs out of descriptors?
> > While the driver has its descriptor returned with VIRTIO_NET_ERR, the
> > VDUSE CVQ has the descriptor available. If this process repeats to
> > make available all of the VDUSE CVQ descriptors, how can we proceed?
>
> There's no reason to return VIRTIO_NET_ERR ever and cvq will not run
> out of descriptors. Kernel uses cvq buffers.
>
>
> > I think both of them can be solved with the DEVICE_NEEDS_RESET status
> > bit, but it is not implemented in the drivers at this moment.
>
> No need for a reset, either.
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 6:52 ` Eugenio Perez Martin
@ 2025-10-15 7:04 ` Michael S. Tsirkin
2025-10-15 7:45 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-15 7:04 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > timeout.
> > > > > >
> > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > Let's not break valid use-cases please.
> > > > > >
> > > > > >
> > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > kernel.
> > > > >
> > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > and so assume the userspace application will Ack them.
> > > > >
> > > > > When the userspace application handles the message, if the handling fails,
> > > > > it somehow marks the device as broken?
> > > > >
> > > > > Thanks,
> > > > > Maxime
> > > >
> > > > Yes but it's a bit more convoluted than just acking them.
> > > > Once you use the buffer you can get another one and so on
> > > > with no limit.
> > > > One fix is to actually maintain device state in the
> > > > kernel, update it, and then notify userspace.
> > > >
> > >
> > > I thought of implementing this approach at first, but it has two drawbacks.
> > >
> > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > replies with an OK. There is no way for the device or VDUSE to update
> > > the driver.
> >
> > There's no timeout. Kernel can guarantee executing all requests.
> >
>
> I don't follow this. How should the VDUSE kernel module act if the
> VDUSE userland device does not use the CVQ buffer then?
First I am not sure a VQ is the best interface for talking to userspace.
But assuming yes - just avoid sending more data, send it later after
userspace used the buffer.
> >
> >
> > >
> > > The second one, what to do when the VDUSE cvq runs out of descriptors?
> > > While the driver has its descriptor returned with VIRTIO_NET_ERR, the
> > > VDUSE CVQ has the descriptor available. If this process repeats to
> > > make available all of the VDUSE CVQ descriptors, how can we proceed?
> >
> > There's no reason to return VIRTIO_NET_ERR ever and cvq will not run
> > out of descriptors. Kernel uses cvq buffers.
> >
> >
> > > I think both of them can be solved with the DEVICE_NEEDS_RESET status
> > > bit, but it is not implemented in the drivers at this moment.
> >
> > No need for a reset, either.
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 7:04 ` Michael S. Tsirkin
@ 2025-10-15 7:45 ` Eugenio Perez Martin
2025-10-15 8:03 ` Maxime Coquelin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-15 7:45 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > timeout.
> > > > > > >
> > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > Let's not break valid use-cases please.
> > > > > > >
> > > > > > >
> > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > kernel.
> > > > > >
> > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > and so assume the userspace application will Ack them.
> > > > > >
> > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > it somehow marks the device as broken?
> > > > > >
> > > > > > Thanks,
> > > > > > Maxime
> > > > >
> > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > Once you use the buffer you can get another one and so on
> > > > > with no limit.
> > > > > One fix is to actually maintain device state in the
> > > > > kernel, update it, and then notify userspace.
> > > > >
> > > >
> > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > >
> > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > the driver.
> > >
> > > There's no timeout. Kernel can guarantee executing all requests.
> > >
> >
> > I don't follow this. How should the VDUSE kernel module act if the
> > VDUSE userland device does not use the CVQ buffer then?
>
> First I am not sure a VQ is the best interface for talking to userspace.
> But assuming yes - just avoid sending more data, send it later after
> userspace used the buffer.
>
Let me take a step back, I think I didn't describe the scenario well enough.
We have a VDUSE device, and then the same host is interacting with the
device through the virtio_net driver over virtio_vdpa.
Then, the virtio_net driver sends a control command though its CVQ, so
it *takes the RTNL*. That command reaches the VDUSE CVQ.
It does not matter if the VDUSE device in the userland processes the
commands through a CVQ, reading the vduse character device, or another
system. The question is: what to do if the VDUSE device does not
process that command in a timely manner? Should we just let the RTNL
be taken forever?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 7:45 ` Eugenio Perez Martin
@ 2025-10-15 8:03 ` Maxime Coquelin
2025-10-15 8:09 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2025-10-15 8:03 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S. Tsirkin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > timeout.
> > > > > > > >
> > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > Let's not break valid use-cases please.
> > > > > > > >
> > > > > > > >
> > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > kernel.
> > > > > > >
> > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > and so assume the userspace application will Ack them.
> > > > > > >
> > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > it somehow marks the device as broken?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Maxime
> > > > > >
> > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > Once you use the buffer you can get another one and so on
> > > > > > with no limit.
> > > > > > One fix is to actually maintain device state in the
> > > > > > kernel, update it, and then notify userspace.
> > > > > >
> > > > >
> > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > >
> > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > the driver.
> > > >
> > > > There's no timeout. Kernel can guarantee executing all requests.
> > > >
> > >
> > > I don't follow this. How should the VDUSE kernel module act if the
> > > VDUSE userland device does not use the CVQ buffer then?
> >
> > First I am not sure a VQ is the best interface for talking to userspace.
> > But assuming yes - just avoid sending more data, send it later after
> > userspace used the buffer.
> >
>
> Let me take a step back, I think I didn't describe the scenario well enough.
>
> We have a VDUSE device, and then the same host is interacting with the
> device through the virtio_net driver over virtio_vdpa.
>
> Then, the virtio_net driver sends a control command though its CVQ, so
> it *takes the RTNL*. That command reaches the VDUSE CVQ.
>
> It does not matter if the VDUSE device in the userland processes the
> commands through a CVQ, reading the vduse character device, or another
> system. The question is: what to do if the VDUSE device does not
> process that command in a timely manner? Should we just let the RTNL
> be taken forever?
>
My understanding is that:
1. Virtio-net sends a control messages, waits for reply
2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
3. Userspace application dequeues the message from the SCVQ
a. If handling is successful it replies OK
b. If handling fails, replies ERROR
4. VDUSE driver reads the reply
a. if OK, do nothing
b. if ERROR, mark the device as broken?
This is simplified as it does not take into account SCVQ overflow if
the application is stuck.
If IIUC, Michael suggests to only enqueue a single message at the time
in the SVQ,
and bufferize the pending messages in the VDUSE driver.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 8:03 ` Maxime Coquelin
@ 2025-10-15 8:09 ` Michael S. Tsirkin
2025-10-15 9:16 ` Maxime Coquelin
2025-10-15 10:36 ` Eugenio Perez Martin
0 siblings, 2 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-15 8:09 UTC (permalink / raw)
To: Maxime Coquelin
Cc: Eugenio Perez Martin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 10:03:49AM +0200, Maxime Coquelin wrote:
> On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > > timeout.
> > > > > > > > >
> > > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > > Let's not break valid use-cases please.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > > kernel.
> > > > > > > >
> > > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > > and so assume the userspace application will Ack them.
> > > > > > > >
> > > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > > it somehow marks the device as broken?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Maxime
> > > > > > >
> > > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > > Once you use the buffer you can get another one and so on
> > > > > > > with no limit.
> > > > > > > One fix is to actually maintain device state in the
> > > > > > > kernel, update it, and then notify userspace.
> > > > > > >
> > > > > >
> > > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > > >
> > > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > > the driver.
> > > > >
> > > > > There's no timeout. Kernel can guarantee executing all requests.
> > > > >
> > > >
> > > > I don't follow this. How should the VDUSE kernel module act if the
> > > > VDUSE userland device does not use the CVQ buffer then?
> > >
> > > First I am not sure a VQ is the best interface for talking to userspace.
> > > But assuming yes - just avoid sending more data, send it later after
> > > userspace used the buffer.
> > >
> >
> > Let me take a step back, I think I didn't describe the scenario well enough.
> >
> > We have a VDUSE device, and then the same host is interacting with the
> > device through the virtio_net driver over virtio_vdpa.
> >
> > Then, the virtio_net driver sends a control command though its CVQ, so
> > it *takes the RTNL*. That command reaches the VDUSE CVQ.
> >
> > It does not matter if the VDUSE device in the userland processes the
> > commands through a CVQ, reading the vduse character device, or another
> > system. The question is: what to do if the VDUSE device does not
> > process that command in a timely manner? Should we just let the RTNL
> > be taken forever?
> >
>
> My understanding is that:
> 1. Virtio-net sends a control messages, waits for reply
> 2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
> 3. Userspace application dequeues the message from the SCVQ
> a. If handling is successful it replies OK
> b. If handling fails, replies ERROR
> 4. VDUSE driver reads the reply
> a. if OK, do nothing
> b. if ERROR, mark the device as broken?
>
> This is simplified as it does not take into account SCVQ overflow if
> the application is stuck.
> If IIUC, Michael suggests to only enqueue a single message at the time
> in the SVQ,
> and bufferize the pending messages in the VDUSE driver.
Not exactly bufferize, record. E.g. we do not need to send
100 messages to enable/disable promisc mode - together they
have no effect.
--
MST
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 8:09 ` Michael S. Tsirkin
@ 2025-10-15 9:16 ` Maxime Coquelin
2025-10-15 10:36 ` Eugenio Perez Martin
1 sibling, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2025-10-15 9:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eugenio Perez Martin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 10:09 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 10:03:49AM +0200, Maxime Coquelin wrote:
> > On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > > > timeout.
> > > > > > > > > >
> > > > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > > > Let's not break valid use-cases please.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > > > kernel.
> > > > > > > > >
> > > > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > > > and so assume the userspace application will Ack them.
> > > > > > > > >
> > > > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > > > it somehow marks the device as broken?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Maxime
> > > > > > > >
> > > > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > > > Once you use the buffer you can get another one and so on
> > > > > > > > with no limit.
> > > > > > > > One fix is to actually maintain device state in the
> > > > > > > > kernel, update it, and then notify userspace.
> > > > > > > >
> > > > > > >
> > > > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > > > >
> > > > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > > > the driver.
> > > > > >
> > > > > > There's no timeout. Kernel can guarantee executing all requests.
> > > > > >
> > > > >
> > > > > I don't follow this. How should the VDUSE kernel module act if the
> > > > > VDUSE userland device does not use the CVQ buffer then?
> > > >
> > > > First I am not sure a VQ is the best interface for talking to userspace.
> > > > But assuming yes - just avoid sending more data, send it later after
> > > > userspace used the buffer.
> > > >
> > >
> > > Let me take a step back, I think I didn't describe the scenario well enough.
> > >
> > > We have a VDUSE device, and then the same host is interacting with the
> > > device through the virtio_net driver over virtio_vdpa.
> > >
> > > Then, the virtio_net driver sends a control command though its CVQ, so
> > > it *takes the RTNL*. That command reaches the VDUSE CVQ.
> > >
> > > It does not matter if the VDUSE device in the userland processes the
> > > commands through a CVQ, reading the vduse character device, or another
> > > system. The question is: what to do if the VDUSE device does not
> > > process that command in a timely manner? Should we just let the RTNL
> > > be taken forever?
> > >
> >
> > My understanding is that:
> > 1. Virtio-net sends a control messages, waits for reply
> > 2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
> > 3. Userspace application dequeues the message from the SCVQ
> > a. If handling is successful it replies OK
> > b. If handling fails, replies ERROR
> > 4. VDUSE driver reads the reply
> > a. if OK, do nothing
> > b. if ERROR, mark the device as broken?
> >
> > This is simplified as it does not take into account SCVQ overflow if
> > the application is stuck.
> > If IIUC, Michael suggests to only enqueue a single message at the time
> > in the SVQ,
> > and bufferize the pending messages in the VDUSE driver.
>
> Not exactly bufferize, record. E.g. we do not need to send
> 100 messages to enable/disable promisc mode - together they
> have no effect.
The downside of such optimization is that it requires the VDUSE Kernel driver
to be able to handle all the message types.
So every time we add support for a new control message type, we'll also have
to patch VDUSE Kernel driver.
I am not sure the gain is worth the effort as the traffic on the
control queue is
usually rather low?
Maxime
>
> --
> MST
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 8:09 ` Michael S. Tsirkin
2025-10-15 9:16 ` Maxime Coquelin
@ 2025-10-15 10:36 ` Eugenio Perez Martin
2025-10-16 5:39 ` Jason Wang
2025-10-22 10:09 ` Michael S. Tsirkin
1 sibling, 2 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-15 10:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 10:09 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 10:03:49AM +0200, Maxime Coquelin wrote:
> > On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > > > timeout.
> > > > > > > > > >
> > > > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > > > Let's not break valid use-cases please.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > > > kernel.
> > > > > > > > >
> > > > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > > > and so assume the userspace application will Ack them.
> > > > > > > > >
> > > > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > > > it somehow marks the device as broken?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Maxime
> > > > > > > >
> > > > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > > > Once you use the buffer you can get another one and so on
> > > > > > > > with no limit.
> > > > > > > > One fix is to actually maintain device state in the
> > > > > > > > kernel, update it, and then notify userspace.
> > > > > > > >
> > > > > > >
> > > > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > > > >
> > > > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > > > the driver.
> > > > > >
> > > > > > There's no timeout. Kernel can guarantee executing all requests.
> > > > > >
> > > > >
> > > > > I don't follow this. How should the VDUSE kernel module act if the
> > > > > VDUSE userland device does not use the CVQ buffer then?
> > > >
> > > > First I am not sure a VQ is the best interface for talking to userspace.
> > > > But assuming yes - just avoid sending more data, send it later after
> > > > userspace used the buffer.
> > > >
> > >
> > > Let me take a step back, I think I didn't describe the scenario well enough.
> > >
> > > We have a VDUSE device, and then the same host is interacting with the
> > > device through the virtio_net driver over virtio_vdpa.
> > >
> > > Then, the virtio_net driver sends a control command though its CVQ, so
> > > it *takes the RTNL*. That command reaches the VDUSE CVQ.
> > >
> > > It does not matter if the VDUSE device in the userland processes the
> > > commands through a CVQ, reading the vduse character device, or another
> > > system. The question is: what to do if the VDUSE device does not
> > > process that command in a timely manner? Should we just let the RTNL
> > > be taken forever?
> > >
> >
> > My understanding is that:
> > 1. Virtio-net sends a control messages, waits for reply
> > 2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
> > 3. Userspace application dequeues the message from the SCVQ
> > a. If handling is successful it replies OK
> > b. If handling fails, replies ERROR
If that's the case, everything would be ok now. In both cases, the
RTNL is held only by that time. The problem is when the VDUSE device
userland does not reply.
> > 4. VDUSE driver reads the reply
> > a. if OK, do nothing
> > b. if ERROR, mark the device as broken?
> >
> > This is simplified as it does not take into account SCVQ overflow if
> > the application is stuck.
> > If IIUC, Michael suggests to only enqueue a single message at the time
> > in the SVQ,
> > and bufferize the pending messages in the VDUSE driver.
But the RTNL keeps being held in all that process, isn't it?
>
> Not exactly bufferize, record. E.g. we do not need to send
> 100 messages to enable/disable promisc mode - together they
> have no effect.
>
I still don't follow how that unlocks the RTNL. Let me put some workflows:
1) MAC_TABLE_SET, what can we do if:
The driver sets a set of MAC addresses, (A, B, C). VDUSE device does
send this set to the VDUSE userland device, as we don't have more
information. Now, the driver sends a new table with addresses (A, B,
D), but the device still didn't reply to the VDUSE driver.
VDUSE should track that the new state is (A, B, D), and then wait for
the previous request to be replied by the device? What should we
report to the driver? If we wait for the device to reply, we're in the
same situation regarding the RTNL.
Now we receive a new state (A, B, E). We haven't sent the (A, B, D),
so it is good to just replace the (A, B, D) with that. and send it
when (A, B, C) is completed with either success or failure.
2) VQ_PAIRS_SET
The driver starts with 1 vq pair. Now the driver sets 3 vq pairs, and
the VDUSE CVQ forwards the command. The driver still thinks that it is
using 1 vq pair. I can store that the driver request was 3, and it is
still in-flight. Now the timeout occurs, so the VDUSE device returns
fail to the driver, and the driver frees the vq regions etc. After
that, the device now replies OK. The memory that was sent as the new
vqs avail ring and descriptor ring now contains garbage, and it could
happen that the device start overriding unrelated memory.
Not even VQ_RESET protects against it as there is still a window
between the CMD set and the VQ reset.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 10:36 ` Eugenio Perez Martin
@ 2025-10-16 5:39 ` Jason Wang
2025-10-16 5:45 ` Michael S. Tsirkin
2025-10-22 10:09 ` Michael S. Tsirkin
1 sibling, 1 reply; 45+ messages in thread
From: Jason Wang @ 2025-10-16 5:39 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S. Tsirkin, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Wed, Oct 15, 2025 at 6:37 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 10:09 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 10:03:49AM +0200, Maxime Coquelin wrote:
> > > On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > > > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > > > > timeout.
> > > > > > > > > > >
> > > > > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > > > > Let's not break valid use-cases please.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > > > > kernel.
> > > > > > > > > >
> > > > > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > > > > and so assume the userspace application will Ack them.
> > > > > > > > > >
> > > > > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > > > > it somehow marks the device as broken?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Maxime
> > > > > > > > >
> > > > > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > > > > Once you use the buffer you can get another one and so on
> > > > > > > > > with no limit.
> > > > > > > > > One fix is to actually maintain device state in the
> > > > > > > > > kernel, update it, and then notify userspace.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > > > > >
> > > > > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > > > > the driver.
> > > > > > >
> > > > > > > There's no timeout. Kernel can guarantee executing all requests.
> > > > > > >
> > > > > >
> > > > > > I don't follow this. How should the VDUSE kernel module act if the
> > > > > > VDUSE userland device does not use the CVQ buffer then?
> > > > >
> > > > > First I am not sure a VQ is the best interface for talking to userspace.
> > > > > But assuming yes - just avoid sending more data, send it later after
> > > > > userspace used the buffer.
> > > > >
> > > >
> > > > Let me take a step back, I think I didn't describe the scenario well enough.
> > > >
> > > > We have a VDUSE device, and then the same host is interacting with the
> > > > device through the virtio_net driver over virtio_vdpa.
> > > >
> > > > Then, the virtio_net driver sends a control command though its CVQ, so
> > > > it *takes the RTNL*. That command reaches the VDUSE CVQ.
> > > >
> > > > It does not matter if the VDUSE device in the userland processes the
> > > > commands through a CVQ, reading the vduse character device, or another
> > > > system. The question is: what to do if the VDUSE device does not
> > > > process that command in a timely manner? Should we just let the RTNL
> > > > be taken forever?
> > > >
> > >
> > > My understanding is that:
> > > 1. Virtio-net sends a control messages, waits for reply
> > > 2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
> > > 3. Userspace application dequeues the message from the SCVQ
> > > a. If handling is successful it replies OK
> > > b. If handling fails, replies ERROR
>
> If that's the case, everything would be ok now. In both cases, the
> RTNL is held only by that time. The problem is when the VDUSE device
> userland does not reply.
>
> > > 4. VDUSE driver reads the reply
> > > a. if OK, do nothing
> > > b. if ERROR, mark the device as broken?
> > >
> > > This is simplified as it does not take into account SCVQ overflow if
> > > the application is stuck.
> > > If IIUC, Michael suggests to only enqueue a single message at the time
> > > in the SVQ,
> > > and bufferize the pending messages in the VDUSE driver.
>
> But the RTNL keeps being held in all that process, isn't it?
>
> >
> > Not exactly bufferize, record. E.g. we do not need to send
> > 100 messages to enable/disable promisc mode - together they
> > have no effect.
Note that there's a case that multiple commands need to be sent, e.g
set rx mode. And assuming not all the commands are the best effort,
kernel VDUSE still needs to wait for the usersapce at least for a
while.
> >
>
> I still don't follow how that unlocks the RTNL. Let me put some workflows:
>
> 1) MAC_TABLE_SET, what can we do if:
> The driver sets a set of MAC addresses, (A, B, C). VDUSE device does
> send this set to the VDUSE userland device, as we don't have more
> information. Now, the driver sends a new table with addresses (A, B,
> D), but the device still didn't reply to the VDUSE driver.
>
> VDUSE should track that the new state is (A, B, D), and then wait for
> the previous request to be replied by the device? What should we
> report to the driver? If we wait for the device to reply, we're in the
> same situation regarding the RTNL.
>
> Now we receive a new state (A, B, E). We haven't sent the (A, B, D),
> so it is good to just replace the (A, B, D) with that. and send it
> when (A, B, C) is completed with either success or failure.
>
> 2) VQ_PAIRS_SET
>
> The driver starts with 1 vq pair. Now the driver sets 3 vq pairs, and
> the VDUSE CVQ forwards the command. The driver still thinks that it is
> using 1 vq pair. I can store that the driver request was 3, and it is
> still in-flight. Now the timeout occurs, so the VDUSE device returns
> fail to the driver, and the driver frees the vq regions etc. After
> that, the device now replies OK. The memory that was sent as the new
> vqs avail ring and descriptor ring now contains garbage, and it could
> happen that the device start overriding unrelated memory.
>
> Not even VQ_RESET protects against it as there is still a window
> between the CMD set and the VQ reset.
Yes, I think it would be fine if the command is the best effort that
means the state in kernel VDUSE could be out of sync with userspace.
But it would be problematic if the command is not the best effort. And
implementing cvq means a device model is implemented in the kernel
VDUSE which might be a burden. Hardening the cvq might still be a good
idea to go e.g
1) the device might be backed by software that is running in DPU
2) in kernel emulation of cvq might be buggy (e.g mlx5_vdpa and simulator)
Thanks
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-16 5:39 ` Jason Wang
@ 2025-10-16 5:45 ` Michael S. Tsirkin
2025-10-16 6:03 ` Jason Wang
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-16 5:45 UTC (permalink / raw)
To: Jason Wang
Cc: Eugenio Perez Martin, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > >
> > > Not exactly bufferize, record. E.g. we do not need to send
> > > 100 messages to enable/disable promisc mode - together they
> > > have no effect.
>
> Note that there's a case that multiple commands need to be sent, e.g
> set rx mode. And assuming not all the commands are the best effort,
> kernel VDUSE still needs to wait for the usersapce at least for a
> while.
Not wait, record. Generate 1st command, after userspace consumed it -
generate and send second command and so on.
But for each bit of data, at most one command has to be sent,
we do not care if guest tweaked rx mode 3 times, we only care about
the latest state.
--
MST
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-16 5:45 ` Michael S. Tsirkin
@ 2025-10-16 6:03 ` Jason Wang
2025-10-16 6:22 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Jason Wang @ 2025-10-16 6:03 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eugenio Perez Martin, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > >
> > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > 100 messages to enable/disable promisc mode - together they
> > > > have no effect.
> >
> > Note that there's a case that multiple commands need to be sent, e.g
> > set rx mode. And assuming not all the commands are the best effort,
> > kernel VDUSE still needs to wait for the usersapce at least for a
> > while.
>
> Not wait, record. Generate 1st command, after userspace consumed it -
> generate and send second command and so on.
Right, that's what I asked in another thread, we still need a timeout
here. Then I think it would not be too much difference whether it is
VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
can only advertise NEEDS_RESET since it's a device implementation.
VDUSE can not simply break the device as it requires synchronization
which is not easy.
> But for each bit of data, at most one command has to be sent,
> we do not care if guest tweaked rx mode 3 times, we only care about
> the latest state.
Yes, but I want to know what's best when VDUSE meets userspace timeout.
Thanks
>
> --
> MST
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-16 6:03 ` Jason Wang
@ 2025-10-16 6:22 ` Michael S. Tsirkin
2025-10-16 6:25 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-16 6:22 UTC (permalink / raw)
To: Jason Wang
Cc: Eugenio Perez Martin, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > >
> > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > 100 messages to enable/disable promisc mode - together they
> > > > > have no effect.
> > >
> > > Note that there's a case that multiple commands need to be sent, e.g
> > > set rx mode. And assuming not all the commands are the best effort,
> > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > while.
> >
> > Not wait, record. Generate 1st command, after userspace consumed it -
> > generate and send second command and so on.
>
> Right, that's what I asked in another thread, we still need a timeout
> here.
we do not need a timeout.
> Then I think it would not be too much difference whether it is
> VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> can only advertise NEEDS_RESET since it's a device implementation.
> VDUSE can not simply break the device as it requires synchronization
> which is not easy.
>
> > But for each bit of data, at most one command has to be sent,
> > we do not care if guest tweaked rx mode 3 times, we only care about
> > the latest state.
>
> Yes, but I want to know what's best when VDUSE meets userspace timeout.
>
> Thanks
userspace should manage its own timeouts.
> >
> > --
> > MST
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-16 6:22 ` Michael S. Tsirkin
@ 2025-10-16 6:25 ` Eugenio Perez Martin
2025-10-17 6:36 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-16 6:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > have no effect.
> > > >
> > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > set rx mode. And assuming not all the commands are the best effort,
> > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > while.
> > >
> > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > generate and send second command and so on.
> >
> > Right, that's what I asked in another thread, we still need a timeout
> > here.
>
> we do not need a timeout.
>
> > Then I think it would not be too much difference whether it is
> > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > can only advertise NEEDS_RESET since it's a device implementation.
> > VDUSE can not simply break the device as it requires synchronization
> > which is not easy.
> >
> > > But for each bit of data, at most one command has to be sent,
> > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > the latest state.
> >
> > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> >
> > Thanks
>
>
> userspace should manage its own timeouts.
>
Can we just apply the patch 2/2 of this RFC directly and apply the
VDUSE CVQ on top then? What are we missing to do it?
On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > have no effect.
> > > >
> > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > set rx mode. And assuming not all the commands are the best effort,
> > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > while.
> > >
> > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > generate and send second command and so on.
> >
> > Right, that's what I asked in another thread, we still need a timeout
> > here.
>
> we do not need a timeout.
>
> > Then I think it would not be too much difference whether it is
> > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > can only advertise NEEDS_RESET since it's a device implementation.
> > VDUSE can not simply break the device as it requires synchronization
> > which is not easy.
> >
> > > But for each bit of data, at most one command has to be sent,
> > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > the latest state.
> >
> > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> >
> > Thanks
>
>
> userspace should manage its own timeouts.
>
> > >
> > > --
> > > MST
> > >
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-16 6:25 ` Eugenio Perez Martin
@ 2025-10-17 6:36 ` Eugenio Perez Martin
2025-10-17 6:39 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-17 6:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Thu, Oct 16, 2025 at 8:25 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > > >
> > > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > > have no effect.
> > > > >
> > > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > > set rx mode. And assuming not all the commands are the best effort,
> > > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > > while.
> > > >
> > > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > > generate and send second command and so on.
> > >
> > > Right, that's what I asked in another thread, we still need a timeout
> > > here.
> >
> > we do not need a timeout.
> >
> > > Then I think it would not be too much difference whether it is
> > > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > > can only advertise NEEDS_RESET since it's a device implementation.
> > > VDUSE can not simply break the device as it requires synchronization
> > > which is not easy.
> > >
> > > > But for each bit of data, at most one command has to be sent,
> > > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > > the latest state.
> > >
> > > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> > >
> > > Thanks
> >
> >
> > userspace should manage its own timeouts.
> >
>
> Can we just apply the patch 2/2 of this RFC directly and apply the
> VDUSE CVQ on top then? What are we missing to do it?
>
Even better, can we just revert commit 56e71885b0349 ("vduse:
Temporarily fail if control queue feature requested") ?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-17 6:36 ` Eugenio Perez Martin
@ 2025-10-17 6:39 ` Michael S. Tsirkin
2025-10-17 7:21 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-17 6:39 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Jason Wang, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Fri, Oct 17, 2025 at 08:36:41AM +0200, Eugenio Perez Martin wrote:
> On Thu, Oct 16, 2025 at 8:25 AM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > > > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > > > >
> > > > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > > > have no effect.
> > > > > >
> > > > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > > > set rx mode. And assuming not all the commands are the best effort,
> > > > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > > > while.
> > > > >
> > > > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > > > generate and send second command and so on.
> > > >
> > > > Right, that's what I asked in another thread, we still need a timeout
> > > > here.
> > >
> > > we do not need a timeout.
> > >
> > > > Then I think it would not be too much difference whether it is
> > > > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > > > can only advertise NEEDS_RESET since it's a device implementation.
> > > > VDUSE can not simply break the device as it requires synchronization
> > > > which is not easy.
> > > >
> > > > > But for each bit of data, at most one command has to be sent,
> > > > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > > > the latest state.
> > > >
> > > > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> > > >
> > > > Thanks
> > >
> > >
> > > userspace should manage its own timeouts.
> > >
> >
> > Can we just apply the patch 2/2 of this RFC directly and apply the
> > VDUSE CVQ on top then? What are we missing to do it?
> >
>
> Even better, can we just revert commit 56e71885b0349 ("vduse:
> Temporarily fail if control queue feature requested") ?
No because both would let userspace hang kernels merely by
not consuming buffers.
--
MST
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-17 6:39 ` Michael S. Tsirkin
@ 2025-10-17 7:21 ` Eugenio Perez Martin
2025-10-22 9:46 ` Eugenio Perez Martin
2025-10-22 10:06 ` Michael S. Tsirkin
0 siblings, 2 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-17 7:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Fri, Oct 17, 2025 at 8:39 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Oct 17, 2025 at 08:36:41AM +0200, Eugenio Perez Martin wrote:
> > On Thu, Oct 16, 2025 at 8:25 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > > > > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > > > > >
> > > > > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > > > > have no effect.
> > > > > > >
> > > > > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > > > > set rx mode. And assuming not all the commands are the best effort,
> > > > > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > > > > while.
> > > > > >
> > > > > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > > > > generate and send second command and so on.
> > > > >
> > > > > Right, that's what I asked in another thread, we still need a timeout
> > > > > here.
> > > >
> > > > we do not need a timeout.
> > > >
> > > > > Then I think it would not be too much difference whether it is
> > > > > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > > > > can only advertise NEEDS_RESET since it's a device implementation.
> > > > > VDUSE can not simply break the device as it requires synchronization
> > > > > which is not easy.
> > > > >
> > > > > > But for each bit of data, at most one command has to be sent,
> > > > > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > > > > the latest state.
> > > > >
> > > > > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> > > > >
> > > > > Thanks
> > > >
> > > >
> > > > userspace should manage its own timeouts.
> > > >
> > >
> > > Can we just apply the patch 2/2 of this RFC directly and apply the
> > > VDUSE CVQ on top then? What are we missing to do it?
> > >
> >
> > Even better, can we just revert commit 56e71885b0349 ("vduse:
> > Temporarily fail if control queue feature requested") ?
>
> No because both would let userspace hang kernels merely by
> not consuming buffers.
>
My understanding was that you want to be able to debug qemu with gdb
from that hang [1].
Could you put an example of the whole chain of events you expect? From
the moment the driver sends a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command,
the VDUSE CVQ Forwards the command to the VDUSE device in the
userspace, and then the vduse userland device does not reply.
How does the VDUSE CVQ detect that the VDUSE device implemented in
userland does not reply? What are the next steps from that point of
the kernel VDUSE module?
Thanks!
[1] https://lore.kernel.org/lkml/20251014042459-mutt-send-email-mst@kernel.org/
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-17 7:21 ` Eugenio Perez Martin
@ 2025-10-22 9:46 ` Eugenio Perez Martin
2025-10-22 10:06 ` Michael S. Tsirkin
1 sibling, 0 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-22 9:46 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Fri, Oct 17, 2025 at 9:21 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Fri, Oct 17, 2025 at 8:39 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Oct 17, 2025 at 08:36:41AM +0200, Eugenio Perez Martin wrote:
> > > On Thu, Oct 16, 2025 at 8:25 AM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > > > > > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > > > > > >
> > > > > > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > > > > > have no effect.
> > > > > > > >
> > > > > > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > > > > > set rx mode. And assuming not all the commands are the best effort,
> > > > > > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > > > > > while.
> > > > > > >
> > > > > > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > > > > > generate and send second command and so on.
> > > > > >
> > > > > > Right, that's what I asked in another thread, we still need a timeout
> > > > > > here.
> > > > >
> > > > > we do not need a timeout.
> > > > >
> > > > > > Then I think it would not be too much difference whether it is
> > > > > > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > > > > > can only advertise NEEDS_RESET since it's a device implementation.
> > > > > > VDUSE can not simply break the device as it requires synchronization
> > > > > > which is not easy.
> > > > > >
> > > > > > > But for each bit of data, at most one command has to be sent,
> > > > > > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > > > > > the latest state.
> > > > > >
> > > > > > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > >
> > > > > userspace should manage its own timeouts.
> > > > >
> > > >
> > > > Can we just apply the patch 2/2 of this RFC directly and apply the
> > > > VDUSE CVQ on top then? What are we missing to do it?
> > > >
> > >
> > > Even better, can we just revert commit 56e71885b0349 ("vduse:
> > > Temporarily fail if control queue feature requested") ?
> >
> > No because both would let userspace hang kernels merely by
> > not consuming buffers.
> >
>
> My understanding was that you want to be able to debug qemu with gdb
> from that hang [1].
>
> Could you put an example of the whole chain of events you expect? From
> the moment the driver sends a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command,
> the VDUSE CVQ Forwards the command to the VDUSE device in the
> userspace, and then the vduse userland device does not reply.
>
> How does the VDUSE CVQ detect that the VDUSE device implemented in
> userland does not reply? What are the next steps from that point of
> the kernel VDUSE module?
>
> Thanks!
>
Friendly ping!
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-17 7:21 ` Eugenio Perez Martin
2025-10-22 9:46 ` Eugenio Perez Martin
@ 2025-10-22 10:06 ` Michael S. Tsirkin
1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-22 10:06 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Jason Wang, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Fri, Oct 17, 2025 at 09:21:17AM +0200, Eugenio Perez Martin wrote:
> On Fri, Oct 17, 2025 at 8:39 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Oct 17, 2025 at 08:36:41AM +0200, Eugenio Perez Martin wrote:
> > > On Thu, Oct 16, 2025 at 8:25 AM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 8:22 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Oct 16, 2025 at 02:03:57PM +0800, Jason Wang wrote:
> > > > > > On Thu, Oct 16, 2025 at 1:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 16, 2025 at 01:39:58PM +0800, Jason Wang wrote:
> > > > > > > > > >
> > > > > > > > > > Not exactly bufferize, record. E.g. we do not need to send
> > > > > > > > > > 100 messages to enable/disable promisc mode - together they
> > > > > > > > > > have no effect.
> > > > > > > >
> > > > > > > > Note that there's a case that multiple commands need to be sent, e.g
> > > > > > > > set rx mode. And assuming not all the commands are the best effort,
> > > > > > > > kernel VDUSE still needs to wait for the usersapce at least for a
> > > > > > > > while.
> > > > > > >
> > > > > > > Not wait, record. Generate 1st command, after userspace consumed it -
> > > > > > > generate and send second command and so on.
> > > > > >
> > > > > > Right, that's what I asked in another thread, we still need a timeout
> > > > > > here.
> > > > >
> > > > > we do not need a timeout.
> > > > >
> > > > > > Then I think it would not be too much difference whether it is
> > > > > > VDUSE or CVQ that will fail or break the device. Conceptually, VDUSE
> > > > > > can only advertise NEEDS_RESET since it's a device implementation.
> > > > > > VDUSE can not simply break the device as it requires synchronization
> > > > > > which is not easy.
> > > > > >
> > > > > > > But for each bit of data, at most one command has to be sent,
> > > > > > > we do not care if guest tweaked rx mode 3 times, we only care about
> > > > > > > the latest state.
> > > > > >
> > > > > > Yes, but I want to know what's best when VDUSE meets userspace timeout.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > >
> > > > > userspace should manage its own timeouts.
> > > > >
> > > >
> > > > Can we just apply the patch 2/2 of this RFC directly and apply the
> > > > VDUSE CVQ on top then? What are we missing to do it?
> > > >
> > >
> > > Even better, can we just revert commit 56e71885b0349 ("vduse:
> > > Temporarily fail if control queue feature requested") ?
> >
> > No because both would let userspace hang kernels merely by
> > not consuming buffers.
> >
>
> My understanding was that you want to be able to debug qemu with gdb
> from that hang [1].
>
> Could you put an example of the whole chain of events you expect? From
> the moment the driver sends a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command,
> the VDUSE CVQ Forwards the command to the VDUSE device in the
> userspace, and then the vduse userland device does not reply.
this is not the idea.
the idea is that kernel handles VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET.
separately it notifies userspace that some configuration
changed and userspace gets the new value.
Or not, if it is stuck.
> How does the VDUSE CVQ detect that the VDUSE device implemented in
> userland does not reply? What are the next steps from that point of
> the kernel VDUSE module?
>
> Thanks!
>
> [1] https://lore.kernel.org/lkml/20251014042459-mutt-send-email-mst@kernel.org/
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-15 10:36 ` Eugenio Perez Martin
2025-10-16 5:39 ` Jason Wang
@ 2025-10-22 10:09 ` Michael S. Tsirkin
2025-10-22 10:50 ` Eugenio Perez Martin
1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-22 10:09 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 15, 2025 at 12:36:47PM +0200, Eugenio Perez Martin wrote:
> On Wed, Oct 15, 2025 at 10:09 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 10:03:49AM +0200, Maxime Coquelin wrote:
> > > On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > > > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > > > > timeout.
> > > > > > > > > > >
> > > > > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > > > > Let's not break valid use-cases please.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > > > > kernel.
> > > > > > > > > >
> > > > > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > > > > and so assume the userspace application will Ack them.
> > > > > > > > > >
> > > > > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > > > > it somehow marks the device as broken?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Maxime
> > > > > > > > >
> > > > > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > > > > Once you use the buffer you can get another one and so on
> > > > > > > > > with no limit.
> > > > > > > > > One fix is to actually maintain device state in the
> > > > > > > > > kernel, update it, and then notify userspace.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > > > > >
> > > > > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > > > > the driver.
> > > > > > >
> > > > > > > There's no timeout. Kernel can guarantee executing all requests.
> > > > > > >
> > > > > >
> > > > > > I don't follow this. How should the VDUSE kernel module act if the
> > > > > > VDUSE userland device does not use the CVQ buffer then?
> > > > >
> > > > > First I am not sure a VQ is the best interface for talking to userspace.
> > > > > But assuming yes - just avoid sending more data, send it later after
> > > > > userspace used the buffer.
> > > > >
> > > >
> > > > Let me take a step back, I think I didn't describe the scenario well enough.
> > > >
> > > > We have a VDUSE device, and then the same host is interacting with the
> > > > device through the virtio_net driver over virtio_vdpa.
> > > >
> > > > Then, the virtio_net driver sends a control command though its CVQ, so
> > > > it *takes the RTNL*. That command reaches the VDUSE CVQ.
> > > >
> > > > It does not matter if the VDUSE device in the userland processes the
> > > > commands through a CVQ, reading the vduse character device, or another
> > > > system. The question is: what to do if the VDUSE device does not
> > > > process that command in a timely manner? Should we just let the RTNL
> > > > be taken forever?
> > > >
> > >
> > > My understanding is that:
> > > 1. Virtio-net sends a control messages, waits for reply
> > > 2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
> > > 3. Userspace application dequeues the message from the SCVQ
> > > a. If handling is successful it replies OK
> > > b. If handling fails, replies ERROR
>
> If that's the case, everything would be ok now. In both cases, the
> RTNL is held only by that time. The problem is when the VDUSE device
> userland does not reply.
>
> > > 4. VDUSE driver reads the reply
> > > a. if OK, do nothing
> > > b. if ERROR, mark the device as broken?
> > >
> > > This is simplified as it does not take into account SCVQ overflow if
> > > the application is stuck.
> > > If IIUC, Michael suggests to only enqueue a single message at the time
> > > in the SVQ,
> > > and bufferize the pending messages in the VDUSE driver.
>
> But the RTNL keeps being held in all that process, isn't it?
>
> >
> > Not exactly bufferize, record. E.g. we do not need to send
> > 100 messages to enable/disable promisc mode - together they
> > have no effect.
> >
>
> I still don't follow how that unlocks the RTNL. Let me put some workflows:
>
> 1) MAC_TABLE_SET, what can we do if:
> The driver sets a set of MAC addresses, (A, B, C). VDUSE device does
> send this set to the VDUSE userland device, as we don't have more
> information. Now, the driver sends a new table with addresses (A, B,
> D), but the device still didn't reply to the VDUSE driver.
>
> VDUSE should track that the new state is (A, B, D), and then wait for
> the previous request to be replied by the device? What should we
> report to the driver?
you reply OK to the driver immediately.
> If we wait for the device to reply, we're in the
> same situation regarding the RTNL.
>
> Now we receive a new state (A, B, E). We haven't sent the (A, B, D),
> so it is good to just replace the (A, B, D) with that. and send it
> when (A, B, C) is completed with either success or failure.
>
> 2) VQ_PAIRS_SET
>
> The driver starts with 1 vq pair. Now the driver sets 3 vq pairs, and
> the VDUSE CVQ forwards the command. The driver still thinks that it is
> using 1 vq pair. I can store that the driver request was 3, and it is
> still in-flight. Now the timeout occurs, so the VDUSE device returns
> fail to the driver, and the driver frees the vq regions etc. After
> that, the device now replies OK. The memory that was sent as the new
> vqs avail ring and descriptor ring now contains garbage, and it could
> happen that the device start overriding unrelated memory.
>
> Not even VQ_RESET protects against it as there is still a window
> between the CMD set and the VQ reset.
Timeouts should be up to userspace. If userspace times out
and then gets confused, kernel is not to blame.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-22 10:09 ` Michael S. Tsirkin
@ 2025-10-22 10:50 ` Eugenio Perez Martin
2025-10-22 11:43 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-22 10:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 22, 2025 at 12:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 12:36:47PM +0200, Eugenio Perez Martin wrote:
> > On Wed, Oct 15, 2025 at 10:09 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 10:03:49AM +0200, Maxime Coquelin wrote:
> > > > On Wed, Oct 15, 2025 at 9:45 AM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 9:05 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 08:52:50AM +0200, Eugenio Perez Martin wrote:
> > > > > > > On Wed, Oct 15, 2025 at 8:33 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 15, 2025 at 08:08:31AM +0200, Eugenio Perez Martin wrote:
> > > > > > > > > On Tue, Oct 14, 2025 at 11:25 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Oct 14, 2025 at 11:14:40AM +0200, Maxime Coquelin wrote:
> > > > > > > > > > > On Tue, Oct 14, 2025 at 10:29 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Oct 07, 2025 at 03:06:21PM +0200, Eugenio Pérez wrote:
> > > > > > > > > > > > > An userland device implemented through VDUSE could take rtnl forever if
> > > > > > > > > > > > > the virtio-net driver is running on top of virtio_vdpa. Let's break the
> > > > > > > > > > > > > device if it does not return the buffer in a longer-than-assumible
> > > > > > > > > > > > > timeout.
> > > > > > > > > > > >
> > > > > > > > > > > > So now I can't debug qemu with gdb because guest dies :(
> > > > > > > > > > > > Let's not break valid use-cases please.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Instead, solve it in vduse, probably by handling cvq within
> > > > > > > > > > > > kernel.
> > > > > > > > > > >
> > > > > > > > > > > Would a shadow control virtqueue implementation in the VDUSE driver work?
> > > > > > > > > > > It would ack systematically messages sent by the Virtio-net driver,
> > > > > > > > > > > and so assume the userspace application will Ack them.
> > > > > > > > > > >
> > > > > > > > > > > When the userspace application handles the message, if the handling fails,
> > > > > > > > > > > it somehow marks the device as broken?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Maxime
> > > > > > > > > >
> > > > > > > > > > Yes but it's a bit more convoluted than just acking them.
> > > > > > > > > > Once you use the buffer you can get another one and so on
> > > > > > > > > > with no limit.
> > > > > > > > > > One fix is to actually maintain device state in the
> > > > > > > > > > kernel, update it, and then notify userspace.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I thought of implementing this approach at first, but it has two drawbacks.
> > > > > > > > >
> > > > > > > > > The first one: it's racy. Let's say the driver updates the MAC filter,
> > > > > > > > > VDUSE timeout occurs, the guest receives the fail, and then the device
> > > > > > > > > replies with an OK. There is no way for the device or VDUSE to update
> > > > > > > > > the driver.
> > > > > > > >
> > > > > > > > There's no timeout. Kernel can guarantee executing all requests.
> > > > > > > >
> > > > > > >
> > > > > > > I don't follow this. How should the VDUSE kernel module act if the
> > > > > > > VDUSE userland device does not use the CVQ buffer then?
> > > > > >
> > > > > > First I am not sure a VQ is the best interface for talking to userspace.
> > > > > > But assuming yes - just avoid sending more data, send it later after
> > > > > > userspace used the buffer.
> > > > > >
> > > > >
> > > > > Let me take a step back, I think I didn't describe the scenario well enough.
> > > > >
> > > > > We have a VDUSE device, and then the same host is interacting with the
> > > > > device through the virtio_net driver over virtio_vdpa.
> > > > >
> > > > > Then, the virtio_net driver sends a control command though its CVQ, so
> > > > > it *takes the RTNL*. That command reaches the VDUSE CVQ.
> > > > >
> > > > > It does not matter if the VDUSE device in the userland processes the
> > > > > commands through a CVQ, reading the vduse character device, or another
> > > > > system. The question is: what to do if the VDUSE device does not
> > > > > process that command in a timely manner? Should we just let the RTNL
> > > > > be taken forever?
> > > > >
> > > >
> > > > My understanding is that:
> > > > 1. Virtio-net sends a control messages, waits for reply
> > > > 2. VDUSE driver dequeues it, adds it to the SCVQ, replies OK to the CVQ
> > > > 3. Userspace application dequeues the message from the SCVQ
> > > > a. If handling is successful it replies OK
> > > > b. If handling fails, replies ERROR
> >
> > If that's the case, everything would be ok now. In both cases, the
> > RTNL is held only by that time. The problem is when the VDUSE device
> > userland does not reply.
> >
> > > > 4. VDUSE driver reads the reply
> > > > a. if OK, do nothing
> > > > b. if ERROR, mark the device as broken?
> > > >
> > > > This is simplified as it does not take into account SCVQ overflow if
> > > > the application is stuck.
> > > > If IIUC, Michael suggests to only enqueue a single message at the time
> > > > in the SVQ,
> > > > and bufferize the pending messages in the VDUSE driver.
> >
> > But the RTNL keeps being held in all that process, isn't it?
> >
> > >
> > > Not exactly bufferize, record. E.g. we do not need to send
> > > 100 messages to enable/disable promisc mode - together they
> > > have no effect.
> > >
> >
> > I still don't follow how that unlocks the RTNL. Let me put some workflows:
> >
> > 1) MAC_TABLE_SET, what can we do if:
> > The driver sets a set of MAC addresses, (A, B, C). VDUSE device does
> > send this set to the VDUSE userland device, as we don't have more
> > information. Now, the driver sends a new table with addresses (A, B,
> > D), but the device still didn't reply to the VDUSE driver.
> >
> > VDUSE should track that the new state is (A, B, D), and then wait for
> > the previous request to be replied by the device? What should we
> > report to the driver?
>
> you reply OK to the driver immediately.
>
Let me switch to MQ as I think it illustrates the point better.
IIUC the workflow:
a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
b) VDUSE CVQ sends ok to the virtio-net driver
c) VDUSE CVQ sends the command to the VDUSE device
d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
e) VDUSE CVQ sends ok to the virtio-net driver
The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
so it potentially uses the second rx queue. But, by the standard:
The device MUST NOT queue packets on receive queues greater than
virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
command in a used buffer.
So the driver does not expect rx buffers on that queue at all. From
the driver's POV, the device is invalid, and it could mark it as
broken.
And, what's worse, how to handle it if the device now replies with
VIRTIO_NET_ERR to the VDUSE CVQ?
> > If we wait for the device to reply, we're in the
> > same situation regarding the RTNL.
> >
> > Now we receive a new state (A, B, E). We haven't sent the (A, B, D),
> > so it is good to just replace the (A, B, D) with that. and send it
> > when (A, B, C) is completed with either success or failure.
> >
> > 2) VQ_PAIRS_SET
> >
> > The driver starts with 1 vq pair. Now the driver sets 3 vq pairs, and
> > the VDUSE CVQ forwards the command. The driver still thinks that it is
> > using 1 vq pair. I can store that the driver request was 3, and it is
> > still in-flight. Now the timeout occurs, so the VDUSE device returns
> > fail to the driver, and the driver frees the vq regions etc. After
> > that, the device now replies OK. The memory that was sent as the new
> > vqs avail ring and descriptor ring now contains garbage, and it could
> > happen that the device start overriding unrelated memory.
> >
> > Not even VQ_RESET protects against it as there is still a window
> > between the CMD set and the VQ reset.
>
> Timeouts should be up to userspace. If userspace times out
> and then gets confused, kernel is not to blame.
>
>
I meant the virtio-net driver will be confused.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-22 10:50 ` Eugenio Perez Martin
@ 2025-10-22 11:43 ` Michael S. Tsirkin
2025-10-22 12:55 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-22 11:43 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> Let me switch to MQ as I think it illustrates the point better.
>
> IIUC the workflow:
> a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> b) VDUSE CVQ sends ok to the virtio-net driver
> c) VDUSE CVQ sends the command to the VDUSE device
> d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> e) VDUSE CVQ sends ok to the virtio-net driver
>
> The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> so it potentially uses the second rx queue. But, by the standard:
>
> The device MUST NOT queue packets on receive queues greater than
> virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> command in a used buffer.
>
> So the driver does not expect rx buffers on that queue at all. From
> the driver's POV, the device is invalid, and it could mark it as
> broken.
ok intresting. Note that if userspace processes vqs it should process
cvq too. I don't know what to do in this case yet, I'm going on
vacation, let me ponder this a bit.
> And, what's worse, how to handle it if the device now replies with
> VIRTIO_NET_ERR to the VDUSE CVQ?
this part does not bother me much. break it, probably.
> > > If we wait for the device to reply, we're in the
> > > same situation regarding the RTNL.
> > >
> > > Now we receive a new state (A, B, E). We haven't sent the (A, B, D),
> > > so it is good to just replace the (A, B, D) with that. and send it
> > > when (A, B, C) is completed with either success or failure.
> > >
> > > 2) VQ_PAIRS_SET
> > >
> > > The driver starts with 1 vq pair. Now the driver sets 3 vq pairs, and
> > > the VDUSE CVQ forwards the command. The driver still thinks that it is
> > > using 1 vq pair. I can store that the driver request was 3, and it is
> > > still in-flight. Now the timeout occurs, so the VDUSE device returns
> > > fail to the driver, and the driver frees the vq regions etc. After
> > > that, the device now replies OK. The memory that was sent as the new
> > > vqs avail ring and descriptor ring now contains garbage, and it could
> > > happen that the device start overriding unrelated memory.
> > >
> > > Not even VQ_RESET protects against it as there is still a window
> > > between the CMD set and the VQ reset.
> >
> > Timeouts should be up to userspace. If userspace times out
> > and then gets confused, kernel is not to blame.
> >
> >
>
> I meant the virtio-net driver will be confused.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-22 11:43 ` Michael S. Tsirkin
@ 2025-10-22 12:55 ` Eugenio Perez Martin
2025-10-28 14:09 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-22 12:55 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > Let me switch to MQ as I think it illustrates the point better.
> >
> > IIUC the workflow:
> > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > b) VDUSE CVQ sends ok to the virtio-net driver
> > c) VDUSE CVQ sends the command to the VDUSE device
> > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > e) VDUSE CVQ sends ok to the virtio-net driver
> >
> > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > so it potentially uses the second rx queue. But, by the standard:
> >
> > The device MUST NOT queue packets on receive queues greater than
> > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > command in a used buffer.
> >
> > So the driver does not expect rx buffers on that queue at all. From
> > the driver's POV, the device is invalid, and it could mark it as
> > broken.
>
> ok intresting. Note that if userspace processes vqs it should process
> cvq too. I don't know what to do in this case yet, I'm going on
> vacation, let me ponder this a bit.
>
Sure.
>
> > And, what's worse, how to handle it if the device now replies with
> > VIRTIO_NET_ERR to the VDUSE CVQ?
>
> this part does not bother me much. break it, probably.
>
To "successfully break it" we should implement NEED_RESET, or would it
work to just stop forwarding messages?
> > > > If we wait for the device to reply, we're in the
> > > > same situation regarding the RTNL.
> > > >
> > > > Now we receive a new state (A, B, E). We haven't sent the (A, B, D),
> > > > so it is good to just replace the (A, B, D) with that. and send it
> > > > when (A, B, C) is completed with either success or failure.
> > > >
> > > > 2) VQ_PAIRS_SET
> > > >
> > > > The driver starts with 1 vq pair. Now the driver sets 3 vq pairs, and
> > > > the VDUSE CVQ forwards the command. The driver still thinks that it is
> > > > using 1 vq pair. I can store that the driver request was 3, and it is
> > > > still in-flight. Now the timeout occurs, so the VDUSE device returns
> > > > fail to the driver, and the driver frees the vq regions etc. After
> > > > that, the device now replies OK. The memory that was sent as the new
> > > > vqs avail ring and descriptor ring now contains garbage, and it could
> > > > happen that the device start overriding unrelated memory.
> > > >
> > > > Not even VQ_RESET protects against it as there is still a window
> > > > between the CMD set and the VQ reset.
> > >
> > > Timeouts should be up to userspace. If userspace times out
> > > and then gets confused, kernel is not to blame.
> > >
> > >
> >
> > I meant the virtio-net driver will be confused.
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-22 12:55 ` Eugenio Perez Martin
@ 2025-10-28 14:09 ` Michael S. Tsirkin
2025-10-28 14:37 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-28 14:09 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > Let me switch to MQ as I think it illustrates the point better.
> > >
> > > IIUC the workflow:
> > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > c) VDUSE CVQ sends the command to the VDUSE device
> > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > e) VDUSE CVQ sends ok to the virtio-net driver
> > >
> > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > so it potentially uses the second rx queue. But, by the standard:
> > >
> > > The device MUST NOT queue packets on receive queues greater than
> > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > command in a used buffer.
> > >
> > > So the driver does not expect rx buffers on that queue at all. From
> > > the driver's POV, the device is invalid, and it could mark it as
> > > broken.
> >
> > ok intresting. Note that if userspace processes vqs it should process
> > cvq too. I don't know what to do in this case yet, I'm going on
> > vacation, let me ponder this a bit.
> >
>
> Sure.
So let me ask you this, how are you going to handle device reset?
Same issue, it seems to me.
--
MST
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-28 14:09 ` Michael S. Tsirkin
@ 2025-10-28 14:37 ` Eugenio Perez Martin
2025-10-28 14:42 ` Michael S. Tsirkin
0 siblings, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-28 14:37 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > Let me switch to MQ as I think it illustrates the point better.
> > > >
> > > > IIUC the workflow:
> > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > >
> > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > so it potentially uses the second rx queue. But, by the standard:
> > > >
> > > > The device MUST NOT queue packets on receive queues greater than
> > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > command in a used buffer.
> > > >
> > > > So the driver does not expect rx buffers on that queue at all. From
> > > > the driver's POV, the device is invalid, and it could mark it as
> > > > broken.
> > >
> > > ok intresting. Note that if userspace processes vqs it should process
> > > cvq too. I don't know what to do in this case yet, I'm going on
> > > vacation, let me ponder this a bit.
> > >
> >
> > Sure.
>
> So let me ask you this, how are you going to handle device reset?
> Same issue, it seems to me.
>
Well my proposal is to mark it as broken so it needs to be reset
manually. For example, unbinding and binding the driver in Linux. The
point is that the driver cannot trust the device anymore as it is in
an invalid state. Maybe suspend and reset all the vqs is also a valid
solution to un-broke it if the device supports it but I think a race
is unavoidable there, and I'm not sure how to communicate it to
userspace for all kinds of devices. Incrementing rx errors could be a
first proposal.
If we want to track it in VDUSE we should implement NEEDS_RESET and
leave all the old drivers without solution. That's why I think it is
better to solve all the problems at once in the driver.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-28 14:37 ` Eugenio Perez Martin
@ 2025-10-28 14:42 ` Michael S. Tsirkin
2025-10-28 14:57 ` Eugenio Perez Martin
0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-10-28 14:42 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 28, 2025 at 03:37:09PM +0100, Eugenio Perez Martin wrote:
> On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > > Let me switch to MQ as I think it illustrates the point better.
> > > > >
> > > > > IIUC the workflow:
> > > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > > >
> > > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > > so it potentially uses the second rx queue. But, by the standard:
> > > > >
> > > > > The device MUST NOT queue packets on receive queues greater than
> > > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > > command in a used buffer.
> > > > >
> > > > > So the driver does not expect rx buffers on that queue at all. From
> > > > > the driver's POV, the device is invalid, and it could mark it as
> > > > > broken.
> > > >
> > > > ok intresting. Note that if userspace processes vqs it should process
> > > > cvq too. I don't know what to do in this case yet, I'm going on
> > > > vacation, let me ponder this a bit.
> > > >
> > >
> > > Sure.
> >
> > So let me ask you this, how are you going to handle device reset?
> > Same issue, it seems to me.
> >
>
> Well my proposal is to mark it as broken so it needs to be reset
> manually.
Heh but guest assumes after reset device does not poke at guest
memory, and will free up and reuse that memory.
If userspace still pokes at it -> plus plus ungood.
> For example, unbinding and binding the driver in Linux. The
> point is that the driver cannot trust the device anymore as it is in
> an invalid state. Maybe suspend and reset all the vqs is also a valid
> solution to un-broke it if the device supports it but I think a race
> is unavoidable there, and I'm not sure how to communicate it to
> userspace for all kinds of devices. Incrementing rx errors could be a
> first proposal.
>
> If we want to track it in VDUSE we should implement NEEDS_RESET and
> leave all the old drivers without solution. That's why I think it is
> better to solve all the problems at once in the driver.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-28 14:42 ` Michael S. Tsirkin
@ 2025-10-28 14:57 ` Eugenio Perez Martin
2025-10-29 0:36 ` Jason Wang
2025-11-05 9:02 ` Eugenio Perez Martin
0 siblings, 2 replies; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-10-28 14:57 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 28, 2025 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 28, 2025 at 03:37:09PM +0100, Eugenio Perez Martin wrote:
> > On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > > > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > > > Let me switch to MQ as I think it illustrates the point better.
> > > > > >
> > > > > > IIUC the workflow:
> > > > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > > > >
> > > > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > > > so it potentially uses the second rx queue. But, by the standard:
> > > > > >
> > > > > > The device MUST NOT queue packets on receive queues greater than
> > > > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > > > command in a used buffer.
> > > > > >
> > > > > > So the driver does not expect rx buffers on that queue at all. From
> > > > > > the driver's POV, the device is invalid, and it could mark it as
> > > > > > broken.
> > > > >
> > > > > ok intresting. Note that if userspace processes vqs it should process
> > > > > cvq too. I don't know what to do in this case yet, I'm going on
> > > > > vacation, let me ponder this a bit.
> > > > >
> > > >
> > > > Sure.
> > >
> > > So let me ask you this, how are you going to handle device reset?
> > > Same issue, it seems to me.
> > >
> >
> > Well my proposal is to mark it as broken so it needs to be reset
> > manually.
>
>
> Heh but guest assumes after reset device does not poke at guest
> memory, and will free up and reuse that memory.
> If userspace still pokes at it -> plus plus ungood.
>
I don't get this part. Once the device is reset, the device should not
poke at guest memory (unless it is malicious or similar). Why would it
do it?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-28 14:57 ` Eugenio Perez Martin
@ 2025-10-29 0:36 ` Jason Wang
2025-11-05 9:02 ` Eugenio Perez Martin
1 sibling, 0 replies; 45+ messages in thread
From: Jason Wang @ 2025-10-29 0:36 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Michael S. Tsirkin, Maxime Coquelin, Yongji Xie, virtualization,
linux-kernel, Xuan Zhuo, Dragos Tatulea DE
On Tue, Oct 28, 2025 at 10:58 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Oct 28, 2025 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 28, 2025 at 03:37:09PM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > > > > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > > > > Let me switch to MQ as I think it illustrates the point better.
> > > > > > >
> > > > > > > IIUC the workflow:
> > > > > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > >
> > > > > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > > > > so it potentially uses the second rx queue. But, by the standard:
> > > > > > >
> > > > > > > The device MUST NOT queue packets on receive queues greater than
> > > > > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > > > > command in a used buffer.
> > > > > > >
> > > > > > > So the driver does not expect rx buffers on that queue at all. From
> > > > > > > the driver's POV, the device is invalid, and it could mark it as
> > > > > > > broken.
> > > > > >
> > > > > > ok intresting. Note that if userspace processes vqs it should process
> > > > > > cvq too. I don't know what to do in this case yet, I'm going on
> > > > > > vacation, let me ponder this a bit.
> > > > > >
> > > > >
> > > > > Sure.
> > > >
> > > > So let me ask you this, how are you going to handle device reset?
> > > > Same issue, it seems to me.
> > > >
> > >
> > > Well my proposal is to mark it as broken so it needs to be reset
> > > manually.
> >
> >
> > Heh but guest assumes after reset device does not poke at guest
> > memory, and will free up and reuse that memory.
> > If userspace still pokes at it -> plus plus ungood.
> >
>
> I don't get this part. Once the device is reset, the device should not
> poke at guest memory (unless it is malicious or similar). Why would it
> do it?
>
At least for this case virtio-vDPA + VDUSE, there's no way for the
userspace to poke after reset since everything is done via IOTLB.
For other devices, if we want this extra safety, we need to enable swiotlb.
Thanks
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-10-28 14:57 ` Eugenio Perez Martin
2025-10-29 0:36 ` Jason Wang
@ 2025-11-05 9:02 ` Eugenio Perez Martin
2025-11-09 21:46 ` Michael S. Tsirkin
1 sibling, 1 reply; 45+ messages in thread
From: Eugenio Perez Martin @ 2025-11-05 9:02 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Tue, Oct 28, 2025 at 3:57 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Oct 28, 2025 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 28, 2025 at 03:37:09PM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > > > > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > > > > Let me switch to MQ as I think it illustrates the point better.
> > > > > > >
> > > > > > > IIUC the workflow:
> > > > > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > >
> > > > > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > > > > so it potentially uses the second rx queue. But, by the standard:
> > > > > > >
> > > > > > > The device MUST NOT queue packets on receive queues greater than
> > > > > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > > > > command in a used buffer.
> > > > > > >
> > > > > > > So the driver does not expect rx buffers on that queue at all. From
> > > > > > > the driver's POV, the device is invalid, and it could mark it as
> > > > > > > broken.
> > > > > >
> > > > > > ok intresting. Note that if userspace processes vqs it should process
> > > > > > cvq too. I don't know what to do in this case yet, I'm going on
> > > > > > vacation, let me ponder this a bit.
> > > > > >
> > > > >
> > > > > Sure.
> > > >
> > > > So let me ask you this, how are you going to handle device reset?
> > > > Same issue, it seems to me.
> > > >
> > >
> > > Well my proposal is to mark it as broken so it needs to be reset
> > > manually.
> >
> >
> > Heh but guest assumes after reset device does not poke at guest
> > memory, and will free up and reuse that memory.
> > If userspace still pokes at it -> plus plus ungood.
> >
>
> I don't get this part. Once the device is reset, the device should not
> poke at guest memory (unless it is malicious or similar). Why would it
> do it?
Friendly ping.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/2] virtio_net: timeout control virtqueue commands
2025-11-05 9:02 ` Eugenio Perez Martin
@ 2025-11-09 21:46 ` Michael S. Tsirkin
0 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2025-11-09 21:46 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Maxime Coquelin, Yongji Xie, virtualization, linux-kernel,
Xuan Zhuo, Dragos Tatulea DE, jasowang
On Wed, Nov 05, 2025 at 10:02:48AM +0100, Eugenio Perez Martin wrote:
> On Tue, Oct 28, 2025 at 3:57 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Oct 28, 2025 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 28, 2025 at 03:37:09PM +0100, Eugenio Perez Martin wrote:
> > > > On Tue, Oct 28, 2025 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 22, 2025 at 02:55:18PM +0200, Eugenio Perez Martin wrote:
> > > > > > On Wed, Oct 22, 2025 at 1:43 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 22, 2025 at 12:50:53PM +0200, Eugenio Perez Martin wrote:
> > > > > > > > Let me switch to MQ as I think it illustrates the point better.
> > > > > > > >
> > > > > > > > IIUC the workflow:
> > > > > > > > a) virtio-net sends MQ_VQ_PAIRS_SET 2 to the device
> > > > > > > > b) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > > > c) VDUSE CVQ sends the command to the VDUSE device
> > > > > > > > d) Now the virtio-net driver sends virtio-net sends MQ_VQ_PAIRS_SET 1
> > > > > > > > e) VDUSE CVQ sends ok to the virtio-net driver
> > > > > > > >
> > > > > > > > The device didn't process the MQ_VQ_PAIRS_SET 1 command at this point,
> > > > > > > > so it potentially uses the second rx queue. But, by the standard:
> > > > > > > >
> > > > > > > > The device MUST NOT queue packets on receive queues greater than
> > > > > > > > virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> > > > > > > > command in a used buffer.
> > > > > > > >
> > > > > > > > So the driver does not expect rx buffers on that queue at all. From
> > > > > > > > the driver's POV, the device is invalid, and it could mark it as
> > > > > > > > broken.
> > > > > > >
> > > > > > > ok intresting. Note that if userspace processes vqs it should process
> > > > > > > cvq too. I don't know what to do in this case yet, I'm going on
> > > > > > > vacation, let me ponder this a bit.
> > > > > > >
> > > > > >
> > > > > > Sure.
> > > > >
> > > > > So let me ask you this, how are you going to handle device reset?
> > > > > Same issue, it seems to me.
> > > > >
> > > >
> > > > Well my proposal is to mark it as broken so it needs to be reset
> > > > manually.
> > >
> > >
> > > Heh but guest assumes after reset device does not poke at guest
> > > memory, and will free up and reuse that memory.
> > > If userspace still pokes at it -> plus plus ungood.
> > >
> >
> > I don't get this part. Once the device is reset, the device should not
> > poke at guest memory (unless it is malicious or similar). Why would it
> > do it?
>
> Friendly ping.
OK I thought about it a bunch. A lot of net drivers actually
just queue ethtool commands and finish them asynchronously.
Thinkably virtio could expose an API on whether it is safe to
wait for buffers to be used. virtio-net would then either
send commands directly or do the asynchronous thing.
Hmm?
--
MST
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2025-11-09 21:47 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-07 13:06 [RFC 0/2] Lift restriction about VDUSE net devices with CVQ Eugenio Pérez
2025-10-07 13:06 ` [RFC 1/2] virtio_net: timeout control virtqueue commands Eugenio Pérez
2025-10-11 7:44 ` Jason Wang
2025-10-14 7:30 ` Eugenio Perez Martin
2025-10-14 8:29 ` Michael S. Tsirkin
2025-10-14 9:14 ` Maxime Coquelin
2025-10-14 9:25 ` Michael S. Tsirkin
2025-10-14 10:21 ` Maxime Coquelin
2025-10-15 4:44 ` Jason Wang
2025-10-15 6:07 ` Michael S. Tsirkin
2025-10-15 6:08 ` Eugenio Perez Martin
2025-10-15 6:33 ` Michael S. Tsirkin
2025-10-15 6:52 ` Eugenio Perez Martin
2025-10-15 7:04 ` Michael S. Tsirkin
2025-10-15 7:45 ` Eugenio Perez Martin
2025-10-15 8:03 ` Maxime Coquelin
2025-10-15 8:09 ` Michael S. Tsirkin
2025-10-15 9:16 ` Maxime Coquelin
2025-10-15 10:36 ` Eugenio Perez Martin
2025-10-16 5:39 ` Jason Wang
2025-10-16 5:45 ` Michael S. Tsirkin
2025-10-16 6:03 ` Jason Wang
2025-10-16 6:22 ` Michael S. Tsirkin
2025-10-16 6:25 ` Eugenio Perez Martin
2025-10-17 6:36 ` Eugenio Perez Martin
2025-10-17 6:39 ` Michael S. Tsirkin
2025-10-17 7:21 ` Eugenio Perez Martin
2025-10-22 9:46 ` Eugenio Perez Martin
2025-10-22 10:06 ` Michael S. Tsirkin
2025-10-22 10:09 ` Michael S. Tsirkin
2025-10-22 10:50 ` Eugenio Perez Martin
2025-10-22 11:43 ` Michael S. Tsirkin
2025-10-22 12:55 ` Eugenio Perez Martin
2025-10-28 14:09 ` Michael S. Tsirkin
2025-10-28 14:37 ` Eugenio Perez Martin
2025-10-28 14:42 ` Michael S. Tsirkin
2025-10-28 14:57 ` Eugenio Perez Martin
2025-10-29 0:36 ` Jason Wang
2025-11-05 9:02 ` Eugenio Perez Martin
2025-11-09 21:46 ` Michael S. Tsirkin
2025-10-07 13:06 ` [RFC 2/2] vduse: lift restriction about net devices with CVQ Eugenio Pérez
2025-10-09 13:14 ` Maxime Coquelin
2025-10-15 6:11 ` Eugenio Perez Martin
2025-10-14 8:31 ` Michael S. Tsirkin
2025-10-15 6:25 ` Eugenio Perez Martin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).