* [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
@ 2025-04-01 17:07 Vladimir Sementsov-Ogievskiy
2025-04-02 12:32 ` Vladimir Sementsov-Ogievskiy
2025-04-04 6:46 ` Markus Armbruster
0 siblings, 2 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-01 17:07 UTC (permalink / raw)
To: mst
Cc: sgarzare, armbru, dave, eblake, pbonzini, berrange, eduardo,
qemu-devel, Vladimir Sementsov-Ogievskiy, Denis Plotnikov
For now we only log the vhost device error, when virtqueue is actually
stopped. Let's add a QAPI event, which makes possible:
- collect statistics of such errors
- make immediate actions: take core dumps or do some other debugging
- inform the user through a management API or UI, so that (s)he can
react somehow, e.g. reset the device driver in the guest or even
build up some automation to do so
Note that basically every inconsistency discovered during virtqueue
processing results in a silent virtqueue stop. The guest then just
sees the requests getting stuck somewhere in the device for no visible
reason. This event provides a means to inform the management layer of
this situation in a timely fashion.
The event could be reused for some other virtqueue problems (not only
for vhost devices) in future. For this it gets a generic name and
structure.
We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
here, it's not the only call to VHOST_OPS_DEBUG in the file.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
---
v5: resend, update version in QAPI to 10.1
drop a-b by Markus (too much time passed, the context could
changed. Markus, is the patch still OK?)
hw/virtio/vhost.c | 12 +++++++++---
monitor/monitor.c | 10 ++++++++++
qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6aa72fd434..0b205cef73 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -15,6 +15,7 @@
#include "qemu/osdep.h"
#include "qapi/error.h"
+#include "qapi/qapi-events-qdev.h"
#include "hw/virtio/vhost.h"
#include "qemu/atomic.h"
#include "qemu/range.h"
@@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
error_notifier);
struct vhost_dev *dev = vq->dev;
- int index = vq - dev->vqs;
if (event_notifier_test_and_clear(n) && dev->vdev) {
- VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
- dev->vq_index + index);
+ int ind = vq - dev->vqs + dev->vq_index;
+ DeviceState *ds = &dev->vdev->parent_obj;
+
+ VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
+ qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
+ VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
+ "vhost reported failure through vring "
+ "error fd");
}
}
diff --git a/monitor/monitor.c b/monitor/monitor.c
index c5a5d30877..1296a9207e 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
[QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
[QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
[QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
+ [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
[QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
[QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
[QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
@@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
}
+ if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
+ hash += g_str_hash(qdict_get_str(evstate->data, "device"));
+ }
+
return hash;
}
@@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
qdict_get_str(evb->data, "qom-path"));
}
+ if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
+ return !strcmp(qdict_get_str(eva->data, "device"),
+ qdict_get_str(evb->data, "device"));
+ }
+
return TRUE;
}
diff --git a/qapi/qdev.json b/qapi/qdev.json
index 25cbcf977b..2d20f4777e 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -187,3 +187,35 @@
{ 'command': 'device-sync-config',
'features': [ 'unstable' ],
'data': {'id': 'str'} }
+
+##
+# @VirtqueueError:
+#
+# @vhost-vring-error: Vhost device reported failure through
+# through vring error fd.
+#
+# Since: 10.1
+##
+{ 'enum': 'VirtqueueError',
+ 'data': [ 'vhost-vring-error' ] }
+
+##
+# @VIRTQUEUE_ERROR:
+#
+# Emitted when a device virtqueue fails in runtime.
+#
+# @device: the device's ID if it has one
+#
+# @path: the device's QOM path
+#
+# @virtqueue: virtqueue index
+#
+# @error: error identifier
+#
+# @description: human readable description
+#
+# Since: 10.1
+##
+{ 'event': 'VIRTQUEUE_ERROR',
+ 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
+ 'error': 'VirtqueueError', 'description': 'str'} }
--
2.48.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
2025-04-01 17:07 [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy
@ 2025-04-02 12:32 ` Vladimir Sementsov-Ogievskiy
2025-04-03 7:37 ` Vladimir Sementsov-Ogievskiy
2025-04-04 6:46 ` Markus Armbruster
1 sibling, 1 reply; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-02 12:32 UTC (permalink / raw)
To: mst
Cc: sgarzare, armbru, dave, eblake, pbonzini, berrange, eduardo,
qemu-devel, Denis Plotnikov
On 01.04.25 20:07, Vladimir Sementsov-Ogievskiy wrote:
> For now we only log the vhost device error, when virtqueue is actually
> stopped. Let's add a QAPI event, which makes possible:
>
> - collect statistics of such errors
> - make immediate actions: take core dumps or do some other debugging
> - inform the user through a management API or UI, so that (s)he can
> react somehow, e.g. reset the device driver in the guest or even
> build up some automation to do so
>
> Note that basically every inconsistency discovered during virtqueue
> processing results in a silent virtqueue stop. The guest then just
> sees the requests getting stuck somewhere in the device for no visible
> reason. This event provides a means to inform the management layer of
> this situation in a timely fashion.
>
> The event could be reused for some other virtqueue problems (not only
> for vhost devices) in future. For this it gets a generic name and
> structure.
>
> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> ---
>
> v5: resend, update version in QAPI to 10.1
> drop a-b by Markus (too much time passed, the context could
> changed. Markus, is the patch still OK?)
>
> hw/virtio/vhost.c | 12 +++++++++---
> monitor/monitor.c | 10 ++++++++++
> qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
> 3 files changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6aa72fd434..0b205cef73 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -15,6 +15,7 @@
>
> #include "qemu/osdep.h"
> #include "qapi/error.h"
> +#include "qapi/qapi-events-qdev.h"
> #include "hw/virtio/vhost.h"
> #include "qemu/atomic.h"
> #include "qemu/range.h"
> @@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
> error_notifier);
> struct vhost_dev *dev = vq->dev;
> - int index = vq - dev->vqs;
>
> if (event_notifier_test_and_clear(n) && dev->vdev) {
> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
> - dev->vq_index + index);
> + int ind = vq - dev->vqs + dev->vq_index;
> + DeviceState *ds = &dev->vdev->parent_obj;
> +
> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
> + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
> + VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
> + "vhost reported failure through vring "
> + "error fd");
> }
> }
>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index c5a5d30877..1296a9207e 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
> + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
> [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
> @@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
> }
>
> + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> + hash += g_str_hash(qdict_get_str(evstate->data, "device"));
Hmm we should use "qom-path" here, right?
> + }
> +
> return hash;
> }
>
> @@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
> qdict_get_str(evb->data, "qom-path"));
> }
>
> + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> + return !strcmp(qdict_get_str(eva->data, "device"),
> + qdict_get_str(evb->data, "device"));
and here
> + }
> +
> return TRUE;
> }
>
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 25cbcf977b..2d20f4777e 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -187,3 +187,35 @@
> { 'command': 'device-sync-config',
> 'features': [ 'unstable' ],
> 'data': {'id': 'str'} }
> +
> +##
> +# @VirtqueueError:
> +#
> +# @vhost-vring-error: Vhost device reported failure through
> +# through vring error fd.
> +#
> +# Since: 10.1
> +##
> +{ 'enum': 'VirtqueueError',
> + 'data': [ 'vhost-vring-error' ] }
> +
> +##
> +# @VIRTQUEUE_ERROR:
> +#
> +# Emitted when a device virtqueue fails in runtime.
> +#
> +# @device: the device's ID if it has one
> +#
> +# @path: the device's QOM path
> +#
> +# @virtqueue: virtqueue index
> +#
> +# @error: error identifier
> +#
> +# @description: human readable description
> +#
> +# Since: 10.1
> +##
> +{ 'event': 'VIRTQUEUE_ERROR',
> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
> + 'error': 'VirtqueueError', 'description': 'str'} }
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
2025-04-02 12:32 ` Vladimir Sementsov-Ogievskiy
@ 2025-04-03 7:37 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-03 7:37 UTC (permalink / raw)
To: mst
Cc: sgarzare, armbru, dave, eblake, pbonzini, berrange, eduardo,
qemu-devel, Denis Plotnikov
On 02.04.25 15:32, Vladimir Sementsov-Ogievskiy wrote:
> On 01.04.25 20:07, Vladimir Sementsov-Ogievskiy wrote:
>> For now we only log the vhost device error, when virtqueue is actually
>> stopped. Let's add a QAPI event, which makes possible:
>>
>> - collect statistics of such errors
>> - make immediate actions: take core dumps or do some other debugging
>> - inform the user through a management API or UI, so that (s)he can
>> react somehow, e.g. reset the device driver in the guest or even
>> build up some automation to do so
>>
>> Note that basically every inconsistency discovered during virtqueue
>> processing results in a silent virtqueue stop. The guest then just
>> sees the requests getting stuck somewhere in the device for no visible
>> reason. This event provides a means to inform the management layer of
>> this situation in a timely fashion.
>>
>> The event could be reused for some other virtqueue problems (not only
>> for vhost devices) in future. For this it gets a generic name and
>> structure.
>>
>> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
>> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>> ---
>>
>> v5: resend, update version in QAPI to 10.1
>> drop a-b by Markus (too much time passed, the context could
>> changed. Markus, is the patch still OK?)
>>
>> hw/virtio/vhost.c | 12 +++++++++---
>> monitor/monitor.c | 10 ++++++++++
>> qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
>> 3 files changed, 51 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..0b205cef73 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -15,6 +15,7 @@
>> #include "qemu/osdep.h"
>> #include "qapi/error.h"
>> +#include "qapi/qapi-events-qdev.h"
>> #include "hw/virtio/vhost.h"
>> #include "qemu/atomic.h"
>> #include "qemu/range.h"
>> @@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
>> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
>> error_notifier);
>> struct vhost_dev *dev = vq->dev;
>> - int index = vq - dev->vqs;
>> if (event_notifier_test_and_clear(n) && dev->vdev) {
>> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
>> - dev->vq_index + index);
>> + int ind = vq - dev->vqs + dev->vq_index;
>> + DeviceState *ds = &dev->vdev->parent_obj;
>> +
>> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
>> + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
>> + VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
>> + "vhost reported failure through vring "
>> + "error fd");
>> }
>> }
>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> index c5a5d30877..1296a9207e 100644
>> --- a/monitor/monitor.c
>> +++ b/monitor/monitor.c
>> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
>> + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
>> @@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>> }
>> + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
>> + hash += g_str_hash(qdict_get_str(evstate->data, "device"));
>
> Hmm we should use "qom-path" here, right?
I.e. "path", as it's called "path" below. Still, other events already have "qom-path", so I should rename in this as well. Will resend a v6.
>
>> + }
>> +
>> return hash;
>> }
>> @@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
>> qdict_get_str(evb->data, "qom-path"));
>> }
>> + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
>> + return !strcmp(qdict_get_str(eva->data, "device"),
>> + qdict_get_str(evb->data, "device"));
>
> and here
>
>> + }
>> +
>> return TRUE;
>> }
>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>> index 25cbcf977b..2d20f4777e 100644
>> --- a/qapi/qdev.json
>> +++ b/qapi/qdev.json
>> @@ -187,3 +187,35 @@
>> { 'command': 'device-sync-config',
>> 'features': [ 'unstable' ],
>> 'data': {'id': 'str'} }
>> +
>> +##
>> +# @VirtqueueError:
>> +#
>> +# @vhost-vring-error: Vhost device reported failure through
>> +# through vring error fd.
>> +#
>> +# Since: 10.1
>> +##
>> +{ 'enum': 'VirtqueueError',
>> + 'data': [ 'vhost-vring-error' ] }
>> +
>> +##
>> +# @VIRTQUEUE_ERROR:
>> +#
>> +# Emitted when a device virtqueue fails in runtime.
>> +#
>> +# @device: the device's ID if it has one
>> +#
>> +# @path: the device's QOM path
>> +#
>> +# @virtqueue: virtqueue index
>> +#
>> +# @error: error identifier
>> +#
>> +# @description: human readable description
>> +#
>> +# Since: 10.1
>> +##
>> +{ 'event': 'VIRTQUEUE_ERROR',
>> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
>> + 'error': 'VirtqueueError', 'description': 'str'} }
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
2025-04-01 17:07 [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy
2025-04-02 12:32 ` Vladimir Sementsov-Ogievskiy
@ 2025-04-04 6:46 ` Markus Armbruster
2025-04-04 10:26 ` Vladimir Sementsov-Ogievskiy
1 sibling, 1 reply; 6+ messages in thread
From: Markus Armbruster @ 2025-04-04 6:46 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, sgarzare, dave, eblake, pbonzini, berrange, eduardo,
qemu-devel, Denis Plotnikov
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> For now we only log the vhost device error, when virtqueue is actually
> stopped. Let's add a QAPI event, which makes possible:
>
> - collect statistics of such errors
> - make immediate actions: take core dumps or do some other debugging
> - inform the user through a management API or UI, so that (s)he can
> react somehow, e.g. reset the device driver in the guest or even
> build up some automation to do so
>
> Note that basically every inconsistency discovered during virtqueue
> processing results in a silent virtqueue stop. The guest then just
> sees the requests getting stuck somewhere in the device for no visible
> reason. This event provides a means to inform the management layer of
> this situation in a timely fashion.
>
> The event could be reused for some other virtqueue problems (not only
> for vhost devices) in future. For this it gets a generic name and
> structure.
>
> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> ---
>
> v5: resend, update version in QAPI to 10.1
> drop a-b by Markus (too much time passed, the context could
> changed. Markus, is the patch still OK?)
Happy to take another look :)
> hw/virtio/vhost.c | 12 +++++++++---
> monitor/monitor.c | 10 ++++++++++
> qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
> 3 files changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6aa72fd434..0b205cef73 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -15,6 +15,7 @@
>
> #include "qemu/osdep.h"
> #include "qapi/error.h"
> +#include "qapi/qapi-events-qdev.h"
> #include "hw/virtio/vhost.h"
> #include "qemu/atomic.h"
> #include "qemu/range.h"
> @@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
> error_notifier);
> struct vhost_dev *dev = vq->dev;
> - int index = vq - dev->vqs;
>
> if (event_notifier_test_and_clear(n) && dev->vdev) {
> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
> - dev->vq_index + index);
> + int ind = vq - dev->vqs + dev->vq_index;
> + DeviceState *ds = &dev->vdev->parent_obj;
> +
> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
> + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
> + VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
> + "vhost reported failure through vring "
> + "error fd");
> }
> }
>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index c5a5d30877..1296a9207e 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
> + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
> [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
> @@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
> }
>
> + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> + hash += g_str_hash(qdict_get_str(evstate->data, "device"));
Isn't @device optional? Should you use @qom-path instead?
> + }
> +
> return hash;
> }
>
> @@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
> qdict_get_str(evb->data, "qom-path"));
> }
>
> + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> + return !strcmp(qdict_get_str(eva->data, "device"),
> + qdict_get_str(evb->data, "device"));
Likewise.
> + }
> +
> return TRUE;
> }
>
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 25cbcf977b..2d20f4777e 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -187,3 +187,35 @@
> { 'command': 'device-sync-config',
> 'features': [ 'unstable' ],
> 'data': {'id': 'str'} }
> +
> +##
> +# @VirtqueueError:
> +#
> +# @vhost-vring-error: Vhost device reported failure through
> +# through vring error fd.
One "through" too many.
I'm not quite sure I understand what you're trying to express. Is it
"the vhost device has communicated failure via the vring error file
descriptor"?
I know next to nothing about vhost devices...
> +#
> +# Since: 10.1
> +##
> +{ 'enum': 'VirtqueueError',
> + 'data': [ 'vhost-vring-error' ] }
> +
> +##
> +# @VIRTQUEUE_ERROR:
> +#
> +# Emitted when a device virtqueue fails in runtime.
I think it's "at runtime".
Can a device have more than one virtqueue?
> +#
> +# @device: the device's ID if it has one
> +#
> +# @path: the device's QOM path
I agree with your follow-up: name it @qom-path.
> +#
> +# @virtqueue: virtqueue index
Bear with me... What's a virtqueue index?
> +#
> +# @error: error identifier
> +#
> +# @description: human readable description
> +#
> +# Since: 10.1
> +##
> +{ 'event': 'VIRTQUEUE_ERROR',
> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
> + 'error': 'VirtqueueError', 'description': 'str'} }
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
2025-04-04 6:46 ` Markus Armbruster
@ 2025-04-04 10:26 ` Vladimir Sementsov-Ogievskiy
2025-04-04 14:28 ` Markus Armbruster
0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-04 10:26 UTC (permalink / raw)
To: Markus Armbruster
Cc: mst, sgarzare, dave, eblake, pbonzini, berrange, eduardo,
qemu-devel, Denis Plotnikov
On 04.04.25 09:46, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
>
>> For now we only log the vhost device error, when virtqueue is actually
>> stopped. Let's add a QAPI event, which makes possible:
>>
>> - collect statistics of such errors
>> - make immediate actions: take core dumps or do some other debugging
>> - inform the user through a management API or UI, so that (s)he can
>> react somehow, e.g. reset the device driver in the guest or even
>> build up some automation to do so
>>
>> Note that basically every inconsistency discovered during virtqueue
>> processing results in a silent virtqueue stop. The guest then just
>> sees the requests getting stuck somewhere in the device for no visible
>> reason. This event provides a means to inform the management layer of
>> this situation in a timely fashion.
>>
>> The event could be reused for some other virtqueue problems (not only
>> for vhost devices) in future. For this it gets a generic name and
>> structure.
>>
>> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
>> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>> ---
>>
>> v5: resend, update version in QAPI to 10.1
>> drop a-b by Markus (too much time passed, the context could
>> changed. Markus, is the patch still OK?)
>
> Happy to take another look :)
>
>> hw/virtio/vhost.c | 12 +++++++++---
>> monitor/monitor.c | 10 ++++++++++
>> qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
>> 3 files changed, 51 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..0b205cef73 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -15,6 +15,7 @@
>>
>> #include "qemu/osdep.h"
>> #include "qapi/error.h"
>> +#include "qapi/qapi-events-qdev.h"
>> #include "hw/virtio/vhost.h"
>> #include "qemu/atomic.h"
>> #include "qemu/range.h"
>> @@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
>> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
>> error_notifier);
>> struct vhost_dev *dev = vq->dev;
>> - int index = vq - dev->vqs;
>>
>> if (event_notifier_test_and_clear(n) && dev->vdev) {
>> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
>> - dev->vq_index + index);
>> + int ind = vq - dev->vqs + dev->vq_index;
>> + DeviceState *ds = &dev->vdev->parent_obj;
>> +
>> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
>> + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
>> + VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
>> + "vhost reported failure through vring "
>> + "error fd");
>> }
>> }
>>
>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> index c5a5d30877..1296a9207e 100644
>> --- a/monitor/monitor.c
>> +++ b/monitor/monitor.c
>> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
>> + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
>> [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
>> @@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>> }
>>
>> + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
>> + hash += g_str_hash(qdict_get_str(evstate->data, "device"));
>
> Isn't @device optional? Should you use @qom-path instead?
Yes, I'm thinking about it too. Maybe, drop the "device" field from the event at all, and keep only qom-path? Years passed since movement to use qom paths..
>
>> + }
>> +
>> return hash;
>> }
>>
>> @@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
>> qdict_get_str(evb->data, "qom-path"));
>> }
>>
>> + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
>> + return !strcmp(qdict_get_str(eva->data, "device"),
>> + qdict_get_str(evb->data, "device"));
>
> Likewise.
>
>> + }
>> +
>> return TRUE;
>> }
>>
>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>> index 25cbcf977b..2d20f4777e 100644
>> --- a/qapi/qdev.json
>> +++ b/qapi/qdev.json
>> @@ -187,3 +187,35 @@
>> { 'command': 'device-sync-config',
>> 'features': [ 'unstable' ],
>> 'data': {'id': 'str'} }
>> +
>> +##
>> +# @VirtqueueError:
>> +#
>> +# @vhost-vring-error: Vhost device reported failure through
>> +# through vring error fd.
>
> One "through" too many.
>
> I'm not quite sure I understand what you're trying to express. Is it
> "the vhost device has communicated failure via the vring error file
> descriptor"?
Yes. Will use your wording.
>
> I know next to nothing about vhost devices...
>
>> +#
>> +# Since: 10.1
>> +##
>> +{ 'enum': 'VirtqueueError',
>> + 'data': [ 'vhost-vring-error' ] }
>> +
>> +##
>> +# @VIRTQUEUE_ERROR:
>> +#
>> +# Emitted when a device virtqueue fails in runtime.
>
> I think it's "at runtime".
>
> Can a device have more than one virtqueue?
Yes. And event has "virtqueue": "int" for it.
>
>> +#
>> +# @device: the device's ID if it has one
>> +#
>> +# @path: the device's QOM path
>
> I agree with your follow-up: name it @qom-path.
>
>> +#
>> +# @virtqueue: virtqueue index
>
> Bear with me... What's a virtqueue index?
sequence number of virtqueue of the vhost device
>
>> +#
>> +# @error: error identifier
>> +#
>> +# @description: human readable description
>> +#
>> +# Since: 10.1
>> +##
>> +{ 'event': 'VIRTQUEUE_ERROR',
>> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
>> + 'error': 'VirtqueueError', 'description': 'str'} }
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
2025-04-04 10:26 ` Vladimir Sementsov-Ogievskiy
@ 2025-04-04 14:28 ` Markus Armbruster
0 siblings, 0 replies; 6+ messages in thread
From: Markus Armbruster @ 2025-04-04 14:28 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, sgarzare, dave, eblake, pbonzini, berrange, eduardo,
qemu-devel, Denis Plotnikov
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> On 04.04.25 09:46, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
>>
>>> For now we only log the vhost device error, when virtqueue is actually
>>> stopped. Let's add a QAPI event, which makes possible:
>>>
>>> - collect statistics of such errors
>>> - make immediate actions: take core dumps or do some other debugging
>>> - inform the user through a management API or UI, so that (s)he can
>>> react somehow, e.g. reset the device driver in the guest or even
>>> build up some automation to do so
>>>
>>> Note that basically every inconsistency discovered during virtqueue
>>> processing results in a silent virtqueue stop. The guest then just
>>> sees the requests getting stuck somewhere in the device for no visible
>>> reason. This event provides a means to inform the management layer of
>>> this situation in a timely fashion.
>>>
>>> The event could be reused for some other virtqueue problems (not only
>>> for vhost devices) in future. For this it gets a generic name and
>>> structure.
>>>
>>> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
>>> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>>> ---
>>>
>>> v5: resend, update version in QAPI to 10.1
>>> drop a-b by Markus (too much time passed, the context could
>>> changed. Markus, is the patch still OK?)
>>
>> Happy to take another look :)
>>
>>> hw/virtio/vhost.c | 12 +++++++++---
>>> monitor/monitor.c | 10 ++++++++++
>>> qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
>>> 3 files changed, 51 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 6aa72fd434..0b205cef73 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -15,6 +15,7 @@
>>>
>>> #include "qemu/osdep.h"
>>> #include "qapi/error.h"
>>> +#include "qapi/qapi-events-qdev.h"
>>> #include "hw/virtio/vhost.h"
>>> #include "qemu/atomic.h"
>>> #include "qemu/range.h"
>>> @@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
>>> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
>>> error_notifier);
>>> struct vhost_dev *dev = vq->dev;
>>> - int index = vq - dev->vqs;
>>>
>>> if (event_notifier_test_and_clear(n) && dev->vdev) {
>>> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
>>> - dev->vq_index + index);
>>> + int ind = vq - dev->vqs + dev->vq_index;
>>> + DeviceState *ds = &dev->vdev->parent_obj;
>>> +
>>> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
>>> + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
>>> + VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
>>> + "vhost reported failure through vring "
>>> + "error fd");
>>> }
>>> }
>>>
>>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>>> index c5a5d30877..1296a9207e 100644
>>> --- a/monitor/monitor.c
>>> +++ b/monitor/monitor.c
>>> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>>> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
>>> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
>>> [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
>>> + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
>>> [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
>>> [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
>>> [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
>>> @@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>>> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>>> }
>>>
>>> + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
>>> + hash += g_str_hash(qdict_get_str(evstate->data, "device"));
>>
>> Isn't @device optional? Should you use @qom-path instead?
>
> Yes, I'm thinking about it too.
This function and the next one together enable fine-grained event
throttling. I figure the intent of your patch is to rate-limit this
event per device. For that, you must use qom-path. If you wanted to
rate-limit per virtqueue, you'd have to throw in @virtqueue.
As is, you have a crash bugL g_str_hash() expects a non-null argument.
> Maybe, drop the "device" field from the event at all, and keep only qom-path? Years passed since movement to use qom paths..
Possible justifications for having @device: consistency with other
events, convenience for users. I can't judge the latter.
>
>>
>>> + }
>>> +
>>> return hash;
>>> }
>>>
>>> @@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
>>> qdict_get_str(evb->data, "qom-path"));
>>> }
>>>
>>> + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
>>> + return !strcmp(qdict_get_str(eva->data, "device"),
>>> + qdict_get_str(evb->data, "device"));
>>
>> Likewise.
>>
>>> + }
>>> +
>>> return TRUE;
>>> }
>>>
>>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>>> index 25cbcf977b..2d20f4777e 100644
>>> --- a/qapi/qdev.json
>>> +++ b/qapi/qdev.json
>>> @@ -187,3 +187,35 @@
>>> { 'command': 'device-sync-config',
>>> 'features': [ 'unstable' ],
>>> 'data': {'id': 'str'} }
>>> +
>>> +##
>>> +# @VirtqueueError:
>>> +#
>>> +# @vhost-vring-error: Vhost device reported failure through
>>> +# through vring error fd.
>>
>> One "through" too many.
>>
>> I'm not quite sure I understand what you're trying to express. Is it
>> "the vhost device has communicated failure via the vring error file
>> descriptor"?
>
> Yes. Will use your wording.
>
>>
>> I know next to nothing about vhost devices...
>>
>>> +#
>>> +# Since: 10.1
>>> +##
>>> +{ 'enum': 'VirtqueueError',
>>> + 'data': [ 'vhost-vring-error' ] }
>>> +
>>> +##
>>> +# @VIRTQUEUE_ERROR:
>>> +#
>>> +# Emitted when a device virtqueue fails in runtime.
>>
>> I think it's "at runtime".
>>
>> Can a device have more than one virtqueue?
>
> Yes. And event has "virtqueue": "int" for it.
Got it, thanks!
>>> +#
>>> +# @device: the device's ID if it has one
>>> +#
>>> +# @path: the device's QOM path
>>
>> I agree with your follow-up: name it @qom-path.
>>
>>> +#
>>> +# @virtqueue: virtqueue index
>>
>> Bear with me... What's a virtqueue index?
>
> sequence number of virtqueue of the vhost device
Maybe something like "the index of the virtqueue that failed".
>>> +#
>>> +# @error: error identifier
>>> +#
>>> +# @description: human readable description
>>> +#
>>> +# Since: 10.1
>>> +##
>>> +{ 'event': 'VIRTQUEUE_ERROR',
>>> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
>>> + 'error': 'VirtqueueError', 'description': 'str'} }
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-04 14:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 17:07 [PATCH v5] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy
2025-04-02 12:32 ` Vladimir Sementsov-Ogievskiy
2025-04-03 7:37 ` Vladimir Sementsov-Ogievskiy
2025-04-04 6:46 ` Markus Armbruster
2025-04-04 10:26 ` Vladimir Sementsov-Ogievskiy
2025-04-04 14:28 ` Markus Armbruster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).