From: Markus Armbruster <armbru@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: qemu-devel@nongnu.org, mst@redhat.com, eblake@redhat.com,
eduardo@habkost.net, berrange@redhat.com, pbonzini@redhat.com,
dave@treblig.org, armbru@redhat.com, sgarzare@redhat.com,
den-plotnikov@yandex-team.ru
Subject: Re: [PATCH v6] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event
Date: Wed, 09 Apr 2025 12:48:46 +0200 [thread overview]
Message-ID: <87plhlbofl.fsf@pond.sub.org> (raw)
In-Reply-To: <20250409094758.58232-1-vsementsov@yandex-team.ru> (Vladimir Sementsov-Ogievskiy's message of "Wed, 9 Apr 2025 12:47:58 +0300")
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> For now we only log the vhost device error, when virtqueue is actually
> stopped. Let's add a QAPI event, which makes possible:
>
> - collect statistics of such errors
> - make immediate actions: take core dumps or do some other debugging
> - inform the user through a management API or UI, so that (s)he can
> react somehow, e.g. reset the device driver in the guest or even
> build up some automation to do so
>
> Note that basically every inconsistency discovered during virtqueue
> processing results in a silent virtqueue stop. The guest then just
> sees the requests getting stuck somewhere in the device for no visible
> reason. This event provides a means to inform the management layer of
> this situation in a timely fashion.
>
> The event could be reused for some other virtqueue problems (not only
> for vhost devices) in future. For this it gets a generic name and
> structure.
>
> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
> here, it's not the only call to VHOST_OPS_DEBUG in the file.
Likely should be tracepoints. Not this patch's problem, though.
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>
> v6: rename path to qom-path, and improve throttling of the event
> improve wording
>
> hw/virtio/vhost.c | 12 +++++++++---
> monitor/monitor.c | 14 ++++++++++++++
> qapi/qdev.json | 32 ++++++++++++++++++++++++++++++++
> 3 files changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6aa72fd434..0b205cef73 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -15,6 +15,7 @@
>
> #include "qemu/osdep.h"
> #include "qapi/error.h"
> +#include "qapi/qapi-events-qdev.h"
> #include "hw/virtio/vhost.h"
> #include "qemu/atomic.h"
> #include "qemu/range.h"
> @@ -1442,11 +1443,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
> error_notifier);
> struct vhost_dev *dev = vq->dev;
> - int index = vq - dev->vqs;
>
> if (event_notifier_test_and_clear(n) && dev->vdev) {
> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
> - dev->vq_index + index);
> + int ind = vq - dev->vqs + dev->vq_index;
> + DeviceState *ds = &dev->vdev->parent_obj;
> +
> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
> + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
> + VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
> + "vhost reported failure through vring "
> + "error fd");
> }
> }
>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index c5a5d30877..11c8859703 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS },
> + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS },
> [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
> [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
> @@ -499,6 +500,12 @@ static unsigned int qapi_event_throttle_hash(const void *key)
> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
> }
>
> + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> + uint64_t virtqueue = qdict_get_int(evstate->data, "virtqueue");
> + hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")) ^
> + g_int64_hash(&virtqueue);
> + }
> +
> return hash;
> }
>
> @@ -527,6 +534,13 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
> qdict_get_str(evb->data, "qom-path"));
> }
>
> + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> + return !strcmp(qdict_get_str(eva->data, "qom-path"),
> + qdict_get_str(evb->data, "qom-path")) &&
> + (qdict_get_int(eva->data, "virtqueue") ==
> + qdict_get_int(evb->data, "virtqueue"));
> + }
> +
> return TRUE;
> }
>
Rate-limiting is now per virt queue. It was per device in previous
revisions. Worth it?
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 25cbcf977b..ddfae18761 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -187,3 +187,35 @@
> { 'command': 'device-sync-config',
> 'features': [ 'unstable' ],
> 'data': {'id': 'str'} }
> +
> +##
> +# @VirtqueueError:
> +#
> +# @vhost-vring-error: the vhost device has communicated failure via
> +# the vring error file descriptor
> +#
> +# Since: 10.1
> +##
> +{ 'enum': 'VirtqueueError',
> + 'data': [ 'vhost-vring-error' ] }
> +
> +##
> +# @VIRTQUEUE_ERROR:
> +#
> +# Emitted when a device virtqueue fails at runtime.
> +#
> +# @device: the device's ID if it has one
> +#
> +# @qom-path: the device's QOM path
> +#
> +# @virtqueue: the index of the virtqueue that failed
> +#
> +# @error: error identifier
> +#
> +# @description: human readable description
> +#
> +# Since: 10.1
> +##
> +{ 'event': 'VIRTQUEUE_ERROR',
> + 'data': { '*device': 'str', 'qom-path': 'str', 'virtqueue': 'int',
> + 'error': 'VirtqueueError', 'description': 'str'} }
Standard question for events: can a management application poll for the
information as well?
I might have asked this before, I don't remember. If you already
answered it, feel free to point me to your answer.
Why is this a standard question for events? Say, a management
application wants to track the state of X. Two ways: poll the state
with a query command that returns it, listen for events that report a
change of X.
Listening for an event is more efficient.
However, if the management application connects to a QEMU instance, X
could be anything, so it needs to poll once.
Special case: the management application restarts for some reason.
next prev parent reply other threads:[~2025-04-09 10:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 9:47 [PATCH v6] [for-10.1] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy
2025-04-09 10:48 ` Markus Armbruster [this message]
2025-04-09 14:04 ` Vladimir Sementsov-Ogievskiy
2025-04-09 14:48 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87plhlbofl.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dave@treblig.org \
--cc=den-plotnikov@yandex-team.ru \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.