* [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event @ 2022-09-19 19:48 Vladimir Sementsov-Ogievskiy 2022-09-20 14:47 ` Markus Armbruster ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2022-09-19 19:48 UTC (permalink / raw) To: qemu-devel Cc: armbru, eblake, eduardo, berrange, pbonzini, mst, rvkagan, yc-core, vsementsov For now we only log the vhost device error, when virtqueue is actually stopped. Let's add a QAPI event, which makes possible: - collect statistics of such errors - make immediate actions: take coredums or do some other debugging The event could be reused for some other virtqueue problems (not only for vhost devices) in future. For this it gets a generic name and structure. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- hw/virtio/vhost.c | 12 +++++++++--- qapi/qdev.json | 25 +++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index f758f177bb..caa81f2ace 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -15,6 +15,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" +#include "qapi/qapi-events-qdev.h" #include "hw/virtio/vhost.h" #include "qemu/atomic.h" #include "qemu/range.h" @@ -1287,11 +1288,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n) struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue, error_notifier); struct vhost_dev *dev = vq->dev; - int index = vq - dev->vqs; if (event_notifier_test_and_clear(n) && dev->vdev) { - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", - dev->vq_index + index); + int ind = vq - dev->vqs + dev->vq_index; + DeviceState *ds = &dev->vdev->parent_obj; + + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind); + qapi_event_send_virtqueue_error(!!ds->id, ds->id, ds->canonical_path, + ind, VIRTQUEUE_ERROR_VHOST_VRING_ERR, + "vhost reported failure through vring " + "error fd"); } } diff --git a/qapi/qdev.json b/qapi/qdev.json index 2708fb4e99..b7c2669c2c 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -158,3 +158,28 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @VirtqueueError: +# +# Since: 7.2 +## +{ 'enum': 'VirtqueueError', + 'data': [ 'vhost-vring-err' ] } + +## +# @VIRTQUEUE_ERROR: +# +# Emitted when a device virtqueue fails in runtime. +# +# @device: the device's ID if it has one +# @path: the device's QOM path +# @virtqueue: virtqueue index +# @error: error identifier +# @description: human readable description +# +# Since: 7.2 +## +{ 'event': 'VIRTQUEUE_ERROR', + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', + 'error': 'VirtqueueError', 'description': 'str'} } -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event 2022-09-19 19:48 [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy @ 2022-09-20 14:47 ` Markus Armbruster 2022-09-20 15:10 ` Vladimir Sementsov-Ogievskiy 2022-10-12 13:24 ` Vladimir Sementsov-Ogievskiy 2022-11-01 16:54 ` Vladimir Sementsov-Ogievskiy 2 siblings, 1 reply; 6+ messages in thread From: Markus Armbruster @ 2022-09-20 14:47 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: qemu-devel, eblake, eduardo, berrange, pbonzini, mst, rvkagan, yc-core Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take coredums or do some other debugging Core dumps, I presume. Is QMP the right tool for the job? Or could a trace point do? > The event could be reused for some other virtqueue problems (not only > for vhost devices) in future. For this it gets a generic name and > structure. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > --- > hw/virtio/vhost.c | 12 +++++++++--- > qapi/qdev.json | 25 +++++++++++++++++++++++++ > 2 files changed, 34 insertions(+), 3 deletions(-) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index f758f177bb..caa81f2ace 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -15,6 +15,7 @@ > > #include "qemu/osdep.h" > #include "qapi/error.h" > +#include "qapi/qapi-events-qdev.h" > #include "hw/virtio/vhost.h" > #include "qemu/atomic.h" > #include "qemu/range.h" Only tangentially related to this patch, but here goes anyway: /* enabled until disconnected backend stabilizes */ #define _VHOST_DEBUG 1 This is from 2016. Has it stabilized? #ifdef _VHOST_DEBUG #define VHOST_OPS_DEBUG(retval, fmt, ...) \ do { \ error_report(fmt ": %s (%d)", ## __VA_ARGS__, \ strerror(-retval), -retval); \ error_report() is for errors the user can do something about, not for debug messages. } while (0) #else #define VHOST_OPS_DEBUG(retval, fmt, ...) \ do { } while (0) #endif > @@ -1287,11 +1288,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n) > struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue, > error_notifier); > struct vhost_dev *dev = vq->dev; > - int index = vq - dev->vqs; > > if (event_notifier_test_and_clear(n) && dev->vdev) { > - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", > - dev->vq_index + index); > + int ind = vq - dev->vqs + dev->vq_index; > + DeviceState *ds = &dev->vdev->parent_obj; > + > + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind); > + qapi_event_send_virtqueue_error(!!ds->id, ds->id, ds->canonical_path, > + ind, VIRTQUEUE_ERROR_VHOST_VRING_ERR, > + "vhost reported failure through vring " > + "error fd"); Do we still need VHOST_OPS_DEBUG() here? > } > } > > diff --git a/qapi/qdev.json b/qapi/qdev.json > index 2708fb4e99..b7c2669c2c 100644 > --- a/qapi/qdev.json > +++ b/qapi/qdev.json > @@ -158,3 +158,28 @@ > ## > { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', > 'data': { '*device': 'str', 'path': 'str' } } > + > +## > +# @VirtqueueError: > +# > +# Since: 7.2 > +## > +{ 'enum': 'VirtqueueError', > + 'data': [ 'vhost-vring-err' ] } > + > +## > +# @VIRTQUEUE_ERROR: > +# > +# Emitted when a device virtqueue fails in runtime. > +# > +# @device: the device's ID if it has one > +# @path: the device's QOM path > +# @virtqueue: virtqueue index > +# @error: error identifier > +# @description: human readable description > +# > +# Since: 7.2 > +## > +{ 'event': 'VIRTQUEUE_ERROR', > + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', > + 'error': 'VirtqueueError', 'description': 'str'} } Can the guest trigger the event? If yes, it needs to be rate-limited. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event 2022-09-20 14:47 ` Markus Armbruster @ 2022-09-20 15:10 ` Vladimir Sementsov-Ogievskiy 2022-09-20 15:46 ` Roman Kagan 0 siblings, 1 reply; 6+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2022-09-20 15:10 UTC (permalink / raw) To: Markus Armbruster Cc: qemu-devel, eblake, eduardo, berrange, pbonzini, mst, rvkagan, yc-core On 9/20/22 17:47, Markus Armbruster wrote: > Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes: > >> For now we only log the vhost device error, when virtqueue is actually >> stopped. Let's add a QAPI event, which makes possible: >> >> - collect statistics of such errors >> - make immediate actions: take coredums or do some other debugging > > Core dumps, I presume. > > Is QMP the right tool for the job? Or could a trace point do? Management tool already can collect QMP events. So, if we want to forward some QMP events to other subsystems (to immediately inform support team, or to update some statistics) it's simple to realize for QMP events. But I'm not sure how to do it for trace-events.. Scanning trace logs is not convenient. Another benefit of QMP events is that they are objects, with fields, which is better for machine processing than textual trace-events. > >> The event could be reused for some other virtqueue problems (not only >> for vhost devices) in future. For this it gets a generic name and >> structure. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> --- >> hw/virtio/vhost.c | 12 +++++++++--- >> qapi/qdev.json | 25 +++++++++++++++++++++++++ >> 2 files changed, 34 insertions(+), 3 deletions(-) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index f758f177bb..caa81f2ace 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -15,6 +15,7 @@ >> >> #include "qemu/osdep.h" >> #include "qapi/error.h" >> +#include "qapi/qapi-events-qdev.h" >> #include "hw/virtio/vhost.h" >> #include "qemu/atomic.h" >> #include "qemu/range.h" > > Only tangentially related to this patch, but here goes anyway: > > /* enabled until disconnected backend stabilizes */ > #define _VHOST_DEBUG 1 > > This is from 2016. Has it stabilized? Hmm, I don't know:) It works for us) But anyway, I agree that error/debug reporting here needs an update. I don't think that dropping the messages is good. Some should be converted to errp-reporting, some to warnings or assertions.. > > #ifdef _VHOST_DEBUG > #define VHOST_OPS_DEBUG(retval, fmt, ...) \ > do { \ > error_report(fmt ": %s (%d)", ## __VA_ARGS__, \ > strerror(-retval), -retval); \ > > error_report() is for errors the user can do something about, not for > debug messages. > > } while (0) > #else > #define VHOST_OPS_DEBUG(retval, fmt, ...) \ > do { } while (0) > #endif > >> @@ -1287,11 +1288,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n) >> struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue, >> error_notifier); >> struct vhost_dev *dev = vq->dev; >> - int index = vq - dev->vqs; >> >> if (event_notifier_test_and_clear(n) && dev->vdev) { >> - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", >> - dev->vq_index + index); >> + int ind = vq - dev->vqs + dev->vq_index; >> + DeviceState *ds = &dev->vdev->parent_obj; >> + >> + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind); >> + qapi_event_send_virtqueue_error(!!ds->id, ds->id, ds->canonical_path, >> + ind, VIRTQUEUE_ERROR_VHOST_VRING_ERR, >> + "vhost reported failure through vring " >> + "error fd"); > > Do we still need VHOST_OPS_DEBUG() here? I think, this should be decided separately from this patch. Here I keep current behavior and add new event. > >> } >> } >> >> diff --git a/qapi/qdev.json b/qapi/qdev.json >> index 2708fb4e99..b7c2669c2c 100644 >> --- a/qapi/qdev.json >> +++ b/qapi/qdev.json >> @@ -158,3 +158,28 @@ >> ## >> { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', >> 'data': { '*device': 'str', 'path': 'str' } } >> + >> +## >> +# @VirtqueueError: >> +# >> +# Since: 7.2 >> +## >> +{ 'enum': 'VirtqueueError', >> + 'data': [ 'vhost-vring-err' ] } >> + >> +## >> +# @VIRTQUEUE_ERROR: >> +# >> +# Emitted when a device virtqueue fails in runtime. >> +# >> +# @device: the device's ID if it has one >> +# @path: the device's QOM path >> +# @virtqueue: virtqueue index >> +# @error: error identifier >> +# @description: human readable description >> +# >> +# Since: 7.2 >> +## >> +{ 'event': 'VIRTQUEUE_ERROR', >> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', >> + 'error': 'VirtqueueError', 'description': 'str'} } > > Can the guest trigger the event? Yes, but as I understand, only once per virtqueue. > > If yes, it needs to be rate-limited. > This may be needed if VIRTQUEUE_ERROR will be shared with other errors. Still adding it now will not hurt I think. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event 2022-09-20 15:10 ` Vladimir Sementsov-Ogievskiy @ 2022-09-20 15:46 ` Roman Kagan 0 siblings, 0 replies; 6+ messages in thread From: Roman Kagan @ 2022-09-20 15:46 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: Markus Armbruster, qemu-devel, eblake, eduardo, berrange, pbonzini, mst, yc-core On Tue, Sep 20, 2022 at 06:10:08PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 9/20/22 17:47, Markus Armbruster wrote: > > Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes: > > > > > For now we only log the vhost device error, when virtqueue is actually > > > stopped. Let's add a QAPI event, which makes possible: > > > > > > - collect statistics of such errors > > > - make immediate actions: take coredums or do some other debugging + inform the user through a management API or UI, so that (s)he can react somehow, e.g. reset the device driver in the guest or even build up some automation to do so Note that basically every inconsistency discovered during virtqueue processing results in a silent virtqueue stop. The guest then just sees the requests getting stuck somewhere in the device for no visible reason. This event provides a means to inform the management layer of this situation in a timely fashion. > > > > Core dumps, I presume. > > > > Is QMP the right tool for the job? Or could a trace point do? > > Management tool already can collect QMP events. So, if we want to > forward some QMP events to other subsystems (to immediately inform > support team, or to update some statistics) it's simple to realize for > QMP events. But I'm not sure how to do it for trace-events.. Scanning > trace logs is not convenient. Right. Trace points are a debugging tool: when you expect the problem to reproduce, you activate them and watch the logs. On the contrary, QMP events can trigger some logic in the management layer and provide for some recovery action. > > > +## > > > +# @VIRTQUEUE_ERROR: > > > +# > > > +# Emitted when a device virtqueue fails in runtime. > > > +# > > > +# @device: the device's ID if it has one > > > +# @path: the device's QOM path > > > +# @virtqueue: virtqueue index > > > +# @error: error identifier > > > +# @description: human readable description > > > +# > > > +# Since: 7.2 > > > +## > > > +{ 'event': 'VIRTQUEUE_ERROR', > > > + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', > > > + 'error': 'VirtqueueError', 'description': 'str'} } > > > > Can the guest trigger the event? > > Yes, but as I understand, only once per virtqueue. Right, in the sense that every relevant dataplane implementation would stop the virtqueue on such an error, so in order to trigger a new one the driver would need to reset the device first. I guess rate-limiting is unnecessary here. Thanks, Roman. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event 2022-09-19 19:48 [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy 2022-09-20 14:47 ` Markus Armbruster @ 2022-10-12 13:24 ` Vladimir Sementsov-Ogievskiy 2022-11-01 16:54 ` Vladimir Sementsov-Ogievskiy 2 siblings, 0 replies; 6+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2022-10-12 13:24 UTC (permalink / raw) To: qemu-devel Cc: armbru, eblake, eduardo, berrange, pbonzini, mst, rvkagan, yc-core ping On 9/19/22 22:48, Vladimir Sementsov-Ogievskiy wrote: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take coredums or do some other debugging > > The event could be reused for some other virtqueue problems (not only > for vhost devices) in future. For this it gets a generic name and > structure. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event 2022-09-19 19:48 [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy 2022-09-20 14:47 ` Markus Armbruster 2022-10-12 13:24 ` Vladimir Sementsov-Ogievskiy @ 2022-11-01 16:54 ` Vladimir Sementsov-Ogievskiy 2 siblings, 0 replies; 6+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2022-11-01 16:54 UTC (permalink / raw) To: qemu-devel Cc: armbru, eblake, eduardo, berrange, pbonzini, mst, rvkagan, yc-core ping On 9/19/22 22:48, Vladimir Sementsov-Ogievskiy wrote: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take coredums or do some other debugging > > The event could be reused for some other virtqueue problems (not only > for vhost devices) in future. For this it gets a generic name and > structure. > > Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru> -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-11-01 17:05 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-19 19:48 [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event Vladimir Sementsov-Ogievskiy 2022-09-20 14:47 ` Markus Armbruster 2022-09-20 15:10 ` Vladimir Sementsov-Ogievskiy 2022-09-20 15:46 ` Roman Kagan 2022-10-12 13:24 ` Vladimir Sementsov-Ogievskiy 2022-11-01 16:54 ` Vladimir Sementsov-Ogievskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).