* [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
@ 2026-03-04 12:28 Kevin Wolf
2026-03-04 12:40 ` Daniel P. Berrangé
2026-03-10 13:32 ` Markus Armbruster
0 siblings, 2 replies; 6+ messages in thread
From: Kevin Wolf @ 2026-03-04 12:28 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, hreitz, xeor, vsementsov, pkrempa, qemu-devel, qemu-stable
Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
event only once a second. This makes sense for cases in which the guest
keeps running and can submit more requests that would possibly also fail
because there is a problem with the backend.
However, if the error policy is configured so that the VM is stopped on
errors, this is both unnecessary because stopping the VM means that the
guest can't issue more requests and in fact harmful because stopping the
VM is an important state change that management tools need to keep track
of even if it happens more than once in a given second. If an event is
dropped, the management tool would see a VM randomly going to paused
state without an associated error, so it has a hard time deciding how to
handle the situation.
This patch disables rate limiting for action=stop by not relying on the
event type alone any more in monitor_qapi_event_queue_no_reenter(), but
checking action for BLOCK_IO_ERROR, too. If the error is reported to the
guest or ignored, the rate limiting stays in place.
Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
qapi/block-core.json | 2 +-
monitor/monitor.c | 21 ++++++++++++++++++++-
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b66bf316e2f..da0b36a3751 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5794,7 +5794,7 @@
# .. note:: If action is "stop", a `STOP` event will eventually follow
# the `BLOCK_IO_ERROR` event.
#
-# .. note:: This event is rate-limited.
+# .. note:: This event is rate-limited, except if action is "stop".
#
# Since: 0.13
#
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 1273eb72605..37fa674cfe6 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict)
{
MonitorQAPIEventConf *evconf;
MonitorQAPIEventState *evstate;
+ bool throttled;
assert(event < QAPI_EVENT__MAX);
evconf = &monitor_qapi_event_conf[event];
trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
+ throttled = evconf->rate;
+
+ /*
+ * Rate limit BLOCK_IO_ERROR only for action != "stop".
+ *
+ * If the VM is stopped after an I/O error, this is important information
+ * for the management tool to keep track of the state of QEMU and we can't
+ * merge any events. At the same time, stopping the VM means that the guest
+ * can't send additional requests and the number of events is already
+ * limited, so we can do without rate limiting.
+ */
+ if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
+ QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
+ const char *action = qdict_get_str(data, "action");
+ if (!strcmp(action, "stop")) {
+ throttled = false;
+ }
+ }
QEMU_LOCK_GUARD(&monitor_lock);
- if (!evconf->rate) {
+ if (!throttled) {
/* Unthrottled event */
monitor_qapi_event_emit(event, qdict);
} else {
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-04 12:28 [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
@ 2026-03-04 12:40 ` Daniel P. Berrangé
2026-03-04 15:39 ` Kevin Wolf
2026-03-10 13:32 ` Markus Armbruster
1 sibling, 1 reply; 6+ messages in thread
From: Daniel P. Berrangé @ 2026-03-04 12:40 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, hreitz, xeor, vsementsov, pkrempa, qemu-devel,
qemu-stable
On Wed, Mar 04, 2026 at 01:28:00PM +0100, Kevin Wolf wrote:
> Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> event only once a second. This makes sense for cases in which the guest
> keeps running and can submit more requests that would possibly also fail
> because there is a problem with the backend.
>
> However, if the error policy is configured so that the VM is stopped on
> errors, this is both unnecessary because stopping the VM means that the
> guest can't issue more requests and in fact harmful because stopping the
> VM is an important state change that management tools need to keep track
> of even if it happens more than once in a given second. If an event is
> dropped, the management tool would see a VM randomly going to paused
> state without an associated error, so it has a hard time deciding how to
> handle the situation.
>
> This patch disables rate limiting for action=stop by not relying on the
> event type alone any more in monitor_qapi_event_queue_no_reenter(), but
> checking action for BLOCK_IO_ERROR, too. If the error is reported to the
> guest or ignored, the rate limiting stays in place.
>
> Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> qapi/block-core.json | 2 +-
> monitor/monitor.c | 21 ++++++++++++++++++++-
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b66bf316e2f..da0b36a3751 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5794,7 +5794,7 @@
> # .. note:: If action is "stop", a `STOP` event will eventually follow
> # the `BLOCK_IO_ERROR` event.
> #
> -# .. note:: This event is rate-limited.
> +# .. note:: This event is rate-limited, except if action is "stop".
> #
> # Since: 0.13
> #
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 1273eb72605..37fa674cfe6 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict)
> {
> MonitorQAPIEventConf *evconf;
> MonitorQAPIEventState *evstate;
> + bool throttled;
>
> assert(event < QAPI_EVENT__MAX);
> evconf = &monitor_qapi_event_conf[event];
> trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> + throttled = evconf->rate;
> +
> + /*
> + * Rate limit BLOCK_IO_ERROR only for action != "stop".
> + *
> + * If the VM is stopped after an I/O error, this is important information
> + * for the management tool to keep track of the state of QEMU and we can't
> + * merge any events. At the same time, stopping the VM means that the guest
> + * can't send additional requests and the number of events is already
> + * limited, so we can do without rate limiting.
> + */
> + if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
> + QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
> + const char *action = qdict_get_str(data, "action");
> + if (!strcmp(action, "stop")) {
> + throttled = false;
> + }
> + }
Can this be handled in the same way as other events viat he
qapi_event_throttle_hash & qapi_event_throttle_equal methods ?
eg if action is "stop", then ensure "equal" is always false ?
Possibly add a random token to the hash but might not be needed
if 'equal' is always false
>
> QEMU_LOCK_GUARD(&monitor_lock);
>
> - if (!evconf->rate) {
> + if (!throttled) {
> /* Unthrottled event */
> monitor_qapi_event_emit(event, qdict);
> } else {
> --
> 2.53.0
>
>
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-04 12:40 ` Daniel P. Berrangé
@ 2026-03-04 15:39 ` Kevin Wolf
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2026-03-04 15:39 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-block, hreitz, xeor, vsementsov, pkrempa, qemu-devel,
qemu-stable
Am 04.03.2026 um 13:40 hat Daniel P. Berrangé geschrieben:
> On Wed, Mar 04, 2026 at 01:28:00PM +0100, Kevin Wolf wrote:
> > Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> > event only once a second. This makes sense for cases in which the guest
> > keeps running and can submit more requests that would possibly also fail
> > because there is a problem with the backend.
> >
> > However, if the error policy is configured so that the VM is stopped on
> > errors, this is both unnecessary because stopping the VM means that the
> > guest can't issue more requests and in fact harmful because stopping the
> > VM is an important state change that management tools need to keep track
> > of even if it happens more than once in a given second. If an event is
> > dropped, the management tool would see a VM randomly going to paused
> > state without an associated error, so it has a hard time deciding how to
> > handle the situation.
> >
> > This patch disables rate limiting for action=stop by not relying on the
> > event type alone any more in monitor_qapi_event_queue_no_reenter(), but
> > checking action for BLOCK_IO_ERROR, too. If the error is reported to the
> > guest or ignored, the rate limiting stays in place.
> >
> > Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> > qapi/block-core.json | 2 +-
> > monitor/monitor.c | 21 ++++++++++++++++++++-
> > 2 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index b66bf316e2f..da0b36a3751 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5794,7 +5794,7 @@
> > # .. note:: If action is "stop", a `STOP` event will eventually follow
> > # the `BLOCK_IO_ERROR` event.
> > #
> > -# .. note:: This event is rate-limited.
> > +# .. note:: This event is rate-limited, except if action is "stop".
> > #
> > # Since: 0.13
> > #
> > diff --git a/monitor/monitor.c b/monitor/monitor.c
> > index 1273eb72605..37fa674cfe6 100644
> > --- a/monitor/monitor.c
> > +++ b/monitor/monitor.c
> > @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict)
> > {
> > MonitorQAPIEventConf *evconf;
> > MonitorQAPIEventState *evstate;
> > + bool throttled;
> >
> > assert(event < QAPI_EVENT__MAX);
> > evconf = &monitor_qapi_event_conf[event];
> > trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> > + throttled = evconf->rate;
> > +
> > + /*
> > + * Rate limit BLOCK_IO_ERROR only for action != "stop".
> > + *
> > + * If the VM is stopped after an I/O error, this is important information
> > + * for the management tool to keep track of the state of QEMU and we can't
> > + * merge any events. At the same time, stopping the VM means that the guest
> > + * can't send additional requests and the number of events is already
> > + * limited, so we can do without rate limiting.
> > + */
> > + if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
> > + QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
> > + const char *action = qdict_get_str(data, "action");
> > + if (!strcmp(action, "stop")) {
> > + throttled = false;
> > + }
> > + }
>
> Can this be handled in the same way as other events viat he
> qapi_event_throttle_hash & qapi_event_throttle_equal methods ?
>
> eg if action is "stop", then ensure "equal" is always false ?
> Possibly add a random token to the hash but might not be needed
> if 'equal' is always false
That was v1 and cost me a day debugging the crashes resulting from
events not comparing equal to themselves (which in turn means that
removing them from the hash table fails silently and you get use after
free).
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-04 12:28 [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
2026-03-04 12:40 ` Daniel P. Berrangé
@ 2026-03-10 13:32 ` Markus Armbruster
2026-03-10 14:21 ` Kevin Wolf
1 sibling, 1 reply; 6+ messages in thread
From: Markus Armbruster @ 2026-03-10 13:32 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, hreitz, xeor, vsementsov, pkrempa, qemu-devel
This is now commit 544ddbb6373d61292a0e2dc269809cd6bd5edec6. I'm not
objecting. Just a few remarks. I dropped qemu-stable@ from cc.
Kevin Wolf <kwolf@redhat.com> writes:
> Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> event only once a second. This makes sense for cases in which the guest
> keeps running and can submit more requests that would possibly also fail
> because there is a problem with the backend.
>
> However, if the error policy is configured so that the VM is stopped on
> errors, this is both unnecessary because stopping the VM means that the
> guest can't issue more requests and in fact harmful because stopping the
> VM is an important state change that management tools need to keep track
> of even if it happens more than once in a given second. If an event is
> dropped, the management tool would see a VM randomly going to paused
> state without an associated error, so it has a hard time deciding how to
> handle the situation.
>
> This patch disables rate limiting for action=stop by not relying on the
> event type alone any more in monitor_qapi_event_queue_no_reenter(), but
> checking action for BLOCK_IO_ERROR, too. If the error is reported to the
> guest or ignored, the rate limiting stays in place.
>
> Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> qapi/block-core.json | 2 +-
> monitor/monitor.c | 21 ++++++++++++++++++++-
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b66bf316e2f..da0b36a3751 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5794,7 +5794,7 @@
> # .. note:: If action is "stop", a `STOP` event will eventually follow
> # the `BLOCK_IO_ERROR` event.
> #
> -# .. note:: This event is rate-limited.
> +# .. note:: This event is rate-limited, except if action is "stop".
> #
> # Since: 0.13
> #
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 1273eb72605..37fa674cfe6 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict)
> {
> MonitorQAPIEventConf *evconf;
> MonitorQAPIEventState *evstate;
> + bool throttled;
>
> assert(event < QAPI_EVENT__MAX);
> evconf = &monitor_qapi_event_conf[event];
> trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> + throttled = evconf->rate;
> +
> + /*
> + * Rate limit BLOCK_IO_ERROR only for action != "stop".
> + *
> + * If the VM is stopped after an I/O error, this is important information
> + * for the management tool to keep track of the state of QEMU and we can't
> + * merge any events. At the same time, stopping the VM means that the guest
> + * can't send additional requests and the number of events is already
> + * limited, so we can do without rate limiting.
> + */
> + if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
> + QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
> + const char *action = qdict_get_str(data, "action");
> + if (!strcmp(action, "stop")) {
> + throttled = false;
> + }
> + }
Having event-specific logic in the general event emission function is
ugly.
Before the patch, the "throttle this event?" logic is coded in one
place: table monitor_qapi_event_conf[].
The table maps from event kind (enum QAPIEvent) to the minimum time
between two events. Non-zero specifies a rate limit, zero makes it
unlimited.
Aside: as far as I can tell, we've only ever used one
MonitorQAPIEventConf: { .rate = 1000 * SCALE_MS }. Could be dumbed down
to bool.
This is insufficient for QAPIEvent QAPI_EVENT_BLOCK_IO_ERROR, where the
desired rate depends on event data, not just the QAPIEvent.
The patch gives us the desired rate, but it splits the logic between the
map and the map's user.
I think the cleaner solution is to make the map more capable: have it
maps from the entire event, not just its kind.
An obvious way to do that would be a table of function pointers
bool (*)(QAPIEvent, QDict *)
Null means unlimited.
The table entry for BLOCK_IO_ERROR returns false for unlimited when
"action" is "stop", else true.
The table entries for the other rate-limited events simply return true.
Thoughts?
>
> QEMU_LOCK_GUARD(&monitor_lock);
>
> - if (!evconf->rate) {
> + if (!throttled) {
> /* Unthrottled event */
> monitor_qapi_event_emit(event, qdict);
> } else {
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-10 13:32 ` Markus Armbruster
@ 2026-03-10 14:21 ` Kevin Wolf
2026-03-10 14:52 ` Markus Armbruster
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2026-03-10 14:21 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-block, hreitz, xeor, vsementsov, pkrempa, qemu-devel
Am 10.03.2026 um 14:32 hat Markus Armbruster geschrieben:
> This is now commit 544ddbb6373d61292a0e2dc269809cd6bd5edec6. I'm not
> objecting. Just a few remarks. I dropped qemu-stable@ from cc.
>
> Kevin Wolf <kwolf@redhat.com> writes:
>
> > Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> > event only once a second. This makes sense for cases in which the guest
> > keeps running and can submit more requests that would possibly also fail
> > because there is a problem with the backend.
> >
> > However, if the error policy is configured so that the VM is stopped on
> > errors, this is both unnecessary because stopping the VM means that the
> > guest can't issue more requests and in fact harmful because stopping the
> > VM is an important state change that management tools need to keep track
> > of even if it happens more than once in a given second. If an event is
> > dropped, the management tool would see a VM randomly going to paused
> > state without an associated error, so it has a hard time deciding how to
> > handle the situation.
> >
> > This patch disables rate limiting for action=stop by not relying on the
> > event type alone any more in monitor_qapi_event_queue_no_reenter(), but
> > checking action for BLOCK_IO_ERROR, too. If the error is reported to the
> > guest or ignored, the rate limiting stays in place.
> >
> > Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> > qapi/block-core.json | 2 +-
> > monitor/monitor.c | 21 ++++++++++++++++++++-
> > 2 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index b66bf316e2f..da0b36a3751 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5794,7 +5794,7 @@
> > # .. note:: If action is "stop", a `STOP` event will eventually follow
> > # the `BLOCK_IO_ERROR` event.
> > #
> > -# .. note:: This event is rate-limited.
> > +# .. note:: This event is rate-limited, except if action is "stop".
> > #
> > # Since: 0.13
> > #
> > diff --git a/monitor/monitor.c b/monitor/monitor.c
> > index 1273eb72605..37fa674cfe6 100644
> > --- a/monitor/monitor.c
> > +++ b/monitor/monitor.c
> > @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict)
> > {
> > MonitorQAPIEventConf *evconf;
> > MonitorQAPIEventState *evstate;
> > + bool throttled;
> >
> > assert(event < QAPI_EVENT__MAX);
> > evconf = &monitor_qapi_event_conf[event];
> > trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> > + throttled = evconf->rate;
> > +
> > + /*
> > + * Rate limit BLOCK_IO_ERROR only for action != "stop".
> > + *
> > + * If the VM is stopped after an I/O error, this is important information
> > + * for the management tool to keep track of the state of QEMU and we can't
> > + * merge any events. At the same time, stopping the VM means that the guest
> > + * can't send additional requests and the number of events is already
> > + * limited, so we can do without rate limiting.
> > + */
> > + if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
> > + QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
> > + const char *action = qdict_get_str(data, "action");
> > + if (!strcmp(action, "stop")) {
> > + throttled = false;
> > + }
> > + }
>
> Having event-specific logic in the general event emission function is
> ugly.
>
> Before the patch, the "throttle this event?" logic is coded in one
> place: table monitor_qapi_event_conf[].
>
> The table maps from event kind (enum QAPIEvent) to the minimum time
> between two events. Non-zero specifies a rate limit, zero makes it
> unlimited.
>
> Aside: as far as I can tell, we've only ever used one
> MonitorQAPIEventConf: { .rate = 1000 * SCALE_MS }. Could be dumbed down
> to bool.
>
> This is insufficient for QAPIEvent QAPI_EVENT_BLOCK_IO_ERROR, where the
> desired rate depends on event data, not just the QAPIEvent.
>
> The patch gives us the desired rate, but it splits the logic between the
> map and the map's user.
>
> I think the cleaner solution is to make the map more capable: have it
> maps from the entire event, not just its kind.
>
> An obvious way to do that would be a table of function pointers
>
> bool (*)(QAPIEvent, QDict *)
>
> Null means unlimited.
>
> The table entry for BLOCK_IO_ERROR returns false for unlimited when
> "action" is "stop", else true.
>
> The table entries for the other rate-limited events simply return true.
>
> Thoughts?
Like you, I often prefer data to code. However, if the data becomes just
a table of function pointers to trivial functions, wouldn't it be better
to just have it as code in the first place?
It could be a helper function with the same signature as you proposed
above and it would simply be a switch statement returning true for a few
event types, looking at the QDict for BLOCK_IO_ERROR and returning false
for default. Seems a lot simpler than an explicit table of function
pointers.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-10 14:21 ` Kevin Wolf
@ 2026-03-10 14:52 ` Markus Armbruster
0 siblings, 0 replies; 6+ messages in thread
From: Markus Armbruster @ 2026-03-10 14:52 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, hreitz, xeor, vsementsov, pkrempa, qemu-devel
Kevin Wolf <kwolf@redhat.com> writes:
> Am 10.03.2026 um 14:32 hat Markus Armbruster geschrieben:
>> This is now commit 544ddbb6373d61292a0e2dc269809cd6bd5edec6. I'm not
>> objecting. Just a few remarks. I dropped qemu-stable@ from cc.
>>
>> Kevin Wolf <kwolf@redhat.com> writes:
>>
>> > Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
>> > event only once a second. This makes sense for cases in which the guest
>> > keeps running and can submit more requests that would possibly also fail
>> > because there is a problem with the backend.
>> >
>> > However, if the error policy is configured so that the VM is stopped on
>> > errors, this is both unnecessary because stopping the VM means that the
>> > guest can't issue more requests and in fact harmful because stopping the
>> > VM is an important state change that management tools need to keep track
>> > of even if it happens more than once in a given second. If an event is
>> > dropped, the management tool would see a VM randomly going to paused
>> > state without an associated error, so it has a hard time deciding how to
>> > handle the situation.
>> >
>> > This patch disables rate limiting for action=stop by not relying on the
>> > event type alone any more in monitor_qapi_event_queue_no_reenter(), but
>> > checking action for BLOCK_IO_ERROR, too. If the error is reported to the
>> > guest or ignored, the rate limiting stays in place.
>> >
>> > Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
>> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> > ---
>> > qapi/block-core.json | 2 +-
>> > monitor/monitor.c | 21 ++++++++++++++++++++-
>> > 2 files changed, 21 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/qapi/block-core.json b/qapi/block-core.json
>> > index b66bf316e2f..da0b36a3751 100644
>> > --- a/qapi/block-core.json
>> > +++ b/qapi/block-core.json
>> > @@ -5794,7 +5794,7 @@
>> > # .. note:: If action is "stop", a `STOP` event will eventually follow
>> > # the `BLOCK_IO_ERROR` event.
>> > #
>> > -# .. note:: This event is rate-limited.
>> > +# .. note:: This event is rate-limited, except if action is "stop".
>> > #
>> > # Since: 0.13
>> > #
>> > diff --git a/monitor/monitor.c b/monitor/monitor.c
>> > index 1273eb72605..37fa674cfe6 100644
>> > --- a/monitor/monitor.c
>> > +++ b/monitor/monitor.c
>> > @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict)
>> > {
>> > MonitorQAPIEventConf *evconf;
>> > MonitorQAPIEventState *evstate;
>> > + bool throttled;
>> >
>> > assert(event < QAPI_EVENT__MAX);
>> > evconf = &monitor_qapi_event_conf[event];
>> > trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
>> > + throttled = evconf->rate;
>> > +
>> > + /*
>> > + * Rate limit BLOCK_IO_ERROR only for action != "stop".
>> > + *
>> > + * If the VM is stopped after an I/O error, this is important information
>> > + * for the management tool to keep track of the state of QEMU and we can't
>> > + * merge any events. At the same time, stopping the VM means that the guest
>> > + * can't send additional requests and the number of events is already
>> > + * limited, so we can do without rate limiting.
>> > + */
>> > + if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
>> > + QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
>> > + const char *action = qdict_get_str(data, "action");
>> > + if (!strcmp(action, "stop")) {
>> > + throttled = false;
>> > + }
>> > + }
>>
>> Having event-specific logic in the general event emission function is
>> ugly.
>>
>> Before the patch, the "throttle this event?" logic is coded in one
>> place: table monitor_qapi_event_conf[].
>>
>> The table maps from event kind (enum QAPIEvent) to the minimum time
>> between two events. Non-zero specifies a rate limit, zero makes it
>> unlimited.
>>
>> Aside: as far as I can tell, we've only ever used one
>> MonitorQAPIEventConf: { .rate = 1000 * SCALE_MS }. Could be dumbed down
>> to bool.
>>
>> This is insufficient for QAPIEvent QAPI_EVENT_BLOCK_IO_ERROR, where the
>> desired rate depends on event data, not just the QAPIEvent.
>>
>> The patch gives us the desired rate, but it splits the logic between the
>> map and the map's user.
>>
>> I think the cleaner solution is to make the map more capable: have it
>> maps from the entire event, not just its kind.
>>
>> An obvious way to do that would be a table of function pointers
>>
>> bool (*)(QAPIEvent, QDict *)
>>
>> Null means unlimited.
>>
>> The table entry for BLOCK_IO_ERROR returns false for unlimited when
>> "action" is "stop", else true.
>>
>> The table entries for the other rate-limited events simply return true.
>>
>> Thoughts?
>
> Like you, I often prefer data to code. However, if the data becomes just
> a table of function pointers to trivial functions, wouldn't it be better
> to just have it as code in the first place?
>
> It could be a helper function with the same signature as you proposed
> above and it would simply be a switch statement returning true for a few
> event types, looking at the QDict for BLOCK_IO_ERROR and returning false
> for default. Seems a lot simpler than an explicit table of function
> pointers.
You're right.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-10 14:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 12:28 [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
2026-03-04 12:40 ` Daniel P. Berrangé
2026-03-04 15:39 ` Kevin Wolf
2026-03-10 13:32 ` Markus Armbruster
2026-03-10 14:21 ` Kevin Wolf
2026-03-10 14:52 ` Markus Armbruster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox