* [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
@ 2026-02-12 21:27 Kevin Wolf
2026-02-13 6:13 ` Peter Krempa
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Kevin Wolf @ 2026-02-12 21:27 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, hreitz, xeor, vsementsov, qemu-devel, qemu-stable
Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
event only once a second. This makes sense for cases in which the guest
keeps running and can submit more requests that would possibly also fail
because there is a problem with the backend.
However, if the error policy is configured so that the VM is stopped on
errors, this is both unnecessary because stopping the VM means that the
guest can't issue more requests and in fact harmful because stopping the
VM is an important state change that management tools need to keep track
of even if it happens more than once in a given second. If an event is
dropped, the management tool would see a VM randomly going to paused
state without an associated error, so it has a hard time deciding how to
handle the situation.
This patch disables rate limiting for action=stop by essentially
considering all BLOCK_IO_ERRORs with action=stop different errors. If
the error is reported to the guest or ignored, the rate limiting stays
in place.
Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
qapi/block-core.json | 2 +-
monitor/monitor.c | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b82af742561..4118d884f46 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5789,7 +5789,7 @@
# .. note:: If action is "stop", a `STOP` event will eventually follow
# the `BLOCK_IO_ERROR` event.
#
-# .. note:: This event is rate-limited.
+# .. note:: This event is rate-limited, except if action is "stop".
#
# Since: 0.13
#
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 1273eb72605..93bd2b93e65 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -525,6 +525,18 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
qdict_get_str(evb->data, "node-name"));
}
+ /*
+ * If the VM is stopped after an I/O error, this is important information
+ * for the management tool to keep track of the state of QEMU and we can't
+ * merge any events. At the same time, stopping the VM means that the guest
+ * can't send additional requests and the number of events is already
+ * limited, so we can do without rate limiting.
+ */
+ if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR &&
+ !strcmp(qdict_get_str(eva->data, "action"), "stop")) {
+ return FALSE;
+ }
+
if (eva->event == QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE ||
eva->event == QAPI_EVENT_BLOCK_IO_ERROR) {
return !strcmp(qdict_get_str(eva->data, "qom-path"),
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-02-12 21:27 [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
@ 2026-02-13 6:13 ` Peter Krempa
2026-02-13 8:13 ` Vladimir Sementsov-Ogievskiy
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Peter Krempa @ 2026-02-13 6:13 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, hreitz, xeor, vsementsov, qemu-devel, qemu-stable
On Thu, Feb 12, 2026 at 22:27:38 +0100, Kevin Wolf wrote:
> Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> event only once a second. This makes sense for cases in which the guest
> keeps running and can submit more requests that would possibly also fail
> because there is a problem with the backend.
>
> However, if the error policy is configured so that the VM is stopped on
> errors, this is both unnecessary because stopping the VM means that the
> guest can't issue more requests and in fact harmful because stopping the
> VM is an important state change that management tools need to keep track
> of even if it happens more than once in a given second. If an event is
> dropped, the management tool would see a VM randomly going to paused
> state without an associated error, so it has a hard time deciding how to
> handle the situation.
>
> This patch disables rate limiting for action=stop by essentially
> considering all BLOCK_IO_ERRORs with action=stop different errors. If
> the error is reported to the guest or ignored, the rate limiting stays
> in place.
>
> Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> qapi/block-core.json | 2 +-
> monitor/monitor.c | 12 ++++++++++++
> 2 files changed, 13 insertions(+), 1 deletion(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-02-12 21:27 [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
2026-02-13 6:13 ` Peter Krempa
@ 2026-02-13 8:13 ` Vladimir Sementsov-Ogievskiy
2026-03-04 10:59 ` Kevin Wolf
2026-03-08 8:31 ` Michael Tokarev
3 siblings, 0 replies; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-02-13 8:13 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: hreitz, xeor, qemu-devel, qemu-stable
On 13.02.26 00:27, Kevin Wolf wrote:
> Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> event only once a second. This makes sense for cases in which the guest
> keeps running and can submit more requests that would possibly also fail
> because there is a problem with the backend.
>
> However, if the error policy is configured so that the VM is stopped on
> errors, this is both unnecessary because stopping the VM means that the
> guest can't issue more requests and in fact harmful because stopping the
> VM is an important state change that management tools need to keep track
> of even if it happens more than once in a given second. If an event is
> dropped, the management tool would see a VM randomly going to paused
> state without an associated error, so it has a hard time deciding how to
> handle the situation.
>
> This patch disables rate limiting for action=stop by essentially
> considering all BLOCK_IO_ERRORs with action=stop different errors. If
> the error is reported to the guest or ignored, the rate limiting stays
> in place.
>
> Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-02-12 21:27 [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
2026-02-13 6:13 ` Peter Krempa
2026-02-13 8:13 ` Vladimir Sementsov-Ogievskiy
@ 2026-03-04 10:59 ` Kevin Wolf
2026-03-08 8:31 ` Michael Tokarev
3 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2026-03-04 10:59 UTC (permalink / raw)
To: qemu-block; +Cc: hreitz, xeor, vsementsov, qemu-devel, qemu-stable
Am 12.02.2026 um 22:27 hat Kevin Wolf geschrieben:
> Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> event only once a second. This makes sense for cases in which the guest
> keeps running and can submit more requests that would possibly also fail
> because there is a problem with the backend.
>
> However, if the error policy is configured so that the VM is stopped on
> errors, this is both unnecessary because stopping the VM means that the
> guest can't issue more requests and in fact harmful because stopping the
> VM is an important state change that management tools need to keep track
> of even if it happens more than once in a given second. If an event is
> dropped, the management tool would see a VM randomly going to paused
> state without an associated error, so it has a hard time deciding how to
> handle the situation.
>
> This patch disables rate limiting for action=stop by essentially
> considering all BLOCK_IO_ERRORs with action=stop different errors. If
> the error is reported to the guest or ignored, the rate limiting stays
> in place.
>
> Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 1273eb72605..93bd2b93e65 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -525,6 +525,18 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
> qdict_get_str(evb->data, "node-name"));
> }
>
> + /*
> + * If the VM is stopped after an I/O error, this is important information
> + * for the management tool to keep track of the state of QEMU and we can't
> + * merge any events. At the same time, stopping the VM means that the guest
> + * can't send additional requests and the number of events is already
> + * limited, so we can do without rate limiting.
> + */
> + if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR &&
> + !strcmp(qdict_get_str(eva->data, "action"), "stop")) {
> + return FALSE;
> + }
> +
It turns out that this approach is completely wrong. The harmless part
is that the hash table is filled up with many events that don't actually
need throttling. The worse part is that events aren't even considered
equal to themselves, which means that the hash table can't find them to
remove them, which in turn causes use after free crashes.
I'll post a v2 that avoids the whole rate limiting code path for I/O
errors that don't need the throttling.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-02-12 21:27 [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
` (2 preceding siblings ...)
2026-03-04 10:59 ` Kevin Wolf
@ 2026-03-08 8:31 ` Michael Tokarev
2026-03-08 8:34 ` Michael Tokarev
3 siblings, 1 reply; 7+ messages in thread
From: Michael Tokarev @ 2026-03-08 8:31 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: hreitz, xeor, vsementsov, qemu-devel, qemu-stable
On 13.02.2026 00:27, Kevin Wolf wrote:
> Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
> event only once a second. This makes sense for cases in which the guest
> keeps running and can submit more requests that would possibly also fail
> because there is a problem with the backend.
>
> However, if the error policy is configured so that the VM is stopped on
> errors, this is both unnecessary because stopping the VM means that the
> guest can't issue more requests and in fact harmful because stopping the
> VM is an important state change that management tools need to keep track
> of even if it happens more than once in a given second. If an event is
> dropped, the management tool would see a VM randomly going to paused
> state without an associated error, so it has a hard time deciding how to
> handle the situation.
>
> This patch disables rate limiting for action=stop by essentially
> considering all BLOCK_IO_ERRORs with action=stop different errors. If
> the error is reported to the guest or ignored, the rate limiting stays
> in place.
>
> Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This looks like a qemu-stable material (10.0.x, 10.1.x, 10.2.x).
Please let me know if it isn't.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-08 8:31 ` Michael Tokarev
@ 2026-03-08 8:34 ` Michael Tokarev
2026-03-09 8:57 ` Kevin Wolf
0 siblings, 1 reply; 7+ messages in thread
From: Michael Tokarev @ 2026-03-08 8:34 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: hreitz, xeor, vsementsov, qemu-devel, qemu-stable
On 08.03.2026 11:31, Michael Tokarev wrote:
> This looks like a qemu-stable material (10.0.x, 10.1.x, 10.2.x).
It is already Cc'd qemu-stable, N/M.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting
2026-03-08 8:34 ` Michael Tokarev
@ 2026-03-09 8:57 ` Kevin Wolf
0 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2026-03-09 8:57 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-block, hreitz, xeor, vsementsov, qemu-devel, qemu-stable
Am 08.03.2026 um 09:34 hat Michael Tokarev geschrieben:
> On 08.03.2026 11:31, Michael Tokarev wrote:
>
> > This looks like a qemu-stable material (10.0.x, 10.1.x, 10.2.x).
> It is already Cc'd qemu-stable, N/M.
Oops, seems I forgot the Cc: tag in the commit message. Sorry! I'll try
to remember doing that again next time.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-09 8:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-12 21:27 [PATCH] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Kevin Wolf
2026-02-13 6:13 ` Peter Krempa
2026-02-13 8:13 ` Vladimir Sementsov-Ogievskiy
2026-03-04 10:59 ` Kevin Wolf
2026-03-08 8:31 ` Michael Tokarev
2026-03-08 8:34 ` Michael Tokarev
2026-03-09 8:57 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox