qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH resend] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request
@ 2025-11-17  1:10 luzhipeng
  2025-12-02 13:16 ` Kevin Wolf
  0 siblings, 1 reply; 2+ messages in thread
From: luzhipeng @ 2025-11-17  1:10 UTC (permalink / raw)
  To: qemu-block; +Cc: Alberto Garcia, Kevin Wolf, Hanna Reitz, qemu-devel, luzhipeng

A race condition exists between throttle_group_restart_queue() and
schedule_next_request(): when multiple ThrottleGroupMembers in the same
throttle group are assigned to different IOThreads, concurrent execution
can cause schedule_next_request() to re-arm a throttle timer while
throttle_group_restart_queue() is being called (e.g., from a timer
callback or external restart). This violates the assumption that no
timer is pending upon entry to throttle_group_restart_queue(), triggering
an assertion failure and causing QEMU to abort.

This patch replaces the assert with a single early-return check:
if the timer for the given direction is already pending, the function
returns immediately. This prevents duplicate coroutine scheduling and
avoids crashes under race conditions, without altering the core
(non-thread-safe) throttle group logic.

For details, see: https://gitlab.com/qemu-project/qemu/-/issues/3194

Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
---
 block/throttle-groups.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 66fdce9a90..9dcc6b4923 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -430,15 +430,14 @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
                                         ThrottleDirection direction)
 {
     Coroutine *co;
+    if (timer_pending(tgm->throttle_timers.timers[direction])) {
+        return;
+    }
     RestartData *rd = g_new0(RestartData, 1);
 
     rd->tgm = tgm;
     rd->direction = direction;
 
-    /* This function is called when a timer is fired or when
-     * throttle_group_restart_tgm() is called. Either way, there can
-     * be no timer pending on this tgm at this point */
-    assert(!timer_pending(tgm->throttle_timers.timers[direction]));
 
     qatomic_inc(&tgm->restart_pending);
 
-- 
2.31.1





^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH resend] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request
  2025-11-17  1:10 [PATCH resend] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request luzhipeng
@ 2025-12-02 13:16 ` Kevin Wolf
  0 siblings, 0 replies; 2+ messages in thread
From: Kevin Wolf @ 2025-12-02 13:16 UTC (permalink / raw)
  To: luzhipeng; +Cc: qemu-block, Alberto Garcia, Hanna Reitz, qemu-devel

Am 17.11.2025 um 02:10 hat luzhipeng geschrieben:
> A race condition exists between throttle_group_restart_queue() and
> schedule_next_request(): when multiple ThrottleGroupMembers in the same
> throttle group are assigned to different IOThreads, concurrent execution
> can cause schedule_next_request() to re-arm a throttle timer while
> throttle_group_restart_queue() is being called (e.g., from a timer
> callback or external restart). This violates the assumption that no
> timer is pending upon entry to throttle_group_restart_queue(), triggering
> an assertion failure and causing QEMU to abort.
> 
> This patch replaces the assert with a single early-return check:
> if the timer for the given direction is already pending, the function
> returns immediately. This prevents duplicate coroutine scheduling and
> avoids crashes under race conditions, without altering the core
> (non-thread-safe) throttle group logic.
> 
> For details, see: https://gitlab.com/qemu-project/qemu/-/issues/3194
> 
> Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
> ---
>  block/throttle-groups.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/block/throttle-groups.c b/block/throttle-groups.c
> index 66fdce9a90..9dcc6b4923 100644
> --- a/block/throttle-groups.c
> +++ b/block/throttle-groups.c
> @@ -430,15 +430,14 @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
>                                          ThrottleDirection direction)
>  {
>      Coroutine *co;
> +    if (timer_pending(tgm->throttle_timers.timers[direction])) {
> +        return;
> +    }
>      RestartData *rd = g_new0(RestartData, 1);
>  
>      rd->tgm = tgm;
>      rd->direction = direction;
>  
> -    /* This function is called when a timer is fired or when
> -     * throttle_group_restart_tgm() is called. Either way, there can
> -     * be no timer pending on this tgm at this point */
> -    assert(!timer_pending(tgm->throttle_timers.timers[direction]));
>  
>      qatomic_inc(&tgm->restart_pending);

I'm not really familiar with the finer details of this throttling code,
but this doesn't look like a proper fix for a race to me. If it was
possible for a timer to be scheduled between the caller and here, then
it can certainly also be scheduled between your new check and whatever
relies on the condition after it.

If I understand the commit message of 25b8e4db7f3 correctly, the idea is
that you never have a timer pending that wouldn't actually restart a
request. If the timer is rescheduled from a different thread right after
this check, we'll already resume all of the requests now and the timer
will still be pending without actually having any work to do.

So it looks to me like we'll need some extended locking (tg->lock is
already held by callers of throttle_group_schedule_timer(), but only for
a short duration in throttle_group_restart_queue_entry() instead of
during the whole process), or just give up on the invariant, but then we
don't have to return early either.

Berto, any thoughts?

Kevin



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-12-02 13:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-17  1:10 [PATCH resend] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request luzhipeng
2025-12-02 13:16 ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).