qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long
@ 2025-03-08 10:16 zoudongjie via
  2025-03-08 10:16 ` [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism zoudongjie via
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: zoudongjie via @ 2025-03-08 10:16 UTC (permalink / raw)
  To: zhuyangyang14, qemu-devel, stefanha, fam, hreitz, qemu-block
  Cc: qemu-stable, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, zoudongjie,
	chenjianfei3, renxuming

From: Zhu Yangyang <zhuyangyang14@huawei.com>

In the disable branch of qmp_block_set_io_throttle(), we call bdrv_drained_begin().
We know that bdrv_drained_begin() is a blocking interface used to wait for all submitted
I/O operations to complete, i.e., to wait until bs->in_flight becomes zero.

Theoretically, once we stop submitting I/O operations, bs->in_flight should quickly
become zero(regardless of success or failure). However, if we are using network storage
and a network link failure occurs at that moment, things become different.
The underlying storage driver will then retry the operation, which may take 1 to 2 minutes
before responding with an I/O error to QEMU. As a result, bdrv_drained_begin() will be blocked
for 1 to 2 minutes, leading to two issues:

1. qmp_block_set_io_throttle() gets blocked, which is an external interface,
   and this can provide a poor user experience.
2. More seriously, qmp_block_set_io_throttle() holds the BQL during its execution, which could
lead to the blocking of vcpu thread, further causing guest os softlockup and potentially a panic.

The stack when qmp_block_set_io_throttle() is blocked is as follows:
At this point, there are no IO submit events or IO response events.
aio_poll() will remain blocked until the network storage reports an EIO to QEMU.

#0  0x00007f54877fc39e in ppoll () from target:/usr/lib64/libc.so.6
#1  0x0000556dced07b7d in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:81
#2  0x0000556dcecee599 in fdmon_poll_wait (ctx=0x556de95f8e40, ready_list=0x7ffcc6378b18, timeout=-1) at ../util/fdmon-poll.c:79
#3  0x0000556dceceda9e in aio_poll (ctx=0x556de95f8e40, blocking=blocking@entry=true) at ../util/aio-posix.c:671
#4  0x0000556dcebe654a in bdrv_do_drained_begin (poll=<optimized out>, parent=0x0, bs=0x556de9896420) at ../block/io.c:378
#5  bdrv_do_drained_begin (bs=0x556de9896420, parent=0x0, poll=<optimized out>) at ../block/io.c:347
#6  0x0000556dcebdc5a1 in blk_io_limits_disable (blk=0x556dea739470) at ../block/block-backend.c:2701
#7  0x0000556dce917584 in qmp_block_set_io_throttle (arg=arg@entry=0x7ffcc6378d30, errp=errp@entry=0x7ffcc6378d18) at ../block/qapi-system.c:505
#8  0x0000556dcec5a8d2 in qmp_marshal_block_set_io_throttle (args=<optimized out>, ret=<optimized out>, errp=0x7f5486395ea0) at qapi/qapi-commands-block.c:368
#9  0x0000556dcece3ed9 in do_qmp_dispatch_bh (opaque=0x7f5486395eb0) at ../qapi/qmp-dispatch.c:128
#10 0x0000556dced03c55 in aio_bh_poll (ctx=ctx@entry=0x556de95e07c0) at ../util/async.c:219
#11 0x0000556dceced93e in aio_dispatch (ctx=0x556de95e07c0) at ../util/aio-posix.c:424
#12 0x0000556dced038fe in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:361
#13 0x00007f5487ec606e in g_main_context_dispatch () from target:/usr/lib64/libglib-2.0.so.0
#14 0x0000556dced04fd8 in glib_pollfds_poll () at ../util/main-loop.c:287
#15 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:310
#16 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:589
#17 0x0000556dce925431 in qemu_main_loop () at ../system/runstate.c:835
#18 0x0000556dcec565c7 in qemu_default_main (opaque=opaque@entry=0x0) at ../system/main.c:48
#19 0x0000556dce6f7848 in main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:76

Zhu Yangyang (2):
  io/block: Refactoring the bdrv_drained_begin() function and implement
    a timeout mechanism.
  qapi: Fix qmp_block_set_io_throttle blocked for too long

 block/block-backend.c                       | 14 ++++-
 block/io.c                                  | 55 +++++++++++++++----
 block/qapi-system.c                         |  7 ++-
 include/block/aio-wait.h                    | 58 +++++++++++++++++++++
 include/block/block-io.h                    |  7 +++
 include/system/block-backend-global-state.h |  1 +
 util/aio-wait.c                             |  7 +++
 7 files changed, 137 insertions(+), 12 deletions(-)

-- 
2.33.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism.
  2025-03-08 10:16 [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
@ 2025-03-08 10:16 ` zoudongjie via
  2025-03-13  4:09   ` Stefan Hajnoczi
  2025-03-13  4:22   ` Stefan Hajnoczi
  2025-03-08 10:16 ` [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: zoudongjie via @ 2025-03-08 10:16 UTC (permalink / raw)
  To: zhuyangyang14, qemu-devel, stefanha, fam, hreitz, qemu-block
  Cc: qemu-stable, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, zoudongjie,
	chenjianfei3, renxuming

From: Zhu Yangyang <zhuyangyang14@huawei.com>

The bdrv_drained_begin() function is a blocking function. In scenarios where network storage
is used and network links fail, it may block for a long time.
Therefore, we add a timeout parameter to control the duration of the block.

Since bdrv_drained_begin() has been widely adopted, both bdrv_drained_begin()
and bdrv_drained_begin_timeout() will be retained.

Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
---
 block/io.c               | 55 ++++++++++++++++++++++++++++++-------
 include/block/aio-wait.h | 58 ++++++++++++++++++++++++++++++++++++++++
 include/block/block-io.h |  7 +++++
 util/aio-wait.c          |  7 +++++
 4 files changed, 117 insertions(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index d369b994df..03b8b2dca7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -255,6 +255,8 @@ typedef struct {
     bool begin;
     bool poll;
     BdrvChild *parent;
+    int ret;
+    int64_t timeout;
 } BdrvCoDrainData;
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
@@ -283,6 +285,8 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
     return bdrv_drain_poll(bs, ignore_parent, false);
 }
 
+static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
+    BdrvChild *parent, bool poll, int64_t timeout);
 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
                                   bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent);
@@ -296,7 +300,8 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     if (bs) {
         bdrv_dec_in_flight(bs);
         if (data->begin) {
-            bdrv_do_drained_begin(bs, data->parent, data->poll);
+            data->ret = bdrv_do_drained_begin_timeout(
+                bs, data->parent, data->poll, data->timeout);
         } else {
             assert(!data->poll);
             bdrv_do_drained_end(bs, data->parent);
@@ -310,10 +315,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     aio_co_wake(co);
 }
 
-static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
-                                                bool begin,
-                                                BdrvChild *parent,
-                                                bool poll)
+static int coroutine_fn bdrv_co_yield_to_drain_timeout(BlockDriverState *bs,
+                                                         bool begin,
+                                                         BdrvChild *parent,
+                                                         bool poll,
+                                                         int64_t timeout)
 {
     BdrvCoDrainData data;
     Coroutine *self = qemu_coroutine_self();
@@ -329,6 +335,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .begin = begin,
         .parent = parent,
         .poll = poll,
+        .timeout = timeout,
+        .ret = 0
     };
 
     if (bs) {
@@ -342,16 +350,25 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     /* If we are resumed from some other event (such as an aio completion or a
      * timer callback), it is a bug in the caller that should be fixed. */
     assert(data.done);
+    return data.ret;
 }
 
-static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
-                                  bool poll)
+static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
+                                                bool begin,
+                                                BdrvChild *parent,
+                                                bool poll)
+{
+    bdrv_co_yield_to_drain_timeout(bs, begin, parent, poll, -1);
+}
+
+static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
+    BdrvChild *parent, bool poll, int64_t timeout_ms)
 {
     IO_OR_GS_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, parent, poll);
-        return;
+        return bdrv_co_yield_to_drain_timeout(bs, true, parent, poll,
+                                              timeout_ms);
     }
 
     GLOBAL_STATE_CODE();
@@ -375,8 +392,20 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
      * nodes.
      */
     if (poll) {
-        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
+        if (timeout_ms < 0) {
+            BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
+        } else {
+            return BDRV_POLL_WHILE_TIMEOUT(
+                bs, bdrv_drain_poll_top_level(bs, parent), timeout_ms);
+        }
     }
+    return 0;
+}
+
+static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
+                                  bool poll)
+{
+    bdrv_do_drained_begin_timeout(bs, parent, poll, -1);
 }
 
 void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
@@ -390,6 +419,12 @@ bdrv_drained_begin(BlockDriverState *bs)
     IO_OR_GS_CODE();
     bdrv_do_drained_begin(bs, NULL, true);
 }
+int coroutine_mixed_fn
+bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms)
+{
+    IO_OR_GS_CODE();
+    return bdrv_do_drained_begin_timeout(bs, NULL, true, timeout_ms);
+}
 
 /**
  * This function does not poll, nor must any of its recursively called
diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index cf5e8bde1c..efbcb9777a 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -28,6 +28,8 @@
 #include "block/aio.h"
 #include "qemu/main-loop.h"
 
+#define AIO_WAIT_INTERVAL 10  /* ms */
+
 /**
  * AioWait:
  *
@@ -56,6 +58,11 @@ typedef struct {
     unsigned num_waiters;
 } AioWait;
 
+typedef struct {
+    struct QEMUTimer *timer;
+    int64_t interval;
+} AioWaitTimer;
+
 extern AioWait global_aio_wait;
 
 /**
@@ -99,6 +106,55 @@ extern AioWait global_aio_wait;
     qatomic_dec(&wait_->num_waiters);                              \
     waited_; })
 
+/**
+ * AIO_WAIT_WHILE_TIMEOUT:
+ *
+ * Refer to the implementation of AIO_WAIT_WHILE_INTERNAL,
+ * the timeout parameter is added.
+ */
+#define AIO_WAIT_WHILE_TIMEOUT(ctx, cond, timeout) ({                    \
+    int ret_ = 0;                                                        \
+    AioWait *wait_ = &global_aio_wait;                                   \
+    AioContext *ctx_ = (ctx);                                            \
+    int64_t start_ = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);             \
+    int64_t deadline_ = start_ + (timeout);                              \
+    /* Ensure that the aio_poll exits periodically to check timeout. */  \
+    AioWaitTimer *s_ = g_malloc0(sizeof(AioWaitTimer));                  \
+    s_->interval = AIO_WAIT_INTERVAL;                                    \
+    /* Increment wait_->num_waiters before evaluating cond. */           \
+    qatomic_inc(&wait_->num_waiters);                                    \
+    /* Paired with smp_mb in aio_wait_kick(). */                         \
+    smp_mb__after_rmw();                                                 \
+    if (ctx_ && in_aio_context_home_thread(ctx_)) {                      \
+        s_->timer = aio_timer_new(ctx_, QEMU_CLOCK_REALTIME,             \
+                        SCALE_MS, aio_wait_timer_retry, s_);             \
+        aio_wait_timer_retry(s_);                                        \
+        while ((cond)) {                                                 \
+            aio_poll(ctx_, true);                                        \
+            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
+                ret_ = -ETIMEDOUT;                                       \
+                break;                                                   \
+            }                                                            \
+        }                                                                \
+    } else {                                                             \
+        s_->timer = aio_timer_new(qemu_get_aio_context(),                \
+            QEMU_CLOCK_REALTIME, SCALE_MS, aio_wait_timer_retry, s_);    \
+        aio_wait_timer_retry(s_);                                        \
+        while ((cond)) {                                                 \
+            assert(qemu_get_current_aio_context() ==                     \
+                   qemu_get_aio_context());                              \
+            aio_poll(qemu_get_aio_context(), true);                      \
+            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
+                ret_ = -ETIMEDOUT;                                       \
+                break;                                                   \
+            }                                                            \
+        }                                                                \
+    }                                                                    \
+    qatomic_dec(&wait_->num_waiters);                                    \
+    timer_free(s_->timer);                                               \
+    g_free(s_);                                                          \
+    ret_; })
+
 #define AIO_WAIT_WHILE(ctx, cond)                                  \
     AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
@@ -149,4 +205,6 @@ static inline bool in_aio_context_home_thread(AioContext *ctx)
     }
 }
 
+void aio_wait_timer_retry(void *opaque);
+
 #endif /* QEMU_AIO_WAIT_H */
diff --git a/include/block/block-io.h b/include/block/block-io.h
index b49e0537dd..84f92d2b09 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -354,6 +354,11 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
     AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),              \
                    cond); })
 
+#define BDRV_POLL_WHILE_TIMEOUT(bs, cond, timeout) ({      \
+    BlockDriverState *bs_ = (bs);                          \
+    AIO_WAIT_WHILE_TIMEOUT(bdrv_get_aio_context(bs_),      \
+                           cond, timeout); })
+
 void bdrv_drain(BlockDriverState *bs);
 
 int co_wrapper_mixed_bdrv_rdlock
@@ -431,6 +436,8 @@ bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
  */
 void bdrv_drained_begin(BlockDriverState *bs);
 
+int bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms);
+
 /**
  * bdrv_do_drained_begin_quiesce:
  *
diff --git a/util/aio-wait.c b/util/aio-wait.c
index b5336cf5fd..9aed165529 100644
--- a/util/aio-wait.c
+++ b/util/aio-wait.c
@@ -84,3 +84,10 @@ void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
     aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
     AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
 }
+
+void aio_wait_timer_retry(void *opaque)
+{
+    AioWaitTimer *s = opaque;
+
+    timer_mod(s->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + s->interval);
+}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long
  2025-03-08 10:16 [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
  2025-03-08 10:16 ` [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism zoudongjie via
@ 2025-03-08 10:16 ` zoudongjie via
  2025-03-13  4:25   ` Stefan Hajnoczi
  2025-03-11  3:24 ` [PATCH 0/2] " zoudongjie via
  2025-03-13  4:27 ` Stefan Hajnoczi
  3 siblings, 1 reply; 11+ messages in thread
From: zoudongjie via @ 2025-03-08 10:16 UTC (permalink / raw)
  To: zhuyangyang14, qemu-devel, stefanha, fam, hreitz, qemu-block
  Cc: qemu-stable, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, zoudongjie,
	chenjianfei3, renxuming

From: Zhu Yangyang <zhuyangyang14@huawei.com>

bdrv_drained_begin() is blocked for a long time when network storage is used
and the network link has just failed.
Therefore, the timeout period is set here.

Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
---
 block/block-backend.c                       | 14 +++++++++++++-
 block/qapi-system.c                         |  7 ++++++-
 include/system/block-backend-global-state.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9288f7e1c6..409d4559c3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2691,20 +2691,32 @@ void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg)
 }
 
 void blk_io_limits_disable(BlockBackend *blk)
+{
+    blk_io_limits_disable_timeout(blk, -1);
+}
+
+int blk_io_limits_disable_timeout(BlockBackend *blk, int64_t timeout_ms)
 {
     BlockDriverState *bs = blk_bs(blk);
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
+    int ret = 0;
+
     assert(tgm->throttle_state);
     GLOBAL_STATE_CODE();
     if (bs) {
         bdrv_ref(bs);
-        bdrv_drained_begin(bs);
+        ret = bdrv_drained_begin_timeout(bs, timeout_ms);
+        if (ret < 0) {
+            goto out;
+        }
     }
     throttle_group_unregister_tgm(tgm);
+out:
     if (bs) {
         bdrv_drained_end(bs);
         bdrv_unref(bs);
     }
+    return ret;
 }
 
 /* should be called before blk_set_io_limits if a limit is set */
diff --git a/block/qapi-system.c b/block/qapi-system.c
index 54b7409b2b..96fa8e5e51 100644
--- a/block/qapi-system.c
+++ b/block/qapi-system.c
@@ -39,6 +39,8 @@
 #include "system/block-backend.h"
 #include "system/blockdev.h"
 
+#define QMP_BLOCKING_TIMEOUT 5000 /* ms */
+
 static BlockBackend *qmp_get_blk(const char *blk_name, const char *qdev_id,
                                  Error **errp)
 {
@@ -502,7 +504,10 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
         blk_set_io_limits(blk, &cfg);
     } else if (blk_get_public(blk)->throttle_group_member.throttle_state) {
         /* If all throttling settings are set to 0, disable I/O limits */
-        blk_io_limits_disable(blk);
+        if (blk_io_limits_disable_timeout(blk, QMP_BLOCKING_TIMEOUT) < 0) {
+            error_setg(errp, "Blk io limits disable timeout");
+            return;
+        }
     }
 }
 
diff --git a/include/system/block-backend-global-state.h b/include/system/block-backend-global-state.h
index 9cc9b008ec..e55ea9dc64 100644
--- a/include/system/block-backend-global-state.h
+++ b/include/system/block-backend-global-state.h
@@ -111,6 +111,7 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
 
 void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg);
 void blk_io_limits_disable(BlockBackend *blk);
+int blk_io_limits_disable_timeout(BlockBackend *blk, int64_t timeout_ms);
 void blk_io_limits_enable(BlockBackend *blk, const char *group);
 void blk_io_limits_update_group(BlockBackend *blk, const char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long
  2025-03-08 10:16 [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
  2025-03-08 10:16 ` [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism zoudongjie via
  2025-03-08 10:16 ` [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
@ 2025-03-11  3:24 ` zoudongjie via
  2025-03-13  4:27 ` Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: zoudongjie via @ 2025-03-11  3:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, fam, hreitz, kwolf, qemu-block, zhuyangyang14,
	qemu-stable, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, zoudongjie,
	chenjianfei3, renxuming

Ping.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism.
  2025-03-08 10:16 ` [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism zoudongjie via
@ 2025-03-13  4:09   ` Stefan Hajnoczi
  2025-03-17 12:18     ` zoudongjie via
  2025-03-13  4:22   ` Stefan Hajnoczi
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-03-13  4:09 UTC (permalink / raw)
  To: zoudongjie
  Cc: zhuyangyang14, qemu-devel, fam, hreitz, qemu-block, qemu-stable,
	luolongmin, suxiaodong1, wangyan122, yebiaoxiang, wangjian161,
	mujinsheng, alex.chen, eric.fangyi, chenjianfei3, renxuming

[-- Attachment #1: Type: text/plain, Size: 12861 bytes --]

On Sat, Mar 08, 2025 at 06:16:17PM +0800, zoudongjie wrote:
> From: Zhu Yangyang <zhuyangyang14@huawei.com>
> 
> The bdrv_drained_begin() function is a blocking function. In scenarios where network storage
> is used and network links fail, it may block for a long time.
> Therefore, we add a timeout parameter to control the duration of the block.
> 
> Since bdrv_drained_begin() has been widely adopted, both bdrv_drained_begin()
> and bdrv_drained_begin_timeout() will be retained.
> 
> Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
> ---
>  block/io.c               | 55 ++++++++++++++++++++++++++++++-------
>  include/block/aio-wait.h | 58 ++++++++++++++++++++++++++++++++++++++++
>  include/block/block-io.h |  7 +++++
>  util/aio-wait.c          |  7 +++++
>  4 files changed, 117 insertions(+), 10 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index d369b994df..03b8b2dca7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -255,6 +255,8 @@ typedef struct {
>      bool begin;
>      bool poll;
>      BdrvChild *parent;
> +    int ret;
> +    int64_t timeout;

Please put the units (milliseconds) into the variable name here and
everywhere else in the patch to avoid confusion about units:

  int64_t timeout_ms;

>  } BdrvCoDrainData;
>  
>  /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
> @@ -283,6 +285,8 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
>      return bdrv_drain_poll(bs, ignore_parent, false);
>  }
>  
> +static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
> +    BdrvChild *parent, bool poll, int64_t timeout);
>  static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
>                                    bool poll);
>  static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent);
> @@ -296,7 +300,8 @@ static void bdrv_co_drain_bh_cb(void *opaque)
>      if (bs) {
>          bdrv_dec_in_flight(bs);
>          if (data->begin) {
> -            bdrv_do_drained_begin(bs, data->parent, data->poll);
> +            data->ret = bdrv_do_drained_begin_timeout(
> +                bs, data->parent, data->poll, data->timeout);
>          } else {
>              assert(!data->poll);
>              bdrv_do_drained_end(bs, data->parent);
> @@ -310,10 +315,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
>      aio_co_wake(co);
>  }
>  
> -static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> -                                                bool begin,
> -                                                BdrvChild *parent,
> -                                                bool poll)
> +static int coroutine_fn bdrv_co_yield_to_drain_timeout(BlockDriverState *bs,
> +                                                         bool begin,
> +                                                         BdrvChild *parent,
> +                                                         bool poll,
> +                                                         int64_t timeout)
>  {
>      BdrvCoDrainData data;
>      Coroutine *self = qemu_coroutine_self();
> @@ -329,6 +335,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
>          .begin = begin,
>          .parent = parent,
>          .poll = poll,
> +        .timeout = timeout,
> +        .ret = 0
>      };
>  
>      if (bs) {
> @@ -342,16 +350,25 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
>      /* If we are resumed from some other event (such as an aio completion or a
>       * timer callback), it is a bug in the caller that should be fixed. */
>      assert(data.done);
> +    return data.ret;
>  }
>  
> -static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> -                                  bool poll)
> +static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> +                                                bool begin,
> +                                                BdrvChild *parent,
> +                                                bool poll)
> +{
> +    bdrv_co_yield_to_drain_timeout(bs, begin, parent, poll, -1);

Is this safe on 32-bit platforms?

> +}
> +
> +static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
> +    BdrvChild *parent, bool poll, int64_t timeout_ms)
>  {
>      IO_OR_GS_CODE();
>  
>      if (qemu_in_coroutine()) {
> -        bdrv_co_yield_to_drain(bs, true, parent, poll);
> -        return;
> +        return bdrv_co_yield_to_drain_timeout(bs, true, parent, poll,
> +                                              timeout_ms);
>      }
>  
>      GLOBAL_STATE_CODE();
> @@ -375,8 +392,20 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
>       * nodes.
>       */
>      if (poll) {
> -        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> +        if (timeout_ms < 0) {
> +            BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> +        } else {
> +            return BDRV_POLL_WHILE_TIMEOUT(
> +                bs, bdrv_drain_poll_top_level(bs, parent), timeout_ms);
> +        }

Any reason to handle timeout_ms < 0 here instead of in
BDRV_POLL_WHILE_TIMEOUT()? It would be more consistent to support -1 in
BDRV_POLL_WHILE_TIMEOUT() so that you don't need to remember which
functions/macros support timeout_ms=-1 and which dont.

>      }
> +    return 0;
> +}
> +
> +static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> +                                  bool poll)
> +{
> +    bdrv_do_drained_begin_timeout(bs, parent, poll, -1);
>  }
>  
>  void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
> @@ -390,6 +419,12 @@ bdrv_drained_begin(BlockDriverState *bs)
>      IO_OR_GS_CODE();
>      bdrv_do_drained_begin(bs, NULL, true);
>  }
> +int coroutine_mixed_fn
> +bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms)
> +{
> +    IO_OR_GS_CODE();
> +    return bdrv_do_drained_begin_timeout(bs, NULL, true, timeout_ms);
> +}
>  
>  /**
>   * This function does not poll, nor must any of its recursively called
> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> index cf5e8bde1c..efbcb9777a 100644
> --- a/include/block/aio-wait.h
> +++ b/include/block/aio-wait.h
> @@ -28,6 +28,8 @@
>  #include "block/aio.h"
>  #include "qemu/main-loop.h"
>  
> +#define AIO_WAIT_INTERVAL 10  /* ms */
> +
>  /**
>   * AioWait:
>   *
> @@ -56,6 +58,11 @@ typedef struct {
>      unsigned num_waiters;
>  } AioWait;
>  
> +typedef struct {
> +    struct QEMUTimer *timer;

struct is not necessary since QEMUTimer is a typedef:

  QEMUTimer *timer;

Also, can this be a struct field instead of a pointer by using
aio_timer_init_ms() instead of aio_timer_new()?

> +    int64_t interval;
> +} AioWaitTimer;
> +
>  extern AioWait global_aio_wait;
>  
>  /**
> @@ -99,6 +106,55 @@ extern AioWait global_aio_wait;
>      qatomic_dec(&wait_->num_waiters);                              \
>      waited_; })
>  
> +/**
> + * AIO_WAIT_WHILE_TIMEOUT:
> + *
> + * Refer to the implementation of AIO_WAIT_WHILE_INTERNAL,
> + * the timeout parameter is added.
> + */
> +#define AIO_WAIT_WHILE_TIMEOUT(ctx, cond, timeout) ({                    \
> +    int ret_ = 0;                                                        \
> +    AioWait *wait_ = &global_aio_wait;                                   \
> +    AioContext *ctx_ = (ctx);                                            \
> +    int64_t start_ = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);             \
> +    int64_t deadline_ = start_ + (timeout);                              \
> +    /* Ensure that the aio_poll exits periodically to check timeout. */  \
> +    AioWaitTimer *s_ = g_malloc0(sizeof(AioWaitTimer));                  \

Can this be allocated on the stack instead of the heap?

> +    s_->interval = AIO_WAIT_INTERVAL;                                    \
> +    /* Increment wait_->num_waiters before evaluating cond. */           \
> +    qatomic_inc(&wait_->num_waiters);                                    \
> +    /* Paired with smp_mb in aio_wait_kick(). */                         \
> +    smp_mb__after_rmw();                                                 \
> +    if (ctx_ && in_aio_context_home_thread(ctx_)) {                      \
> +        s_->timer = aio_timer_new(ctx_, QEMU_CLOCK_REALTIME,             \
> +                        SCALE_MS, aio_wait_timer_retry, s_);             \
> +        aio_wait_timer_retry(s_);                                        \
> +        while ((cond)) {                                                 \
> +            aio_poll(ctx_, true);                                        \
> +            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
> +                ret_ = -ETIMEDOUT;                                       \
> +                break;                                                   \
> +            }                                                            \
> +        }                                                                \

What is the purpose of interval?

I expected the timer's callback function to be an empty function that is
called when the deadline expires. The while loop here would use
timer_pending() to check for expiry instead of explicitly checking
against the deadline.

> +    } else {                                                             \
> +        s_->timer = aio_timer_new(qemu_get_aio_context(),                \
> +            QEMU_CLOCK_REALTIME, SCALE_MS, aio_wait_timer_retry, s_);    \
> +        aio_wait_timer_retry(s_);                                        \
> +        while ((cond)) {                                                 \
> +            assert(qemu_get_current_aio_context() ==                     \
> +                   qemu_get_aio_context());                              \
> +            aio_poll(qemu_get_aio_context(), true);                      \
> +            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
> +                ret_ = -ETIMEDOUT;                                       \
> +                break;                                                   \
> +            }                                                            \
> +        }                                                                \
> +    }                                                                    \
> +    qatomic_dec(&wait_->num_waiters);                                    \
> +    timer_free(s_->timer);                                               \

This will need to be timer_del() when the QEMUTimer is moved onto the
stack.

> +    g_free(s_);                                                          \
> +    ret_; })
> +
>  #define AIO_WAIT_WHILE(ctx, cond)                                  \
>      AIO_WAIT_WHILE_INTERNAL(ctx, cond)
>  
> @@ -149,4 +205,6 @@ static inline bool in_aio_context_home_thread(AioContext *ctx)
>      }
>  }
>  
> +void aio_wait_timer_retry(void *opaque);
> +
>  #endif /* QEMU_AIO_WAIT_H */
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index b49e0537dd..84f92d2b09 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -354,6 +354,11 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
>      AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),              \
>                     cond); })
>  
> +#define BDRV_POLL_WHILE_TIMEOUT(bs, cond, timeout) ({      \
> +    BlockDriverState *bs_ = (bs);                          \
> +    AIO_WAIT_WHILE_TIMEOUT(bdrv_get_aio_context(bs_),      \
> +                           cond, timeout); })
> +
>  void bdrv_drain(BlockDriverState *bs);
>  
>  int co_wrapper_mixed_bdrv_rdlock
> @@ -431,6 +436,8 @@ bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
>   */
>  void bdrv_drained_begin(BlockDriverState *bs);
>  
> +int bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms);
> +
>  /**
>   * bdrv_do_drained_begin_quiesce:
>   *
> diff --git a/util/aio-wait.c b/util/aio-wait.c
> index b5336cf5fd..9aed165529 100644
> --- a/util/aio-wait.c
> +++ b/util/aio-wait.c
> @@ -84,3 +84,10 @@ void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
>      aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
>      AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
>  }
> +
> +void aio_wait_timer_retry(void *opaque)
> +{
> +    AioWaitTimer *s = opaque;
> +
> +    timer_mod(s->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + s->interval);
> +}
> -- 
> 2.33.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism.
  2025-03-08 10:16 ` [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism zoudongjie via
  2025-03-13  4:09   ` Stefan Hajnoczi
@ 2025-03-13  4:22   ` Stefan Hajnoczi
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-03-13  4:22 UTC (permalink / raw)
  To: zoudongjie
  Cc: zhuyangyang14, qemu-devel, fam, hreitz, qemu-block, qemu-stable,
	luolongmin, suxiaodong1, wangyan122, yebiaoxiang, wangjian161,
	mujinsheng, alex.chen, eric.fangyi, chenjianfei3, renxuming

[-- Attachment #1: Type: text/plain, Size: 11902 bytes --]

On Sat, Mar 08, 2025 at 06:16:17PM +0800, zoudongjie wrote:
> From: Zhu Yangyang <zhuyangyang14@huawei.com>
> 
> The bdrv_drained_begin() function is a blocking function. In scenarios where network storage
> is used and network links fail, it may block for a long time.
> Therefore, we add a timeout parameter to control the duration of the block.
> 
> Since bdrv_drained_begin() has been widely adopted, both bdrv_drained_begin()
> and bdrv_drained_begin_timeout() will be retained.
> 
> Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
> ---
>  block/io.c               | 55 ++++++++++++++++++++++++++++++-------
>  include/block/aio-wait.h | 58 ++++++++++++++++++++++++++++++++++++++++
>  include/block/block-io.h |  7 +++++
>  util/aio-wait.c          |  7 +++++
>  4 files changed, 117 insertions(+), 10 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index d369b994df..03b8b2dca7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -255,6 +255,8 @@ typedef struct {
>      bool begin;
>      bool poll;
>      BdrvChild *parent;
> +    int ret;
> +    int64_t timeout;
>  } BdrvCoDrainData;
>  
>  /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
> @@ -283,6 +285,8 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
>      return bdrv_drain_poll(bs, ignore_parent, false);
>  }
>  
> +static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
> +    BdrvChild *parent, bool poll, int64_t timeout);
>  static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
>                                    bool poll);
>  static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent);
> @@ -296,7 +300,8 @@ static void bdrv_co_drain_bh_cb(void *opaque)
>      if (bs) {
>          bdrv_dec_in_flight(bs);
>          if (data->begin) {
> -            bdrv_do_drained_begin(bs, data->parent, data->poll);
> +            data->ret = bdrv_do_drained_begin_timeout(
> +                bs, data->parent, data->poll, data->timeout);
>          } else {
>              assert(!data->poll);
>              bdrv_do_drained_end(bs, data->parent);
> @@ -310,10 +315,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
>      aio_co_wake(co);
>  }
>  
> -static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> -                                                bool begin,
> -                                                BdrvChild *parent,
> -                                                bool poll)
> +static int coroutine_fn bdrv_co_yield_to_drain_timeout(BlockDriverState *bs,
> +                                                         bool begin,
> +                                                         BdrvChild *parent,
> +                                                         bool poll,
> +                                                         int64_t timeout)
>  {
>      BdrvCoDrainData data;
>      Coroutine *self = qemu_coroutine_self();
> @@ -329,6 +335,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
>          .begin = begin,
>          .parent = parent,
>          .poll = poll,
> +        .timeout = timeout,
> +        .ret = 0
>      };
>  
>      if (bs) {
> @@ -342,16 +350,25 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
>      /* If we are resumed from some other event (such as an aio completion or a
>       * timer callback), it is a bug in the caller that should be fixed. */
>      assert(data.done);
> +    return data.ret;
>  }
>  
> -static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> -                                  bool poll)
> +static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> +                                                bool begin,
> +                                                BdrvChild *parent,
> +                                                bool poll)
> +{
> +    bdrv_co_yield_to_drain_timeout(bs, begin, parent, poll, -1);
> +}
> +
> +static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
> +    BdrvChild *parent, bool poll, int64_t timeout_ms)
>  {
>      IO_OR_GS_CODE();
>  
>      if (qemu_in_coroutine()) {
> -        bdrv_co_yield_to_drain(bs, true, parent, poll);
> -        return;
> +        return bdrv_co_yield_to_drain_timeout(bs, true, parent, poll,
> +                                              timeout_ms);
>      }
>  
>      GLOBAL_STATE_CODE();
> @@ -375,8 +392,20 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
>       * nodes.
>       */
>      if (poll) {
> -        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> +        if (timeout_ms < 0) {
> +            BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> +        } else {
> +            return BDRV_POLL_WHILE_TIMEOUT(
> +                bs, bdrv_drain_poll_top_level(bs, parent), timeout_ms);
> +        }
>      }
> +    return 0;
> +}
> +
> +static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> +                                  bool poll)
> +{
> +    bdrv_do_drained_begin_timeout(bs, parent, poll, -1);
>  }
>  
>  void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
> @@ -390,6 +419,12 @@ bdrv_drained_begin(BlockDriverState *bs)
>      IO_OR_GS_CODE();
>      bdrv_do_drained_begin(bs, NULL, true);
>  }
> +int coroutine_mixed_fn
> +bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms)
> +{
> +    IO_OR_GS_CODE();
> +    return bdrv_do_drained_begin_timeout(bs, NULL, true, timeout_ms);
> +}
>  
>  /**
>   * This function does not poll, nor must any of its recursively called
> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> index cf5e8bde1c..efbcb9777a 100644
> --- a/include/block/aio-wait.h
> +++ b/include/block/aio-wait.h
> @@ -28,6 +28,8 @@
>  #include "block/aio.h"
>  #include "qemu/main-loop.h"
>  
> +#define AIO_WAIT_INTERVAL 10  /* ms */
> +
>  /**
>   * AioWait:
>   *
> @@ -56,6 +58,11 @@ typedef struct {
>      unsigned num_waiters;
>  } AioWait;
>  
> +typedef struct {
> +    struct QEMUTimer *timer;
> +    int64_t interval;
> +} AioWaitTimer;
> +
>  extern AioWait global_aio_wait;
>  
>  /**
> @@ -99,6 +106,55 @@ extern AioWait global_aio_wait;
>      qatomic_dec(&wait_->num_waiters);                              \
>      waited_; })
>  
> +/**
> + * AIO_WAIT_WHILE_TIMEOUT:
> + *
> + * Refer to the implementation of AIO_WAIT_WHILE_INTERNAL,
> + * the timeout parameter is added.
> + */
> +#define AIO_WAIT_WHILE_TIMEOUT(ctx, cond, timeout) ({                    \
> +    int ret_ = 0;                                                        \
> +    AioWait *wait_ = &global_aio_wait;                                   \
> +    AioContext *ctx_ = (ctx);                                            \
> +    int64_t start_ = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);             \
> +    int64_t deadline_ = start_ + (timeout);                              \
> +    /* Ensure that the aio_poll exits periodically to check timeout. */  \
> +    AioWaitTimer *s_ = g_malloc0(sizeof(AioWaitTimer));                  \
> +    s_->interval = AIO_WAIT_INTERVAL;                                    \
> +    /* Increment wait_->num_waiters before evaluating cond. */           \
> +    qatomic_inc(&wait_->num_waiters);                                    \
> +    /* Paired with smp_mb in aio_wait_kick(). */                         \
> +    smp_mb__after_rmw();                                                 \
> +    if (ctx_ && in_aio_context_home_thread(ctx_)) {                      \
> +        s_->timer = aio_timer_new(ctx_, QEMU_CLOCK_REALTIME,             \
> +                        SCALE_MS, aio_wait_timer_retry, s_);             \
> +        aio_wait_timer_retry(s_);                                        \
> +        while ((cond)) {                                                 \
> +            aio_poll(ctx_, true);                                        \
> +            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
> +                ret_ = -ETIMEDOUT;                                       \
> +                break;                                                   \
> +            }                                                            \
> +        }                                                                \
> +    } else {                                                             \
> +        s_->timer = aio_timer_new(qemu_get_aio_context(),                \
> +            QEMU_CLOCK_REALTIME, SCALE_MS, aio_wait_timer_retry, s_);    \
> +        aio_wait_timer_retry(s_);                                        \
> +        while ((cond)) {                                                 \
> +            assert(qemu_get_current_aio_context() ==                     \
> +                   qemu_get_aio_context());                              \
> +            aio_poll(qemu_get_aio_context(), true);                      \
> +            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
> +                ret_ = -ETIMEDOUT;                                       \
> +                break;                                                   \
> +            }                                                            \
> +        }                                                                \
> +    }                                                                    \
> +    qatomic_dec(&wait_->num_waiters);                                    \
> +    timer_free(s_->timer);                                               \
> +    g_free(s_);                                                          \
> +    ret_; })
> +
>  #define AIO_WAIT_WHILE(ctx, cond)                                  \
>      AIO_WAIT_WHILE_INTERNAL(ctx, cond)
>  
> @@ -149,4 +205,6 @@ static inline bool in_aio_context_home_thread(AioContext *ctx)
>      }
>  }
>  
> +void aio_wait_timer_retry(void *opaque);
> +
>  #endif /* QEMU_AIO_WAIT_H */
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index b49e0537dd..84f92d2b09 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -354,6 +354,11 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
>      AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),              \
>                     cond); })
>  
> +#define BDRV_POLL_WHILE_TIMEOUT(bs, cond, timeout) ({      \
> +    BlockDriverState *bs_ = (bs);                          \
> +    AIO_WAIT_WHILE_TIMEOUT(bdrv_get_aio_context(bs_),      \
> +                           cond, timeout); })
> +
>  void bdrv_drain(BlockDriverState *bs);
>  
>  int co_wrapper_mixed_bdrv_rdlock
> @@ -431,6 +436,8 @@ bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
>   */
>  void bdrv_drained_begin(BlockDriverState *bs);
>  
> +int bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms);

Missing documentation, especially that bdrv_drained_end() must be called
when -ETIMEDOUT is returned.

> +
>  /**
>   * bdrv_do_drained_begin_quiesce:
>   *
> diff --git a/util/aio-wait.c b/util/aio-wait.c
> index b5336cf5fd..9aed165529 100644
> --- a/util/aio-wait.c
> +++ b/util/aio-wait.c
> @@ -84,3 +84,10 @@ void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
>      aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
>      AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
>  }
> +
> +void aio_wait_timer_retry(void *opaque)
> +{
> +    AioWaitTimer *s = opaque;
> +
> +    timer_mod(s->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + s->interval);
> +}
> -- 
> 2.33.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long
  2025-03-08 10:16 ` [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
@ 2025-03-13  4:25   ` Stefan Hajnoczi
  2025-03-17 12:59     ` zoudongjie via
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-03-13  4:25 UTC (permalink / raw)
  To: zoudongjie
  Cc: zhuyangyang14, qemu-devel, fam, hreitz, qemu-block, qemu-stable,
	luolongmin, suxiaodong1, wangyan122, yebiaoxiang, wangjian161,
	mujinsheng, alex.chen, eric.fangyi, chenjianfei3, renxuming

[-- Attachment #1: Type: text/plain, Size: 3708 bytes --]

On Sat, Mar 08, 2025 at 06:16:18PM +0800, zoudongjie wrote:
> From: Zhu Yangyang <zhuyangyang14@huawei.com>
> 
> bdrv_drained_begin() is blocked for a long time when network storage is used
> and the network link has just failed.
> Therefore, the timeout period is set here.
> 
> Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
> ---
>  block/block-backend.c                       | 14 +++++++++++++-
>  block/qapi-system.c                         |  7 ++++++-
>  include/system/block-backend-global-state.h |  1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 9288f7e1c6..409d4559c3 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2691,20 +2691,32 @@ void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg)
>  }
>  
>  void blk_io_limits_disable(BlockBackend *blk)
> +{
> +    blk_io_limits_disable_timeout(blk, -1);
> +}
> +
> +int blk_io_limits_disable_timeout(BlockBackend *blk, int64_t timeout_ms)
>  {
>      BlockDriverState *bs = blk_bs(blk);
>      ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
> +    int ret = 0;
> +
>      assert(tgm->throttle_state);
>      GLOBAL_STATE_CODE();
>      if (bs) {
>          bdrv_ref(bs);
> -        bdrv_drained_begin(bs);
> +        ret = bdrv_drained_begin_timeout(bs, timeout_ms);
> +        if (ret < 0) {
> +            goto out;
> +        }
>      }
>      throttle_group_unregister_tgm(tgm);
> +out:
>      if (bs) {
>          bdrv_drained_end(bs);
>          bdrv_unref(bs);
>      }
> +    return ret;
>  }
>  
>  /* should be called before blk_set_io_limits if a limit is set */
> diff --git a/block/qapi-system.c b/block/qapi-system.c
> index 54b7409b2b..96fa8e5e51 100644
> --- a/block/qapi-system.c
> +++ b/block/qapi-system.c
> @@ -39,6 +39,8 @@
>  #include "system/block-backend.h"
>  #include "system/blockdev.h"
>  
> +#define QMP_BLOCKING_TIMEOUT 5000 /* ms */
> +
>  static BlockBackend *qmp_get_blk(const char *blk_name, const char *qdev_id,
>                                   Error **errp)
>  {
> @@ -502,7 +504,10 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
>          blk_set_io_limits(blk, &cfg);
>      } else if (blk_get_public(blk)->throttle_group_member.throttle_state) {
>          /* If all throttling settings are set to 0, disable I/O limits */
> -        blk_io_limits_disable(blk);
> +        if (blk_io_limits_disable_timeout(blk, QMP_BLOCKING_TIMEOUT) < 0) {

Please add a timeout parameter to the QMP command instead of hardcoding
the timeout. It should also support an infinite timeout and that should
be the default (so that existing users aren't surprised with a new error
they don't handle).

> +            error_setg(errp, "Blk io limits disable timeout");
> +            return;
> +        }
>      }
>  }
>  
> diff --git a/include/system/block-backend-global-state.h b/include/system/block-backend-global-state.h
> index 9cc9b008ec..e55ea9dc64 100644
> --- a/include/system/block-backend-global-state.h
> +++ b/include/system/block-backend-global-state.h
> @@ -111,6 +111,7 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
>  
>  void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg);
>  void blk_io_limits_disable(BlockBackend *blk);
> +int blk_io_limits_disable_timeout(BlockBackend *blk, int64_t timeout_ms);
>  void blk_io_limits_enable(BlockBackend *blk, const char *group);
>  void blk_io_limits_update_group(BlockBackend *blk, const char *group);
>  void blk_set_force_allow_inactivate(BlockBackend *blk);
> -- 
> 2.33.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long
  2025-03-08 10:16 [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
                   ` (2 preceding siblings ...)
  2025-03-11  3:24 ` [PATCH 0/2] " zoudongjie via
@ 2025-03-13  4:27 ` Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-03-13  4:27 UTC (permalink / raw)
  To: zoudongjie
  Cc: zhuyangyang14, qemu-devel, fam, hreitz, qemu-block, qemu-stable,
	luolongmin, suxiaodong1, wangyan122, yebiaoxiang, wangjian161,
	mujinsheng, alex.chen, eric.fangyi, chenjianfei3, renxuming

[-- Attachment #1: Type: text/plain, Size: 227 bytes --]

The idea of adding a timeout parameter sounds good to me.

I think it's safe to call bdrv_drained_end() after the timeout fails,
but I'm not 100% sure. Maybe Kevin has thoughts on this.

I left comments on the patches.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism.
  2025-03-13  4:09   ` Stefan Hajnoczi
@ 2025-03-17 12:18     ` zoudongjie via
  2025-03-17 14:57       ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: zoudongjie via @ 2025-03-17 12:18 UTC (permalink / raw)
  To: stefanha
  Cc: fam, hreitz, kwolf, qemu-block, qemu-devel, qemu-stable,
	zhuyangyang14, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, zoudongjie,
	chenjianfei3, renxuming

On Thu, 13 Mar, 2025 at 12:09:45 +0800, Stefan Hajnoczi wrote:
> On Sat, Mar 08, 2025 at 06:16:17PM +0800, zoudongjie wrote:
> > From: Zhu Yangyang <zhuyangyang14@huawei.com>
> > 
> > The bdrv_drained_begin() function is a blocking function. In scenarios where network storage
> > is used and network links fail, it may block for a long time.
> > Therefore, we add a timeout parameter to control the duration of the block.
> > 
> > Since bdrv_drained_begin() has been widely adopted, both bdrv_drained_begin()
> > and bdrv_drained_begin_timeout() will be retained.
> > 
> > Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
> > ---
> >  block/io.c               | 55 ++++++++++++++++++++++++++++++-------
> >  include/block/aio-wait.h | 58 ++++++++++++++++++++++++++++++++++++++++
> >  include/block/block-io.h |  7 +++++
> >  util/aio-wait.c          |  7 +++++
> >  4 files changed, 117 insertions(+), 10 deletions(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index d369b994df..03b8b2dca7 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -255,6 +255,8 @@ typedef struct {
> >      bool begin;
> >      bool poll;
> >      BdrvChild *parent;
> > +    int ret;
> > +    int64_t timeout;
> 
> Please put the units (milliseconds) into the variable name here and
> everywhere else in the patch to avoid confusion about units:
> 
>   int64_t timeout_ms;

Ok, I'm going to modify it in patch V2.

> 
> >  } BdrvCoDrainData;
> >  
> >  /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
> > @@ -283,6 +285,8 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
> >      return bdrv_drain_poll(bs, ignore_parent, false);
> >  }
> >  
> > +static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
> > +    BdrvChild *parent, bool poll, int64_t timeout);
> >  static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> >                                    bool poll);
> >  static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent);
> > @@ -296,7 +300,8 @@ static void bdrv_co_drain_bh_cb(void *opaque)
> >      if (bs) {
> >          bdrv_dec_in_flight(bs);
> >          if (data->begin) {
> > -            bdrv_do_drained_begin(bs, data->parent, data->poll);
> > +            data->ret = bdrv_do_drained_begin_timeout(
> > +                bs, data->parent, data->poll, data->timeout);
> >          } else {
> >              assert(!data->poll);
> >              bdrv_do_drained_end(bs, data->parent);
> > @@ -310,10 +315,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
> >      aio_co_wake(co);
> >  }
> >  
> > -static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> > -                                                bool begin,
> > -                                                BdrvChild *parent,
> > -                                                bool poll)
> > +static int coroutine_fn bdrv_co_yield_to_drain_timeout(BlockDriverState *bs,
> > +                                                         bool begin,
> > +                                                         BdrvChild *parent,
> > +                                                         bool poll,
> > +                                                         int64_t timeout)
> >  {
> >      BdrvCoDrainData data;
> >      Coroutine *self = qemu_coroutine_self();
> > @@ -329,6 +335,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> >          .begin = begin,
> >          .parent = parent,
> >          .poll = poll,
> > +        .timeout = timeout,
> > +        .ret = 0
> >      };
> >  
> >      if (bs) {
> > @@ -342,16 +350,25 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> >      /* If we are resumed from some other event (such as an aio completion or a
> >       * timer callback), it is a bug in the caller that should be fixed. */
> >      assert(data.done);
> > +    return data.ret;
> >  }
> >  
> > -static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> > -                                  bool poll)
> > +static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> > +                                                bool begin,
> > +                                                BdrvChild *parent,
> > +                                                bool poll)
> > +{
> > +    bdrv_co_yield_to_drain_timeout(bs, begin, parent, poll, -1);
> 
> Is this safe on 32-bit platforms?

I'm sorry, can it be more specific here, I didn't get it.

> 
> > +}
> > +
> > +static int bdrv_do_drained_begin_timeout(BlockDriverState *bs,
> > +    BdrvChild *parent, bool poll, int64_t timeout_ms)
> >  {
> >      IO_OR_GS_CODE();
> >  
> >      if (qemu_in_coroutine()) {
> > -        bdrv_co_yield_to_drain(bs, true, parent, poll);
> > -        return;
> > +        return bdrv_co_yield_to_drain_timeout(bs, true, parent, poll,
> > +                                              timeout_ms);
> >      }
> >  
> >      GLOBAL_STATE_CODE();
> > @@ -375,8 +392,20 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> >       * nodes.
> >       */
> >      if (poll) {
> > -        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> > +        if (timeout_ms < 0) {
> > +            BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
> > +        } else {
> > +            return BDRV_POLL_WHILE_TIMEOUT(
> > +                bs, bdrv_drain_poll_top_level(bs, parent), timeout_ms);
> > +        }
> 
> Any reason to handle timeout_ms < 0 here instead of in
> BDRV_POLL_WHILE_TIMEOUT()? It would be more consistent to support -1 in
> BDRV_POLL_WHILE_TIMEOUT() so that you don't need to remember which
> functions/macros support timeout_ms=-1 and which dont.

Previously, BDRV_POLL_WHILE_TIMEOUT() was not done very well, aio_poll() exits
frequently because interval is used in the timer. but now I will support -1 in
BDRV_POLL_WHILE_TIMEOUT().

> 
> >      }
> > +    return 0;
> > +}
> > +
> > +static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> > +                                  bool poll)
> > +{
> > +    bdrv_do_drained_begin_timeout(bs, parent, poll, -1);
> >  }
> >  
> >  void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
> > @@ -390,6 +419,12 @@ bdrv_drained_begin(BlockDriverState *bs)
> >      IO_OR_GS_CODE();
> >      bdrv_do_drained_begin(bs, NULL, true);
> >  }
> > +int coroutine_mixed_fn
> > +bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms)
> > +{
> > +    IO_OR_GS_CODE();
> > +    return bdrv_do_drained_begin_timeout(bs, NULL, true, timeout_ms);
> > +}
> >  
> >  /**
> >   * This function does not poll, nor must any of its recursively called
> > diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> > index cf5e8bde1c..efbcb9777a 100644
> > --- a/include/block/aio-wait.h
> > +++ b/include/block/aio-wait.h
> > @@ -28,6 +28,8 @@
> >  #include "block/aio.h"
> >  #include "qemu/main-loop.h"
> >  
> > +#define AIO_WAIT_INTERVAL 10  /* ms */
> > +
> >  /**
> >   * AioWait:
> >   *
> > @@ -56,6 +58,11 @@ typedef struct {
> >      unsigned num_waiters;
> >  } AioWait;
> >  
> > +typedef struct {
> > +    struct QEMUTimer *timer;
> 
> struct is not necessary since QEMUTimer is a typedef:
> 
>   QEMUTimer *timer;
> 
> Also, can this be a struct field instead of a pointer by using
> aio_timer_init_ms() instead of aio_timer_new()?

Ok, I'm going to modify it in patch V2.

> 
> > +    int64_t interval;
> > +} AioWaitTimer;
> > +
> >  extern AioWait global_aio_wait;
> >  
> >  /**
> > @@ -99,6 +106,55 @@ extern AioWait global_aio_wait;
> >      qatomic_dec(&wait_->num_waiters);                              \
> >      waited_; })
> >  
> > +/**
> > + * AIO_WAIT_WHILE_TIMEOUT:
> > + *
> > + * Refer to the implementation of AIO_WAIT_WHILE_INTERNAL,
> > + * the timeout parameter is added.
> > + */
> > +#define AIO_WAIT_WHILE_TIMEOUT(ctx, cond, timeout) ({                    \
> > +    int ret_ = 0;                                                        \
> > +    AioWait *wait_ = &global_aio_wait;                                   \
> > +    AioContext *ctx_ = (ctx);                                            \
> > +    int64_t start_ = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);             \
> > +    int64_t deadline_ = start_ + (timeout);                              \
> > +    /* Ensure that the aio_poll exits periodically to check timeout. */  \
> > +    AioWaitTimer *s_ = g_malloc0(sizeof(AioWaitTimer));                  \
> 
> Can this be allocated on the stack instead of the heap?

Yes, it's certainly better.

> 
> > +    s_->interval = AIO_WAIT_INTERVAL;                                    \
> > +    /* Increment wait_->num_waiters before evaluating cond. */           \
> > +    qatomic_inc(&wait_->num_waiters);                                    \
> > +    /* Paired with smp_mb in aio_wait_kick(). */                         \
> > +    smp_mb__after_rmw();                                                 \
> > +    if (ctx_ && in_aio_context_home_thread(ctx_)) {                      \
> > +        s_->timer = aio_timer_new(ctx_, QEMU_CLOCK_REALTIME,             \
> > +                        SCALE_MS, aio_wait_timer_retry, s_);             \
> > +        aio_wait_timer_retry(s_);                                        \
> > +        while ((cond)) {                                                 \
> > +            aio_poll(ctx_, true);                                        \
> > +            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
> > +                ret_ = -ETIMEDOUT;                                       \
> > +                break;                                                   \
> > +            }                                                            \
> > +        }                                                                \
> 
> What is the purpose of interval?
> 
> I expected the timer's callback function to be an empty function that is
> called when the deadline expires. The while loop here would use
> timer_pending() to check for expiry instead of explicitly checking
> against the deadline.

This was really a good idea; it resolved some of my doubts and made everything
look better.

> 
> > +    } else {                                                             \
> > +        s_->timer = aio_timer_new(qemu_get_aio_context(),                \
> > +            QEMU_CLOCK_REALTIME, SCALE_MS, aio_wait_timer_retry, s_);    \
> > +        aio_wait_timer_retry(s_);                                        \
> > +        while ((cond)) {                                                 \
> > +            assert(qemu_get_current_aio_context() ==                     \
> > +                   qemu_get_aio_context());                              \
> > +            aio_poll(qemu_get_aio_context(), true);                      \
> > +            if (qemu_clock_get_ms(QEMU_CLOCK_REALTIME) > deadline_) {    \
> > +                ret_ = -ETIMEDOUT;                                       \
> > +                break;                                                   \
> > +            }                                                            \
> > +        }                                                                \
> > +    }                                                                    \
> > +    qatomic_dec(&wait_->num_waiters);                                    \
> > +    timer_free(s_->timer);                                               \
> 
> This will need to be timer_del() when the QEMUTimer is moved onto the
> stack.
> 
> > +    g_free(s_);                                                          \
> > +    ret_; })
> > +
> >  #define AIO_WAIT_WHILE(ctx, cond)                                  \
> >      AIO_WAIT_WHILE_INTERNAL(ctx, cond)
> >  
> > @@ -149,4 +205,6 @@ static inline bool in_aio_context_home_thread(AioContext *ctx)
> >      }
> >  }
> >  
> > +void aio_wait_timer_retry(void *opaque);
> > +
> >  #endif /* QEMU_AIO_WAIT_H */
> > diff --git a/include/block/block-io.h b/include/block/block-io.h
> > index b49e0537dd..84f92d2b09 100644
> > --- a/include/block/block-io.h
> > +++ b/include/block/block-io.h
> > @@ -354,6 +354,11 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
> >      AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),              \
> >                     cond); })
> >  
> > +#define BDRV_POLL_WHILE_TIMEOUT(bs, cond, timeout) ({      \
> > +    BlockDriverState *bs_ = (bs);                          \
> > +    AIO_WAIT_WHILE_TIMEOUT(bdrv_get_aio_context(bs_),      \
> > +                           cond, timeout); })
> > +
> >  void bdrv_drain(BlockDriverState *bs);
> >  
> >  int co_wrapper_mixed_bdrv_rdlock
> > @@ -431,6 +436,8 @@ bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
> >   */
> >  void bdrv_drained_begin(BlockDriverState *bs);
> >  
> > +int bdrv_drained_begin_timeout(BlockDriverState *bs, int64_t timeout_ms);
> > +
> >  /**
> >   * bdrv_do_drained_begin_quiesce:
> >   *
> > diff --git a/util/aio-wait.c b/util/aio-wait.c
> > index b5336cf5fd..9aed165529 100644
> > --- a/util/aio-wait.c
> > +++ b/util/aio-wait.c
> > @@ -84,3 +84,10 @@ void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
> >      aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
> >      AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
> >  }
> > +
> > +void aio_wait_timer_retry(void *opaque)
> > +{
> > +    AioWaitTimer *s = opaque;
> > +
> > +    timer_mod(s->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + s->interval);
> > +}
> > -- 
> > 2.33.0
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long
  2025-03-13  4:25   ` Stefan Hajnoczi
@ 2025-03-17 12:59     ` zoudongjie via
  0 siblings, 0 replies; 11+ messages in thread
From: zoudongjie via @ 2025-03-17 12:59 UTC (permalink / raw)
  To: stefanha
  Cc: fam, hreitz, kwolf, qemu-block, qemu-devel, qemu-stable,
	zhuyangyang14, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, zoudongjie,
	chenjianfei3, renxuming

On Thu, 13 Mar, 2025 at 12:25:08 +0800, Stefan Hajnoczi wrote:
> On Sat, Mar 08, 2025 at 06:16:18PM +0800, zoudongjie wrote:
> > From: Zhu Yangyang <zhuyangyang14@huawei.com>
> > 
> > bdrv_drained_begin() is blocked for a long time when network storage is used
> > and the network link has just failed.
> > Therefore, the timeout period is set here.
> > 
> > Signed-off-by: Zhu Yangyang <zhuyangyang14@huawei.com>
> > ---
> >  block/block-backend.c                       | 14 +++++++++++++-
> >  block/qapi-system.c                         |  7 ++++++-
> >  include/system/block-backend-global-state.h |  1 +
> >  3 files changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index 9288f7e1c6..409d4559c3 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -2691,20 +2691,32 @@ void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg)
> >  }
> >  
> >  void blk_io_limits_disable(BlockBackend *blk)
> > +{
> > +    blk_io_limits_disable_timeout(blk, -1);
> > +}
> > +
> > +int blk_io_limits_disable_timeout(BlockBackend *blk, int64_t timeout_ms)
> >  {
> >      BlockDriverState *bs = blk_bs(blk);
> >      ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
> > +    int ret = 0;
> > +
> >      assert(tgm->throttle_state);
> >      GLOBAL_STATE_CODE();
> >      if (bs) {
> >          bdrv_ref(bs);
> > -        bdrv_drained_begin(bs);
> > +        ret = bdrv_drained_begin_timeout(bs, timeout_ms);
> > +        if (ret < 0) {
> > +            goto out;
> > +        }
> >      }
> >      throttle_group_unregister_tgm(tgm);
> > +out:
> >      if (bs) {
> >          bdrv_drained_end(bs);
> >          bdrv_unref(bs);
> >      }
> > +    return ret;
> >  }
> >  
> >  /* should be called before blk_set_io_limits if a limit is set */
> > diff --git a/block/qapi-system.c b/block/qapi-system.c
> > index 54b7409b2b..96fa8e5e51 100644
> > --- a/block/qapi-system.c
> > +++ b/block/qapi-system.c
> > @@ -39,6 +39,8 @@
> >  #include "system/block-backend.h"
> >  #include "system/blockdev.h"
> >  
> > +#define QMP_BLOCKING_TIMEOUT 5000 /* ms */
> > +
> >  static BlockBackend *qmp_get_blk(const char *blk_name, const char *qdev_id,
> >                                   Error **errp)
> >  {
> > @@ -502,7 +504,10 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
> >          blk_set_io_limits(blk, &cfg);
> >      } else if (blk_get_public(blk)->throttle_group_member.throttle_state) {
> >          /* If all throttling settings are set to 0, disable I/O limits */
> > -        blk_io_limits_disable(blk);
> > +        if (blk_io_limits_disable_timeout(blk, QMP_BLOCKING_TIMEOUT) < 0) {
> 
> Please add a timeout parameter to the QMP command instead of hardcoding
> the timeout. It should also support an infinite timeout and that should
> be the default (so that existing users aren't surprised with a new error
> they don't handle).

Okay, I'm going to modify it in patch v2.

> 
> > +            error_setg(errp, "Blk io limits disable timeout");
> > +            return;
> > +        }
> >      }
> >  }
> >  
> > diff --git a/include/system/block-backend-global-state.h b/include/system/block-backend-global-state.h
> > index 9cc9b008ec..e55ea9dc64 100644
> > --- a/include/system/block-backend-global-state.h
> > +++ b/include/system/block-backend-global-state.h
> > @@ -111,6 +111,7 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
> >  
> >  void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg);
> >  void blk_io_limits_disable(BlockBackend *blk);
> > +int blk_io_limits_disable_timeout(BlockBackend *blk, int64_t timeout_ms);
> >  void blk_io_limits_enable(BlockBackend *blk, const char *group);
> >  void blk_io_limits_update_group(BlockBackend *blk, const char *group);
> >  void blk_set_force_allow_inactivate(BlockBackend *blk);
> > -- 
> > 2.33.0
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism.
  2025-03-17 12:18     ` zoudongjie via
@ 2025-03-17 14:57       ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-03-17 14:57 UTC (permalink / raw)
  To: zoudongjie
  Cc: fam, hreitz, kwolf, qemu-block, qemu-devel, qemu-stable,
	zhuyangyang14, luolongmin, suxiaodong1, wangyan122, yebiaoxiang,
	wangjian161, mujinsheng, alex.chen, eric.fangyi, chenjianfei3,
	renxuming

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

On Mon, Mar 17, 2025 at 08:18:28PM +0800, zoudongjie wrote:
> On Thu, 13 Mar, 2025 at 12:09:45 +0800, Stefan Hajnoczi wrote:
> > On Sat, Mar 08, 2025 at 06:16:17PM +0800, zoudongjie wrote:
> > > @@ -342,16 +350,25 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> > >      /* If we are resumed from some other event (such as an aio completion or a
> > >       * timer callback), it is a bug in the caller that should be fixed. */
> > >      assert(data.done);
> > > +    return data.ret;
> > >  }
> > >  
> > > -static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
> > > -                                  bool poll)
> > > +static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
> > > +                                                bool begin,
> > > +                                                BdrvChild *parent,
> > > +                                                bool poll)
> > > +{
> > > +    bdrv_co_yield_to_drain_timeout(bs, begin, parent, poll, -1);
> > 
> > Is this safe on 32-bit platforms?
> 
> I'm sorry, can it be more specific here, I didn't get it.

I was thinking about -1 vs -1ull integer literals, but it's not a
problem for int64_t so everything is fine here.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-03-17 14:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08 10:16 [PATCH 0/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
2025-03-08 10:16 ` [PATCH 1/2] io/block: Refactoring the bdrv_drained_begin() function and implement a timeout mechanism zoudongjie via
2025-03-13  4:09   ` Stefan Hajnoczi
2025-03-17 12:18     ` zoudongjie via
2025-03-17 14:57       ` Stefan Hajnoczi
2025-03-13  4:22   ` Stefan Hajnoczi
2025-03-08 10:16 ` [PATCH 2/2] qapi: Fix qmp_block_set_io_throttle blocked for too long zoudongjie via
2025-03-13  4:25   ` Stefan Hajnoczi
2025-03-17 12:59     ` zoudongjie via
2025-03-11  3:24 ` [PATCH 0/2] " zoudongjie via
2025-03-13  4:27 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).