* [PATCH] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request
@ 2025-11-14 2:11 luzhipeng
2025-11-14 2:11 ` [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete luzhipeng
0 siblings, 1 reply; 6+ messages in thread
From: luzhipeng @ 2025-11-14 2:11 UTC (permalink / raw)
To: qemu-block; +Cc: Alberto Garcia, Kevin Wolf, Hanna Reitz, qemu-devel, luzhipeng
A race condition exists between throttle_group_restart_queue() and
schedule_next_request(): when multiple ThrottleGroupMembers in the same
throttle group are assigned to different IOThreads, concurrent execution
can cause schedule_next_request() to re-arm a throttle timer while
throttle_group_restart_queue() is being called (e.g., from a timer
callback or external restart). This violates the assumption that no
timer is pending upon entry to throttle_group_restart_queue(), triggering
an assertion failure and causing QEMU to abort.
This patch replaces the assert with a single early-return check:
if the timer for the given direction is already pending, the function
returns immediately. This prevents duplicate coroutine scheduling and
avoids crashes under race conditions, without altering the core
(non-thread-safe) throttle group logic.
For details, see: https://gitlab.com/qemu-project/qemu/-/issues/3194
Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
---
block/throttle-groups.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 66fdce9a90..9dcc6b4923 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -430,15 +430,14 @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
ThrottleDirection direction)
{
Coroutine *co;
+ if (timer_pending(tgm->throttle_timers.timers[direction])) {
+ return;
+ }
RestartData *rd = g_new0(RestartData, 1);
rd->tgm = tgm;
rd->direction = direction;
- /* This function is called when a timer is fired or when
- * throttle_group_restart_tgm() is called. Either way, there can
- * be no timer pending on this tgm at this point */
- assert(!timer_pending(tgm->throttle_timers.timers[direction]));
qatomic_inc(&tgm->restart_pending);
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete
2025-11-14 2:11 [PATCH] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request luzhipeng
@ 2025-11-14 2:11 ` luzhipeng
0 siblings, 0 replies; 6+ messages in thread
From: luzhipeng @ 2025-11-14 2:11 UTC (permalink / raw)
To: qemu-block; +Cc: Alberto Garcia, Kevin Wolf, Hanna Reitz, qemu-devel, luzhipeng
When mirroring data blocks, detect if the read data consists entirely of
zeros. If so, use blk_co_pwrite_zeroes() instead of regular write to
improve performance.
Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
---
block/mirror.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index b344182c74..535112f65d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -269,6 +269,33 @@ static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
return;
}
+ /* Check if the read data is all zeros */
+ bool is_zero = true;
+ for (int i = 0; i < op->qiov.niov; i++) {
+ if (!buffer_is_zero(op->qiov.iov[i].iov_base,
+ op->qiov.iov[i].iov_len)) {
+ is_zero = false;
+ break;
+ }
+ }
+
+ /* Write to target - optimized path for zero blocks */
+ if (is_zero) {
+ /*
+ * Use zero-writing interface which may:
+ * 1. Avoid actual data transfer
+ * 2. Enable storage-level optimizations
+ * 3. Potentially unmap blocks (if supported)
+ */
+ ret = blk_co_pwrite_zeroes(s->target, op->offset,
+ op->qiov.size,
+ BDRV_REQ_MAY_UNMAP);
+ } else {
+ /* Normal data write path */
+ ret = blk_co_pwritev(s->target, op->offset,
+ op->qiov.size, &op->qiov, 0);
+ }
+
ret = blk_co_pwritev(s->target, op->offset, op->qiov.size, &op->qiov, 0);
mirror_write_complete(op, ret);
}
--
2.45.1.windows.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete
@ 2025-09-22 11:36 luzhipeng
2025-09-22 11:36 ` luzhipeng
0 siblings, 1 reply; 6+ messages in thread
From: luzhipeng @ 2025-09-22 11:36 UTC (permalink / raw)
To: qemu-block; +Cc: John Snow, Kevin Wolf, Hanna Reitz, qemu-devel, luzhipeng
When mirroring data blocks, detect if the read data consists entirely of
zeros. If so, use blk_co_pwrite_zeroes() instead of regular write to
improve performance.
Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
---
block/mirror.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index b344182c74..535112f65d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -269,6 +269,33 @@ static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
return;
}
+ /* Check if the read data is all zeros */
+ bool is_zero = true;
+ for (int i = 0; i < op->qiov.niov; i++) {
+ if (!buffer_is_zero(op->qiov.iov[i].iov_base,
+ op->qiov.iov[i].iov_len)) {
+ is_zero = false;
+ break;
+ }
+ }
+
+ /* Write to target - optimized path for zero blocks */
+ if (is_zero) {
+ /*
+ * Use zero-writing interface which may:
+ * 1. Avoid actual data transfer
+ * 2. Enable storage-level optimizations
+ * 3. Potentially unmap blocks (if supported)
+ */
+ ret = blk_co_pwrite_zeroes(s->target, op->offset,
+ op->qiov.size,
+ BDRV_REQ_MAY_UNMAP);
+ } else {
+ /* Normal data write path */
+ ret = blk_co_pwritev(s->target, op->offset,
+ op->qiov.size, &op->qiov, 0);
+ }
+
ret = blk_co_pwritev(s->target, op->offset, op->qiov.size, &op->qiov, 0);
mirror_write_complete(op, ret);
}
--
2.45.1.windows.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete
2025-09-22 11:36 luzhipeng
@ 2025-09-22 11:36 ` luzhipeng
2025-09-22 12:53 ` Eric Blake
0 siblings, 1 reply; 6+ messages in thread
From: luzhipeng @ 2025-09-22 11:36 UTC (permalink / raw)
To: qemu-block; +Cc: John Snow, Kevin Wolf, Hanna Reitz, qemu-devel, luzhipeng
When mirroring data blocks, detect if the read data consists entirely of
zeros. If so, use blk_co_pwrite_zeroes() instead of regular write to
improve performance.
Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
---
block/mirror.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index b344182c74..535112f65d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -269,6 +269,33 @@ static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
return;
}
+ /* Check if the read data is all zeros */
+ bool is_zero = true;
+ for (int i = 0; i < op->qiov.niov; i++) {
+ if (!buffer_is_zero(op->qiov.iov[i].iov_base,
+ op->qiov.iov[i].iov_len)) {
+ is_zero = false;
+ break;
+ }
+ }
+
+ /* Write to target - optimized path for zero blocks */
+ if (is_zero) {
+ /*
+ * Use zero-writing interface which may:
+ * 1. Avoid actual data transfer
+ * 2. Enable storage-level optimizations
+ * 3. Potentially unmap blocks (if supported)
+ */
+ ret = blk_co_pwrite_zeroes(s->target, op->offset,
+ op->qiov.size,
+ BDRV_REQ_MAY_UNMAP);
+ } else {
+ /* Normal data write path */
+ ret = blk_co_pwritev(s->target, op->offset,
+ op->qiov.size, &op->qiov, 0);
+ }
+
ret = blk_co_pwritev(s->target, op->offset, op->qiov.size, &op->qiov, 0);
mirror_write_complete(op, ret);
}
--
2.45.1.windows.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete
2025-09-22 11:36 ` luzhipeng
@ 2025-09-22 12:53 ` Eric Blake
2025-11-17 17:35 ` Kevin Wolf
0 siblings, 1 reply; 6+ messages in thread
From: Eric Blake @ 2025-09-22 12:53 UTC (permalink / raw)
To: luzhipeng; +Cc: qemu-block, John Snow, Kevin Wolf, Hanna Reitz, qemu-devel
On Mon, Sep 22, 2025 at 07:36:55PM +0800, luzhipeng wrote:
> When mirroring data blocks, detect if the read data consists entirely of
> zeros. If so, use blk_co_pwrite_zeroes() instead of regular write to
> improve performance.
>
> Signed-off-by: luzhipeng <luzhipeng@cestc.cn>
> ---
> block/mirror.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
First, some minor observations:
>
> diff --git a/block/mirror.c b/block/mirror.c
> index b344182c74..535112f65d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -269,6 +269,33 @@ static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
> return;
> }
>
> + /* Check if the read data is all zeros */
> + bool is_zero = true;
> + for (int i = 0; i < op->qiov.niov; i++) {
> + if (!buffer_is_zero(op->qiov.iov[i].iov_base,
> + op->qiov.iov[i].iov_len)) {
Indentation looks off here.
> + is_zero = false;
> + break;
> + }
> + }
> +
> + /* Write to target - optimized path for zero blocks */
> + if (is_zero) {
> + /*
> + * Use zero-writing interface which may:
> + * 1. Avoid actual data transfer
> + * 2. Enable storage-level optimizations
> + * 3. Potentially unmap blocks (if supported)
> + */
> + ret = blk_co_pwrite_zeroes(s->target, op->offset,
> + op->qiov.size,
> + BDRV_REQ_MAY_UNMAP);
...and here.
> + } else {
> + /* Normal data write path */
> + ret = blk_co_pwritev(s->target, op->offset,
> + op->qiov.size, &op->qiov, 0);
...here too.
> + }
> +
> ret = blk_co_pwritev(s->target, op->offset, op->qiov.size, &op->qiov, 0);
> mirror_write_complete(op, ret);
> }
Then a higher-level question. Should we teach blk_co_pwritev() to
have a flag that ANY caller can set to request write-zero
optimizations, rather than having to open-code a check and call to
blk_co_pwrite_zeroes() at every call-site that might benefit?
Optimizing to a write zero is often nice, but using BDRV_REQ_MAY_UNMAP
may break use cases that have explicitly requested full allocation.
The more we can consolidate all of the decisions about whether or not
to use write zeroes, and whether or not to allow unmap in that
attempt, into a single common function rather than duplicated out at
every call site that might benefit, the easier it will be to maintain
the code down the road.
Thus, I'm wondering if it might be better to introduce a new BDRV_REQ_
flag for passing to blk_co_pwritev for deciding to trigger potential
write-zero optimizations.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete
2025-09-22 12:53 ` Eric Blake
@ 2025-11-17 17:35 ` Kevin Wolf
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2025-11-17 17:35 UTC (permalink / raw)
To: Eric Blake; +Cc: luzhipeng, qemu-block, John Snow, Hanna Reitz, qemu-devel
Am 22.09.2025 um 14:53 hat Eric Blake geschrieben:
> Then a higher-level question. Should we teach blk_co_pwritev() to
> have a flag that ANY caller can set to request write-zero
> optimizations, rather than having to open-code a check and call to
> blk_co_pwrite_zeroes() at every call-site that might benefit?
>
> Optimizing to a write zero is often nice, but using BDRV_REQ_MAY_UNMAP
> may break use cases that have explicitly requested full allocation.
> The more we can consolidate all of the decisions about whether or not
> to use write zeroes, and whether or not to allow unmap in that
> attempt, into a single common function rather than duplicated out at
> every call site that might benefit, the easier it will be to maintain
> the code down the road.
>
> Thus, I'm wondering if it might be better to introduce a new BDRV_REQ_
> flag for passing to blk_co_pwritev for deciding to trigger potential
> write-zero optimizations.
We already have the detect-zeroes=on|off|unmap option, which is used in
bdrv_aligned_pwritev() to enable this kind of optimisation. I think we
should reuse this code in some way. We also should configure the mirror
job in a similar way, adding a new detect-zeroes=on|off|unmap option to
blockdev-mirror (and off will stay the default).
One option is, as you say, that we introduce BDRV_REQ_* flags that could
take effect even when the BlockDriverState isn't configured to detect
zeroes. For a three-way option, we'd need a combination of two flags,
though, or a two-bit field. I'm not sure how much I like this.
The other option is that you just set detect-zeroes on the target node
that mirror points to. What makes this a bit harder is that mirror only
owns the BlockBackend, but no node on the target side (on the source
side, we have the mirror_top filter), and suddenly inserting new nodes
isn't great either. So this is probably something that mirror can't do
for you, but you as the user would have to configure the target node
explicitly this way.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-17 17:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 2:11 [PATCH] block: add single-check guard in throttle_group_restart_queue to address race with schedule_next_request luzhipeng
2025-11-14 2:11 ` [PATCH] mirror: Optimize mirroring for zero blocks in mirror_read_complete luzhipeng
-- strict thread matches above, loose matches on Subject: below --
2025-09-22 11:36 luzhipeng
2025-09-22 11:36 ` luzhipeng
2025-09-22 12:53 ` Eric Blake
2025-11-17 17:35 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).