* [PATCH] mirror: Fix missed dirty bitmap writes during startup
@ 2026-02-19 20:24 Kevin Wolf
2026-02-20 14:00 ` Fiona Ebner
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Kevin Wolf @ 2026-02-19 20:24 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, hreitz, f.ebner, vsementsov, jsnow, jean-louis,
dionbosschieter, qemu-devel, qemu-stable
Currently, mirror disables the block layer's dirty bitmap before its own
replacement is working. This means that during startup, there is a
window in which the allocation status of blocks in the source has
already been checked, but new writes coming in aren't tracked yet,
resulting in a corrupted copy:
1. Dirty bitmap is disabled in mirror_start_job()
2. Some request are started in mirror_top_bs while s->job == NULL
3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because
the request hasn't completed yet, the block isn't allocated
4. The request completes, still sees s->job == NULL and skips the
bitmap, and nothing else will mark it dirty either
One ingredient is that mirror_top_opaque->job is only set after the
job is fully initialized. For the rationale, see commit 32125b1460
("mirror: Fix access of uninitialised fields during start").
Fix this by giving mirror_top_bs access to dirty_bitmap and enabling it
to track writes from the beginning. Disabling the block layer's tracking
and enabling the mirror_top_bs one happens in a drained section, so
there is no danger of races with in-flight requests any more. All of
this happens well before the block allocation status is checked, so we
can be sure that no writes will be missed.
Cc: qemu-stable@nongnu.org
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273
Fixes: 32125b14606a ('mirror: Fix access of uninitialised fields during start')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
Supersedes: <20260212120411.369498-1-f.ebner@proxmox.com>
---
block/mirror.c | 48 +++++++++++++++++++++++++++++-------------------
1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index b344182c747..f38636e7457 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -99,6 +99,7 @@ typedef struct MirrorBlockJob {
typedef struct MirrorBDSOpaque {
MirrorBlockJob *job;
+ BdrvDirtyBitmap *dirty_bitmap;
bool stop;
bool is_commit;
} MirrorBDSOpaque;
@@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method,
abort();
}
- if (!copy_to_target && s->job && s->job->dirty_bitmap) {
+ if (!copy_to_target) {
qatomic_set(&s->job->actively_synced, false);
- bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
+ bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes);
}
if (ret < 0) {
@@ -1901,13 +1902,35 @@ static BlockJob *mirror_start_job(
bdrv_drained_begin(bs);
ret = bdrv_append(mirror_top_bs, bs, errp);
- bdrv_drained_end(bs);
-
if (ret < 0) {
+ bdrv_drained_end(bs);
+ bdrv_unref(mirror_top_bs);
+ return NULL;
+ }
+
+ bs_opaque->dirty_bitmap = bdrv_create_dirty_bitmap(mirror_top_bs,
+ granularity,
+ NULL, errp);
+ if (!bs_opaque->dirty_bitmap) {
+ bdrv_drained_end(bs);
bdrv_unref(mirror_top_bs);
return NULL;
}
+ /*
+ * The mirror job doesn't use the block layer's dirty tracking because it
+ * needs to be able to switch seemlessly between background copy mode (which
+ * does need dirty tracking) and write blocking mode (which doesn't) and
+ * doing that would require draining the node. Instead, mirror_top_bs takes
+ * care of updating the dirty bitmap as appropriate.
+ *
+ * Note that write blocking mode only becomes effective after mirror_run()
+ * sets mirror_top_opaque->job (see should_copy_to_target()). Until then,
+ * we're still in background copy mode irrespective of @copy_mode.
+ */
+ bdrv_disable_dirty_bitmap(bs_opaque->dirty_bitmap);
+ bdrv_drained_end(bs);
+
/* Make sure that the source is not resized while the job is running */
s = block_job_create(job_id, driver, NULL, mirror_top_bs,
BLK_PERM_CONSISTENT_READ,
@@ -2002,24 +2025,13 @@ static BlockJob *mirror_start_job(
s->base_overlay = bdrv_find_overlay(bs, base);
s->granularity = granularity;
s->buf_size = ROUND_UP(buf_size, granularity);
+ s->dirty_bitmap = bs_opaque->dirty_bitmap;
s->unmap = unmap;
if (auto_complete) {
s->should_complete = true;
}
bdrv_graph_rdunlock_main_loop();
- s->dirty_bitmap = bdrv_create_dirty_bitmap(s->mirror_top_bs, granularity,
- NULL, errp);
- if (!s->dirty_bitmap) {
- goto fail;
- }
-
- /*
- * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active
- * mode.
- */
- bdrv_disable_dirty_bitmap(s->dirty_bitmap);
-
bdrv_graph_wrlock_drained();
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
@@ -2099,9 +2111,6 @@ fail:
g_free(s->replaces);
blk_unref(s->target);
bs_opaque->job = NULL;
- if (s->dirty_bitmap) {
- bdrv_release_dirty_bitmap(s->dirty_bitmap);
- }
job_early_fail(&s->common.job);
}
@@ -2115,6 +2124,7 @@ fail:
bdrv_graph_wrunlock();
bdrv_drained_end(bs);
+ bdrv_release_dirty_bitmap(bs_opaque->dirty_bitmap);
bdrv_unref(mirror_top_bs);
return NULL;
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-19 20:24 [PATCH] mirror: Fix missed dirty bitmap writes during startup Kevin Wolf @ 2026-02-20 14:00 ` Fiona Ebner 2026-02-24 13:58 ` Kevin Wolf 2026-03-05 18:34 ` Vladimir Sementsov-Ogievskiy 2026-03-08 8:25 ` Michael Tokarev 2 siblings, 1 reply; 15+ messages in thread From: Fiona Ebner @ 2026-02-20 14:00 UTC (permalink / raw) To: Kevin Wolf, qemu-block Cc: hreitz, vsementsov, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable Am 19.02.26 um 9:25 PM schrieb Kevin Wolf: > @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, > abort(); > } > > - if (!copy_to_target && s->job && s->job->dirty_bitmap) { > + if (!copy_to_target) { > qatomic_set(&s->job->actively_synced, false); Here, we must check that s->job is set for the qatomic_set(). Other than that, the patch looks good to me, thanks! > - bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes); > + bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); > } > > if (ret < 0) { ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-20 14:00 ` Fiona Ebner @ 2026-02-24 13:58 ` Kevin Wolf 2026-02-24 14:06 ` Fiona Ebner 2026-02-25 12:32 ` Jean-Louis Dupond 0 siblings, 2 replies; 15+ messages in thread From: Kevin Wolf @ 2026-02-24 13:58 UTC (permalink / raw) To: Fiona Ebner Cc: qemu-block, hreitz, vsementsov, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable Am 20.02.2026 um 15:00 hat Fiona Ebner geschrieben: > Am 19.02.26 um 9:25 PM schrieb Kevin Wolf: > > @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, > > abort(); > > } > > > > - if (!copy_to_target && s->job && s->job->dirty_bitmap) { > > + if (!copy_to_target) { > > qatomic_set(&s->job->actively_synced, false); > > Here, we must check that s->job is set for the qatomic_set(). > > Other than that, the patch looks good to me, thanks! Thanks, good catch. I'm squashing in this change: diff --git a/block/mirror.c b/block/mirror.c index abdffc6de86..fa1d975eb9f 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1677,7 +1677,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, } if (!copy_to_target) { - qatomic_set(&s->job->actively_synced, false); + if (s->job) { + qatomic_set(&s->job->actively_synced, false); + } bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); } Can I add your R-b with this? Jean-Louis, do you want to give this patch a test if it still fixes your problem? The approach is a bit different from what we tried initially. Kevin ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-24 13:58 ` Kevin Wolf @ 2026-02-24 14:06 ` Fiona Ebner 2026-02-25 12:32 ` Jean-Louis Dupond 1 sibling, 0 replies; 15+ messages in thread From: Fiona Ebner @ 2026-02-24 14:06 UTC (permalink / raw) To: Kevin Wolf Cc: qemu-block, hreitz, vsementsov, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable Am 24.02.26 um 2:58 PM schrieb Kevin Wolf: > Am 20.02.2026 um 15:00 hat Fiona Ebner geschrieben: >> Am 19.02.26 um 9:25 PM schrieb Kevin Wolf: >>> @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, >>> abort(); >>> } >>> >>> - if (!copy_to_target && s->job && s->job->dirty_bitmap) { >>> + if (!copy_to_target) { >>> qatomic_set(&s->job->actively_synced, false); >> >> Here, we must check that s->job is set for the qatomic_set(). >> >> Other than that, the patch looks good to me, thanks! > > Thanks, good catch. I'm squashing in this change: > > diff --git a/block/mirror.c b/block/mirror.c > index abdffc6de86..fa1d975eb9f 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -1677,7 +1677,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, > } > > if (!copy_to_target) { > - qatomic_set(&s->job->actively_synced, false); > + if (s->job) { > + qatomic_set(&s->job->actively_synced, false); > + } > bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); > } > > Can I add your R-b with this? Yes! > Jean-Louis, do you want to give this patch a test if it still fixes your > problem? The approach is a bit different from what we tried initially. > > Kevin > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-24 13:58 ` Kevin Wolf 2026-02-24 14:06 ` Fiona Ebner @ 2026-02-25 12:32 ` Jean-Louis Dupond 2026-03-02 9:49 ` Jean-Louis Dupond 1 sibling, 1 reply; 15+ messages in thread From: Jean-Louis Dupond @ 2026-02-25 12:32 UTC (permalink / raw) To: Kevin Wolf, Fiona Ebner Cc: qemu-block, hreitz, vsementsov, jsnow, dionbosschieter, qemu-devel, qemu-stable On 24/02/2026 14:58, Kevin Wolf wrote: > Am 20.02.2026 um 15:00 hat Fiona Ebner geschrieben: >> Am 19.02.26 um 9:25 PM schrieb Kevin Wolf: >>> @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, >>> abort(); >>> } >>> >>> - if (!copy_to_target && s->job && s->job->dirty_bitmap) { >>> + if (!copy_to_target) { >>> qatomic_set(&s->job->actively_synced, false); >> Here, we must check that s->job is set for the qatomic_set(). >> >> Other than that, the patch looks good to me, thanks! > Thanks, good catch. I'm squashing in this change: > > diff --git a/block/mirror.c b/block/mirror.c > index abdffc6de86..fa1d975eb9f 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -1677,7 +1677,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, > } > > if (!copy_to_target) { > - qatomic_set(&s->job->actively_synced, false); > + if (s->job) { > + qatomic_set(&s->job->actively_synced, false); > + } > bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); > } > > Can I add your R-b with this? > > Jean-Louis, do you want to give this patch a test if it still fixes your > problem? The approach is a bit different from what we tried initially. We are running our test case for +24hours now without any issue with this patch included. I am currently rebuilding el9 qemu with this patch included to run on our production machines that had this corruption. Let me come back in some days to fully confirm it's also fixed there :) > > Kevin > Thanks for the effort on this! Good to see this nasty issue getting resolved! :) Jean-Louis ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-25 12:32 ` Jean-Louis Dupond @ 2026-03-02 9:49 ` Jean-Louis Dupond 2026-03-02 13:13 ` Kevin Wolf 0 siblings, 1 reply; 15+ messages in thread From: Jean-Louis Dupond @ 2026-03-02 9:49 UTC (permalink / raw) To: Kevin Wolf, Fiona Ebner Cc: qemu-block, hreitz, vsementsov, jsnow, dionbosschieter, qemu-devel, qemu-stable On 25/02/2026 13:32, Jean-Louis Dupond wrote: > > On 24/02/2026 14:58, Kevin Wolf wrote: >> Am 20.02.2026 um 15:00 hat Fiona Ebner geschrieben: >>> Am 19.02.26 um 9:25 PM schrieb Kevin Wolf: >>>> @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState >>>> *bs, MirrorMethod method, >>>> abort(); >>>> } >>>> - if (!copy_to_target && s->job && s->job->dirty_bitmap) { >>>> + if (!copy_to_target) { >>>> qatomic_set(&s->job->actively_synced, false); >>> Here, we must check that s->job is set for the qatomic_set(). >>> >>> Other than that, the patch looks good to me, thanks! >> Thanks, good catch. I'm squashing in this change: >> >> diff --git a/block/mirror.c b/block/mirror.c >> index abdffc6de86..fa1d975eb9f 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -1677,7 +1677,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, >> MirrorMethod method, >> } >> >> if (!copy_to_target) { >> - qatomic_set(&s->job->actively_synced, false); >> + if (s->job) { >> + qatomic_set(&s->job->actively_synced, false); >> + } >> bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); >> } >> >> Can I add your R-b with this? >> >> Jean-Louis, do you want to give this patch a test if it still fixes your >> problem? The approach is a bit different from what we tried initially. > We are running our test case for +24hours now without any issue with > this patch included. > I am currently rebuilding el9 qemu with this patch included to run on > our production machines that had this corruption. > Let me come back in some days to fully confirm it's also fixed there :) Everything stays stable (without corruption), after a lot of servers in backup already with the patch. So therefor: Tested-by: Jean-Louis Dupond <jean-louis@dupond.be> >> >> Kevin >> > Thanks for the effort on this! Good to see this nasty issue getting > resolved! :) > > Jean-Louis ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-02 9:49 ` Jean-Louis Dupond @ 2026-03-02 13:13 ` Kevin Wolf 0 siblings, 0 replies; 15+ messages in thread From: Kevin Wolf @ 2026-03-02 13:13 UTC (permalink / raw) To: Jean-Louis Dupond Cc: Fiona Ebner, qemu-block, hreitz, vsementsov, jsnow, dionbosschieter, qemu-devel, qemu-stable Am 02.03.2026 um 10:49 hat Jean-Louis Dupond geschrieben: > > On 25/02/2026 13:32, Jean-Louis Dupond wrote: > > > > On 24/02/2026 14:58, Kevin Wolf wrote: > > > Am 20.02.2026 um 15:00 hat Fiona Ebner geschrieben: > > > > Am 19.02.26 um 9:25 PM schrieb Kevin Wolf: > > > > > @@ -1672,9 +1673,9 @@ > > > > > bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod > > > > > method, > > > > > abort(); > > > > > } > > > > > - if (!copy_to_target && s->job && s->job->dirty_bitmap) { > > > > > + if (!copy_to_target) { > > > > > qatomic_set(&s->job->actively_synced, false); > > > > Here, we must check that s->job is set for the qatomic_set(). > > > > > > > > Other than that, the patch looks good to me, thanks! > > > Thanks, good catch. I'm squashing in this change: > > > > > > diff --git a/block/mirror.c b/block/mirror.c > > > index abdffc6de86..fa1d975eb9f 100644 > > > --- a/block/mirror.c > > > +++ b/block/mirror.c > > > @@ -1677,7 +1677,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, > > > MirrorMethod method, > > > } > > > > > > if (!copy_to_target) { > > > - qatomic_set(&s->job->actively_synced, false); > > > + if (s->job) { > > > + qatomic_set(&s->job->actively_synced, false); > > > + } > > > bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); > > > } > > > > > > Can I add your R-b with this? > > > > > > Jean-Louis, do you want to give this patch a test if it still fixes your > > > problem? The approach is a bit different from what we tried initially. > > We are running our test case for +24hours now without any issue with > > this patch included. > > I am currently rebuilding el9 qemu with this patch included to run on > > our production machines that had this corruption. > > Let me come back in some days to fully confirm it's also fixed there :) > Everything stays stable (without corruption), after a lot of servers in > backup already with the patch. > So therefor: > > Tested-by: Jean-Louis Dupond <jean-louis@dupond.be> Great, thanks for your testing! Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-19 20:24 [PATCH] mirror: Fix missed dirty bitmap writes during startup Kevin Wolf 2026-02-20 14:00 ` Fiona Ebner @ 2026-03-05 18:34 ` Vladimir Sementsov-Ogievskiy 2026-03-06 9:34 ` Kevin Wolf 2026-03-08 8:25 ` Michael Tokarev 2 siblings, 1 reply; 15+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2026-03-05 18:34 UTC (permalink / raw) To: Kevin Wolf, qemu-block Cc: hreitz, f.ebner, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable On 19.02.26 23:24, Kevin Wolf wrote: > Currently, mirror disables the block layer's dirty bitmap before its own > replacement is working. This means that during startup, there is a > window in which the allocation status of blocks in the source has > already been checked, but new writes coming in aren't tracked yet, > resulting in a corrupted copy: > > 1. Dirty bitmap is disabled in mirror_start_job() > 2. Some request are started in mirror_top_bs while s->job == NULL > 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because > the request hasn't completed yet, the block isn't allocated > 4. The request completes, still sees s->job == NULL and skips the > bitmap, and nothing else will mark it dirty either > > One ingredient is that mirror_top_opaque->job is only set after the > job is fully initialized. For the rationale, see commit 32125b1460 > ("mirror: Fix access of uninitialised fields during start"). > > Fix this by giving mirror_top_bs access to dirty_bitmap and enabling it > to track writes from the beginning. Disabling the block layer's tracking > and enabling the mirror_top_bs one happens in a drained section, so > there is no danger of races with in-flight requests any more. All of > this happens well before the block allocation status is checked, so we > can be sure that no writes will be missed. > > Cc: qemu-stable@nongnu.org > Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273 > Fixes: 32125b14606a ('mirror: Fix access of uninitialised fields during start') > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > Supersedes: <20260212120411.369498-1-f.ebner@proxmox.com> > --- > block/mirror.c | 48 +++++++++++++++++++++++++++++------------------- > 1 file changed, 29 insertions(+), 19 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index b344182c747..f38636e7457 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -99,6 +99,7 @@ typedef struct MirrorBlockJob { > > typedef struct MirrorBDSOpaque { > MirrorBlockJob *job; > + BdrvDirtyBitmap *dirty_bitmap; > bool stop; > bool is_commit; > } MirrorBDSOpaque; > @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, > abort(); > } > > - if (!copy_to_target && s->job && s->job->dirty_bitmap) { > + if (!copy_to_target) { > qatomic_set(&s->job->actively_synced, false); > - bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes); > + bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); > } > > if (ret < 0) { > @@ -1901,13 +1902,35 @@ static BlockJob *mirror_start_job( > > bdrv_drained_begin(bs); > ret = bdrv_append(mirror_top_bs, bs, errp); > - bdrv_drained_end(bs); > - > if (ret < 0) { > + bdrv_drained_end(bs); > + bdrv_unref(mirror_top_bs); > + return NULL; > + } > + > + bs_opaque->dirty_bitmap = bdrv_create_dirty_bitmap(mirror_top_bs, > + granularity, > + NULL, errp); > + if (!bs_opaque->dirty_bitmap) { > + bdrv_drained_end(bs); > bdrv_unref(mirror_top_bs); > return NULL; > } > > + /* > + * The mirror job doesn't use the block layer's dirty tracking because it > + * needs to be able to switch seemlessly between background copy mode (which > + * does need dirty tracking) and write blocking mode (which doesn't) and > + * doing that would require draining the node. Instead, mirror_top_bs takes > + * care of updating the dirty bitmap as appropriate. > + * > + * Note that write blocking mode only becomes effective after mirror_run() > + * sets mirror_top_opaque->job (see should_copy_to_target()). Until then, > + * we're still in background copy mode irrespective of @copy_mode. > + */ > + bdrv_disable_dirty_bitmap(bs_opaque->dirty_bitmap); > + bdrv_drained_end(bs); > + > /* Make sure that the source is not resized while the job is running */ > s = block_job_create(job_id, driver, NULL, mirror_top_bs, > BLK_PERM_CONSISTENT_READ, > @@ -2002,24 +2025,13 @@ static BlockJob *mirror_start_job( > s->base_overlay = bdrv_find_overlay(bs, base); > s->granularity = granularity; > s->buf_size = ROUND_UP(buf_size, granularity); > + s->dirty_bitmap = bs_opaque->dirty_bitmap; > s->unmap = unmap; > if (auto_complete) { > s->should_complete = true; > } > bdrv_graph_rdunlock_main_loop(); > > - s->dirty_bitmap = bdrv_create_dirty_bitmap(s->mirror_top_bs, granularity, > - NULL, errp); > - if (!s->dirty_bitmap) { > - goto fail; > - } > - > - /* > - * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active > - * mode. > - */ > - bdrv_disable_dirty_bitmap(s->dirty_bitmap); > - > bdrv_graph_wrlock_drained(); > ret = block_job_add_bdrv(&s->common, "source", bs, 0, > BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE | > @@ -2099,9 +2111,6 @@ fail: > g_free(s->replaces); > blk_unref(s->target); > bs_opaque->job = NULL; > - if (s->dirty_bitmap) { > - bdrv_release_dirty_bitmap(s->dirty_bitmap); > - } > job_early_fail(&s->common.job); > } > > @@ -2115,6 +2124,7 @@ fail: > bdrv_graph_wrunlock(); > bdrv_drained_end(bs); > > + bdrv_release_dirty_bitmap(bs_opaque->dirty_bitmap); Hmm. Shouldn't we change position of _release_ in mirror_exit_common() too? Now the sequence is: bdrv_release_dirty_bitmap(s->dirty_bitmap); < could mirror_top_bs access dirty_bitmap here, before drained begin? > ... drained begin .. a lot of logic, including actual removing of the mirror_top_bs from the chain .. drained end bdrv_unref(mirror_top_bs) > bdrv_unref(mirror_top_bs); > > return NULL; -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-05 18:34 ` Vladimir Sementsov-Ogievskiy @ 2026-03-06 9:34 ` Kevin Wolf 2026-03-24 14:44 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 15+ messages in thread From: Kevin Wolf @ 2026-03-06 9:34 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: qemu-block, hreitz, f.ebner, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable, pbonzini Am 05.03.2026 um 19:34 hat Vladimir Sementsov-Ogievskiy geschrieben: > On 19.02.26 23:24, Kevin Wolf wrote: > > Currently, mirror disables the block layer's dirty bitmap before its own > > replacement is working. This means that during startup, there is a > > window in which the allocation status of blocks in the source has > > already been checked, but new writes coming in aren't tracked yet, > > resulting in a corrupted copy: > > > > 1. Dirty bitmap is disabled in mirror_start_job() > > 2. Some request are started in mirror_top_bs while s->job == NULL > > 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because > > the request hasn't completed yet, the block isn't allocated > > 4. The request completes, still sees s->job == NULL and skips the > > bitmap, and nothing else will mark it dirty either > > > > One ingredient is that mirror_top_opaque->job is only set after the > > job is fully initialized. For the rationale, see commit 32125b1460 > > ("mirror: Fix access of uninitialised fields during start"). > > > > Fix this by giving mirror_top_bs access to dirty_bitmap and enabling it > > to track writes from the beginning. Disabling the block layer's tracking > > and enabling the mirror_top_bs one happens in a drained section, so > > there is no danger of races with in-flight requests any more. All of > > this happens well before the block allocation status is checked, so we > > can be sure that no writes will be missed. > > > > Cc: qemu-stable@nongnu.org > > Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273 > > Fixes: 32125b14606a ('mirror: Fix access of uninitialised fields during start') > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > Supersedes: <20260212120411.369498-1-f.ebner@proxmox.com> > > --- > > block/mirror.c | 48 +++++++++++++++++++++++++++++------------------- > > 1 file changed, 29 insertions(+), 19 deletions(-) > > > > diff --git a/block/mirror.c b/block/mirror.c > > index b344182c747..f38636e7457 100644 > > --- a/block/mirror.c > > +++ b/block/mirror.c > > @@ -99,6 +99,7 @@ typedef struct MirrorBlockJob { > > typedef struct MirrorBDSOpaque { > > MirrorBlockJob *job; > > + BdrvDirtyBitmap *dirty_bitmap; > > bool stop; > > bool is_commit; > > } MirrorBDSOpaque; > > @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, > > abort(); > > } > > - if (!copy_to_target && s->job && s->job->dirty_bitmap) { > > + if (!copy_to_target) { > > qatomic_set(&s->job->actively_synced, false); > > - bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes); > > + bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); > > } > > if (ret < 0) { > > @@ -1901,13 +1902,35 @@ static BlockJob *mirror_start_job( > > bdrv_drained_begin(bs); > > ret = bdrv_append(mirror_top_bs, bs, errp); > > - bdrv_drained_end(bs); > > - > > if (ret < 0) { > > + bdrv_drained_end(bs); > > + bdrv_unref(mirror_top_bs); > > + return NULL; > > + } > > + > > + bs_opaque->dirty_bitmap = bdrv_create_dirty_bitmap(mirror_top_bs, > > + granularity, > > + NULL, errp); > > + if (!bs_opaque->dirty_bitmap) { > > + bdrv_drained_end(bs); > > bdrv_unref(mirror_top_bs); > > return NULL; > > } > > + /* > > + * The mirror job doesn't use the block layer's dirty tracking because it > > + * needs to be able to switch seemlessly between background copy mode (which > > + * does need dirty tracking) and write blocking mode (which doesn't) and > > + * doing that would require draining the node. Instead, mirror_top_bs takes > > + * care of updating the dirty bitmap as appropriate. > > + * > > + * Note that write blocking mode only becomes effective after mirror_run() > > + * sets mirror_top_opaque->job (see should_copy_to_target()). Until then, > > + * we're still in background copy mode irrespective of @copy_mode. > > + */ > > + bdrv_disable_dirty_bitmap(bs_opaque->dirty_bitmap); > > + bdrv_drained_end(bs); > > + > > /* Make sure that the source is not resized while the job is running */ > > s = block_job_create(job_id, driver, NULL, mirror_top_bs, > > BLK_PERM_CONSISTENT_READ, > > @@ -2002,24 +2025,13 @@ static BlockJob *mirror_start_job( > > s->base_overlay = bdrv_find_overlay(bs, base); > > s->granularity = granularity; > > s->buf_size = ROUND_UP(buf_size, granularity); > > + s->dirty_bitmap = bs_opaque->dirty_bitmap; > > s->unmap = unmap; > > if (auto_complete) { > > s->should_complete = true; > > } > > bdrv_graph_rdunlock_main_loop(); > > - s->dirty_bitmap = bdrv_create_dirty_bitmap(s->mirror_top_bs, granularity, > > - NULL, errp); > > - if (!s->dirty_bitmap) { > > - goto fail; > > - } > > - > > - /* > > - * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active > > - * mode. > > - */ > > - bdrv_disable_dirty_bitmap(s->dirty_bitmap); > > - > > bdrv_graph_wrlock_drained(); > > ret = block_job_add_bdrv(&s->common, "source", bs, 0, > > BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE | > > @@ -2099,9 +2111,6 @@ fail: > > g_free(s->replaces); > > blk_unref(s->target); > > bs_opaque->job = NULL; > > - if (s->dirty_bitmap) { > > - bdrv_release_dirty_bitmap(s->dirty_bitmap); > > - } > > job_early_fail(&s->common.job); > > } > > @@ -2115,6 +2124,7 @@ fail: > > bdrv_graph_wrunlock(); > > bdrv_drained_end(bs); > > + bdrv_release_dirty_bitmap(bs_opaque->dirty_bitmap); > > > Hmm. Shouldn't we change position of _release_ in mirror_exit_common() too? > > Now the sequence is: > > bdrv_release_dirty_bitmap(s->dirty_bitmap); > > > < could mirror_top_bs access dirty_bitmap here, before drained begin? > > > ... > > drained begin > > .. a lot of logic, including actual removing of the mirror_top_bs from the chain .. > > drained end > > bdrv_unref(mirror_top_bs) I think you're right, but isn't this already a preexisting bug in master? After releasing, we don't set s->dirty_bitmap = NULL, which could have prevented the access in the code before this patch. So this should probably be a separate patch. mirror_exit_common() runs in the main loop, so I assume you can hit this when using an iothread. It seems that initially the release was later, but commit 2119882 moved it earlier, without saying why it did that. Paolo, do you remember? Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-06 9:34 ` Kevin Wolf @ 2026-03-24 14:44 ` Vladimir Sementsov-Ogievskiy 2026-03-25 10:13 ` Fiona Ebner 0 siblings, 1 reply; 15+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2026-03-24 14:44 UTC (permalink / raw) To: Kevin Wolf Cc: qemu-block, hreitz, f.ebner, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable, pbonzini On 06.03.26 12:34, Kevin Wolf wrote: > Am 05.03.2026 um 19:34 hat Vladimir Sementsov-Ogievskiy geschrieben: >> On 19.02.26 23:24, Kevin Wolf wrote: >>> Currently, mirror disables the block layer's dirty bitmap before its own >>> replacement is working. This means that during startup, there is a >>> window in which the allocation status of blocks in the source has >>> already been checked, but new writes coming in aren't tracked yet, >>> resulting in a corrupted copy: >>> >>> 1. Dirty bitmap is disabled in mirror_start_job() >>> 2. Some request are started in mirror_top_bs while s->job == NULL >>> 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because >>> the request hasn't completed yet, the block isn't allocated >>> 4. The request completes, still sees s->job == NULL and skips the >>> bitmap, and nothing else will mark it dirty either >>> >>> One ingredient is that mirror_top_opaque->job is only set after the >>> job is fully initialized. For the rationale, see commit 32125b1460 >>> ("mirror: Fix access of uninitialised fields during start"). >>> >>> Fix this by giving mirror_top_bs access to dirty_bitmap and enabling it >>> to track writes from the beginning. Disabling the block layer's tracking >>> and enabling the mirror_top_bs one happens in a drained section, so >>> there is no danger of races with in-flight requests any more. All of >>> this happens well before the block allocation status is checked, so we >>> can be sure that no writes will be missed. >>> >>> Cc: qemu-stable@nongnu.org >>> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273 >>> Fixes: 32125b14606a ('mirror: Fix access of uninitialised fields during start') >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com> >>> --- >>> Supersedes: <20260212120411.369498-1-f.ebner@proxmox.com> >>> --- >>> block/mirror.c | 48 +++++++++++++++++++++++++++++------------------- >>> 1 file changed, 29 insertions(+), 19 deletions(-) >>> >>> diff --git a/block/mirror.c b/block/mirror.c >>> index b344182c747..f38636e7457 100644 >>> --- a/block/mirror.c >>> +++ b/block/mirror.c >>> @@ -99,6 +99,7 @@ typedef struct MirrorBlockJob { >>> typedef struct MirrorBDSOpaque { >>> MirrorBlockJob *job; >>> + BdrvDirtyBitmap *dirty_bitmap; >>> bool stop; >>> bool is_commit; >>> } MirrorBDSOpaque; >>> @@ -1672,9 +1673,9 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method, >>> abort(); >>> } >>> - if (!copy_to_target && s->job && s->job->dirty_bitmap) { >>> + if (!copy_to_target) { >>> qatomic_set(&s->job->actively_synced, false); >>> - bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes); >>> + bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, bytes); >>> } >>> if (ret < 0) { >>> @@ -1901,13 +1902,35 @@ static BlockJob *mirror_start_job( >>> bdrv_drained_begin(bs); >>> ret = bdrv_append(mirror_top_bs, bs, errp); >>> - bdrv_drained_end(bs); >>> - >>> if (ret < 0) { >>> + bdrv_drained_end(bs); >>> + bdrv_unref(mirror_top_bs); >>> + return NULL; >>> + } >>> + >>> + bs_opaque->dirty_bitmap = bdrv_create_dirty_bitmap(mirror_top_bs, >>> + granularity, >>> + NULL, errp); >>> + if (!bs_opaque->dirty_bitmap) { >>> + bdrv_drained_end(bs); >>> bdrv_unref(mirror_top_bs); >>> return NULL; >>> } >>> + /* >>> + * The mirror job doesn't use the block layer's dirty tracking because it >>> + * needs to be able to switch seemlessly between background copy mode (which >>> + * does need dirty tracking) and write blocking mode (which doesn't) and >>> + * doing that would require draining the node. Instead, mirror_top_bs takes >>> + * care of updating the dirty bitmap as appropriate. >>> + * >>> + * Note that write blocking mode only becomes effective after mirror_run() >>> + * sets mirror_top_opaque->job (see should_copy_to_target()). Until then, >>> + * we're still in background copy mode irrespective of @copy_mode. >>> + */ >>> + bdrv_disable_dirty_bitmap(bs_opaque->dirty_bitmap); >>> + bdrv_drained_end(bs); >>> + >>> /* Make sure that the source is not resized while the job is running */ >>> s = block_job_create(job_id, driver, NULL, mirror_top_bs, >>> BLK_PERM_CONSISTENT_READ, >>> @@ -2002,24 +2025,13 @@ static BlockJob *mirror_start_job( >>> s->base_overlay = bdrv_find_overlay(bs, base); >>> s->granularity = granularity; >>> s->buf_size = ROUND_UP(buf_size, granularity); >>> + s->dirty_bitmap = bs_opaque->dirty_bitmap; >>> s->unmap = unmap; >>> if (auto_complete) { >>> s->should_complete = true; >>> } >>> bdrv_graph_rdunlock_main_loop(); >>> - s->dirty_bitmap = bdrv_create_dirty_bitmap(s->mirror_top_bs, granularity, >>> - NULL, errp); >>> - if (!s->dirty_bitmap) { >>> - goto fail; >>> - } >>> - >>> - /* >>> - * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active >>> - * mode. >>> - */ >>> - bdrv_disable_dirty_bitmap(s->dirty_bitmap); >>> - >>> bdrv_graph_wrlock_drained(); >>> ret = block_job_add_bdrv(&s->common, "source", bs, 0, >>> BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE | >>> @@ -2099,9 +2111,6 @@ fail: >>> g_free(s->replaces); >>> blk_unref(s->target); >>> bs_opaque->job = NULL; >>> - if (s->dirty_bitmap) { >>> - bdrv_release_dirty_bitmap(s->dirty_bitmap); >>> - } >>> job_early_fail(&s->common.job); >>> } >>> @@ -2115,6 +2124,7 @@ fail: >>> bdrv_graph_wrunlock(); >>> bdrv_drained_end(bs); >>> + bdrv_release_dirty_bitmap(bs_opaque->dirty_bitmap); >> >> >> Hmm. Shouldn't we change position of _release_ in mirror_exit_common() too? >> >> Now the sequence is: >> >> bdrv_release_dirty_bitmap(s->dirty_bitmap); >> >> >> < could mirror_top_bs access dirty_bitmap here, before drained begin? > >> >> ... >> >> drained begin >> >> .. a lot of logic, including actual removing of the mirror_top_bs from the chain .. >> >> drained end >> >> bdrv_unref(mirror_top_bs) > > I think you're right, but isn't this already a preexisting bug in > master? After releasing, we don't set s->dirty_bitmap = NULL, which > could have prevented the access in the code before this patch. So this > should probably be a separate patch. > > mirror_exit_common() runs in the main loop, so I assume you can > hit this when using an iothread. > > It seems that initially the release was later, but commit 2119882 moved > it earlier, without saying why it did that. Paolo, do you remember? > > Kevin > On the other hand, in mirror_exit_common(), we are already in bdrv_drained_begin(bs); section, started in mirror_run()... Does bdrv_drained_begin(mirror_top_bs) add something to previous bdrv_drained_begin(bs) ? -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-24 14:44 ` Vladimir Sementsov-Ogievskiy @ 2026-03-25 10:13 ` Fiona Ebner 0 siblings, 0 replies; 15+ messages in thread From: Fiona Ebner @ 2026-03-25 10:13 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Kevin Wolf Cc: qemu-block, hreitz, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable, pbonzini Am 24.03.26 um 3:43 PM schrieb Vladimir Sementsov-Ogievskiy: > On 06.03.26 12:34, Kevin Wolf wrote: >> Am 05.03.2026 um 19:34 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> On 19.02.26 23:24, Kevin Wolf wrote: >>>> @@ -2099,9 +2111,6 @@ fail: >>>> g_free(s->replaces); >>>> blk_unref(s->target); >>>> bs_opaque->job = NULL; >>>> - if (s->dirty_bitmap) { >>>> - bdrv_release_dirty_bitmap(s->dirty_bitmap); >>>> - } >>>> job_early_fail(&s->common.job); >>>> } >>>> @@ -2115,6 +2124,7 @@ fail: >>>> bdrv_graph_wrunlock(); >>>> bdrv_drained_end(bs); >>>> + bdrv_release_dirty_bitmap(bs_opaque->dirty_bitmap); >>> >>> >>> Hmm. Shouldn't we change position of _release_ in >>> mirror_exit_common() too? >>> >>> Now the sequence is: >>> >>> bdrv_release_dirty_bitmap(s->dirty_bitmap); >>> >>> >>> < could mirror_top_bs access dirty_bitmap here, before drained begin? > >>> >>> ... >>> >>> drained begin >>> >>> .. a lot of logic, including actual removing of the mirror_top_bs >>> from the chain .. >>> >>> drained end >>> >>> bdrv_unref(mirror_top_bs) >> >> I think you're right, but isn't this already a preexisting bug in >> master? After releasing, we don't set s->dirty_bitmap = NULL, which >> could have prevented the access in the code before this patch. So this >> should probably be a separate patch. >> >> mirror_exit_common() runs in the main loop, so I assume you can >> hit this when using an iothread. >> >> It seems that initially the release was later, but commit 2119882 moved >> it earlier, without saying why it did that. Paolo, do you remember? >> >> Kevin >> > > On the other hand, in mirror_exit_common(), we are already in > > bdrv_drained_begin(bs); section, started in mirror_run()... > > Does bdrv_drained_begin(mirror_top_bs) add something to previous > bdrv_drained_begin(bs) ? > Looking at the history, it was added with: > commit d2da5e288a2e71e82866c8fdefd41b5727300124 > Author: Kevin Wolf <kwolf@redhat.com> > Date: Mon Jul 22 17:44:27 2019 +0200 > > mirror: Keep mirror_top_bs drained after dropping permissions > > mirror_top_bs is currently implicitly drained through its connection to > the source or the target node. However, the drain section for target_bs > ends early after moving mirror_top_bs from src to target_bs, so that > requests can already be restarted while mirror_top_bs is still present > in the chain, but has dropped all permissions and therefore runs into an > assertion failure like this: > > qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare: > Assertion `child->perm & BLK_PERM_WRITE' failed. > > Keep mirror_top_bs drained until all graph changes have completed. At this time, bdrv_drained_end(target_bs) happened before (1) bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, ...) The order of those two operations was later switched, namely in: > commit ccd6a37947574707613e826e2bf04d55f1d5f238 > Author: Kevin Wolf <kwolf@redhat.com> > Date: Fri Oct 27 17:53:25 2023 +0200 > > block: Mark bdrv_replace_node() GRAPH_WRLOCK > > Instead of taking the writer lock internally, require callers to already > hold it when calling bdrv_replace_node(). Its callers may already want > to hold the graph lock and so wouldn't be able to call functions that > take it internally. There still is (2) bdrv_replace_node(to_replace, target_bs, ...) with the drained section for to_replace ending right afterwards. Looking at where mirror_top_bs is after (2): If to_replace == src, then mirror_top_bs will be on top of target_bs. If to_replace != src, then mirror_top_bs will still be on top of src. So it shouldn't matter that the drain for to_replace ends before (1). The drained sections for target_bs and src both end after (1). If this analysis is correct, it's possible to drop the drain for mirror_top_bs. Question is if it's not better to stay explicit about it rather than implicitly relying on src/target being drained. Best Regards, Fiona ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-02-19 20:24 [PATCH] mirror: Fix missed dirty bitmap writes during startup Kevin Wolf 2026-02-20 14:00 ` Fiona Ebner 2026-03-05 18:34 ` Vladimir Sementsov-Ogievskiy @ 2026-03-08 8:25 ` Michael Tokarev 2026-03-10 16:22 ` Fiona Ebner 2 siblings, 1 reply; 15+ messages in thread From: Michael Tokarev @ 2026-03-08 8:25 UTC (permalink / raw) To: Kevin Wolf, qemu-block Cc: hreitz, f.ebner, vsementsov, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable On 19.02.2026 23:24, Kevin Wolf wrote: > Currently, mirror disables the block layer's dirty bitmap before its own > replacement is working. This means that during startup, there is a > window in which the allocation status of blocks in the source has > already been checked, but new writes coming in aren't tracked yet, > resulting in a corrupted copy: > > 1. Dirty bitmap is disabled in mirror_start_job() > 2. Some request are started in mirror_top_bs while s->job == NULL > 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because > the request hasn't completed yet, the block isn't allocated > 4. The request completes, still sees s->job == NULL and skips the > bitmap, and nothing else will mark it dirty either > > One ingredient is that mirror_top_opaque->job is only set after the > job is fully initialized. For the rationale, see commit 32125b1460 > ("mirror: Fix access of uninitialised fields during start"). > > Fix this by giving mirror_top_bs access to dirty_bitmap and enabling it > to track writes from the beginning. Disabling the block layer's tracking > and enabling the mirror_top_bs one happens in a drained section, so > there is no danger of races with in-flight requests any more. All of > this happens well before the block allocation status is checked, so we > can be sure that no writes will be missed. > > Cc: qemu-stable@nongnu.org > Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273 > Fixes: 32125b14606a ('mirror: Fix access of uninitialised fields during start') > Signed-off-by: Kevin Wolf <kwolf@redhat.com> Hi! While trying to apply this one to 10.0.x series of qemu, I noticed there are a few other commits missing in this area. Namely, these commits are missing in there: 91ba0e1c382bd4 block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK ffdcd081f52544 block: move drain outside of bdrv_root_attach_child() 6b89e851fabf78 block: add bdrv_graph_wrlock_drained() convenience wrapper This is just a context, but I wonder if the resulting thing look sane. https://gitlab.com/mjt0k/qemu/-/commit/04c53bdbef634f93ac4149a19bf2de2e8d156b2b -- can you take a look please? Thanks, /mjt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-08 8:25 ` Michael Tokarev @ 2026-03-10 16:22 ` Fiona Ebner 2026-03-10 18:35 ` Michael Tokarev 0 siblings, 1 reply; 15+ messages in thread From: Fiona Ebner @ 2026-03-10 16:22 UTC (permalink / raw) To: Michael Tokarev, Kevin Wolf, qemu-block Cc: hreitz, vsementsov, jsnow, jean-louis, dionbosschieter, qemu-devel, qemu-stable Am 08.03.26 um 9:24 AM schrieb Michael Tokarev: > On 19.02.2026 23:24, Kevin Wolf wrote: >> Currently, mirror disables the block layer's dirty bitmap before its own >> replacement is working. This means that during startup, there is a >> window in which the allocation status of blocks in the source has >> already been checked, but new writes coming in aren't tracked yet, >> resulting in a corrupted copy: >> >> 1. Dirty bitmap is disabled in mirror_start_job() >> 2. Some request are started in mirror_top_bs while s->job == NULL >> 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because >> the request hasn't completed yet, the block isn't allocated >> 4. The request completes, still sees s->job == NULL and skips the >> bitmap, and nothing else will mark it dirty either >> >> One ingredient is that mirror_top_opaque->job is only set after the >> job is fully initialized. For the rationale, see commit 32125b1460 >> ("mirror: Fix access of uninitialised fields during start"). >> >> Fix this by giving mirror_top_bs access to dirty_bitmap and enabling it >> to track writes from the beginning. Disabling the block layer's tracking >> and enabling the mirror_top_bs one happens in a drained section, so >> there is no danger of races with in-flight requests any more. All of >> this happens well before the block allocation status is checked, so we >> can be sure that no writes will be missed. >> >> Cc: qemu-stable@nongnu.org >> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273 >> Fixes: 32125b14606a ('mirror: Fix access of uninitialised fields >> during start') >> Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > Hi! > > While trying to apply this one to 10.0.x series of qemu, I noticed there > are a few other commits missing in this area. Namely, these commits are > missing in there: > > 91ba0e1c382bd4 block: move drain outside of bdrv_change_aio_context() > and mark GRAPH_RDLOCK > ffdcd081f52544 block: move drain outside of bdrv_root_attach_child() > 6b89e851fabf78 block: add bdrv_graph_wrlock_drained() convenience wrapper > > This is just a context, but I wonder if the resulting thing look sane. > https://gitlab.com/mjt0k/qemu/-/ > commit/04c53bdbef634f93ac4149a19bf2de2e8d156b2b -- > can you take a look please? FWIW, it looks fine to me :) The fix here should be orthogonal to the draining changes from those commits. Best Regards, Fiona ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-10 16:22 ` Fiona Ebner @ 2026-03-10 18:35 ` Michael Tokarev 2026-03-11 11:10 ` Fiona Ebner 0 siblings, 1 reply; 15+ messages in thread From: Michael Tokarev @ 2026-03-10 18:35 UTC (permalink / raw) To: Fiona Ebner, Kevin Wolf, qemu-block; +Cc: qemu-devel, qemu-stable On 10.03.2026 19:22, Fiona Ebner wrote: > Am 08.03.26 um 9:24 AM schrieb Michael Tokarev: >> On 19.02.2026 23:24, Kevin Wolf wrote:...>> While trying to apply this one to 10.0.x series of qemu, I noticed there >> are a few other commits missing in this area. Namely, these commits are >> missing in there: >> >> 91ba0e1c382bd4 block: move drain outside of bdrv_change_aio_context() >> and mark GRAPH_RDLOCK >> ffdcd081f52544 block: move drain outside of bdrv_root_attach_child() >> 6b89e851fabf78 block: add bdrv_graph_wrlock_drained() convenience wrapper >> >> This is just a context, but I wonder if the resulting thing look sane. >> https://gitlab.com/mjt0k/qemu/-/ >> commit/04c53bdbef634f93ac4149a19bf2de2e8d156b2b -- >> can you take a look please? > > FWIW, it looks fine to me :) The fix here should be orthogonal to the > draining changes from those commits. Thank you very much for looking at this one. Since you're the one who wrote the above mentioned draining commits, how do you think, should these be picked up for stable-10.0? (I tried, it gets quite a bit messy there, but it seems like they're fixing real issues). Thank you! /mjt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mirror: Fix missed dirty bitmap writes during startup 2026-03-10 18:35 ` Michael Tokarev @ 2026-03-11 11:10 ` Fiona Ebner 0 siblings, 0 replies; 15+ messages in thread From: Fiona Ebner @ 2026-03-11 11:10 UTC (permalink / raw) To: Michael Tokarev, Kevin Wolf, qemu-block; +Cc: qemu-devel, qemu-stable Am 10.03.26 um 7:35 PM schrieb Michael Tokarev: > On 10.03.2026 19:22, Fiona Ebner wrote: >> Am 08.03.26 um 9:24 AM schrieb Michael Tokarev: >>> On 19.02.2026 23:24, Kevin Wolf wrote:...>> While trying to apply >>> this one to 10.0.x series of qemu, I noticed there >>> are a few other commits missing in this area. Namely, these commits are >>> missing in there: >>> >>> 91ba0e1c382bd4 block: move drain outside of bdrv_change_aio_context() >>> and mark GRAPH_RDLOCK >>> ffdcd081f52544 block: move drain outside of bdrv_root_attach_child() >>> 6b89e851fabf78 block: add bdrv_graph_wrlock_drained() convenience >>> wrapper >>> >>> This is just a context, but I wonder if the resulting thing look sane. >>> https://gitlab.com/mjt0k/qemu/-/ >>> commit/04c53bdbef634f93ac4149a19bf2de2e8d156b2b -- >>> can you take a look please? >> >> FWIW, it looks fine to me :) The fix here should be orthogonal to the >> draining changes from those commits. > > Thank you very much for looking at this one. > > Since you're the one who wrote the above mentioned draining commits, > how do you think, should these be picked up for stable-10.0? > (I tried, it gets quite a bit messy there, but it seems like they're > fixing real issues). Yes, they do fix real deadlocks, but only in some rare, racy edge cases. I can imagine that it might get messy, since that patch series essentially touched all of the block layer and is doing lots of incremental changes. I honestly do not know if it would be worth the effort. Best Regards, Fiona ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-03-25 10:14 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-19 20:24 [PATCH] mirror: Fix missed dirty bitmap writes during startup Kevin Wolf 2026-02-20 14:00 ` Fiona Ebner 2026-02-24 13:58 ` Kevin Wolf 2026-02-24 14:06 ` Fiona Ebner 2026-02-25 12:32 ` Jean-Louis Dupond 2026-03-02 9:49 ` Jean-Louis Dupond 2026-03-02 13:13 ` Kevin Wolf 2026-03-05 18:34 ` Vladimir Sementsov-Ogievskiy 2026-03-06 9:34 ` Kevin Wolf 2026-03-24 14:44 ` Vladimir Sementsov-Ogievskiy 2026-03-25 10:13 ` Fiona Ebner 2026-03-08 8:25 ` Michael Tokarev 2026-03-10 16:22 ` Fiona Ebner 2026-03-10 18:35 ` Michael Tokarev 2026-03-11 11:10 ` Fiona Ebner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox