From: Kevin Wolf <kwolf@redhat.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: qemu-block@nongnu.org, f.ebner@proxmox.com, hreitz@redhat.com,
qemu-devel@nongnu.org, qemu-stable@nongnu.org
Subject: Re: [PATCH v2] qcow2: Fix data loss on zero write with detect-zeroes=unmap
Date: Mon, 1 Jun 2026 18:37:55 +0200 [thread overview]
Message-ID: <ah21Y_fIiy23V5S7@redhat.com> (raw)
In-Reply-To: <20260522151318.238064-1-t.lamprecht@proxmox.com>
Am 22.05.2026 um 17:13 hat Thomas Lamprecht geschrieben:
> Commit b8bfb1478d ("qcow2: Fix corruption on discard during write with
> COW") added a wait_for_dependencies() at the start of
> qcow2_subcluster_zeroize(). That fixes the inconsistency it set out to
> fix, but turns the lock-protected pre-check in the caller,
> qcow2_co_pwrite_zeroes(), into a stale one: the wait yields s->lock,
> so an in-flight allocating write whose QCowL2Meta is already on
> s->cluster_allocs (but whose L2 entry is not yet linked) gets to link
> its entry during the yield. When the zeroize wakes, the cluster is now
> NORMAL, and with BDRV_REQ_MAY_UNMAP the free path in zero_in_l2_slice()
> unmaps the just-written cluster, silently dropping the data write's
> payload.
>
> This is reachable with detect-zeroes=unmap (the default for VirtIO
> disks with discard on in Proxmox VE), under which the block layer
> auto-promotes all-zero buffers to BDRV_REQ_ZERO_WRITE |
> BDRV_REQ_MAY_UNMAP. A memory-constrained Debian guest running 'apt
> full-upgrade' on such a disk reproduces it as random SIGSEGVs:
> swapped-out code pages come back as zero.
>
> Wait for in-flight dependencies before the lock-protected check in
> qcow2_co_pwrite_zeroes(). If a write linked its L2 entry during the
> wait, the type check now fails and the block layer falls back to a
> bounce-buffered zero write that only touches the requested subrange,
> preserving the racing write's data. Promote wait_for_dependencies() to
> qcow2_wait_for_dependencies() so qcow2.c can call it.
>
> Fixes: b8bfb1478d ("qcow2: Fix corruption on discard during write with COW")
> Cc: qemu-stable@nongnu.org
> Tested-by: Fiona Ebner <f.ebner@proxmox.com>
> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> ---
>
> Changes v1 -> v2:
> * improve comments (thx @Fiona)
> * add Fiona's R-b/T-b
>
> block/qcow2-cluster.c | 10 +++++-----
> block/qcow2.c | 10 ++++++++--
> block/qcow2.h | 4 ++++
> tests/qemu-iotests/046 | 23 +++++++++++++++++++++++
> tests/qemu-iotests/046.out | 10 ++++++++++
> 5 files changed, 50 insertions(+), 7 deletions(-)
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 8b1e80bd0b..e02fae6a0c 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1474,9 +1474,9 @@ static int coroutine_fn handle_dependencies(BlockDriverState *bs,
> return 0;
> }
>
> -static void coroutine_mixed_fn wait_for_dependencies(BlockDriverState *bs,
> - uint64_t guest_offset,
> - uint64_t bytes)
> +void coroutine_mixed_fn qcow2_wait_for_dependencies(BlockDriverState *bs,
> + uint64_t guest_offset,
> + uint64_t bytes)
> {
> BDRVQcow2State *s = bs->opaque;
> QCowL2Meta *m = NULL;
> @@ -2035,7 +2035,7 @@ int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
> * We don't need to allocate a QCowL2Meta for the discard operation because
> * s->lock is held for the duration of the whole operation.
> */
> - wait_for_dependencies(bs, offset, bytes);
> + qcow2_wait_for_dependencies(bs, offset, bytes);
>
> /* Caller must pass aligned values, except at image end */
> assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> @@ -2204,7 +2204,7 @@ int coroutine_fn qcow2_subcluster_zeroize(BlockDriverState *bs, uint64_t offset,
> * We don't need to allocate a QCowL2Meta for the zeroize operation because
> * s->lock is held for the duration of the whole operation.
> */
> - wait_for_dependencies(bs, offset, bytes);
> + qcow2_wait_for_dependencies(bs, offset, bytes);
>
> /* If we have to stay in sync with an external data file, zero out
> * s->data_file first. */
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 81fd299b4c..96efdd4503 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -4234,10 +4234,16 @@ qcow2_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
> }
>
> qemu_co_mutex_lock(&s->lock);
> - /* We can have new write after previous check */
> offset -= head;
> bytes = s->subcluster_size;
> - nr = s->subcluster_size;
I'm not sure why you changed this line. Isn't the new version of it
doing exactly the same thing as before? This makes the diff a little
more confusing than it should be.
> + /*
> + * Wait for in-flight allocating writes first: otherwise the type
> + * check below could pass on UNALLOCATED while a yet-to-link_l2 write
> + * completes during qcow2_subcluster_zeroize()'s own wait, letting the
> + * resumed MAY_UNMAP discard the just-written data.
> + */
> + qcow2_wait_for_dependencies(bs, offset, bytes);
> + nr = bytes;
> ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
> if (ret < 0 ||
> (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
The rest looks good to me. Maybe not perfect because we're calling
qcow2_wait_for_dependencies() twice now and rely on the second instance
not doing anything any more, but I guess that's okay.
So if you agree, I'd just change the assignment of nr above back and
apply this.
Kevin
prev parent reply other threads:[~2026-06-01 16:38 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 15:13 [PATCH v2] qcow2: Fix data loss on zero write with detect-zeroes=unmap Thomas Lamprecht
2026-06-01 16:37 ` Kevin Wolf [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah21Y_fIiy23V5S7@redhat.com \
--to=kwolf@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=hreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.