All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, sunnyzhyy@qq.com,
	vsementsov@yandex-team.ru, John Snow <jsnow@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH v4 11/13] mirror: Skip writing zeroes when target is already zero
Date: Mon, 12 May 2025 14:43:56 -0400	[thread overview]
Message-ID: <20250512184356.GG141177@fedora> (raw)
In-Reply-To: <20250509204341.3553601-26-eblake@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4186 bytes --]

On Fri, May 09, 2025 at 03:40:28PM -0500, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated (except when explicitly punching holes, then merely
> reading zero is not enough to know if it is sparse, so we still want
> to punch the hole).  Avoiding a redundant write to zero (whether in
> the background because the zero cluster was marked in the dirty
> bitmap, or in the foreground because the guest is writing zeroes) when
> the destination already reads as zero makes mirroring faster, and
> avoids allocating the destination merely because the source reports as
> allocated.
> 
> The effect is especially pronounced when the source is a raw file.
> That's because when the source is a qcow2 file, the dirty bitmap only
> visits the portions of the source that are allocated, which tend to be
> non-zero.  But when the source is a raw file,
> bdrv_co_is_allocated_above() reports the entire file as allocated so
> mirror_dirty_init sets the entire dirty bitmap, and it is only later
> during mirror_iteration that we change to consulting the more precise
> bdrv_co_block_status_above() to learn where the source reads as zero.
> 
> Remember that since a mirror operation can write a cluster more than
> once (every time the guest changes the source, the destination is also
> changed to keep up), and the guest can change whether a given cluster
> reads as zero, is discarded, or has non-zero data over the course of
> the mirror operation, we can't take the shortcut of relying on
> s->target_is_zero (which is static for the life of the job) in
> mirror_co_zero() to see if the destination is already zero, because
> that information may be stale.  Any solution we use must be dynamic in
> the face of the guest writing or discarding a cluster while the mirror
> has been ongoing.
> 
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that extra block_status()
> calls are not always cheap (tmpfs, anyone?), especially when they are
> random access rather than linear.  Use of block_status() of the source
> by the background task in a linear fashion is not our bottleneck (it's
> a background task, after all); but since mirroring can be done while
> the source is actively being changed, we don't want a slow
> block_status() of the destination to occur on the hot path of the
> guest trying to do random-access writes to the source.
> 
> So this patch takes a slightly different approach: any time we have to
> track dirty clusters, we can also track which clusters are known to
> read as zero.  For sync=TOP or when we are punching holes from
> "detect-zeroes":"unmap", the zero bitmap starts out empty, but
> prevents a second write zero to a cluster that was already zero by an
> earlier pass; for sync=FULL when we are not punching holes, the zero
> bitmap starts out full if the destination reads as zero during
> initialization.  Either way, I/O to the destination can now avoid
> redundant write zero to a cluster that already reads as zero, all
> without having to do a block_status() per write on the destination.
> 
> With this patch, if I create a raw sparse destination file, connect it
> with QMP 'blockdev-add' while leaving it at the default "discard":
> "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
> destination remains sparse rather than fully allocated.  Meanwhile, a
> destination image that is already fully allocated remains so unless it
> was opened with "detect-zeroes": "unmap".  And any time writing zeroes
> is skipped, the job counters are not incremented.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> v3: also skip counters when skipping I/O [Sunny]
> v4: rebase to earlier changes
> ---
>  block/mirror.c | 107 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 93 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-05-12 18:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-09 20:40 [PATCH v4 00/13] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-05-09 20:40 ` [PATCH v4 01/13] block: Expand block status mode from bool to flags Eric Blake
2025-05-09 20:40 ` [PATCH v4 02/13] file-posix, gluster: Handle zero block status hint better Eric Blake
2025-05-09 20:40 ` [PATCH v4 03/13] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
2025-05-09 20:40 ` [PATCH v4 04/13] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
2025-05-09 20:40 ` [PATCH v4 05/13] iotests: Improve iotest 194 to mirror data Eric Blake
2025-05-09 20:40 ` [PATCH v4 06/13] mirror: Minor refactoring Eric Blake
2025-05-09 20:40 ` [PATCH v4 07/13] mirror: Pass full sync mode rather than bool to internals Eric Blake
2025-05-02 23:17   ` Sunny Zhu
2025-05-12 15:19   ` Stefan Hajnoczi
2025-05-09 20:40 ` [PATCH v4 08/13] mirror: Allow QMP override to declare target already zero Eric Blake
2025-05-02 23:57   ` Sunny Zhu
2025-05-12 17:30     ` Eric Blake
2025-05-03  0:40   ` Sunny Zhu
2025-05-12 15:23   ` Stefan Hajnoczi
2025-05-09 20:40 ` [PATCH v4 09/13] mirror: Drop redundant zero_target parameter Eric Blake
2025-05-03  0:47   ` Sunny Zhu
2025-05-12 15:24   ` Stefan Hajnoczi
2025-05-14 22:09   ` Eric Blake
2025-05-15  1:11     ` Eric Blake
2025-05-09 20:40 ` [PATCH v4 10/13] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
2025-05-03  0:51   ` Sunny Zhu
2025-05-12 18:40   ` Stefan Hajnoczi
2025-05-09 20:40 ` [PATCH v4 11/13] mirror: Skip writing zeroes when target " Eric Blake
2025-05-12 18:43   ` Stefan Hajnoczi [this message]
2025-05-13 21:37   ` Eric Blake
2025-05-14 15:37   ` Sunny Zhu
2025-05-14 16:30     ` Eric Blake
2025-05-09 20:40 ` [PATCH v4 12/13] iotests/common.rc: add disk_usage function Eric Blake
2025-05-09 20:40 ` [PATCH v4 13/13] tests: Add iotest mirror-sparse for recent patches Eric Blake
2025-05-12 18:44   ` Stefan Hajnoczi
2025-05-12 18:44 ` [PATCH v4 00/13] Make blockdev-mirror dest sparse in more cases Stefan Hajnoczi
2025-05-13 22:00 ` [PATCH v4 14/13] mirror: Reduce I/O when destination is detect-zeroes:unmap Eric Blake
2025-05-14 13:20   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250512184356.GG141177@fedora \
    --to=stefanha@redhat.com \
    --cc=eblake@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sunnyzhyy@qq.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.