All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	vsementsov@yandex-team.ru, Fam Zheng <fam@euphon.net>,
	Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function
Date: Thu, 17 Apr 2025 16:35:33 -0400	[thread overview]
Message-ID: <20250417203533.GC85491@fedora> (raw)
In-Reply-To: <20250417184133.105746-17-eblake@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5072 bytes --]

On Thu, Apr 17, 2025 at 01:39:09PM -0500, Eric Blake wrote:
> There are some optimizations that require knowing if an image starts
> out as reading all zeroes, such as making blockdev-mirror faster by
> skipping the copying of source zeroes to the destination.  The
> existing bdrv_co_is_zero_fast() is a good building block for answering
> this question, but it tends to give an answer of 0 for a file we just
> created via QMP 'blockdev-create' or similar (such as 'qemu-img create
> -f raw').  Why?  Because file-posix.c insists on allocating a tiny
> header to any file rather than leaving it 100% sparse, due to some
> filesystems that are unable to answer alignment probes on a hole.  But
> teaching file-posix.c to read the tiny header doesn't scale - the
> problem of a small header is also visible when libvirt sets up an NBD
> client to a just-created file on a migration destination host.
> 
> So, we need a wrapper function that handles a bit more complexity in a
> common manner for all block devices - when the BDS is mostly a hole,
> but has a small non-hole header, it is still worth the time to read
> that header and check if it reads as all zeroes before giving up and
> returning a pessimistic answer.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/block-io.h |  2 ++
>  block/io.c               | 58 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index b49e0537dd4..b99cc98d265 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -161,6 +161,8 @@ bdrv_is_allocated_above(BlockDriverState *bs, BlockDriverState *base,
> 
>  int coroutine_fn GRAPH_RDLOCK
>  bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
> +int coroutine_fn GRAPH_RDLOCK
> +bdrv_co_is_all_zeroes(BlockDriverState *bs);
> 
>  int GRAPH_RDLOCK
>  bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
> diff --git a/block/io.c b/block/io.c
> index 6ef78070915..dc1341e4029 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2778,6 +2778,64 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
>      return 1;
>  }
> 
> +/*
> + * Check @bs (and its backing chain) to see if the entire image is known
> + * to read as zeroes.
> + * Return 1 if that is the case, 0 otherwise and -errno on error.
> + * This test is meant to be fast rather than accurate so returning 0
> + * does not guarantee non-zero data; however, it can report 1 in more

False negatives are possible, let's also document that false positives
are not possible:

  This test is mean to be fast rather than accurate so returning 0 does
  not guarantee non-zero data, but returning 1 does guarantee all zero
  data; ...

> + * cases than bdrv_co_is_zero_fast.
> + */
> +int coroutine_fn bdrv_co_is_all_zeroes(BlockDriverState *bs)
> +{
> +    int ret;
> +    int64_t pnum, bytes;
> +    char *buf;
> +    QEMUIOVector local_qiov;
> +    IO_CODE();
> +
> +    bytes = bdrv_co_getlength(bs);
> +    if (bytes < 0) {
> +        return bytes;
> +    }
> +
> +    /* First probe - see if the entire image reads as zero */
> +    ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
> +                                            0, bytes, &pnum, NULL, NULL,
> +                                            NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    if (ret & BDRV_BLOCK_ZERO) {
> +        return bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> +    }
> +
> +    /*
> +     * Because of the way 'blockdev-create' works, raw files tend to
> +     * be created with a non-sparse region at the front to make
> +     * alignment probing easier.  If the block starts with only a
> +     * small allocated region, it is still worth the effort to see if
> +     * the rest of the image is still sparse, coupled with manually
> +     * reading the first region to see if it reads zero after all.
> +     */
> +    if (pnum > qemu_real_host_page_size()) {

Probably not worth it for the corner case, but replacing
qemu_real_host_page_size() with 128 KiB would allow this to work on
images created on different CPU architectures (4 KiB vs 64 KiB page
sizes).

> +        return 0;
> +    }
> +    ret = bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> +    if (ret <= 0) {
> +        return ret;
> +    }
> +    /* Only the head of the image is unknown, and it's small.  Read it.  */
> +    buf = qemu_blockalign(bs, pnum);
> +    qemu_iovec_init_buf(&local_qiov, buf, pnum);
> +    ret = bdrv_driver_preadv(bs, 0, pnum, &local_qiov, 0, 0);
> +    if (ret >= 0) {
> +        ret = buffer_is_zero(buf, pnum);
> +    }
> +    qemu_vfree(buf);
> +    return ret;
> +}
> +
>  int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
>                                        int64_t bytes, int64_t *pnum)
>  {
> -- 
> 2.49.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-04-17 20:36 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-04-17 18:39 ` [PATCH v2 01/11] block: Expand block status mode from bool to enum Eric Blake
2025-04-17 20:17   ` Stefan Hajnoczi
2025-04-18 19:02     ` Eric Blake
2025-04-18 21:55       ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 02/11] file-posix: Handle zero block status hint better Eric Blake
2025-04-17 20:58   ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
2025-04-17 20:21   ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
2025-04-17 20:35   ` Stefan Hajnoczi [this message]
2025-04-18 19:07     ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
2025-04-17 20:39   ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 06/11] mirror: Minor refactoring Eric Blake
2025-04-17 20:42   ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
2025-04-17 20:46   ` Stefan Hajnoczi
2025-04-24 17:10   ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 08/11] mirror: Skip writing zeroes when target " Eric Blake
2025-04-17 20:54   ` Stefan Hajnoczi
2025-04-23 16:42   ` Sunny Zhu
2025-04-23 19:12     ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 09/11] iotests/common.rc: add disk_usage function Eric Blake
2025-04-17 20:54   ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
2025-04-17 20:55   ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
2025-04-17 20:57   ` Stefan Hajnoczi
2025-04-18  4:47   ` Markus Armbruster
2025-04-17 20:59 ` [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Stefan Hajnoczi
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
2025-04-18 21:52   ` [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
2025-04-22 14:43     ` Stefan Hajnoczi
2025-04-22 14:43   ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Stefan Hajnoczi
2025-04-24 18:08   ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250417203533.GC85491@fedora \
    --to=stefanha@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.