From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>, Nir Soffer <nsoffer@redhat.com>
Cc: qemu-block <qemu-block@nongnu.org>,
pl@kamp.de, QEMU Developers <qemu-devel@nongnu.org>,
nirsof <nirsof@gmail.com>, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device
Date: Tue, 16 Jun 2020 15:09:31 -0500 [thread overview]
Message-ID: <3fbf14f4-1bb9-7e30-3e7c-6207fa3f15c1@redhat.com> (raw)
In-Reply-To: <20200616153241.GF4305@linux.fritz.box>
On 6/16/20 10:32 AM, Kevin Wolf wrote:
> Am 15.06.2020 um 21:32 hat Nir Soffer geschrieben:
>> We can zero 2.3 g/s:
>>
>> # time blkdiscard -z test-lv
>>
>> real 0m43.902s
>> user 0m0.002s
>> sys 0m0.130s
>
>> We can write 445m/s:
>>
>> # dd if=/dev/zero bs=2M count=51200 of=test-lv oflag=direct conv=fsync
>> 107374182400 bytes (107 GB, 100 GiB) copied, 241.257 s, 445 MB/s
>
> So using FALLOC_FL_PUNCH_HOLE _is_ faster after all. What might not be
> faster is zeroing out the whole device and then overwriting a
> considerable part of it again.
Yeah, there can indeed be a difference between a pre-zeroing which can
be super-fast (on a posix file, truncate to 0 and back to the desired
size, for example), and where it is faster than writes but still slower
than a single pass.
>
> I think this means that we shouldn't fail write_zeroes at the file-posix
> level even if BDRV_REQ_NO_FALLBACK is given. Instead, qemu-img convert
> is where I see a fix.
Is the kernel able to tell us reliably when we can perform a fast
pre-zero pass? If it can't, it's that much harder to define when
BDRV_REQ_NO_FALLBACK makes a difference.
>
> Certainly qemu-img could be cleverer and zero out more selectively. The
> idea of doing a blk_make_zero() first seems to have caused some
> problems, though of course its introduction was also justified with
> performance, so improving one case might hurt another if we're not
> careful.
>
> However, when Peter Lieven introduced this (commit 5a37b60a61c), we
> didn't use write_zeroes yet during the regular copy loop (we do since
> commit 690c7301600). So chances are that blk_make_zero() doesn't
> actually help any more now.
>
> Can you run another test with the patch below? I think it should perform
> the same as yours. Eric, Peter, do you think this would have a negative
> effect for NBD and/or iscsi?
I'm still hoping to revive my work on making bdrv_make_zero a per-driver
callback with smarts for the fastest possible pre-zeroing that driver is
capable of, or fast failure when BDRV_REQ_NO_FALLBACK is set and it is
no faster to pre-zero than it is to just write zeroes when needed. I
can certainly construct NBD scenarios in either direction (where a
pre-zeroing pass is faster because of less network traffic, or where a
pre-zeroing pass is slower because of increased I/O - in fact, that was
part of my KVM Forum 2019 demo on why the NBD protocol added a FAST_ZERO
flag mirroring the idea of qemu's BDRV_REQ_NO_FALLBACK).
>
> The other option would be providing an option and making it Someone
> Else's Problem.
Matching what we recently did with --target-is-zero.
>
> Kevin
>
>
> diff --git a/qemu-img.c b/qemu-img.c
> index d7e846e607..bdb9f6aa46 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2084,15 +2084,6 @@ static int convert_do_copy(ImgConvertState *s)
> s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
> }
>
> - if (!s->has_zero_init && !s->target_has_backing &&
> - bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
> - {
> - ret = blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK);
> - if (ret == 0) {
> - s->has_zero_init = true;
> - }
> - }
> -
> /* Allocate buffer for copied data. For compressed images, only one cluster
> * can be copied at a time. */
> if (s->compressed) {
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
prev parent reply other threads:[~2020-06-16 20:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-13 17:08 [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device Nir Soffer
2020-06-15 19:32 ` Nir Soffer
2020-06-16 15:32 ` Kevin Wolf
2020-06-16 17:39 ` Nir Soffer
2020-06-16 20:01 ` Nir Soffer
2020-06-17 10:47 ` Kevin Wolf
2020-06-16 20:09 ` Eric Blake [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3fbf14f4-1bb9-7e30-3e7c-6207fa3f15c1@redhat.com \
--to=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=nirsof@gmail.com \
--cc=nsoffer@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).