Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>, Nir Soffer <nsoffer@redhat.com>
Cc: qemu-block <qemu-block@nongnu.org>,
	pl@kamp.de, QEMU Developers <qemu-devel@nongnu.org>,
	nirsof <nirsof@gmail.com>, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device
Date: Tue, 16 Jun 2020 15:09:31 -0500	[thread overview]
Message-ID: <3fbf14f4-1bb9-7e30-3e7c-6207fa3f15c1@redhat.com> (raw)
In-Reply-To: <20200616153241.GF4305@linux.fritz.box>

On 6/16/20 10:32 AM, Kevin Wolf wrote:
> Am 15.06.2020 um 21:32 hat Nir Soffer geschrieben:
>> We can zero 2.3 g/s:
>>
>> # time blkdiscard -z test-lv
>>
>> real 0m43.902s
>> user 0m0.002s
>> sys 0m0.130s
> 
>> We can write 445m/s:
>>
>> # dd if=/dev/zero bs=2M count=51200 of=test-lv oflag=direct conv=fsync
>> 107374182400 bytes (107 GB, 100 GiB) copied, 241.257 s, 445 MB/s
> 
> So using FALLOC_FL_PUNCH_HOLE _is_ faster after all. What might not be
> faster is zeroing out the whole device and then overwriting a
> considerable part of it again.

Yeah, there can indeed be a difference between a pre-zeroing which can 
be super-fast (on a posix file, truncate to 0 and back to the desired 
size, for example), and where it is faster than writes but still slower 
than a single pass.

> 
> I think this means that we shouldn't fail write_zeroes at the file-posix
> level even if BDRV_REQ_NO_FALLBACK is given. Instead, qemu-img convert
> is where I see a fix.

Is the kernel able to tell us reliably when we can perform a fast 
pre-zero pass?  If it can't, it's that much harder to define when 
BDRV_REQ_NO_FALLBACK makes a difference.

> 
> Certainly qemu-img could be cleverer and zero out more selectively. The
> idea of doing a blk_make_zero() first seems to have caused some
> problems, though of course its introduction was also justified with
> performance, so improving one case might hurt another if we're not
> careful.
> 
> However, when Peter Lieven introduced this (commit 5a37b60a61c), we
> didn't use write_zeroes yet during the regular copy loop (we do since
> commit 690c7301600). So chances are that blk_make_zero() doesn't
> actually help any more now.
> 
> Can you run another test with the patch below? I think it should perform
> the same as yours. Eric, Peter, do you think this would have a negative
> effect for NBD and/or iscsi?

I'm still hoping to revive my work on making bdrv_make_zero a per-driver 
callback with smarts for the fastest possible pre-zeroing that driver is 
capable of, or fast failure when BDRV_REQ_NO_FALLBACK is set and it is 
no faster to pre-zero than it is to just write zeroes when needed.  I 
can certainly construct NBD scenarios in either direction (where a 
pre-zeroing pass is faster because of less network traffic, or where a 
pre-zeroing pass is slower because of increased I/O - in fact, that was 
part of my KVM Forum 2019 demo on why the NBD protocol added a FAST_ZERO 
flag mirroring the idea of qemu's BDRV_REQ_NO_FALLBACK).

> 
> The other option would be providing an option and making it Someone
> Else's Problem.

Matching what we recently did with --target-is-zero.

> 
> Kevin
> 
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index d7e846e607..bdb9f6aa46 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2084,15 +2084,6 @@ static int convert_do_copy(ImgConvertState *s)
>           s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
>       }
> 
> -    if (!s->has_zero_init && !s->target_has_backing &&
> -        bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
> -    {
> -        ret = blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK);
> -        if (ret == 0) {
> -            s->has_zero_init = true;
> -        }
> -    }
> -
>       /* Allocate buffer for copied data. For compressed images, only one cluster
>        * can be copied at a time. */
>       if (s->compressed) {
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

     prev parent reply	other threads:[~2020-06-16 20:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-13 17:08 [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device Nir Soffer
2020-06-15 19:32 ` Nir Soffer
2020-06-16 15:32   ` Kevin Wolf
2020-06-16 17:39     ` Nir Soffer
2020-06-16 20:01       ` Nir Soffer
2020-06-17 10:47         ` Kevin Wolf
2020-06-16 20:09     ` Eric Blake [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fbf14f4-1bb9-7e30-3e7c-6207fa3f15c1@redhat.com \
    --to=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=nsoffer@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).