Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>
Cc: nirsof <nirsof@gmail.com>, qemu-block <qemu-block@nongnu.org>,
	pl@kamp.de, QEMU Developers <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device
Date: Wed, 17 Jun 2020 12:47:09 +0200	[thread overview]
Message-ID: <20200617104709.GA5166@linux.fritz.box> (raw)
In-Reply-To: <CAMRbyyvD6kRdaitm6Oc6LAnF6_e+Y9y+jTPKNVq8ENLEKYyKng@mail.gmail.com>

Am 16.06.2020 um 22:01 hat Nir Soffer geschrieben:
> On Tue, Jun 16, 2020 at 8:39 PM Nir Soffer <nsoffer@redhat.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 6:32 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 15.06.2020 um 21:32 hat Nir Soffer geschrieben:
> > > > We can zero 2.3 g/s:
> > > >
> > > > # time blkdiscard -z test-lv
> > > >
> > > > real 0m43.902s
> > > > user 0m0.002s
> > > > sys 0m0.130s
> > >
> > > > We can write 445m/s:
> > > >
> > > > # dd if=/dev/zero bs=2M count=51200 of=test-lv oflag=direct conv=fsync
> > > > 107374182400 bytes (107 GB, 100 GiB) copied, 241.257 s, 445 MB/s
> > >
> > > So using FALLOC_FL_PUNCH_HOLE _is_ faster after all. What might not be
> > > faster is zeroing out the whole device and then overwriting a
> > > considerable part of it again.
> > >
> > > I think this means that we shouldn't fail write_zeroes at the file-posix
> > > level even if BDRV_REQ_NO_FALLBACK is given. Instead, qemu-img convert
> > > is where I see a fix.
> > >
> > > Certainly qemu-img could be cleverer and zero out more selectively. The
> > > idea of doing a blk_make_zero() first seems to have caused some
> > > problems, though of course its introduction was also justified with
> > > performance, so improving one case might hurt another if we're not
> > > careful.
> > >
> > > However, when Peter Lieven introduced this (commit 5a37b60a61c), we
> > > didn't use write_zeroes yet during the regular copy loop (we do since
> > > commit 690c7301600). So chances are that blk_make_zero() doesn't
> > > actually help any more now.
> > >
> > > Can you run another test with the patch below?
> >
> > Sure, I can try this.
> 
> Tried it, and it performs the same as expected.

Thanks.

> > > I think it should perform
> > > the same as yours. Eric, Peter, do you think this would have a negative
> > > effect for NBD and/or iscsi?
> > >
> > > The other option would be providing an option and making it Someone
> > > Else's Problem.
> > >
> > > Kevin
> > >
> > >
> > > diff --git a/qemu-img.c b/qemu-img.c
> > > index d7e846e607..bdb9f6aa46 100644
> > > --- a/qemu-img.c
> > > +++ b/qemu-img.c
> > > @@ -2084,15 +2084,6 @@ static int convert_do_copy(ImgConvertState *s)
> > >          s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
> > >      }
> > >
> > > -    if (!s->has_zero_init && !s->target_has_backing &&
> > > -        bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
> > > -    {
> > > -        ret = blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK);
> > > -        if (ret == 0) {
> > > -            s->has_zero_init = true;
> > > -        }
> > > -    }
> >
> > This will work of course, but now we will not do bulk zero for any target
> 
> I would like to have a minimal change to increase the chance that we
> can consume this very soon in oVirt.

I think this one would be pretty minimal.

Maybe we can later bring this code back, but with an implementation of
blk_make_zero() that doesn't use the generic write_zeroes operation,
but with a specific callback like Eric suggested.

> > I think we never do this for regular files anyway since we truncate
> > files, and there is nothing to zero, but maybe there is some case
> > when this is useful?

Yes, regular files have s->has_zero_init == true anyway.

> > BTW, do we use BDRV_REQ_NO_FALLBACK elsewhere? maybe we can remove
> > it.

qcow2 uses it when zeroing out parts of a newly allocated cluster on
partial writes. Though that code is questionable, too, and XFS people
suggest that we should stop using it.

Kevin

next prev parent reply	other threads:[~2020-06-17 10:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-13 17:08 [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device Nir Soffer
2020-06-15 19:32 ` Nir Soffer
2020-06-16 15:32   ` Kevin Wolf
2020-06-16 17:39     ` Nir Soffer
2020-06-16 20:01       ` Nir Soffer
2020-06-17 10:47         ` Kevin Wolf [this message]
2020-06-16 20:09     ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200617104709.GA5166@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=nsoffer@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).