All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: vsementsov@virtuozzo.com, berto@igalia.com,
	qemu-devel@nongnu.org, qemu-block@nongnu.org, mreitz@redhat.com
Subject: Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
Date: Thu, 23 Apr 2020 18:04:10 +0200	[thread overview]
Message-ID: <20200423160410.GF23654@linux.fritz.box> (raw)
In-Reply-To: <fca340c2-5bee-b287-e43e-28dc679ea372@redhat.com>

Am 23.04.2020 um 17:36 hat Eric Blake geschrieben:
> On 4/23/20 10:01 AM, Kevin Wolf wrote:
> > The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> > image is possibly preallocated and then the zero flag is added to all
> > clusters. This means that a copy-on-write operation may be needed when
> > writing to these clusters, despite having used preallocation, negating
> > one of the major benefits of preallocation.
> > 
> > Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> > and if the protocol driver can ensure that the new area reads as zeros,
> > we can skip setting the zero flag in the qcow2 layer.
> 
> Hmm.  When we get block status, it is very easy to tell that something reads
> as zero when the qcow2 file has the zero bit set, but when the qcow2 file
> does not have the zero bit set, we have to then query the format layer
> whether it reads as zeros (which is expensive enough for some format layers
> that we no longer report things as reading as zero). I'm worried that
> optimizing this case could penalize later block status.

That's just how preallocation works. If you don't want that, you need
preallocation=off.

> We already can tell the difference between a cluster that has the zero bit
> set but is not preallocated, vs. has the zero bit set and is preallocated.
> Are we really forcing a copy-on-write to a cluster that is marked zero but
> preallocated?  Is the problem that we don't have a way to distinguish
> between 'reads as zeroes, allocated, but we don't know state of format
> layer' and 'reads as zeroes, allocated, and we know format layer reads as
> zeroes'?

Basically, yes. If we had this, we could have a type of cluster where
writing to it still involves a metadata update (to clear the zero flag),
but never copy-on-write, even for partial writes.

I'm not sure if this would cover a very relevant case, though.

> > 
> > Unfortunately, the same approach doesn't work for metadata
> > preallocation, so we'll still set the zero flag there.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2.c              | 22 +++++++++++++++++++---
> >   tests/qemu-iotests/274.out |  4 ++--
> >   2 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index ad621fe404..b28e588942 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -4170,9 +4170,25 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           /* Allocate the data area */
> >           new_file_size = allocation_start +
> >                           nb_new_data_clusters * s->cluster_size;
> > -        /* Image file grows, so @exact does not matter */
> > -        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> > -                               errp);
> > +        /*
> > +         * Image file grows, so @exact does not matter.
> > +         *
> > +         * If we need to zero out the new area, try first whether the protocol
> > +         * driver can already take care of this.
> > +         */
> > +        if (flags & BDRV_REQ_ZERO_WRITE) {
> > +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc,
> > +                                   BDRV_REQ_ZERO_WRITE, errp);
> > +            if (ret >= 0) {
> > +                flags &= ~BDRV_REQ_ZERO_WRITE;
> > +            }
> 
> Hmm - just noticing things: how does this series interplay with the existing
> bdrv_has_zero_init_truncate?  Should this series automatically handle
> BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request for all
> drivers that report true, even if that driver does not advertise support for
> the BDRV_REQ_ZERO_WRITE flag?

Hmm... It feels risky to me.

> > +        } else {
> > +            ret = -1;
> > +        }
> 
> Here, ret == -1 does not imply whether errp is set - but annoyingly, errp
> CAN be set if bdrv_co_truncate() failed.
> 
> > +        if (ret < 0) {
> > +            ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> > +                                   errp);
> 
> And here, you are passing a possibly-set errp back to bdrv_co_truncate().
> That is a bug that can abort.  You need to pass NULL to the first
> bdrv_co_truncate() call or else clear errp prior to trying a fallback to
> this second bdrv_co_truncate() call.

Yes, you're right. If nothing else comes up, I'll fix this while
applying.

Kevin



  reply	other threads:[~2020-04-23 16:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 15:01 [PATCH v6 00/10] block: Fix resize (extending) of short overlays Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 01/10] block: Add flags to BlockDriver.bdrv_co_truncate() Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 02/10] block: Add flags to bdrv(_co)_truncate() Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 03/10] block-backend: Add flags to blk_truncate() Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Kevin Wolf
2020-04-23 15:18   ` Eric Blake
2020-04-23 15:48     ` Kevin Wolf
2020-04-24  6:16   ` Vladimir Sementsov-Ogievskiy
2020-04-24 12:17     ` Kevin Wolf
2020-04-24 14:16       ` Eric Blake
2020-04-24 14:27       ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 05/10] raw-format: " Kevin Wolf
2020-04-24  6:21   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 06/10] file-posix: " Kevin Wolf
2020-04-23 15:01 ` [PATCH v6 07/10] block: truncate: Don't make backing file data visible Kevin Wolf
2020-04-23 15:21   ` Eric Blake
2020-04-23 17:59   ` Max Reitz
2020-04-24  6:45   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info() Kevin Wolf
2020-04-24  6:51   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 09/10] iotests: Test committing to short backing file Kevin Wolf
2020-04-24  8:53   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:01 ` [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation Kevin Wolf
2020-04-23 15:36   ` Eric Blake
2020-04-23 16:04     ` Kevin Wolf [this message]
2020-04-23 16:15       ` Eric Blake
2020-04-23 18:05   ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200423160410.GF23654@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=berto@igalia.com \
    --cc=eblake@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.