qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: "fam@euphon.net" <fam@euphon.net>,
	Denis Lunev <den@virtuozzo.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Wed, 20 Nov 2019 12:44:08 +0100	[thread overview]
Message-ID: <20191120114408.GA5779@linux.fritz.box> (raw)
In-Reply-To: <7f8574a2-8fd2-9724-a197-d67d3c69d538@virtuozzo.com>

Am 20.11.2019 um 11:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.11.2019 19:34, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> > 
> > I wanted to understand, what is the real difference between bdrv_block_status_above
> > and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
> > bdrv_block_status_above..
> > 
> > And I found the problem: bdrv_is_allocated_above considers space after EOF as
> > UNALLOCATED for intermediate nodes..
> > 
> > UNALLOCATED is not about allocation at fs level, but about should we go to backing or
> > not.. And it seems incorrect for me, as in case of short backing file, we'll read
> > zeroes after EOF, instead of going further by backing chain.
> > 
> > This leads to the following effect:
> > 
> > ./qemu-img create -f qcow2 base.qcow2 2M
> > ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> > 
> > ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> > ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> > 
> > Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> > ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> > 
> > But after commit guest visible state is changed, which seems wrong for me:
> > ./qemu-img commit top.qcow2 -b mid.qcow2
> > 
> > ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> > Pattern verification failed at offset 1048576, 1048576 bytes
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> > 
> > ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> > 
> > 
> > I don't know, is it a real bug, as I don't know, do we support backing file larger than
> > its parent. Still, I'm not sure that this behavior of bdrv_is_allocated_above don't lead
> > to other problems.
> > 
> > =====
> > 
> > Hmm, bdrv_block_allocated_above behaves strange too:
> > 
> > with want_zero=true, it may report unallocated zeroes because of short backing files, which
> > are actually "allocated" in POV of backing chains. But I see this may influence only
> > qemu-img compare, and I don't see can it trigger some bug..
> > 
> > with want_zero=false, it may do no progress because of short backing file. Moreover it may
> > report EOF in the middle!! But want_zero=false used only in bdrv_is_allocated, which considers
> > onlyt top layer, so it seems OK.
> > 
> > =====
> > 
> > So, I propose these series, still I'm not sure is there a real bug.
> > 
> > Vladimir Sementsov-Ogievskiy (4):
> >    block/io: fix bdrv_co_block_status_above
> >    block/io: bdrv_common_block_status_above: support include_base
> >    block/io: bdrv_common_block_status_above: support bs == base
> >    block/io: fix bdrv_is_allocated_above
> > 
> >   block/io.c                 | 104 ++++++++++++++++++-------------------
> >   tests/qemu-iotests/154.out |   4 +-
> >   2 files changed, 53 insertions(+), 55 deletions(-)
> > 
> 
> 
> Interesting that the problem illustrated here is not fixed by the series, it's actually
> relates to the fact that mirror does truncation with PREALLOC_MODE_OFF, which leads
> to unallocated qcow2 clusters, which I think should be fixed too.

Yes, this is what I posted yesterday. (With a suggested quick fix, but
it turns out it was not quite correct, see below.)

> To illustrate the problem fixed by the series, we should commit to base:
> 
> # ./qemu-img commit top.qcow2 -b base.qcow2
> Image committed.
> # ./qemu-io -c "read -P 0 1M 1M" base.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (5.366 GiB/sec and 5494.4149 ops/sec)

Ok, I'll try that later.

> Hmm, but how to fix the problem about truncate? I think truncate must
> not make underlying backing available for read.. Discard operation
> doesn't do it.
> 
> So, actually on PREALLOC_MODE_OFF we must allocated L2 tables and mark
> new clusters as ZERO?

Yes, we need to write zeroes to the new area if the backing file covers
it. We need to do this not only in mirror/commit/bdrv_commit(), but in
fact for all truncate operations: Berto mentioned on IRC yesterday that
you can get into the same situation with 'block_resize' monitor
commands.

So I tried to fix this yesterday, and I thought that I had a fix, when I
noticed that bdrv_co_do_zero_pwritev() takes a 32 bit bytes parameter.
So I'll still need to fix this. Other than that, I suppose the following
fix should work (but is probably a bit too invasive for -rc3).

Kevin

diff --git a/block/io.c b/block/io.c
index f75777f5ea..4118bf0118 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3382,6 +3382,32 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
         goto out;
     }

+    /*
+     * If the image has a backing file that is large enough that it would
+     * provide data for the new area, we cannot leave it unallocated because
+     * then the backing file content would become visible. Instead, zero-fill
+     * the area where backing file and new area overlap.
+     */
+    if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+        int64_t backing_len;
+
+        backing_len = bdrv_getlength(backing_bs(bs));
+        if (backing_len < 0) {
+            ret = backing_len;
+            goto out;
+        }
+
+        if (backing_len > old_size) {
+            /* FIXME bytes parameter is 32 bits */
+            ret = bdrv_co_do_zero_pwritev(child, old_size,
+                                          MIN(new_bytes, backing_len - old_size),
+                                          BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP, &req);
+            if (ret < 0) {
+                goto out;
+            }
+        }
+    }
+
     ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not refresh total sector count");



  reply	other threads:[~2019-11-20 12:08 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-16 16:34 [PATCH 0/4] fix & merge block_status_above and is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-16 16:34 ` [PATCH 1/4] block/io: fix bdrv_co_block_status_above Vladimir Sementsov-Ogievskiy
2019-11-25 16:00   ` Kevin Wolf
2019-11-26  7:26     ` Vladimir Sementsov-Ogievskiy
2019-11-26 14:20       ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 2/4] block/io: bdrv_common_block_status_above: support include_base Vladimir Sementsov-Ogievskiy
2019-11-25 16:19   ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 3/4] block/io: bdrv_common_block_status_above: support bs == base Vladimir Sementsov-Ogievskiy
2019-11-25 16:23   ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 4/4] block/io: fix bdrv_is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-19 10:22 ` [PATCH 0/4] fix & merge block_status_above and is_allocated_above Max Reitz
2019-11-19 12:02   ` Denis V. Lunev
2019-11-19 12:12     ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:20     ` Max Reitz
2019-11-19 12:30       ` Vladimir Sementsov-Ogievskiy
2019-11-19 13:28         ` Kevin Wolf
2019-11-19 12:05 ` Kevin Wolf
2019-11-19 12:17   ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:32     ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:34       ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:49         ` Vladimir Sementsov-Ogievskiy
2019-11-19 14:21     ` Kevin Wolf
2019-11-19 14:54 ` Kevin Wolf
2019-11-19 16:58 ` Stefan Hajnoczi
2019-11-19 17:11   ` Vladimir Sementsov-Ogievskiy
2019-11-20 10:20 ` Vladimir Sementsov-Ogievskiy
2019-11-20 11:44   ` Kevin Wolf [this message]
2019-11-20 12:04     ` Vladimir Sementsov-Ogievskiy
2019-11-20 13:30       ` Kevin Wolf
2019-11-20 13:51         ` Vladimir Sementsov-Ogievskiy
2019-11-20 13:37       ` Vladimir Sementsov-Ogievskiy
2019-11-20 16:24 ` [PATCH 5/4] iotests: add commit top->base cases to 274 Vladimir Sementsov-Ogievskiy
2019-11-25 10:08 ` [PATCH 0/4] fix & merge block_status_above and is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-25 15:46   ` Kevin Wolf
2019-11-26  7:27     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191120114408.GA5779@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=fam@euphon.net \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).