From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: "fam@euphon.net" <fam@euphon.net>,
Denis Lunev <den@virtuozzo.com>,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"mreitz@redhat.com" <mreitz@redhat.com>,
"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Wed, 20 Nov 2019 12:44:08 +0100 [thread overview]
Message-ID: <20191120114408.GA5779@linux.fritz.box> (raw)
In-Reply-To: <7f8574a2-8fd2-9724-a197-d67d3c69d538@virtuozzo.com>
Am 20.11.2019 um 11:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.11.2019 19:34, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> >
> > I wanted to understand, what is the real difference between bdrv_block_status_above
> > and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
> > bdrv_block_status_above..
> >
> > And I found the problem: bdrv_is_allocated_above considers space after EOF as
> > UNALLOCATED for intermediate nodes..
> >
> > UNALLOCATED is not about allocation at fs level, but about should we go to backing or
> > not.. And it seems incorrect for me, as in case of short backing file, we'll read
> > zeroes after EOF, instead of going further by backing chain.
> >
> > This leads to the following effect:
> >
> > ./qemu-img create -f qcow2 base.qcow2 2M
> > ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> >
> > ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> > ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> >
> > Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> > ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> >
> > But after commit guest visible state is changed, which seems wrong for me:
> > ./qemu-img commit top.qcow2 -b mid.qcow2
> >
> > ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> > Pattern verification failed at offset 1048576, 1048576 bytes
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> >
> > ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> >
> >
> > I don't know, is it a real bug, as I don't know, do we support backing file larger than
> > its parent. Still, I'm not sure that this behavior of bdrv_is_allocated_above don't lead
> > to other problems.
> >
> > =====
> >
> > Hmm, bdrv_block_allocated_above behaves strange too:
> >
> > with want_zero=true, it may report unallocated zeroes because of short backing files, which
> > are actually "allocated" in POV of backing chains. But I see this may influence only
> > qemu-img compare, and I don't see can it trigger some bug..
> >
> > with want_zero=false, it may do no progress because of short backing file. Moreover it may
> > report EOF in the middle!! But want_zero=false used only in bdrv_is_allocated, which considers
> > onlyt top layer, so it seems OK.
> >
> > =====
> >
> > So, I propose these series, still I'm not sure is there a real bug.
> >
> > Vladimir Sementsov-Ogievskiy (4):
> > block/io: fix bdrv_co_block_status_above
> > block/io: bdrv_common_block_status_above: support include_base
> > block/io: bdrv_common_block_status_above: support bs == base
> > block/io: fix bdrv_is_allocated_above
> >
> > block/io.c | 104 ++++++++++++++++++-------------------
> > tests/qemu-iotests/154.out | 4 +-
> > 2 files changed, 53 insertions(+), 55 deletions(-)
> >
>
>
> Interesting that the problem illustrated here is not fixed by the series, it's actually
> relates to the fact that mirror does truncation with PREALLOC_MODE_OFF, which leads
> to unallocated qcow2 clusters, which I think should be fixed too.
Yes, this is what I posted yesterday. (With a suggested quick fix, but
it turns out it was not quite correct, see below.)
> To illustrate the problem fixed by the series, we should commit to base:
>
> # ./qemu-img commit top.qcow2 -b base.qcow2
> Image committed.
> # ./qemu-io -c "read -P 0 1M 1M" base.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (5.366 GiB/sec and 5494.4149 ops/sec)
Ok, I'll try that later.
> Hmm, but how to fix the problem about truncate? I think truncate must
> not make underlying backing available for read.. Discard operation
> doesn't do it.
>
> So, actually on PREALLOC_MODE_OFF we must allocated L2 tables and mark
> new clusters as ZERO?
Yes, we need to write zeroes to the new area if the backing file covers
it. We need to do this not only in mirror/commit/bdrv_commit(), but in
fact for all truncate operations: Berto mentioned on IRC yesterday that
you can get into the same situation with 'block_resize' monitor
commands.
So I tried to fix this yesterday, and I thought that I had a fix, when I
noticed that bdrv_co_do_zero_pwritev() takes a 32 bit bytes parameter.
So I'll still need to fix this. Other than that, I suppose the following
fix should work (but is probably a bit too invasive for -rc3).
Kevin
diff --git a/block/io.c b/block/io.c
index f75777f5ea..4118bf0118 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3382,6 +3382,32 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
goto out;
}
+ /*
+ * If the image has a backing file that is large enough that it would
+ * provide data for the new area, we cannot leave it unallocated because
+ * then the backing file content would become visible. Instead, zero-fill
+ * the area where backing file and new area overlap.
+ */
+ if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+ int64_t backing_len;
+
+ backing_len = bdrv_getlength(backing_bs(bs));
+ if (backing_len < 0) {
+ ret = backing_len;
+ goto out;
+ }
+
+ if (backing_len > old_size) {
+ /* FIXME bytes parameter is 32 bits */
+ ret = bdrv_co_do_zero_pwritev(child, old_size,
+ MIN(new_bytes, backing_len - old_size),
+ BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP, &req);
+ if (ret < 0) {
+ goto out;
+ }
+ }
+ }
+
ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not refresh total sector count");
next prev parent reply other threads:[~2019-11-20 12:08 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-16 16:34 [PATCH 0/4] fix & merge block_status_above and is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-16 16:34 ` [PATCH 1/4] block/io: fix bdrv_co_block_status_above Vladimir Sementsov-Ogievskiy
2019-11-25 16:00 ` Kevin Wolf
2019-11-26 7:26 ` Vladimir Sementsov-Ogievskiy
2019-11-26 14:20 ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 2/4] block/io: bdrv_common_block_status_above: support include_base Vladimir Sementsov-Ogievskiy
2019-11-25 16:19 ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 3/4] block/io: bdrv_common_block_status_above: support bs == base Vladimir Sementsov-Ogievskiy
2019-11-25 16:23 ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 4/4] block/io: fix bdrv_is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-19 10:22 ` [PATCH 0/4] fix & merge block_status_above and is_allocated_above Max Reitz
2019-11-19 12:02 ` Denis V. Lunev
2019-11-19 12:12 ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:20 ` Max Reitz
2019-11-19 12:30 ` Vladimir Sementsov-Ogievskiy
2019-11-19 13:28 ` Kevin Wolf
2019-11-19 12:05 ` Kevin Wolf
2019-11-19 12:17 ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:32 ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:34 ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:49 ` Vladimir Sementsov-Ogievskiy
2019-11-19 14:21 ` Kevin Wolf
2019-11-19 14:54 ` Kevin Wolf
2019-11-19 16:58 ` Stefan Hajnoczi
2019-11-19 17:11 ` Vladimir Sementsov-Ogievskiy
2019-11-20 10:20 ` Vladimir Sementsov-Ogievskiy
2019-11-20 11:44 ` Kevin Wolf [this message]
2019-11-20 12:04 ` Vladimir Sementsov-Ogievskiy
2019-11-20 13:30 ` Kevin Wolf
2019-11-20 13:51 ` Vladimir Sementsov-Ogievskiy
2019-11-20 13:37 ` Vladimir Sementsov-Ogievskiy
2019-11-20 16:24 ` [PATCH 5/4] iotests: add commit top->base cases to 274 Vladimir Sementsov-Ogievskiy
2019-11-25 10:08 ` [PATCH 0/4] fix & merge block_status_above and is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-25 15:46 ` Kevin Wolf
2019-11-26 7:27 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191120114408.GA5779@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=den@virtuozzo.com \
--cc=fam@euphon.net \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).