* [Qemu-devel] [PATCH] block: Fix copy-on-read crash with partial final cluster
@ 2018-07-06 16:45 Kevin Wolf
2018-07-06 21:20 ` Eric Blake
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2018-07-06 16:45 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, qemu-devel
If the virtual disk size isn't aligned to full clusters,
bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full
cluster completed, which will let it run into an assertion failure:
qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.
Check for EOF, assert that we read at least as much as the read request
originally wanted to have (which is true at EOF because otherwise
bdrv_check_byte_request() would already have returned an error) and
return success early even though we couldn't copy the full cluster.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/io.c | 6 ++++++
tests/qemu-iotests/197 | 9 +++++++++
tests/qemu-iotests/197.out | 8 ++++++++
3 files changed, 23 insertions(+)
diff --git a/block/io.c b/block/io.c
index 038449f81f..4c0831149c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1200,6 +1200,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
pnum = MIN(cluster_bytes, max_transfer);
}
+ /* Stop at EOF if the image ends in the middle of the cluster */
+ if (ret == 0 && pnum == 0) {
+ assert(progress >= bytes);
+ break;
+ }
+
assert(skip_bytes < pnum);
if (ret <= 0) {
diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index 3ae4975eec..0369aa5cff 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -109,6 +109,15 @@ $QEMU_IO -f qcow2 -c map "$TEST_WRAP"
_check_test_img
$QEMU_IMG compare -f $IMGFMT -F qcow2 "$TEST_IMG" "$TEST_WRAP"
+echo
+echo '=== Partial final cluster ==='
+echo
+
+_make_test_img 1024
+$QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -f $IMGFMT -c map "$TEST_IMG"
+_check_test_img
+
# success, all done
echo '*** done'
status=0
diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out
index 52b4137d7b..8febda5dea 100644
--- a/tests/qemu-iotests/197.out
+++ b/tests/qemu-iotests/197.out
@@ -23,4 +23,12 @@ can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-only dev
1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
No errors were found on the image.
Images are identical.
+
+=== Partial final cluster ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1024
+read 1024/1024 bytes at offset 0
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+1 KiB (0x400) bytes allocated at offset 0 bytes (0x0)
+No errors were found on the image.
*** done
--
2.13.6
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Fix copy-on-read crash with partial final cluster
2018-07-06 16:45 [Qemu-devel] [PATCH] block: Fix copy-on-read crash with partial final cluster Kevin Wolf
@ 2018-07-06 21:20 ` Eric Blake
2018-07-09 9:32 ` Kevin Wolf
0 siblings, 1 reply; 4+ messages in thread
From: Eric Blake @ 2018-07-06 21:20 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: qemu-devel
On 07/06/2018 11:45 AM, Kevin Wolf wrote:
> If the virtual disk size isn't aligned to full clusters,
> bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full
> cluster completed, which will let it run into an assertion failure:
>
> qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.
>
> Check for EOF, assert that we read at least as much as the read request
> originally wanted to have (which is true at EOF because otherwise
> bdrv_check_byte_request() would already have returned an error) and
> return success early even though we couldn't copy the full cluster.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/io.c | 6 ++++++
> tests/qemu-iotests/197 | 9 +++++++++
> tests/qemu-iotests/197.out | 8 ++++++++
> 3 files changed, 23 insertions(+)
>
> diff --git a/block/io.c b/block/io.c
> index 038449f81f..4c0831149c 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1200,6 +1200,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
> pnum = MIN(cluster_bytes, max_transfer);
> }
>
> + /* Stop at EOF if the image ends in the middle of the cluster */
> + if (ret == 0 && pnum == 0) {
Is it any smarter to check for 'ret & BDRV_BLOCK_EOF' instead of pnum == 0?
But as far as I can tell, right now those two conditions are synonymous.
>
> +echo
> +echo '=== Partial final cluster ==='
> +echo
> +
> +_make_test_img 1024
> +$QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
Here, $TEST_IMG has no backing file, and does not have the final cluster
allocated; all we have to do is properly read all zeroes. Is it worth
also explicitly testing reading of allocated data under COR, and/or the
case of one image with another backing image (with same or differing
partial cluster sizes), where COR actually has to write the partial
cluster? However, the logic in the code appears to cover all of those,
whether or not the testsuite does as well.
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Fix copy-on-read crash with partial final cluster
2018-07-06 21:20 ` Eric Blake
@ 2018-07-09 9:32 ` Kevin Wolf
2018-07-20 20:19 ` Eric Blake
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2018-07-09 9:32 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-block, qemu-devel
Am 06.07.2018 um 23:20 hat Eric Blake geschrieben:
> On 07/06/2018 11:45 AM, Kevin Wolf wrote:
> > If the virtual disk size isn't aligned to full clusters,
> > bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full
> > cluster completed, which will let it run into an assertion failure:
> >
> > qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.
> >
> > Check for EOF, assert that we read at least as much as the read request
> > originally wanted to have (which is true at EOF because otherwise
> > bdrv_check_byte_request() would already have returned an error) and
> > return success early even though we couldn't copy the full cluster.
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> > block/io.c | 6 ++++++
> > tests/qemu-iotests/197 | 9 +++++++++
> > tests/qemu-iotests/197.out | 8 ++++++++
> > 3 files changed, 23 insertions(+)
> >
> > diff --git a/block/io.c b/block/io.c
> > index 038449f81f..4c0831149c 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -1200,6 +1200,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
> > pnum = MIN(cluster_bytes, max_transfer);
> > }
> > + /* Stop at EOF if the image ends in the middle of the cluster */
> > + if (ret == 0 && pnum == 0) {
>
> Is it any smarter to check for 'ret & BDRV_BLOCK_EOF' instead of pnum == 0?
>
> But as far as I can tell, right now those two conditions are synonymous.
You removed the context from your quote, which contains the assertion
that the commit message mentions:
assert(skip_bytes < pnum);
So I think it makes more sense to check pnum, because it's the short
pnum and not the BDRV_BLOCK_EOF flag that would make the following code
fail.
> > +echo
> > +echo '=== Partial final cluster ==='
> > +echo
> > +
> > +_make_test_img 1024
> > +$QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
>
> Here, $TEST_IMG has no backing file, and does not have the final
> cluster allocated; all we have to do is properly read all zeroes.
And write them back, we test that it's allocated afterwards.
> Is it worth also explicitly testing reading of allocated data under
> COR
I could add a second read of the same area, which would not only test
reading of allocated data, but also test whether the allocating COR
wrote the correct data (all zeros).
Do you think that would be a worthwhile addition?
> and/or the case of one image with another backing image (with
> same or differing partial cluster sizes), where COR actually has to
> write the partial cluster? However, the logic in the code appears to
> cover all of those, whether or not the testsuite does as well.
Having a backing file is a different code path for non-zero data, but I
think we already test normal write/write_zeroes extensively. For zero
data, it doesn't make a difference whether it comes from the backing
file or from not having a backing file (as far as the COR logic is
concerned, anyway).
I didn't want to use a backing file because the test is 'generic', but
it already uses qcow2 in other cases, so if we really think this is
important to have, I guess I can have another case with qcow2 over
$IMGFMT and non-zero data.
Kevin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Fix copy-on-read crash with partial final cluster
2018-07-09 9:32 ` Kevin Wolf
@ 2018-07-20 20:19 ` Eric Blake
0 siblings, 0 replies; 4+ messages in thread
From: Eric Blake @ 2018-07-20 20:19 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, qemu-devel
On 07/09/2018 04:32 AM, Kevin Wolf wrote:
> Am 06.07.2018 um 23:20 hat Eric Blake geschrieben:
>> On 07/06/2018 11:45 AM, Kevin Wolf wrote:
>>> If the virtual disk size isn't aligned to full clusters,
>>> bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full
>>> cluster completed, which will let it run into an assertion failure:
>>>
>>> qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.
>>>
>>> Check for EOF, assert that we read at least as much as the read request
>>> originally wanted to have (which is true at EOF because otherwise
>>> bdrv_check_byte_request() would already have returned an error) and
>>> return success early even though we couldn't copy the full cluster.
>>>
>>> +echo
>>> +echo '=== Partial final cluster ==='
>>> +echo
>>> +
>>> +_make_test_img 1024
>>> +$QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
>>
>> Here, $TEST_IMG has no backing file, and does not have the final
>> cluster allocated; all we have to do is properly read all zeroes.
>
> And write them back, we test that it's allocated afterwards.
>
>> Is it worth also explicitly testing reading of allocated data under
>> COR
>
> I could add a second read of the same area, which would not only test
> reading of allocated data, but also test whether the allocating COR
> wrote the correct data (all zeros).
>
> Do you think that would be a worthwhile addition?
It can't hurt (we've already proven that our iotests coverage is
incomplete, and anything we can do to enhance it improves chances of it
catching unintentional regressions).
>
>> and/or the case of one image with another backing image (with
>> same or differing partial cluster sizes), where COR actually has to
>> write the partial cluster? However, the logic in the code appears to
>> cover all of those, whether or not the testsuite does as well.
>
> Having a backing file is a different code path for non-zero data, but I
> think we already test normal write/write_zeroes extensively. For zero
> data, it doesn't make a difference whether it comes from the backing
> file or from not having a backing file (as far as the COR logic is
> concerned, anyway).
>
> I didn't want to use a backing file because the test is 'generic', but
> it already uses qcow2 in other cases, so if we really think this is
> important to have, I guess I can have another case with qcow2 over
> $IMGFMT and non-zero data.
At this point, I'm not volunteering to write such test enhancements, and
I don't think it's a show-stopper if 3.0 uses the test as currently in git.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-07-20 20:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-06 16:45 [Qemu-devel] [PATCH] block: Fix copy-on-read crash with partial final cluster Kevin Wolf
2018-07-06 21:20 ` Eric Blake
2018-07-09 9:32 ` Kevin Wolf
2018-07-20 20:19 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).