* [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions
@ 2017-10-05 19:02 Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 1/6] qemu-io: Add -C for opening with copy-on-read Eric Blake
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, jsnow, stefanha, qemu-block
During my quest to switch block status to be byte-based, John
forced me to evaluate whether we have a situation during
copy-on-read where we could exceed BDRV_REQUEST_MAX_BYTES [1].
Sure enough, we have a number of pre-existing bugs in the
copy-on-read code. Fix those, along with adding a test.
Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-copy-on-read-v3
Since v2 (available at [2]):
- add a new patch to fix an iotests wart
- tweak patch 5 (now 6) to skip rather than fail on limited memory [patchew]
- tweak patch 2 condition for legibility [Stefan]
- add R-b
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07286.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg00524.html
001/6:[----] [--] 'qemu-io: Add -C for opening with copy-on-read'
002/6:[0008] [FC] 'block: Uniform handling of 0-length bdrv_get_block_status()'
003/6:[down] 'iotests: Restore stty settings on completion'
004/6:[----] [--] 'block: Add blkdebug hook for copy-on-read'
005/6:[----] [--] 'block: Perform copy-on-read in loop'
006/6:[0013] [FC] 'iotests: Add test 197 for covering copy-on-read'
Eric Blake (6):
qemu-io: Add -C for opening with copy-on-read
block: Uniform handling of 0-length bdrv_get_block_status()
iotests: Restore stty settings on completion
block: Add blkdebug hook for copy-on-read
block: Perform copy-on-read in loop
iotests: Add test 197 for covering copy-on-read
qapi/block-core.json | 5 +-
block/io.c | 123 +++++++++++++++++++++++++++------------
qemu-io.c | 15 ++++-
tests/qemu-iotests/common.filter | 1 +
tests/qemu-iotests/197 | 109 ++++++++++++++++++++++++++++++++++
tests/qemu-iotests/197.out | 26 +++++++++
tests/qemu-iotests/check | 10 ++++
tests/qemu-iotests/group | 1 +
8 files changed, 249 insertions(+), 41 deletions(-)
create mode 100755 tests/qemu-iotests/197
create mode 100644 tests/qemu-iotests/197.out
--
2.13.6
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 1/6] qemu-io: Add -C for opening with copy-on-read
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
@ 2017-10-05 19:02 ` Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 2/6] block: Uniform handling of 0-length bdrv_get_block_status() Eric Blake
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, jsnow, stefanha, qemu-block, Max Reitz
Make it easier to enable copy-on-read during iotests, by
exposing a new bool option to main and open.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
qemu-io.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/qemu-io.c b/qemu-io.c
index 265445ad89..c70bde3eb1 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -102,6 +102,7 @@ static void open_help(void)
" Opens a file for subsequent use by all of the other qemu-io commands.\n"
" -r, -- open file read-only\n"
" -s, -- use snapshot file\n"
+" -C, -- use copy-on-read\n"
" -n, -- disable host cache, short for -t none\n"
" -U, -- force shared permissions\n"
" -k, -- use kernel AIO implementation (on Linux only)\n"
@@ -120,7 +121,7 @@ static const cmdinfo_t open_cmd = {
.argmin = 1,
.argmax = -1,
.flags = CMD_NOFILE_OK,
- .args = "[-rsnkU] [-t cache] [-d discard] [-o options] [path]",
+ .args = "[-rsCnkU] [-t cache] [-d discard] [-o options] [path]",
.oneline = "open the file specified by path",
.help = open_help,
};
@@ -145,7 +146,7 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
QDict *opts;
bool force_share = false;
- while ((c = getopt(argc, argv, "snro:kt:d:U")) != -1) {
+ while ((c = getopt(argc, argv, "snCro:kt:d:U")) != -1) {
switch (c) {
case 's':
flags |= BDRV_O_SNAPSHOT;
@@ -154,6 +155,9 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
flags |= BDRV_O_NOCACHE;
writethrough = false;
break;
+ case 'C':
+ flags |= BDRV_O_COPY_ON_READ;
+ break;
case 'r':
readonly = 1;
break;
@@ -251,6 +255,7 @@ static void usage(const char *name)
" -r, --read-only export read-only\n"
" -s, --snapshot use snapshot file\n"
" -n, --nocache disable host cache, short for -t none\n"
+" -C, --copy-on-read enable copy-on-read\n"
" -m, --misalign misalign allocations for O_DIRECT\n"
" -k, --native-aio use kernel AIO implementation (on Linux only)\n"
" -t, --cache=MODE use the given cache mode for the image\n"
@@ -439,7 +444,7 @@ static QemuOptsList file_opts = {
int main(int argc, char **argv)
{
int readonly = 0;
- const char *sopt = "hVc:d:f:rsnmkt:T:U";
+ const char *sopt = "hVc:d:f:rsnCmkt:T:U";
const struct option lopt[] = {
{ "help", no_argument, NULL, 'h' },
{ "version", no_argument, NULL, 'V' },
@@ -448,6 +453,7 @@ int main(int argc, char **argv)
{ "read-only", no_argument, NULL, 'r' },
{ "snapshot", no_argument, NULL, 's' },
{ "nocache", no_argument, NULL, 'n' },
+ { "copy-on-read", no_argument, NULL, 'C' },
{ "misalign", no_argument, NULL, 'm' },
{ "native-aio", no_argument, NULL, 'k' },
{ "discard", required_argument, NULL, 'd' },
@@ -492,6 +498,9 @@ int main(int argc, char **argv)
flags |= BDRV_O_NOCACHE;
writethrough = false;
break;
+ case 'C':
+ flags |= BDRV_O_COPY_ON_READ;
+ break;
case 'd':
if (bdrv_parse_discard_flags(optarg, &flags) < 0) {
error_report("Invalid discard option: %s", optarg);
--
2.13.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 2/6] block: Uniform handling of 0-length bdrv_get_block_status()
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 1/6] qemu-io: Add -C for opening with copy-on-read Eric Blake
@ 2017-10-05 19:02 ` Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 3/6] iotests: Restore stty settings on completion Eric Blake
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, jcody, jsnow, stefanha, qemu-block, Fam Zheng, Max Reitz
Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.
Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled. While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop). However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v3: split into two conditionals [Stefan], simple enough to keep R-b
v2: new patch
---
block/io.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/io.c b/block/io.c
index e0f904583f..94f74703b7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1777,6 +1777,10 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
*pnum = 0;
return BDRV_BLOCK_EOF;
}
+ if (!nb_sectors) {
+ *pnum = 0;
+ return 0;
+ }
n = total_sectors - sector_num;
if (n < nb_sectors) {
--
2.13.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 3/6] iotests: Restore stty settings on completion
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 1/6] qemu-io: Add -C for opening with copy-on-read Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 2/6] block: Uniform handling of 0-length bdrv_get_block_status() Eric Blake
@ 2017-10-05 19:02 ` Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 4/6] block: Add blkdebug hook for copy-on-read Eric Blake
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, jsnow, stefanha, qemu-block, Max Reitz
Executing qemu with a terminal as stdin will temporarily alter stty
settings on that terminal (for example, disabling echo), because of
how we run both the monitor and any multiplexing with guest input.
Normally, qemu restores the original settings on exit; but if an
iotest triggers qemu to abort in the middle, we can be left with
the altered terminal setup. This can make life very annoying when
debugging an iotest failure (not everyone remembers the trick of
blind-typing 'stty sane' without echo, and some people prefer
terminal settings that are slightly different than the defaults
picked by 'stty sane').
It is possible to avoid qemu corrupting the terminal by not passing
a terminal to qemu's stdin in the first place (as in, use
'./check ... </dev/null'), but that's extra typing to have to
remember. But running 'exec </dev/null' in the harness seems like
it might be too heavy of a hammer. So I instead went the the
solution of saving and restoring the stty settings, only when the
harness detects that it is run interactively.
I tested this patch by forcing an allocation failure (I can't
guarantee that this particular limit will work on all setups, but
it shows the idea):
$ (ulimit -S -v 500000; ./check -qcow2 1)
Signed-off-by: Eric Blake <eblake@redhat.com>
---
v3: new patch
---
tests/qemu-iotests/check | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 176cb8e937..e6b6ff7a04 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -134,6 +134,13 @@ export VALGRIND_QEMU=
export IMGKEYSECRET=
export IMGOPTSSYNTAX=false
+# Save current tty settings, since an aborting qemu call may leave things
+# screwed up
+STTY_RESTORE=
+if test -t 0; then
+ STTY_RESTORE=$(stty -g)
+fi
+
for r
do
@@ -664,6 +671,9 @@ END { if (NR > 0) {
needwrap=false
fi
+ if test -n "$STTY_RESTORE"; then
+ stty $STTY_RESTORE
+ fi
rm -f "${TEST_DIR}"/*.out "${TEST_DIR}"/*.err "${TEST_DIR}"/*.time
rm -f "${TEST_DIR}"/check.pid "${TEST_DIR}"/check.sts
rm -f $tmp.*
--
2.13.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 4/6] block: Add blkdebug hook for copy-on-read
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
` (2 preceding siblings ...)
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 3/6] iotests: Restore stty settings on completion Eric Blake
@ 2017-10-05 19:02 ` Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 5/6] block: Perform copy-on-read in loop Eric Blake
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, jcody, jsnow, stefanha, qemu-block, Fam Zheng, Max Reitz,
Markus Armbruster
Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
qapi/block-core.json | 5 ++++-
block/io.c | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 750bb0c77c..ab96e348e6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2538,6 +2538,8 @@
#
# @l1_shrink_free_l2_clusters: discard the l2 tables. (since 2.11)
#
+# @cor_write: a write due to copy-on-read (since 2.11)
+#
# Since: 2.9
##
{ 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
@@ -2555,7 +2557,8 @@
'flush_to_disk', 'pwritev_rmw_head', 'pwritev_rmw_after_head',
'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
- 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters' ] }
+ 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
+ 'cor_write'] }
##
# @BlkdebugInjectErrorOptions:
diff --git a/block/io.c b/block/io.c
index 94f74703b7..a5598ed869 100644
--- a/block/io.c
+++ b/block/io.c
@@ -983,6 +983,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
goto err;
}
+ bdrv_debug_event(bs, BLKDBG_COR_WRITE);
if (drv->bdrv_co_pwrite_zeroes &&
buffer_is_zero(bounce_buffer, iov.iov_len)) {
/* FIXME: Should we (perhaps conditionally) be setting
--
2.13.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 5/6] block: Perform copy-on-read in loop
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
` (3 preceding siblings ...)
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 4/6] block: Add blkdebug hook for copy-on-read Eric Blake
@ 2017-10-05 19:02 ` Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 6/6] iotests: Add test 197 for covering copy-on-read Eric Blake
2017-10-06 12:17 ` [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Kevin Wolf
6 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, jcody, jsnow, stefanha, qemu-block, qemu-stable, Fam Zheng,
Max Reitz
Improve our braindead copy-on-read implementation. Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G. While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read
Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits. Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.
Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2: avoid uninit ret on 0-length op [patchew, Kevin]
---
block/io.c | 120 +++++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 82 insertions(+), 38 deletions(-)
diff --git a/block/io.c b/block/io.c
index a5598ed869..8e419070b5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -34,6 +34,9 @@
#define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
+/* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
+#define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
+
static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int bytes, BdrvRequestFlags flags);
@@ -945,11 +948,14 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
BlockDriver *drv = bs->drv;
struct iovec iov;
- QEMUIOVector bounce_qiov;
+ QEMUIOVector local_qiov;
int64_t cluster_offset;
unsigned int cluster_bytes;
size_t skip_bytes;
int ret;
+ int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+ BDRV_REQUEST_MAX_BYTES);
+ unsigned int progress = 0;
/* FIXME We cannot require callers to have write permissions when all they
* are doing is a read request. If we did things right, write permissions
@@ -961,53 +967,95 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
// assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));
/* Cover entire cluster so no additional backing file I/O is required when
- * allocating cluster in the image file.
+ * allocating cluster in the image file. Note that this value may exceed
+ * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which
+ * is one reason we loop rather than doing it all at once.
*/
bdrv_round_to_clusters(bs, offset, bytes, &cluster_offset, &cluster_bytes);
+ skip_bytes = offset - cluster_offset;
trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
cluster_offset, cluster_bytes);
- iov.iov_len = cluster_bytes;
- iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
+ bounce_buffer = qemu_try_blockalign(bs,
+ MIN(MIN(max_transfer, cluster_bytes),
+ MAX_BOUNCE_BUFFER));
if (bounce_buffer == NULL) {
ret = -ENOMEM;
goto err;
}
- qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+ while (cluster_bytes) {
+ int64_t pnum;
- ret = bdrv_driver_preadv(bs, cluster_offset, cluster_bytes,
- &bounce_qiov, 0);
- if (ret < 0) {
- goto err;
- }
+ ret = bdrv_is_allocated(bs, cluster_offset,
+ MIN(cluster_bytes, max_transfer), &pnum);
+ if (ret < 0) {
+ /* Safe to treat errors in querying allocation as if
+ * unallocated; we'll probably fail again soon on the
+ * read, but at least that will set a decent errno.
+ */
+ pnum = MIN(cluster_bytes, max_transfer);
+ }
- bdrv_debug_event(bs, BLKDBG_COR_WRITE);
- if (drv->bdrv_co_pwrite_zeroes &&
- buffer_is_zero(bounce_buffer, iov.iov_len)) {
- /* FIXME: Should we (perhaps conditionally) be setting
- * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
- * that still correctly reads as zero? */
- ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
- } else {
- /* This does not change the data on the disk, it is not necessary
- * to flush even in cache=writethrough mode.
- */
- ret = bdrv_driver_pwritev(bs, cluster_offset, cluster_bytes,
- &bounce_qiov, 0);
- }
+ assert(skip_bytes < pnum);
- if (ret < 0) {
- /* It might be okay to ignore write errors for guest requests. If this
- * is a deliberate copy-on-read then we don't want to ignore the error.
- * Simply report it in all cases.
- */
- goto err;
- }
+ if (ret <= 0) {
+ /* Must copy-on-read; use the bounce buffer */
+ iov.iov_base = bounce_buffer;
+ iov.iov_len = pnum = MIN(pnum, MAX_BOUNCE_BUFFER);
+ qemu_iovec_init_external(&local_qiov, &iov, 1);
+
+ ret = bdrv_driver_preadv(bs, cluster_offset, pnum,
+ &local_qiov, 0);
+ if (ret < 0) {
+ goto err;
+ }
- skip_bytes = offset - cluster_offset;
- qemu_iovec_from_buf(qiov, 0, bounce_buffer + skip_bytes, bytes);
+ bdrv_debug_event(bs, BLKDBG_COR_WRITE);
+ if (drv->bdrv_co_pwrite_zeroes &&
+ buffer_is_zero(bounce_buffer, pnum)) {
+ /* FIXME: Should we (perhaps conditionally) be setting
+ * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
+ * that still correctly reads as zero? */
+ ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, pnum, 0);
+ } else {
+ /* This does not change the data on the disk, it is not
+ * necessary to flush even in cache=writethrough mode.
+ */
+ ret = bdrv_driver_pwritev(bs, cluster_offset, pnum,
+ &local_qiov, 0);
+ }
+
+ if (ret < 0) {
+ /* It might be okay to ignore write errors for guest
+ * requests. If this is a deliberate copy-on-read
+ * then we don't want to ignore the error. Simply
+ * report it in all cases.
+ */
+ goto err;
+ }
+
+ qemu_iovec_from_buf(qiov, progress, bounce_buffer + skip_bytes,
+ pnum - skip_bytes);
+ } else {
+ /* Read directly into the destination */
+ qemu_iovec_init(&local_qiov, qiov->niov);
+ qemu_iovec_concat(&local_qiov, qiov, progress, pnum - skip_bytes);
+ ret = bdrv_driver_preadv(bs, offset + progress, local_qiov.size,
+ &local_qiov, 0);
+ qemu_iovec_destroy(&local_qiov);
+ if (ret < 0) {
+ goto err;
+ }
+ }
+
+ cluster_offset += pnum;
+ cluster_bytes -= pnum;
+ progress += pnum - skip_bytes;
+ skip_bytes = 0;
+ }
+ ret = 0;
err:
qemu_vfree(bounce_buffer);
@@ -1213,9 +1261,6 @@ int coroutine_fn bdrv_co_readv(BdrvChild *child, int64_t sector_num,
return bdrv_co_do_readv(child, sector_num, nb_sectors, qiov, 0);
}
-/* Maximum buffer for write zeroes fallback, in bytes */
-#define MAX_WRITE_ZEROES_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
-
static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int bytes, BdrvRequestFlags flags)
{
@@ -1230,8 +1275,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
bs->bl.request_alignment);
- int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
- MAX_WRITE_ZEROES_BOUNCE_BUFFER);
+ int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
assert(alignment % bs->bl.request_alignment == 0);
head = offset % alignment;
--
2.13.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 6/6] iotests: Add test 197 for covering copy-on-read
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
` (4 preceding siblings ...)
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 5/6] block: Perform copy-on-read in loop Eric Blake
@ 2017-10-05 19:02 ` Eric Blake
2017-10-06 12:17 ` [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Kevin Wolf
6 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2017-10-05 19:02 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, jsnow, stefanha, qemu-block, Max Reitz
Add a test for qcow2 copy-on-read behavior, including exposure
for the just-fixed bugs.
The copy-on-read behavior is always to a qcow2 image, but the
test is careful to allow running with most image protocol/format
combos as the backing file being copied from (luks being the
exception, as it is harder to pass the right secret to all the
right places). In fact, for './check nbd', this appears to be
the first time we've had a qcow2 image wrapping NBD, requiring
an additional line in _filter_img_create to match the similar
line in _filter_img_info.
Invoking blkdebug to prove we don't write too much took some
effort to get working; and it requires that $TEST_WRAP (based
on $TEST_DIR) not be subject to word splitting. We may decide
later to have the entire iotests suite use relative rather than
absolute names, to avoid problems inherited by the absolute
name of $PWD or $TEST_DIR, at which point the sanity check in
this commit could be simplified.
This test requires at least 2G of consecutive memory to succeed;
as such, it is prone to spurious failures, particularly on
32-bit machines under load. This situation is detected and
triggers an early exit to skip the test, rather than a failure.
To manually provoke this setup on a beefier machine, I used:
$ (ulimit -S -v 1000000; ./check -qcow2 197)
Signed-off-by: Eric Blake <eblake@redhat.com>
---
v3: add out-of-memory detection [patchew]
v2: test 0-length query [Kevin], sanity check TEST_DIR [Jeff]
I only tested with -raw, -qcow2, -qed, and -nbd. I won't be
surprised if the test fails in some other setup...
---
tests/qemu-iotests/common.filter | 1 +
tests/qemu-iotests/197 | 109 +++++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/197.out | 26 ++++++++++
tests/qemu-iotests/group | 1 +
4 files changed, 137 insertions(+)
create mode 100755 tests/qemu-iotests/197
create mode 100644 tests/qemu-iotests/197.out
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 9d5442ecd9..227b37e941 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -111,6 +111,7 @@ _filter_img_create()
sed -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \
-e "s#$TEST_DIR#TEST_DIR#g" \
-e "s#$IMGFMT#IMGFMT#g" \
+ -e 's#nbd:127.0.0.1:10810#TEST_DIR/t.IMGFMT#g' \
-e "s# encryption=off##g" \
-e "s# cluster_size=[0-9]\\+##g" \
-e "s# table_size=[0-9]\\+##g" \
diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
new file mode 100755
index 0000000000..887eb4f496
--- /dev/null
+++ b/tests/qemu-iotests/197
@@ -0,0 +1,109 @@
+#!/bin/bash
+#
+# Test case for copy-on-read into qcow2
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1 # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+TEST_WRAP="$TEST_DIR/t.wrap.qcow2"
+BLKDBG_CONF="$TEST_DIR/blkdebug.conf"
+
+# Sanity check: our use of blkdebug fails if $TEST_DIR contains spaces
+# or other problems
+case "$TEST_DIR" in
+ *[^-_a-zA-Z0-9/]*)
+ _notrun "Suspicious TEST_DIR='$TEST_DIR', cowardly refusing to run" ;;
+esac
+
+_cleanup()
+{
+ _cleanup_test_img
+ rm -f "$BLKDBG_CONF"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# Test is supported for any backing file; but we force qcow2 for our wrapper.
+_supported_fmt generic
+_supported_proto generic
+_supported_os Linux
+# LUKS support may be possible, but it complicates things.
+_unsupported_fmt luks
+
+echo
+echo '=== Copy-on-read ==='
+echo
+
+# Prep the images
+_make_test_img 4G
+$QEMU_IO -c "write -P 55 3G 1k" "$TEST_IMG" | _filter_qemu_io
+IMGPROTO=file IMGFMT=qcow2 IMGOPTS= TEST_IMG_FILE="$TEST_WRAP" \
+ _make_test_img -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create
+$QEMU_IO -f qcow2 -c "write -z -u 1M 64k" "$TEST_WRAP" | _filter_qemu_io
+
+# Ensure that a read of two clusters, but where one is already allocated,
+# does not re-write the allocated cluster
+cat > "$BLKDBG_CONF" <<EOF
+[inject-error]
+event = "cor_write"
+sector = "2048"
+EOF
+$QEMU_IO -c "open -C \
+ -o driver=blkdebug,config=$BLKDBG_CONF,image.driver=qcow2 $TEST_WRAP" \
+ -c "read -P 0 1M 128k" | _filter_qemu_io
+
+# Read the areas we want copied. A zero-length read should still be a
+# no-op. The next read is under 2G, but aligned so that rounding to
+# clusters copies more than 2G of zeroes. The final read will pick up
+# the non-zero data in the same cluster. Since a 2G read may exhaust
+# memory on some machines (particularly 32-bit), we skip the test if
+# that fails due to memory pressure.
+$QEMU_IO -f qcow2 -C -c "read 0 0" "$TEST_WRAP" | _filter_qemu_io
+output=$($QEMU_IO -f qcow2 -C -c "read -P 0 1k $((2*1024*1024*1024 - 512))" \
+ "$TEST_WRAP" 2>&1 | _filter_qemu_io)
+case $output in
+ *allocate*)
+ _notrun "Insufficent memory to run test" ;;
+ *) printf '%s\n' "$output" ;;
+esac
+$QEMU_IO -f qcow2 -C -c "read -P 0 $((3*1024*1024*1024 + 1024)) 1k" \
+ "$TEST_WRAP" | _filter_qemu_io
+
+# Copy-on-read is incompatible with read-only
+$QEMU_IO -f qcow2 -C -r "$TEST_WRAP" 2>&1 | _filter_testdir
+
+# Break the backing chain, and show that images are identical, and that
+# we properly copied over explicit zeros.
+$QEMU_IMG rebase -u -b "" -f qcow2 "$TEST_WRAP"
+$QEMU_IO -f qcow2 -c map "$TEST_WRAP"
+_check_test_img
+$QEMU_IMG compare -f $IMGFMT -F qcow2 "$TEST_IMG" "$TEST_WRAP"
+
+# success, all done
+echo '*** done'
+status=0
diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out
new file mode 100644
index 0000000000..52b4137d7b
--- /dev/null
+++ b/tests/qemu-iotests/197.out
@@ -0,0 +1,26 @@
+QA output created by 197
+
+=== Copy-on-read ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296
+wrote 1024/1024 bytes at offset 3221225472
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.wrap.IMGFMT', fmt=IMGFMT size=4294967296 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+wrote 65536/65536 bytes at offset 1048576
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1048576
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 0/0 bytes at offset 0
+0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2147483136/2147483136 bytes at offset 1024
+2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 3221226496
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-only device
+2 GiB (0x80010000) bytes allocated at offset 0 bytes (0x0)
+1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
+64 KiB (0x10000) bytes allocated at offset 3 GiB (0xc0000000)
+1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
+No errors were found on the image.
+Images are identical.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 595f4fd416..83da427c0a 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -192,3 +192,4 @@
192 rw auto quick
194 rw auto migration quick
195 rw auto quick
+197 rw auto quick
--
2.13.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
` (5 preceding siblings ...)
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 6/6] iotests: Add test 197 for covering copy-on-read Eric Blake
@ 2017-10-06 12:17 ` Kevin Wolf
6 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2017-10-06 12:17 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, jcody, jsnow, stefanha, qemu-block
Am 05.10.2017 um 21:02 hat Eric Blake geschrieben:
> During my quest to switch block status to be byte-based, John
> forced me to evaluate whether we have a situation during
> copy-on-read where we could exceed BDRV_REQUEST_MAX_BYTES [1].
> Sure enough, we have a number of pre-existing bugs in the
> copy-on-read code. Fix those, along with adding a test.
>
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-copy-on-read-v3
>
> Since v2 (available at [2]):
> - add a new patch to fix an iotests wart
> - tweak patch 5 (now 6) to skip rather than fail on limited memory [patchew]
> - tweak patch 2 condition for legibility [Stefan]
> - add R-b
Thanks, applied to the block branch.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-06 12:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-05 19:02 [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 1/6] qemu-io: Add -C for opening with copy-on-read Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 2/6] block: Uniform handling of 0-length bdrv_get_block_status() Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 3/6] iotests: Restore stty settings on completion Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 4/6] block: Add blkdebug hook for copy-on-read Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 5/6] block: Perform copy-on-read in loop Eric Blake
2017-10-05 19:02 ` [Qemu-devel] [PATCH v3 6/6] iotests: Add test 197 for covering copy-on-read Eric Blake
2017-10-06 12:17 ` [Qemu-devel] [PATCH v3 0/6] block: Avoid copy-on-read assertions Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).