* [PULL 1/5] qemu-img: add support for rate limit in qemu-img commit
2020-10-27 15:15 [PULL 0/5] Block layer patches Kevin Wolf
@ 2020-10-27 15:15 ` Kevin Wolf
2020-10-27 15:15 ` [PULL 2/5] qemu-img: add support for rate limit in qemu-img convert Kevin Wolf
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2020-10-27 15:15 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, peter.maydell, qemu-devel
From: Zhengui <lizhengui@huawei.com>
add support for rate limit in qemu-img commit.
Signed-off-by: Zhengui <lizhengui@huawei.com>
Message-Id: <1603205264-17424-2-git-send-email-lizhengui@huawei.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
docs/tools/qemu-img.rst | 4 +++-
qemu-img.c | 11 +++++++++--
qemu-img-cmds.hx | 4 ++--
3 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index c35bd64822..bcb11b0899 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -349,7 +349,7 @@ Command description:
state after (the attempt at) repairing it. That is, a successful ``-r all``
will yield the exit code 0, independently of the image state before.
-.. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-d] [-p] FILENAME
+.. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
Commit the changes recorded in *FILENAME* in its base image or backing file.
If the backing file is smaller than the snapshot, then the backing file will be
@@ -371,6 +371,8 @@ Command description:
garbage data when read. For this reason, ``-b`` implies ``-d`` (so that
the top image stays valid).
+ The rate limit for the commit process is specified by ``-r``.
+
.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] FILENAME1 FILENAME2
Check if two images have the same content. You can compare images with
diff --git a/qemu-img.c b/qemu-img.c
index 2103507936..3023abea8b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -980,6 +980,7 @@ static int img_commit(int argc, char **argv)
CommonBlockJobCBInfo cbi;
bool image_opts = false;
AioContext *aio_context;
+ int64_t rate_limit = 0;
fmt = NULL;
cache = BDRV_DEFAULT_CACHE;
@@ -991,7 +992,7 @@ static int img_commit(int argc, char **argv)
{"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
{0, 0, 0, 0}
};
- c = getopt_long(argc, argv, ":f:ht:b:dpq",
+ c = getopt_long(argc, argv, ":f:ht:b:dpqr:",
long_options, NULL);
if (c == -1) {
break;
@@ -1026,6 +1027,12 @@ static int img_commit(int argc, char **argv)
case 'q':
quiet = true;
break;
+ case 'r':
+ rate_limit = cvtnum("rate limit", optarg);
+ if (rate_limit < 0) {
+ return 1;
+ }
+ break;
case OPTION_OBJECT: {
QemuOpts *opts;
opts = qemu_opts_parse_noisily(&qemu_object_opts,
@@ -1099,7 +1106,7 @@ static int img_commit(int argc, char **argv)
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
- commit_active_start("commit", bs, base_bs, JOB_DEFAULT, 0,
+ commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
&cbi, false, &local_err);
aio_context_release(aio_context);
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index cab8234235..965c1e3e59 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -34,9 +34,9 @@ SRST
ERST
DEF("commit", img_commit,
- "commit [--object objectdef] [--image-opts] [-q] [-f fmt] [-t cache] [-b base] [-d] [-p] filename")
+ "commit [--object objectdef] [--image-opts] [-q] [-f fmt] [-t cache] [-b base] [-r rate_limit] [-d] [-p] filename")
SRST
-.. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-d] [-p] FILENAME
+.. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
ERST
DEF("compare", img_compare,
--
2.28.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PULL 2/5] qemu-img: add support for rate limit in qemu-img convert
2020-10-27 15:15 [PULL 0/5] Block layer patches Kevin Wolf
2020-10-27 15:15 ` [PULL 1/5] qemu-img: add support for rate limit in qemu-img commit Kevin Wolf
@ 2020-10-27 15:15 ` Kevin Wolf
2020-10-27 15:15 ` [PULL 3/5] qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status() Kevin Wolf
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2020-10-27 15:15 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, peter.maydell, qemu-devel
From: Zhengui <lizhengui@huawei.com>
add support for rate limit in qemu-img convert.
Signed-off-by: Zhengui <lizhengui@huawei.com>
Message-Id: <1603205264-17424-3-git-send-email-lizhengui@huawei.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
docs/tools/qemu-img.rst | 6 +++++-
qemu-img.c | 27 ++++++++++++++++++++++++++-
qemu-img-cmds.hx | 4 ++--
3 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index bcb11b0899..b615aa8419 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -188,6 +188,10 @@ Parameters to convert subcommand:
allocated target image depending on the host support for getting allocation
information.
+.. option:: -r
+
+ Rate limit for the convert process
+
.. option:: --salvage
Try to ignore I/O errors when reading. Unless in quiet mode (``-q``), errors
@@ -410,7 +414,7 @@ Command description:
4
Error on reading data
-.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
+.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
Convert the disk image *FILENAME* or a snapshot *SNAPSHOT_PARAM*
to disk image *OUTPUT_FILENAME* using format *OUTPUT_FMT*. It can
diff --git a/qemu-img.c b/qemu-img.c
index 3023abea8b..a968c74cba 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -50,6 +50,8 @@
#include "block/qapi.h"
#include "crypto/init.h"
#include "trace/control.h"
+#include "qemu/throttle.h"
+#include "block/throttle-groups.h"
#define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \
"\n" QEMU_COPYRIGHT "\n"
@@ -1669,6 +1671,7 @@ enum ImgConvertBlockStatus {
};
#define MAX_COROUTINES 16
+#define CONVERT_THROTTLE_GROUP "img_convert"
typedef struct ImgConvertState {
BlockBackend **src;
@@ -2184,6 +2187,17 @@ static int convert_copy_bitmaps(BlockDriverState *src, BlockDriverState *dst)
#define MAX_BUF_SECTORS 32768
+static void set_rate_limit(BlockBackend *blk, int64_t rate_limit)
+{
+ ThrottleConfig cfg;
+
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = rate_limit;
+
+ blk_io_limits_enable(blk, CONVERT_THROTTLE_GROUP);
+ blk_set_io_limits(blk, &cfg);
+}
+
static int img_convert(int argc, char **argv)
{
int c, bs_i, flags, src_flags = 0;
@@ -2204,6 +2218,7 @@ static int img_convert(int argc, char **argv)
bool force_share = false;
bool explict_min_sparse = false;
bool bitmaps = false;
+ int64_t rate_limit = 0;
ImgConvertState s = (ImgConvertState) {
/* Need at least 4k of zeros for sparse detection */
@@ -2226,7 +2241,7 @@ static int img_convert(int argc, char **argv)
{"bitmaps", no_argument, 0, OPTION_BITMAPS},
{0, 0, 0, 0}
};
- c = getopt_long(argc, argv, ":hf:O:B:Cco:l:S:pt:T:qnm:WU",
+ c = getopt_long(argc, argv, ":hf:O:B:Cco:l:S:pt:T:qnm:WUr:",
long_options, NULL);
if (c == -1) {
break;
@@ -2323,6 +2338,12 @@ static int img_convert(int argc, char **argv)
case 'U':
force_share = true;
break;
+ case 'r':
+ rate_limit = cvtnum("rate limit", optarg);
+ if (rate_limit < 0) {
+ goto fail_getopt;
+ }
+ break;
case OPTION_OBJECT: {
QemuOpts *object_opts;
object_opts = qemu_opts_parse_noisily(&qemu_object_opts,
@@ -2712,6 +2733,10 @@ static int img_convert(int argc, char **argv)
s.cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE;
}
+ if (rate_limit) {
+ set_rate_limit(s.target, rate_limit);
+ }
+
ret = convert_do_copy(&s);
/* Now copy the bitmaps */
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 965c1e3e59..b3620f29e5 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -46,9 +46,9 @@ SRST
ERST
DEF("convert", img_convert,
- "convert [--object objectdef] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] [--salvage] filename [filename2 [...]] output_filename")
+ "convert [--object objectdef] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] [-l snapshot_param] [-S sparse_size] [-r rate_limit] [-m num_coroutines] [-W] [--salvage] filename [filename2 [...]] output_filename")
SRST
-.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-m NUM_COROUTINES] [-W] [--salvage] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
+.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m NUM_COROUTINES] [-W] [--salvage] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
ERST
DEF("create", img_create,
--
2.28.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PULL 3/5] qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status()
2020-10-27 15:15 [PULL 0/5] Block layer patches Kevin Wolf
2020-10-27 15:15 ` [PULL 1/5] qemu-img: add support for rate limit in qemu-img commit Kevin Wolf
2020-10-27 15:15 ` [PULL 2/5] qemu-img: add support for rate limit in qemu-img convert Kevin Wolf
@ 2020-10-27 15:15 ` Kevin Wolf
2020-10-27 15:15 ` [PULL 4/5] qcow2: Skip copy-on-write when allocating a zero cluster Kevin Wolf
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2020-10-27 15:15 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, peter.maydell, qemu-devel
From: Alberto Garcia <berto@igalia.com>
If a BlockDriverState supports backing files but has none then any
unallocated area reads back as zeroes.
bdrv_co_block_status() is only reporting this is if want_zero is true,
but this is an inexpensive test and there is no reason not to do it in
all cases.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/io.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/io.c b/block/io.c
index 02528b3823..6fe1b275b6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2282,17 +2282,17 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
ret |= BDRV_BLOCK_ALLOCATED;
- } else if (want_zero && bs->drv->supports_backing) {
+ } else if (bs->drv->supports_backing) {
BlockDriverState *cow_bs = bdrv_cow_bs(bs);
- if (cow_bs) {
+ if (!cow_bs) {
+ ret |= BDRV_BLOCK_ZERO;
+ } else if (want_zero) {
int64_t size2 = bdrv_getlength(cow_bs);
if (size2 >= 0 && offset >= size2) {
ret |= BDRV_BLOCK_ZERO;
}
- } else {
- ret |= BDRV_BLOCK_ZERO;
}
}
--
2.28.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PULL 4/5] qcow2: Skip copy-on-write when allocating a zero cluster
2020-10-27 15:15 [PULL 0/5] Block layer patches Kevin Wolf
` (2 preceding siblings ...)
2020-10-27 15:15 ` [PULL 3/5] qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status() Kevin Wolf
@ 2020-10-27 15:15 ` Kevin Wolf
2020-10-27 15:15 ` [PULL 5/5] block: End quiescent sections when a BDS is deleted Kevin Wolf
2020-10-30 15:49 ` [PULL 0/5] Block layer patches Peter Maydell
5 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2020-10-27 15:15 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, peter.maydell, qemu-devel
From: Alberto Garcia <berto@igalia.com>
Since commit c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a when a write
request results in a new allocation QEMU first tries to see if the
rest of the cluster outside the written area contains only zeroes.
In that case, instead of doing a normal copy-on-write operation and
writing explicit zero buffers to disk, the code zeroes the whole
cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.
This improves performance very significantly but it only happens when
we are writing to an area that was completely unallocated before. Zero
clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
are therefore slower to allocate.
This happens because the code uses bdrv_is_allocated_above() rather
bdrv_block_status_above(). The former is not as accurate for this
purpose but it is faster. However in the case of qcow2 the underlying
call does already report zero clusters just fine so there is no reason
why we cannot use that information.
After testing 4KB writes on an image that only contains zero clusters
this patch results in almost five times more IOPS.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
include/block/block.h | 2 ++
block/io.c | 27 +++++++++++++++++++++++++++
block/qcow2.c | 35 +++++++++++++++++++----------------
3 files changed, 48 insertions(+), 16 deletions(-)
diff --git a/include/block/block.h b/include/block/block.h
index d16c401cb4..c9d7c58765 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -508,6 +508,8 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
bool include_base, int64_t offset, int64_t bytes,
int64_t *pnum);
+int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
+ int64_t bytes);
bool bdrv_is_read_only(BlockDriverState *bs);
int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
diff --git a/block/io.c b/block/io.c
index 6fe1b275b6..c33cecd58d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2447,6 +2447,33 @@ int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
offset, bytes, pnum, map, file);
}
+/*
+ * Check @bs (and its backing chain) to see if the range defined
+ * by @offset and @bytes is known to read as zeroes.
+ * Return 1 if that is the case, 0 otherwise and -errno on error.
+ * This test is meant to be fast rather than accurate so returning 0
+ * does not guarantee non-zero data.
+ */
+int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
+ int64_t bytes)
+{
+ int ret;
+ int64_t pnum = bytes;
+
+ if (!bytes) {
+ return 1;
+ }
+
+ ret = bdrv_common_block_status_above(bs, NULL, false, false, offset,
+ bytes, &pnum, NULL, NULL);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ return (pnum == bytes) && (ret & BDRV_BLOCK_ZERO);
+}
+
int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
int64_t bytes, int64_t *pnum)
{
diff --git a/block/qcow2.c b/block/qcow2.c
index b6cb4db8bb..4274806a2a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2387,26 +2387,26 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
return false;
}
-static bool is_unallocated(BlockDriverState *bs, int64_t offset, int64_t bytes)
-{
- int64_t nr;
- return !bytes ||
- (!bdrv_is_allocated_above(bs, NULL, false, offset, bytes, &nr) &&
- nr == bytes);
-}
-
-static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+/*
+ * Return 1 if the COW regions read as zeroes, 0 if not, < 0 on error.
+ * Note that returning 0 does not guarantee non-zero data.
+ */
+static int is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
{
/*
* This check is designed for optimization shortcut so it must be
* efficient.
- * Instead of is_zero(), use is_unallocated() as it is faster (but not
- * as accurate and can result in false negatives).
+ * Instead of is_zero(), use bdrv_co_is_zero_fast() as it is
+ * faster (but not as accurate and can result in false negatives).
*/
- return is_unallocated(bs, m->offset + m->cow_start.offset,
- m->cow_start.nb_bytes) &&
- is_unallocated(bs, m->offset + m->cow_end.offset,
- m->cow_end.nb_bytes);
+ int ret = bdrv_co_is_zero_fast(bs, m->offset + m->cow_start.offset,
+ m->cow_start.nb_bytes);
+ if (ret <= 0) {
+ return ret;
+ }
+
+ return bdrv_co_is_zero_fast(bs, m->offset + m->cow_end.offset,
+ m->cow_end.nb_bytes);
}
static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
@@ -2432,7 +2432,10 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
continue;
}
- if (!is_zero_cow(bs, m)) {
+ ret = is_zero_cow(bs, m);
+ if (ret < 0) {
+ return ret;
+ } else if (ret == 0) {
continue;
}
--
2.28.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PULL 5/5] block: End quiescent sections when a BDS is deleted
2020-10-27 15:15 [PULL 0/5] Block layer patches Kevin Wolf
` (3 preceding siblings ...)
2020-10-27 15:15 ` [PULL 4/5] qcow2: Skip copy-on-write when allocating a zero cluster Kevin Wolf
@ 2020-10-27 15:15 ` Kevin Wolf
2020-10-30 15:49 ` [PULL 0/5] Block layer patches Peter Maydell
5 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2020-10-27 15:15 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, peter.maydell, qemu-devel
From: Greg Kurz <groug@kaod.org>
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.
This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.
It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.
Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.
BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
include/block/block.h | 6 ++++++
block.c | 9 +++++++++
block/io.c | 13 +++++++++++++
tests/test-bdrv-drain.c | 1 +
4 files changed, 29 insertions(+)
diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..4bfe3b546b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -781,6 +781,12 @@ void bdrv_drained_end(BlockDriverState *bs);
*/
void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
+/**
+ * End all quiescent sections started by bdrv_drain_all_begin(). This is
+ * only needed when deleting a BDS before bdrv_drain_all_end() is called.
+ */
+void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
+
/**
* End a quiescent section started by bdrv_subtree_drained_begin().
*/
diff --git a/block.c b/block.c
index 430edf79bb..ee5b28a979 100644
--- a/block.c
+++ b/block.c
@@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
}
QLIST_INIT(&bs->aio_notifiers);
bdrv_drained_end(bs);
+
+ /*
+ * If we're still inside some bdrv_drain_all_begin()/end() sections, end
+ * them now since this BDS won't exist anymore when bdrv_drain_all_end()
+ * gets called.
+ */
+ if (bs->quiesce_counter) {
+ bdrv_drain_all_end_quiesce(bs);
+ }
}
void bdrv_close_all(void)
diff --git a/block/io.c b/block/io.c
index c33cecd58d..9918f2499c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
}
}
+void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
+{
+ int drained_end_counter = 0;
+
+ g_assert(bs->quiesce_counter > 0);
+ g_assert(!bs->refcnt);
+
+ while (bs->quiesce_counter) {
+ bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
+ }
+ BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
+}
+
void bdrv_drain_all_end(void)
{
BlockDriverState *bs = NULL;
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1595bbc92e..8a29e33e00 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -594,6 +594,7 @@ static void test_graph_change_drain_all(void)
g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
g_assert_cmpint(b_s->drain_count, ==, 0);
+ g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
bdrv_unref(bs_b);
blk_unref(blk_b);
--
2.28.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PULL 0/5] Block layer patches
2020-10-27 15:15 [PULL 0/5] Block layer patches Kevin Wolf
` (4 preceding siblings ...)
2020-10-27 15:15 ` [PULL 5/5] block: End quiescent sections when a BDS is deleted Kevin Wolf
@ 2020-10-30 15:49 ` Peter Maydell
5 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2020-10-30 15:49 UTC (permalink / raw)
To: Kevin Wolf; +Cc: QEMU Developers, Qemu-block
On Tue, 27 Oct 2020 at 15:15, Kevin Wolf <kwolf@redhat.com> wrote:
>
> The following changes since commit d55450df995d6223486db11c66491cbf6c131523:
>
> Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20201026a' into staging (2020-10-27 10:25:42 +0000)
>
> are available in the Git repository at:
>
> git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 1a6d3bd229d429879a85a9105fb84cae049d083c:
>
> block: End quiescent sections when a BDS is deleted (2020-10-27 15:26:20 +0100)
>
> ----------------------------------------------------------------
> Block layer patches:
>
> - qcow2: Skip copy-on-write when allocating a zero cluster
> - qemu-img: add support for rate limit in qemu-img convert/commit
> - Fix deadlock when deleting a block node during drain_all
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread