* [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit
@ 2016-04-22 6:35 Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 1/3] mirror: Extract mirror_replace Fam Zheng
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Fam Zheng @ 2016-04-22 6:35 UTC (permalink / raw)
To: qemu-devel
Cc: Jeff Cody, Kevin Wolf, Max Reitz, qemu-block, pbonzini, stefanha
This supersedes the "virtio: Register host notifier handler as external" patch
from yesterday.
The bug was initially reported by Matthew Schumacher as LaunchPad Bug 1570134,
and nicely bisected by Max Reitz. See patch 2 for the analysis.
We are late for 2.6 so the fix is done locally in mirror.c. In 2.7 we should
revisit this and let bdrv_drained_begin do the work.
Fam Zheng (3):
mirror: Extract mirror_replace
mirror: Skip BH for mirror_exit if in main loop
block: Update comment of bdrv_drained_begin
block/mirror.c | 43 ++++++++++++++++++++++++++++++-------------
include/block/block.h | 5 +++++
2 files changed, 35 insertions(+), 13 deletions(-)
--
2.8.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH for-2.6 1/3] mirror: Extract mirror_replace
2016-04-22 6:35 [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
@ 2016-04-22 6:35 ` Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 2/3] mirror: Skip BH for mirror_exit if in main loop Fam Zheng
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2016-04-22 6:35 UTC (permalink / raw)
To: qemu-devel
Cc: Jeff Cody, Kevin Wolf, Max Reitz, qemu-block, pbonzini, stefanha
Move the BDS replacing code to a separate function so that it can be
used in the next patch. A new field "should_replace" is added and set to
true so the function is always called for now.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
block/mirror.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index d56e30e..6c3fe43 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -441,26 +441,20 @@ static void mirror_drain(MirrorBlockJob *s)
}
typedef struct {
+ bool should_replace;
int ret;
} MirrorExitData;
-static void mirror_exit(BlockJob *job, void *opaque)
+static void mirror_replace(MirrorBlockJob *s, int *ret)
{
- MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
- MirrorExitData *data = opaque;
AioContext *replace_aio_context = NULL;
- BlockDriverState *src = s->common.bs;
-
- /* Make sure that the source BDS doesn't go away before we called
- * block_job_completed(). */
- bdrv_ref(src);
if (s->to_replace) {
replace_aio_context = bdrv_get_aio_context(s->to_replace);
aio_context_acquire(replace_aio_context);
}
- if (s->should_complete && data->ret == 0) {
+ if (s->should_complete && *ret == 0) {
BlockDriverState *to_replace = s->common.bs;
if (s->to_replace) {
to_replace = s->to_replace;
@@ -470,7 +464,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
* nodes could have been newly attached to a BlockBackend. */
if (to_replace->blk && s->target->blk) {
error_report("block job: Can't create node with two BlockBackends");
- data->ret = -EINVAL;
+ *ret = -EINVAL;
goto out;
}
@@ -481,14 +475,29 @@ static void mirror_exit(BlockJob *job, void *opaque)
}
out:
+ if (replace_aio_context) {
+ aio_context_release(replace_aio_context);
+ }
+}
+
+static void mirror_exit(BlockJob *job, void *opaque)
+{
+ MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+ MirrorExitData *data = opaque;
+ BlockDriverState *src = s->common.bs;
+
+ /* Make sure that the source BDS doesn't go away before we called
+ * block_job_completed(). */
+ bdrv_ref(src);
+
+ if (data->should_replace) {
+ mirror_replace(s, &data->ret);
+ }
if (s->to_replace) {
bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
error_free(s->replace_blocker);
bdrv_unref(s->to_replace);
}
- if (replace_aio_context) {
- aio_context_release(replace_aio_context);
- }
g_free(s->replaces);
bdrv_op_unblock_all(s->target, s->common.blocker);
bdrv_unref(s->target);
@@ -713,6 +722,7 @@ immediate_exit:
data = g_malloc(sizeof(*data));
data->ret = ret;
+ data->should_replace = true;
/* Before we switch to target in mirror_exit, make sure data doesn't
* change. */
bdrv_drained_begin(s->common.bs);
--
2.8.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH for-2.6 2/3] mirror: Skip BH for mirror_exit if in main loop
2016-04-22 6:35 [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 1/3] mirror: Extract mirror_replace Fam Zheng
@ 2016-04-22 6:35 ` Fam Zheng
2016-04-22 9:37 ` Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 3/3] block: Update comment of bdrv_drained_begin Fam Zheng
2016-04-22 6:46 ` [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
3 siblings, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2016-04-22 6:35 UTC (permalink / raw)
To: qemu-devel
Cc: Jeff Cody, Kevin Wolf, Max Reitz, qemu-block, pbonzini, stefanha
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against guest requests that could sneak in before the BH is
dispatched. For example, this could happen (assuming a code base at that
commit):
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch
aio_dispatch
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
qemu_iohandler_poll
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
main_loop_wait # 2
<snip>
aio_dispatch
aio_bh_poll
(c) mirror_exit
At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.
Commit f3926945c8 made iohandler to use aio API. The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:
- Bug case 1:
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch (qemu_aio_context)
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
aio_ctx_dispatch (iohandler_ctx)
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
main_loop_wait # 2
...
aio_dispatch
aio_bh_poll
(c) mirror_exit
- Bug case 2:
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch (qemu_aio_context)
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
main_loop_wait # 2
...
aio_ctx_dispatch (iohandler_ctx)
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
aio_dispatch
aio_bh_poll
(c) mirror_exit
In both cases, (b) breaks the invariant wanted by (a) and (c).
Unfortunately, the request loss has been silent, until 3f09bfbc7be added
an assertion at (c) to check the invariant in
bdrv_replace_in_backing_chain. Max reproduced an assertion failure after
that commit, by doing active committing while the guest is running
bonnie++.
2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.
As a bandage, this patch undoes the change of 5a7e7a0bad17 in
non-dataplane case, and calls mirror_replace directly in mirror_run.
The next step after 2.6 is to complete bdrv_drained_begin complete
(including both fixing the dispatching code in the main loop and marking
host event notifiers etc. as external) and then this special casing is
not necessary.
Launchpad Bug: 1570134
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
---
block/mirror.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index 6c3fe43..bc77773 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -726,6 +726,13 @@ immediate_exit:
/* Before we switch to target in mirror_exit, make sure data doesn't
* change. */
bdrv_drained_begin(s->common.bs);
+ if (qemu_get_aio_context() == bdrv_get_aio_context(bs)) {
+ /* FIXME: if we are in the main loop, the iohandler doesn't honor
+ * bdrv_drained_begin yet, and guest requests can sneak in before BH
+ * callback runs. Do the replace now to avoid it. */
+ mirror_replace(s, &data->ret);
+ data->should_replace = false;
+ }
block_job_defer_to_main_loop(&s->common, mirror_exit, data);
}
--
2.8.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH for-2.6 3/3] block: Update comment of bdrv_drained_begin
2016-04-22 6:35 [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 1/3] mirror: Extract mirror_replace Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 2/3] mirror: Skip BH for mirror_exit if in main loop Fam Zheng
@ 2016-04-22 6:35 ` Fam Zheng
2016-04-22 6:46 ` [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
3 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2016-04-22 6:35 UTC (permalink / raw)
To: qemu-devel
Cc: Jeff Cody, Kevin Wolf, Max Reitz, qemu-block, pbonzini, stefanha
The limitation should be documented. Removing it is a post 2.6
material.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
include/block/block.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/block/block.h b/include/block/block.h
index 3a73137..bb67652 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -530,6 +530,11 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
* this doesn't block timers or coroutines from submitting more requests, which
* means block_job_pause is still necessary.
*
+ * for now this only affacts the aio event loops (namely the data-plane
+ * iothreads and nested aio_poll() calls in block layer). Specifically, the
+ * main loop can still dispatch external events, even after bdrv_drained_begin
+ * is called.
+ *
* This function can be recursive.
*/
void bdrv_drained_begin(BlockDriverState *bs);
--
2.8.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit
2016-04-22 6:35 [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
` (2 preceding siblings ...)
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 3/3] block: Update comment of bdrv_drained_begin Fam Zheng
@ 2016-04-22 6:46 ` Fam Zheng
3 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2016-04-22 6:46 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, qemu-block, Jeff Cody, Max Reitz, stefanha, pbonzini,
qemu-stable
On Fri, 04/22 14:35, Fam Zheng wrote:
> This supersedes the "virtio: Register host notifier handler as external" patch
> from yesterday.
>
> The bug was initially reported by Matthew Schumacher as LaunchPad Bug 1570134,
> and nicely bisected by Max Reitz. See patch 2 for the analysis.
>
> We are late for 2.6 so the fix is done locally in mirror.c. In 2.7 we should
> revisit this and let bdrv_drained_begin do the work.
>
> Fam Zheng (3):
> mirror: Extract mirror_replace
> mirror: Skip BH for mirror_exit if in main loop
> block: Update comment of bdrv_drained_begin
>
> block/mirror.c | 43 ++++++++++++++++++++++++++++++-------------
> include/block/block.h | 5 +++++
> 2 files changed, 35 insertions(+), 13 deletions(-)
>
> --
> 2.8.0
>
>
Cc: qemu-stable@nongnu.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.6 2/3] mirror: Skip BH for mirror_exit if in main loop
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 2/3] mirror: Skip BH for mirror_exit if in main loop Fam Zheng
@ 2016-04-22 9:37 ` Fam Zheng
0 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2016-04-22 9:37 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, qemu-block, Jeff Cody, Max Reitz, stefanha, pbonzini
On Fri, 04/22 14:35, Fam Zheng wrote:
> As a bandage, this patch undoes the change of 5a7e7a0bad17 in
> non-dataplane case, and calls mirror_replace directly in mirror_run.
mirror_exit minus mirror_replace was kept in the BH because it's not safe to
call in coroutine. After discussing with kwolf on IRC, now it turns out that
mirror_replace isn't either, due to the bdrv_drain_all in bdrv_reopen. So,
NACK
The only feasible fix for 2.6 is with the previous direction, but we need
is_external check added to the main loop. Patches are being worked on.
Fam
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-04-22 9:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-22 6:35 [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 1/3] mirror: Extract mirror_replace Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 2/3] mirror: Skip BH for mirror_exit if in main loop Fam Zheng
2016-04-22 9:37 ` Fam Zheng
2016-04-22 6:35 ` [Qemu-devel] [PATCH for-2.6 3/3] block: Update comment of bdrv_drained_begin Fam Zheng
2016-04-22 6:46 ` [Qemu-devel] [PATCH for-2.6 0/3] block: Fix assertion failure at mirror exit Fam Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).