[Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code
@ 2016-08-17 18:06 Denis V. Lunev
  2016-08-17 18:06 ` [Qemu-devel] [PATCH 1/2] block: fix deadlock in bdrv_co_flush Denis V. Lunev
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Denis V. Lunev @ 2016-08-17 18:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Stefan Hajnoczi, Fam Zheng, Kevin Wolf,
	Max Reitz

We have suffered from the following deadlock

Thread 2 (Thread 0x7f1b7edf9700 (LWP 240293)):
#0  0x00007f1bd1f0675f in ppoll () from /lib64/libc.so.6
#1  0x00007f1bd8c1d78b in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at qemu-timer.c:310
#3  0x00007f1bd8c1e8bf in aio_poll (ctx=0x7f1bda091780, blocking=blocking@entry=true) at aio-posix.c:451
#4  0x00007f1bd8c119cf in bdrv_drain_one (bs=bs@entry=0x7f1bda0f2000) at block.c:2055
#5  0x00007f1bd8c13244 in bdrv_drain_all () at block.c:2115
#6  0x00007f1bd8a2c5e3 in vm_stop (state=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:685
#7  0x00007f1bd8a2c636 in vm_stop_force_state (state=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:1383
#8  0x00007f1bd8bc798f in migration_completion (start_time=<synthetic pointer>, old_vm_running=<synthetic pointer>, current_active_state=<optimized out>, s=0x7f1bd90e3c20 <current_migration.37255>)
        at migration/migration.c:1213
#9  migration_thread (opaque=0x7f1bd90e3c20 <current_migration.37255>) at migration/migration.c:1314
#10 0x00007f1bd21e3dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f1bd1f10ced in clone () from /lib64/libc.so.6

The problem was narrowed down to the commit
    commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c
    Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
    Date:   Mon Jul 18 22:39:52 2016 +0300
    block: ignore flush requests when storage is clean

This patches contains fixes for the situation. The probability of the
problem is not that big. Our regression testing faces it ~1 time
a week or less.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH 1/2] block: fix deadlock in bdrv_co_flush
  2016-08-17 18:06 [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Denis V. Lunev
@ 2016-08-17 18:06 ` Denis V. Lunev
  2016-08-17 18:06 ` [Qemu-devel] [PATCH 2/2] block: fix possible reorder of flush operations Denis V. Lunev
  2016-08-18 12:40 ` [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Denis V. Lunev @ 2016-08-17 18:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Stefan Hajnoczi, Fam Zheng, Kevin Wolf,
	Max Reitz

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

The following commit
    commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c
    Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
    Date:   Mon Jul 18 22:39:52 2016 +0300
    block: ignore flush requests when storage is clean
has introduced a regression.

There is a problem that it is still possible for 2 requests to execute
in non sequential fashion and sometimes this results in a deadlock
when bdrv_drain_one/all are called for BDS with such stalled requests.

1. Current flushed_gen and flush_started_gen is 1.
2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
   as flushed_gen). It gets past flushed_gen != flush_started_gen and
   sets flush_started_gen to 1 (again, the same it was before).
3. Request 1 yields somewhere before exiting bdrv_co_flush
4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
   flushed_gen != flush_started_gen and sets flush_started_gen to 2.
5. Request 2 runs to completion and sets flushed_gen to 2
6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
   However flush_started_gen is now 2.

>From here on out flushed_gen is always != to flush_started_gen and all
further requests will wait on flush_queue. This change replaces
flush_started_gen with an explicitly tracked active flush request.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
---
 block/io.c                | 5 +++--
 include/block/block_int.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index d5493ba..9c04086 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2283,11 +2283,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
     int current_gen = bs->write_gen;
 
     /* Wait until any previous flushes are completed */
-    while (bs->flush_started_gen != bs->flushed_gen) {
+    while (bs->active_flush_req != NULL) {
         qemu_co_queue_wait(&bs->flush_queue);
     }
 
-    bs->flush_started_gen = current_gen;
+    bs->active_flush_req = &req;
 
     /* Write back all layers by calling one driver function */
     if (bs->drv->bdrv_co_flush) {
@@ -2357,6 +2357,7 @@ flush_parent:
 out:
     /* Notify any pending flushes that we have completed */
     bs->flushed_gen = current_gen;
+    bs->active_flush_req = NULL;
     qemu_co_queue_restart_all(&bs->flush_queue);
 
     tracked_request_end(&req);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 47665be..1e939de 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -443,8 +443,8 @@ struct BlockDriverState {
                          note this is a reference count */
 
     CoQueue flush_queue;            /* Serializing flush queue */
+    BdrvTrackedRequest *active_flush_req; /* Flush request in flight */
     unsigned int write_gen;         /* Current data generation */
-    unsigned int flush_started_gen; /* Generation for which flush has started */
     unsigned int flushed_gen;       /* Flushed write generation */
 
     BlockDriver *drv; /* NULL means no media */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH 2/2] block: fix possible reorder of flush operations
  2016-08-17 18:06 [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Denis V. Lunev
  2016-08-17 18:06 ` [Qemu-devel] [PATCH 1/2] block: fix deadlock in bdrv_co_flush Denis V. Lunev
@ 2016-08-17 18:06 ` Denis V. Lunev
  2016-08-18 12:40 ` [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Denis V. Lunev @ 2016-08-17 18:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Stefan Hajnoczi, Fam Zheng, Kevin Wolf, Max Reitz

This patch reduce CPU usage of flush operations a bit. When we have one
flush completed we should kick only next operation. We should not start
all pending operations in the hope that they will go back to wait on
wait_queue.

Also there is a technical possibility that requests will get reordered
with the previous approach. After wakeup all requests are removed from
the wait queue. They become active and they are processed one-by-one
adding to the wait queue in the same order. Though new flush can arrive
while all requests are not put into the queue.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 9c04086..420944d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2358,7 +2358,8 @@ out:
     /* Notify any pending flushes that we have completed */
     bs->flushed_gen = current_gen;
     bs->active_flush_req = NULL;
-    qemu_co_queue_restart_all(&bs->flush_queue);
+    /* Return value is ignored - it's ok if wait queue is empty */
+    qemu_co_queue_next(&bs->flush_queue);
 
     tracked_request_end(&req);
     return ret;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code
  2016-08-17 18:06 [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Denis V. Lunev
  2016-08-17 18:06 ` [Qemu-devel] [PATCH 1/2] block: fix deadlock in bdrv_co_flush Denis V. Lunev
  2016-08-17 18:06 ` [Qemu-devel] [PATCH 2/2] block: fix possible reorder of flush operations Denis V. Lunev
@ 2016-08-18 12:40 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2016-08-18 12:40 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-block, qemu-devel, Kevin Wolf, Fam Zheng, Evgeny Yakovlev,
	Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

On Wed, Aug 17, 2016 at 09:06:52PM +0300, Denis V. Lunev wrote:
> We have suffered from the following deadlock
> 
> Thread 2 (Thread 0x7f1b7edf9700 (LWP 240293)):
> #0  0x00007f1bd1f0675f in ppoll () from /lib64/libc.so.6
> #1  0x00007f1bd8c1d78b in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at qemu-timer.c:310
> #3  0x00007f1bd8c1e8bf in aio_poll (ctx=0x7f1bda091780, blocking=blocking@entry=true) at aio-posix.c:451
> #4  0x00007f1bd8c119cf in bdrv_drain_one (bs=bs@entry=0x7f1bda0f2000) at block.c:2055
> #5  0x00007f1bd8c13244 in bdrv_drain_all () at block.c:2115
> #6  0x00007f1bd8a2c5e3 in vm_stop (state=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:685
> #7  0x00007f1bd8a2c636 in vm_stop_force_state (state=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:1383
> #8  0x00007f1bd8bc798f in migration_completion (start_time=<synthetic pointer>, old_vm_running=<synthetic pointer>, current_active_state=<optimized out>, s=0x7f1bd90e3c20 <current_migration.37255>)
>         at migration/migration.c:1213
> #9  migration_thread (opaque=0x7f1bd90e3c20 <current_migration.37255>) at migration/migration.c:1314
> #10 0x00007f1bd21e3dc5 in start_thread () from /lib64/libpthread.so.0
> #11 0x00007f1bd1f10ced in clone () from /lib64/libc.so.6
> 
> The problem was narrowed down to the commit
>     commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c
>     Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>     Date:   Mon Jul 18 22:39:52 2016 +0300
>     block: ignore flush requests when storage is clean
> 
> This patches contains fixes for the situation. The probability of the
> problem is not that big. Our regression testing faces it ~1 time
> a week or less.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> 
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-18 12:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-17 18:06 [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Denis V. Lunev
2016-08-17 18:06 ` [Qemu-devel] [PATCH 1/2] block: fix deadlock in bdrv_co_flush Denis V. Lunev
2016-08-17 18:06 ` [Qemu-devel] [PATCH 2/2] block: fix possible reorder of flush operations Denis V. Lunev
2016-08-18 12:40 ` [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).