From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34176) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba5FG-0002DU-Q7 for qemu-devel@nongnu.org; Wed, 17 Aug 2016 14:07:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba5FC-0004c2-I3 for qemu-devel@nongnu.org; Wed, 17 Aug 2016 14:07:17 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:9377 helo=relay.sw.ru) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba5FC-0004Yh-5Y for qemu-devel@nongnu.org; Wed, 17 Aug 2016 14:07:14 -0400 From: "Denis V. Lunev" Date: Wed, 17 Aug 2016 21:06:52 +0300 Message-Id: <1471457214-3994-1-git-send-email-den@openvz.org> Subject: [Qemu-devel] [PATCH for 2.7 0/2] block: fixes for deadlock in flush code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-block@nongnu.org, qemu-devel@nongnu.org Cc: den@openvz.org, Evgeny Yakovlev , Stefan Hajnoczi , Fam Zheng , Kevin Wolf , Max Reitz We have suffered from the following deadlock Thread 2 (Thread 0x7f1b7edf9700 (LWP 240293)): #0 0x00007f1bd1f0675f in ppoll () from /lib64/libc.so.6 #1 0x00007f1bd8c1d78b in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at qemu-timer.c:310 #3 0x00007f1bd8c1e8bf in aio_poll (ctx=0x7f1bda091780, blocking=blocking@entry=true) at aio-posix.c:451 #4 0x00007f1bd8c119cf in bdrv_drain_one (bs=bs@entry=0x7f1bda0f2000) at block.c:2055 #5 0x00007f1bd8c13244 in bdrv_drain_all () at block.c:2115 #6 0x00007f1bd8a2c5e3 in vm_stop (state=) at /usr/src/debug/qemu-2.3.0/cpus.c:685 #7 0x00007f1bd8a2c636 in vm_stop_force_state (state=) at /usr/src/debug/qemu-2.3.0/cpus.c:1383 #8 0x00007f1bd8bc798f in migration_completion (start_time=, old_vm_running=, current_active_state=, s=0x7f1bd90e3c20 ) at migration/migration.c:1213 #9 migration_thread (opaque=0x7f1bd90e3c20 ) at migration/migration.c:1314 #10 0x00007f1bd21e3dc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f1bd1f10ced in clone () from /lib64/libc.so.6 The problem was narrowed down to the commit commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c Author: Evgeny Yakovlev Date: Mon Jul 18 22:39:52 2016 +0300 block: ignore flush requests when storage is clean This patches contains fixes for the situation. The probability of the problem is not that big. Our regression testing faces it ~1 time a week or less. Signed-off-by: Evgeny Yakovlev Signed-off-by: Denis V. Lunev CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz