From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58934) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgqZI-0008DA-J8 for qemu-devel@nongnu.org; Tue, 29 Sep 2015 04:47:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZgqZF-0006ty-CN for qemu-devel@nongnu.org; Tue, 29 Sep 2015 04:47:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59021) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgqZF-0006tp-53 for qemu-devel@nongnu.org; Tue, 29 Sep 2015 04:47:21 -0400 Date: Tue, 29 Sep 2015 09:47:16 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150929084715.GA2684@work-vm> References: <56050489.9010306@cn.fujitsu.com> <1443172180-1005-1-git-send-email-den@openvz.org> <560517FB.9080909@cn.fujitsu.com> <1443437748.13911.2.camel@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1443437748.13911.2.camel@virtuozzo.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 0/2] migration: fix deadlock List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Redko Cc: Juan Quintela , Anna Melekhova , qemu-devel@nongnu.org, "Denis V. Lunev" , Amit Shah , Paolo Bonzini * Igor Redko (redkoi@virtuozzo.com) wrote: > On =D0=9F=D1=82., 2015-09-25 at 17:46 +0800, Wen Congyang wrote: > > On 09/25/2015 05:09 PM, Denis V. Lunev wrote: > > > Release qemu global mutex before call synchronize_rcu(). > > > synchronize_rcu() waiting for all readers to finish their critical > > > sections. There is at least one critical section in which we try > > > to get QGM (critical section is in address_space_rw() and > > > prepare_mmio_access() is trying to aquire QGM). > > >=20 > > > Both functions (migration_end() and migration_bitmap_extend()) > > > are called from main thread which is holding QGM. > > >=20 > > > Thus there is a race condition that ends up with deadlock: > > > main thread working thread > > > Lock QGA | > > > | Call KVM_EXIT_IO handler > > > | | > > > | Open rcu reader's critical section > > > Migration cleanup bh | > > > | | > > > synchronize_rcu() is | > > > waiting for readers | > > > | prepare_mmio_access() is waiting for QGM > > > \ / > > > deadlock > > >=20 > > > Patches here are quick and dirty, compile-tested only to validate t= he > > > architectual approach. > > >=20 > > > Igor, Anna, can you pls start your tests with these patches instead= of your > > > original one. Thank you. > >=20 > > Can you give me the backtrace of the working thread? > >=20 > > I think it is very bad to wait some lock in rcu reader's cirtical sec= tion. >=20 > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevelloc= k.S:135 > #1 0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=3D0x7f1ef414= 5ce0 ) at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007f1ef3c36546 in qemu_mutex_lock (mutex=3D0x7f1ef4145ce0 ) at util/qemu-thread-posix.c:73 > #3 0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at /home/user/my_= qemu/qemu/cpus.c:1170 > #4 0x00007f1ef38514a2 in prepare_mmio_access (mr=3D0x7f1ef612f200) at = /home/user/my_qemu/qemu/exec.c:2390 > #5 0x00007f1ef385157e in address_space_rw (as=3D0x7f1ef40ec940 , addr=3D49402, attrs=3D..., buf=3D0x7f1ef3f97000 "\001", len=3D= 1, is_write=3Dtrue) > at /home/user/my_qemu/qemu/exec.c:2425 > #6 0x00007f1ef3897c53 in kvm_handle_io (port=3D49402, attrs=3D..., dat= a=3D0x7f1ef3f97000, direction=3D1, size=3D1, count=3D1) at /home/user/my_= qemu/qemu/kvm-all.c:1680 > #7 0x00007f1ef3898144 in kvm_cpu_exec (cpu=3D0x7f1ef5010fc0) at /home/= user/my_qemu/qemu/kvm-all.c:1849 > #8 0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=3D0x7f1ef5010fc0)= at /home/user/my_qemu/qemu/cpus.c:979 > #9 0x00007f1ef113a6aa in start_thread (arg=3D0x7f1eef0b9700) at pthrea= d_create.c:333 > #10 0x00007f1ef0e6feed in clone () at ../sysdeps/unix/sysv/linux/x86_64= /clone.S:109 Do you have a test to run in the guest that easily triggers this? Dave > >=20 > > >=20 > > > Signed-off-by: Denis V. Lunev > > > CC: Igor Redko > > > CC: Anna Melekhova > > > CC: Juan Quintela > > > CC: Amit Shah > > >=20 > > > Denis V. Lunev (2): > > > migration: bitmap_set is unnecessary as bitmap_new uses g_try_mal= loc0 > > > migration: fix deadlock > > >=20 > > > migration/ram.c | 45 ++++++++++++++++++++++++++++----------------- > > > 1 file changed, 28 insertions(+), 17 deletions(-) > > >=20 > >=20 >=20 >=20 >=20 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK