From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39788) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZhIMp-0000z9-RU for qemu-devel@nongnu.org; Wed, 30 Sep 2015 10:28:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZhIMk-0006VG-Qv for qemu-devel@nongnu.org; Wed, 30 Sep 2015 10:28:23 -0400 Received: from mx2.parallels.com ([199.115.105.18]:44672) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZhIMk-0006UY-J2 for qemu-devel@nongnu.org; Wed, 30 Sep 2015 10:28:18 -0400 References: <56050489.9010306@cn.fujitsu.com> <1443172180-1005-1-git-send-email-den@openvz.org> <560517FB.9080909@cn.fujitsu.com> <1443437748.13911.2.camel@virtuozzo.com> <20150929084715.GA2684@work-vm> From: Igor Redko Message-ID: <560BF175.4030503@virtuozzo.com> Date: Wed, 30 Sep 2015 17:28:05 +0300 MIME-Version: 1.0 In-Reply-To: <20150929084715.GA2684@work-vm> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v2 0/2] migration: fix deadlock List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Juan Quintela , Anna Melekhova , qemu-devel@nongnu.org, "Denis V. Lunev" , Amit Shah , Paolo Bonzini On 29.09.2015 11:47, Dr. David Alan Gilbert wrote: > * Igor Redko (redkoi@virtuozzo.com) wrote: >> On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote: >>> On 09/25/2015 05:09 PM, Denis V. Lunev wrote: >>>> Release qemu global mutex before call synchronize_rcu(). >>>> synchronize_rcu() waiting for all readers to finish their critical >>>> sections. There is at least one critical section in which we try >>>> to get QGM (critical section is in address_space_rw() and >>>> prepare_mmio_access() is trying to aquire QGM). >>>> >>>> Both functions (migration_end() and migration_bitmap_extend()) >>>> are called from main thread which is holding QGM. >>>> >>>> Thus there is a race condition that ends up with deadlock: >>>> main thread working thread >>>> Lock QGA | >>>> | Call KVM_EXIT_IO handler >>>> | | >>>> | Open rcu reader's critical section >>>> Migration cleanup bh | >>>> | | >>>> synchronize_rcu() is | >>>> waiting for readers | >>>> | prepare_mmio_access() is waiting for QGM >>>> \ / >>>> deadlock >>>> >>>> Patches here are quick and dirty, compile-tested only to validate the >>>> architectual approach. >>>> >>>> Igor, Anna, can you pls start your tests with these patches instead of your >>>> original one. Thank you. >>> >>> Can you give me the backtrace of the working thread? >>> >>> I think it is very bad to wait some lock in rcu reader's cirtical section. >> >> #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> #1 0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 ) at ../nptl/pthread_mutex_lock.c:80 >> #2 0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 ) at util/qemu-thread-posix.c:73 >> #3 0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at /home/user/my_qemu/qemu/cpus.c:1170 >> #4 0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at /home/user/my_qemu/qemu/exec.c:2390 >> #5 0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 , addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true) >> at /home/user/my_qemu/qemu/exec.c:2425 >> #6 0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., data=0x7f1ef3f97000, direction=1, size=1, count=1) at /home/user/my_qemu/qemu/kvm-all.c:1680 >> #7 0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/kvm-all.c:1849 >> #8 0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/cpus.c:979 >> #9 0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at pthread_create.c:333 >> #10 0x00007f1ef0e6feed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > > Do you have a test to run in the guest that easily triggers this? > > Dave There are two ways to trigger this. Both of them need 2 hosts with qemu+libvirt (host0 and host1) configured for migration. First way: 0. Create VM on host0 and install centos7 1. Shutdown VM. 2. Start VM (virsh start ) and right after that start migration to host1 (smth like 'virsh migrate --live --verbose "qemu+ssh://host1/system"') 3. Stop migration after ~1 sec (after migration process have been started, but before it completed. for example when you see "Migration: [ 5 %]") deadlock: no response from VM and no response from qemu monitor (for example 'virsh qemu-monitor-command --hmp "info migrate"' will hang indefinitely) 9/10 Second way: 0. Create VM with e1000 network card on host0 and install centos7 1. Run iperf on VM (or any other load on network) 2. Start migration 3. Stop migration before it completed. For this approach e1000 network card is essential because it generates KVM_EXIT_MMIO. Igor >>> >>>> >>>> Signed-off-by: Denis V. Lunev >>>> CC: Igor Redko >>>> CC: Anna Melekhova >>>> CC: Juan Quintela >>>> CC: Amit Shah >>>> >>>> Denis V. Lunev (2): >>>> migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0 >>>> migration: fix deadlock >>>> >>>> migration/ram.c | 45 ++++++++++++++++++++++++++++----------------- >>>> 1 file changed, 28 insertions(+), 17 deletions(-) >>>> >>> >> >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >