From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Lieven Subject: qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Date: Wed, 12 May 2010 16:01:20 +0200 Message-ID: <4BEAB4B0.70803@dlh.net> References: <4BDF3F94.1080608@dlh.net> <4BDFDC44.9030808@redhat.com> <4BE00750.6040804@dlh.net> <4BE01120.30608@redhat.com> <4BE02440.6010802@dlh.net> <4BE028BF.1000603@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Christoph Hellwig To: Kevin Wolf Return-path: Received: from zion.dlh.net ([91.198.192.1]:35559 "EHLO mail.dlh.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751113Ab0ELOBe (ORCPT ); Wed, 12 May 2010 10:01:34 -0400 In-Reply-To: <4BE028BF.1000603@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Hi Kevin, here we go. I created a blocking multipath device (interrupted all=20 paths). qemu-kvm hangs with 100% cpu. also monitor is not responding. If I restore at least one path, the vm is continueing. BR, Peter ^C Program received signal SIGINT, Interrupt. 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 (gdb) bt #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so= =2E0 #3 0x000000000042e739 in kvm_mutex_lock () at=20 /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 #4 0x000000000042e76e in qemu_mutex_lock_iothread () at=20 /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 #5 0x000000000040c262 in main_loop_wait (timeout=3D1000) at=20 /usr/src/qemu-kvm-0.12.4/vl.c:3995 #6 0x000000000042dcf1 in kvm_main_loop () at=20 /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c= :4212 #8 0x000000000041054b in main (argc=3D30, argv=3D0x7fff266a77e8,=20 envp=3D0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 (gdb) bt full #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 No symbol table info available. #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so= =2E0 No symbol table info available. #3 0x000000000042e739 in kvm_mutex_lock () at=20 /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 No locals. #4 0x000000000042e76e in qemu_mutex_lock_iothread () at=20 /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 No locals. #5 0x000000000040c262 in main_loop_wait (timeout=3D1000) at=20 /usr/src/qemu-kvm-0.12.4/vl.c:3995 ioh =3D (IOHandlerRecord *) 0x0 rfds =3D {fds_bits =3D {1048576, 0 }} wfds =3D {fds_bits =3D {0 }} xfds =3D {fds_bits =3D {0 }} ret =3D 1 nfds =3D 21 tv =3D {tv_sec =3D 0, tv_usec =3D 999761} #6 0x000000000042dcf1 in kvm_main_loop () at=20 /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 fds =3D {18, 19} mask =3D {__val =3D {268443712, 0 }} sigfd =3D 20 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c= :4212 r =3D 0 #8 0x000000000041054b in main (argc=3D30, argv=3D0x7fff266a77e8,=20 envp=3D0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 gdbstub_dev =3D 0x0 boot_devices_bitmap =3D 12 i =3D 0 snapshot =3D 0 linux_boot =3D 0 initrd_filename =3D 0x0 kernel_filename =3D 0x0 kernel_cmdline =3D 0x588fac "" boot_devices =3D "dc", '\0' ds =3D (DisplayState *) 0x198bf00 dcl =3D (DisplayChangeListener *) 0x0 cyls =3D 0 heads =3D 0 secs =3D 0 translation =3D 0 hda_opts =3D (QemuOpts *) 0x0 opts =3D (QemuOpts *) 0x1957390 optind =3D 30 ---Type to continue, or q to quit--- r =3D 0x7fff266a8a23 "-usbdevice" optarg =3D 0x7fff266a8a2e "tablet" loadvm =3D 0x0 machine =3D (QEMUMachine *) 0x861720 cpu_model =3D 0x7fff266a8917 "qemu64,model_id=3DIntel(R) Xeon(R) CP= U", '=20 ' , "E5520 @ 2.27GHz" fds =3D {644511720, 32767} tb_size =3D 0 pid_file =3D 0x7fff266a89bb "/var/run/qemu/vm-150.pid" incoming =3D 0x0 fd =3D 0 pwd =3D (struct passwd *) 0x0 chroot_dir =3D 0x0 run_as =3D 0x0 env =3D (struct CPUX86State *) 0x0 show_vnc_port =3D 0 params =3D {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0= } Kevin Wolf wrote: > Am 04.05.2010 15:42, schrieb Peter Lieven: > =20 >> hi kevin, >> >> you did it *g* >> >> looks promising. applied this patched and was not able to reproduce = yet :-) >> >> secure way to reproduce was to shut down all multipath paths, then=20 >> initiate i/o >> in the vm (e.g. start an application). of course, everything hangs a= t=20 >> this point. >> >> after reenabling one path, vm crashed. now it seems to behave correc= tly and >> just report an DMA timeout and continues normally afterwards. >> =20 > > Great, I'm going to submit it as a proper patch then. > > Christoph, by now I'm pretty sure it's right, but can you have anothe= r > look if this is correct, anyway? > > =20 >> can you imagine of any way preventing the vm to consume 100% cpu in >> that waiting state? >> my current approach is to run all vms with nice 1, which helped to k= eep the >> machine responsible if all vms (in my test case 64 on a box) have ha= nging >> i/o at the same time. >> =20 > > I don't have anything particular in mind, but you could just attach g= db > and get another backtrace while it consumes 100% CPU (you'll need to = use > "thread apply all bt" to catch everything). Then we should see where > it's hanging. > > Kevin > > > > =20 --=20 Mit freundlichen Gr=FC=DFen/Kind Regards Peter Lieven =2E....................................................................= =2E.................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 mailto:pl@kamp.de | http://www.kamp.de Gesch=E4ftsf=FChrer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 =2E....................................................................= =2E...................................=20