From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=57156 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OCCVF-00012P-97 for qemu-devel@nongnu.org; Wed, 12 May 2010 10:01:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OCCVB-0003pG-L6 for qemu-devel@nongnu.org; Wed, 12 May 2010 10:01:37 -0400 Received: from zion.dlh.net ([91.198.192.1]:32960 helo=mail.dlh.net) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OCCVB-0003oy-AG for qemu-devel@nongnu.org; Wed, 12 May 2010 10:01:33 -0400 Message-ID: <4BEAB4B0.70803@dlh.net> Date: Wed, 12 May 2010 16:01:20 +0200 From: Peter Lieven MIME-Version: 1.0 Subject: qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) References: <4BDF3F94.1080608@dlh.net> <4BDFDC44.9030808@redhat.com> <4BE00750.6040804@dlh.net> <4BE01120.30608@redhat.com> <4BE02440.6010802@dlh.net> <4BE028BF.1000603@redhat.com> In-Reply-To: <4BE028BF.1000603@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Christoph Hellwig Hi Kevin, here we go. I created a blocking multipath device (interrupted all paths). qemu-kvm hangs with 100% cpu. also monitor is not responding. If I restore at least one path, the vm is continueing. BR, Peter ^C Program received signal SIGINT, Interrupt. 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 (gdb) bt #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 (gdb) bt full #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 No symbol table info available. #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 No symbol table info available. #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 No locals. #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 No locals. #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 ioh = (IOHandlerRecord *) 0x0 rfds = {fds_bits = {1048576, 0 }} wfds = {fds_bits = {0 }} xfds = {fds_bits = {0 }} ret = 1 nfds = 21 tv = {tv_sec = 0, tv_usec = 999761} #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 fds = {18, 19} mask = {__val = {268443712, 0 }} sigfd = 20 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 r = 0 #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 gdbstub_dev = 0x0 boot_devices_bitmap = 12 i = 0 snapshot = 0 linux_boot = 0 initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = 0x588fac "" boot_devices = "dc", '\0' ds = (DisplayState *) 0x198bf00 dcl = (DisplayChangeListener *) 0x0 cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = (QemuOpts *) 0x0 opts = (QemuOpts *) 0x1957390 optind = 30 ---Type to continue, or q to quit--- r = 0x7fff266a8a23 "-usbdevice" optarg = 0x7fff266a8a2e "tablet" loadvm = 0x0 machine = (QEMUMachine *) 0x861720 cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' ' , "E5520 @ 2.27GHz" fds = {644511720, 32767} tb_size = 0 pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid" incoming = 0x0 fd = 0 pwd = (struct passwd *) 0x0 chroot_dir = 0x0 run_as = 0x0 env = (struct CPUX86State *) 0x0 show_vnc_port = 0 params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0} Kevin Wolf wrote: > Am 04.05.2010 15:42, schrieb Peter Lieven: > >> hi kevin, >> >> you did it *g* >> >> looks promising. applied this patched and was not able to reproduce yet :-) >> >> secure way to reproduce was to shut down all multipath paths, then >> initiate i/o >> in the vm (e.g. start an application). of course, everything hangs at >> this point. >> >> after reenabling one path, vm crashed. now it seems to behave correctly and >> just report an DMA timeout and continues normally afterwards. >> > > Great, I'm going to submit it as a proper patch then. > > Christoph, by now I'm pretty sure it's right, but can you have another > look if this is correct, anyway? > > >> can you imagine of any way preventing the vm to consume 100% cpu in >> that waiting state? >> my current approach is to run all vms with nice 1, which helped to keep the >> machine responsible if all vms (in my test case 64 on a box) have hanging >> i/o at the same time. >> > > I don't have anything particular in mind, but you could just attach gdb > and get another backtrace while it consumes 100% CPU (you'll need to use > "thread apply all bt" to catch everything). Then we should see where > it's hanging. > > Kevin > > > > -- Mit freundlichen Grüßen/Kind Regards Peter Lieven .......................................................................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 mailto:pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 .........................................................................................................