From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kevin Wolf Subject: Re: qemu-kvm hangs if multipath device is queing Date: Tue, 18 May 2010 15:22:36 +0200 Message-ID: <4BF2949C.8010108@redhat.com> References: <4BDF3F94.1080608@dlh.net> <4BDFDC44.9030808@redhat.com> <4BE00750.6040804@dlh.net> <4BE01120.30608@redhat.com> <4BE02440.6010802@dlh.net> <4BE028BF.1000603@redhat.com> <4BEAB4B0.70803@dlh.net> <4BED1740.1080604@redhat.com> <4BF275B1.8030106@dlh.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Christoph Hellwig To: Peter Lieven Return-path: Received: from mx1.redhat.com ([209.132.183.28]:19306 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757269Ab0ERNXI (ORCPT ); Tue, 18 May 2010 09:23:08 -0400 In-Reply-To: <4BF275B1.8030106@dlh.net> Sender: kvm-owner@vger.kernel.org List-ID: Am 18.05.2010 13:10, schrieb Peter Lieven: > hi kevin, > > here is the backtrace of (hopefully) all threads: > > ^C > Program received signal SIGINT, Interrupt. > [Switching to Thread 0x7f39b72656f0 (LWP 10695)] > 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 > > (gdb) thread apply all bt > > Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)): > #0 0x00007f39b6c3eedb in read () from /lib/libpthread.so.0 > #1 0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at > linux-aio.c:125 > #2 0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at > linux-aio.c:184 I think it's stuck here in an endless loop: while (laiocb->ret == -EINPROGRESS) qemu_laio_completion_cb(laiocb->ctx); Can you verify this by single-stepping one or two loop iterations? ret and errno after the read call could be interesting, too. We'll be stuck in an endless loop if the request doesn't complete, which might well happen in your scenario. Not sure what the right thing to do is. We probably need to fail the bdrv_aio_cancel to avoid blocking the whole program, but I have no idea what device emulations should do on that condition. As long as we can't handle that condition correctly, leaving the hang in place is probably the best option. Maybe add some sleep to avoid 100% CPU consumption. Kevin