From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O9IhW-0008Pu-BG for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:18 -0400 Received: from [140.186.70.92] (port=37856 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O9IhU-0008Ot-9c for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O9IhO-0006Sr-Eg for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49421) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O9IhO-0006Sh-4t for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:10 -0400 Message-ID: <4BE028BF.1000603@redhat.com> Date: Tue, 04 May 2010 16:01:35 +0200 From: Kevin Wolf MIME-Version: 1.0 Subject: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion References: <4BDF3F94.1080608@dlh.net> <4BDFDC44.9030808@redhat.com> <4BE00750.6040804@dlh.net> <4BE01120.30608@redhat.com> <4BE02440.6010802@dlh.net> In-Reply-To: <4BE02440.6010802@dlh.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Christoph Hellwig Am 04.05.2010 15:42, schrieb Peter Lieven: > hi kevin, > > you did it *g* > > looks promising. applied this patched and was not able to reproduce yet :-) > > secure way to reproduce was to shut down all multipath paths, then > initiate i/o > in the vm (e.g. start an application). of course, everything hangs at > this point. > > after reenabling one path, vm crashed. now it seems to behave correctly and > just report an DMA timeout and continues normally afterwards. Great, I'm going to submit it as a proper patch then. Christoph, by now I'm pretty sure it's right, but can you have another look if this is correct, anyway? > can you imagine of any way preventing the vm to consume 100% cpu in > that waiting state? > my current approach is to run all vms with nice 1, which helped to keep the > machine responsible if all vms (in my test case 64 on a box) have hanging > i/o at the same time. I don't have anything particular in mind, but you could just attach gdb and get another backtrace while it consumes 100% CPU (you'll need to use "thread apply all bt" to catch everything). Then we should see where it's hanging. Kevin