From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1O9IhW-0008Pu-BG
	for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:18 -0400
Received: from [140.186.70.92] (port=37856 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1O9IhU-0008Ot-9c
	for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:18 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <kwolf@redhat.com>) id 1O9IhO-0006Sr-Eg
	for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:15 -0400
Received: from mx1.redhat.com ([209.132.183.28]:49421)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <kwolf@redhat.com>) id 1O9IhO-0006Sh-4t
	for qemu-devel@nongnu.org; Tue, 04 May 2010 10:02:10 -0400
Message-ID: <4BE028BF.1000603@redhat.com>
Date: Tue, 04 May 2010 16:01:35 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion
References: <4BDF3F94.1080608@dlh.net> <4BDFDC44.9030808@redhat.com>
	<4BE00750.6040804@dlh.net> <4BE01120.30608@redhat.com>
	<4BE02440.6010802@dlh.net>
In-Reply-To: <4BE02440.6010802@dlh.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Lieven <pl@dlh.net>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Christoph Hellwig <hch@lst.de>

Am 04.05.2010 15:42, schrieb Peter Lieven:
> hi kevin,
> 
> you did it *g*
> 
> looks promising. applied this patched and was not able to reproduce yet :-)
> 
> secure way to reproduce was to shut down all multipath paths, then 
> initiate i/o
> in the vm (e.g. start an application). of course, everything hangs at 
> this point.
> 
> after reenabling one path, vm crashed. now it seems to behave correctly and
> just report an DMA timeout and continues normally afterwards.

Great, I'm going to submit it as a proper patch then.

Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?

> can you imagine of any way preventing the vm to consume 100% cpu in
> that waiting state?
> my current approach is to run all vms with nice 1, which helped to keep the
> machine responsible if all vms (in my test case 64 on a box) have hanging
> i/o at the same time.

I don't have anything particular in mind, but you could just attach gdb
and get another backtrace while it consumes 100% CPU (you'll need to use
"thread apply all bt" to catch everything). Then we should see where
it's hanging.

Kevin