From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O9IP1-0006mg-ML for qemu-devel@nongnu.org; Tue, 04 May 2010 09:43:11 -0400 Received: from [140.186.70.92] (port=44882 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O9IP0-0006m7-EJ for qemu-devel@nongnu.org; Tue, 04 May 2010 09:43:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O9IOy-0002Nw-2t for qemu-devel@nongnu.org; Tue, 04 May 2010 09:43:10 -0400 Received: from zion.dlh.net ([91.198.192.1]:57831 helo=mail.dlh.net) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O9IOx-0002M4-Or for qemu-devel@nongnu.org; Tue, 04 May 2010 09:43:08 -0400 Message-ID: <4BE02440.6010802@dlh.net> Date: Tue, 04 May 2010 15:42:24 +0200 From: Peter Lieven MIME-Version: 1.0 Subject: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion References: <4BDF3F94.1080608@dlh.net> <4BDFDC44.9030808@redhat.com> <4BE00750.6040804@dlh.net> <4BE01120.30608@redhat.com> In-Reply-To: <4BE01120.30608@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Christoph Hellwig hi kevin, you did it *g* looks promising. applied this patched and was not able to reproduce yet :-) secure way to reproduce was to shut down all multipath paths, then initiate i/o in the vm (e.g. start an application). of course, everything hangs at this point. after reenabling one path, vm crashed. now it seems to behave correctly and just report an DMA timeout and continues normally afterwards. can you imagine of any way preventing the vm to consume 100% cpu in that waiting state? my current approach is to run all vms with nice 1, which helped to keep the machine responsible if all vms (in my test case 64 on a box) have hanging i/o at the same time. br, peter Kevin Wolf wrote: > Am 04.05.2010 13:38, schrieb Peter Lieven: > >> hi kevin, >> >> i set a breakpint at bmdma_active_if. the first 2 breaks encountered >> when the last path in the multipath >> failed, but the assertion was not true. >> when i kicked one path back in the breakpoint was reached again, this >> time leading to an assert. >> the stacktrace is from the point shortly before. >> >> hope this helps. >> > > Hm, looks like there's something wrong with cancelling requests - > bdrv_aio_cancel might decide that it completes a request (and > consequently calls the callback for it) whereas the IDE emulation > decides that it's done with the request before calling bdrv_aio_cancel. > > I haven't looked in much detail what this could break, but does > something like this help? > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index 0757528..3cd55e3 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read) > void ide_dma_cancel(BMDMAState *bm) > { > if (bm->status & BM_STATUS_DMAING) { > - bm->status &= ~BM_STATUS_DMAING; > - /* cancel DMA request */ > - bm->unit = -1; > - bm->dma_cb = NULL; > if (bm->aiocb) { > #ifdef DEBUG_AIO > printf("aio_cancel\n"); > @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm) > bdrv_aio_cancel(bm->aiocb); > bm->aiocb = NULL; > } > + bm->status &= ~BM_STATUS_DMAING; > + /* cancel DMA request */ > + bm->unit = -1; > + bm->dma_cb = NULL; > } > } > > Kevin > >