From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:44535) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rxd2e-0000F2-TI for qemu-devel@nongnu.org; Wed, 15 Feb 2012 06:29:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rxd2Z-0003vm-4j for qemu-devel@nongnu.org; Wed, 15 Feb 2012 06:28:56 -0500 Received: from mail-lpp01m010-f45.google.com ([209.85.215.45]:34447) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rxd2Y-0003uT-Ow for qemu-devel@nongnu.org; Wed, 15 Feb 2012 06:28:51 -0500 Received: by lahi5 with SMTP id i5so924427lah.4 for ; Wed, 15 Feb 2012 03:28:49 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20120214143038.GB27413@arachsys.com> References: <20120214143038.GB27413@arachsys.com> Date: Wed, 15 Feb 2012 11:28:49 +0000 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Block IO throttling: contractor wanted List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chris Webb Cc: Kevin Wolf , Zhi Yong Wu , qemu-devel@nongnu.org On Tue, Feb 14, 2012 at 2:30 PM, Chris Webb wrote: > However, there are (known) run-time > assertion failures with throttled IDE devices[1], which show up in qemu-k= vm > 1.0 and apparently also in qemu HEAD. We have also sometimes seen throttl= ed > VMs spinning unresponsively with 100% CPU on start-up, which may be relat= ed. ... > [1] I can't immediately find the original reports in the archives, but I > discussed this privately with Zhi Yong Wu and he had already had reports = of > the same issue. As a quick example, I can trigger an assertion failure in > the IDE driver by turning on limits on a running guest doing heavy IO. I > configure a guest with an IDE drive ide.0.0 and then do > > =A0block_set_io_throttle ide.0.0 100000000 0 0 1000 0 0 > > Shortly afterwards, the qemu-kvm process exists with an assert(): > > qemu-kvm: /home/root/packages/qemu-kvm-1.0/src-76ig7q/hw/ide/pci.c:313: > =A0bmdma_cmd_writeb: Assertion `bm->bus->dma->aiocb =3D=3D ((void *)0)' f= ailed. > > i.e. bm->bus->dma->aiocb is not NULL after qemu_aio_flush() in > bmdma_cmd_writeb in the IDE driver: > > =A0void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val) > =A0{ > =A0#ifdef DEBUG_IDE > =A0 =A0 =A0printf("%s: 0x%08x\n", __func__, val); > =A0#endif > > =A0 =A0 =A0/* Ignore writes to SSBM if it keeps the old value */ > =A0 =A0 =A0if ((val & BM_CMD_START) !=3D (bm->cmd & BM_CMD_START)) { > =A0 =A0 =A0 =A0 =A0if (!(val & BM_CMD_START)) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0/* > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * We can't cancel Scatter Gather DMA in the m= iddle of the > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * operation or a partial (not full) DMA trans= fer would reach > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * the storage so we wait for completion inste= ad (we beahve > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * like if the DMA was completed by the time t= he guest trying > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * to cancel dma with bmdma_cmd_writeb with BM= _CMD_START not > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * set). > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * In the future we'll be able to safely cance= l the I/O if the > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * whole DMA operation will be submitted to di= sk with a single > =A0 =A0 =A0 =A0 =A0 =A0 =A0 * aio operation with preadv/pwritev. > =A0 =A0 =A0 =A0 =A0 =A0 =A0 */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0if (bm->bus->dma->aiocb) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qemu_aio_flush(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0assert(bm->bus->dma->aiocb =3D=3D NULL= ); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0assert((bm->status & BM_STATUS_DMAING)= =3D=3D 0); > =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0 =A0bm->cur_addr =3D bm->addr; > =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!(bm->status & BM_STATUS_DMAING)) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0bm->status |=3D BM_STATUS_DMAING; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* start dma transfer if possible */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (bm->dma_cb) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0bm->dma_cb(bmdma_active_if(bm)= , 0); > =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0} > > =A0 =A0 =A0bm->cmd =3D val & 0x09; > =A0} > > (My uninformed guess is that this might be something to do with > qemu_aio_flush() not being able to write out all the data because of the = IO > throttling?) Thanks for the bug report. This is an actively maintained part of the codebase, so chances are good this can be fixed in a reasonable time by the community. Just wanted to share my thoughts in case no one else replies - Zhi Yong and I are aware of this bug and there should be time to look into it. Stefan