From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KZ4YU-0008Dq-PT for qemu-devel@nongnu.org; Fri, 29 Aug 2008 10:02:26 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KZ4YS-0008DP-9l for qemu-devel@nongnu.org; Fri, 29 Aug 2008 10:02:26 -0400 Received: from [199.232.76.173] (port=46481 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KZ4YS-0008DM-6y for qemu-devel@nongnu.org; Fri, 29 Aug 2008 10:02:24 -0400 Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:50858 helo=mx.cpushare.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KZ4YR-0006ua-UR for qemu-devel@nongnu.org; Fri, 29 Aug 2008 10:02:24 -0400 Date: Fri, 29 Aug 2008 15:52:49 +0200 From: Andrea Arcangeli Message-ID: <20080829135249.GI24884@duo.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: [Qemu-devel] [PATCH] ide_dma_cancel will result in partial DMA transfer Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hello, while trying to track down weird fs corruption, I noticed the way the cmd_writeb ioport write cancels the I/O is not atomic from a DMA point of view if a SG table with more than one entry is used. The DMA command with qemu/kvm is aborted partially. Unfortunately I don't know the IDE specs well enough to know what happens in the hardware, but I doubt hardware will abort the DMA command in the middle. I wonder if this could explain fs corruption or not. If yes, this should possibly fix it. There's also the possibility that the reboot handler is canceling a dma in the middle of a SG table processing, but if guest has still DMA writes in flight to disk when it issues reboots, that sounds a guest bug. Furthermore during poweroff (kind of reboot) a large dma may not reach the disk. So I'm more worried about the "software" behavior of bmdma_cmd_writeb than the reboot handler. I wonder if perhaps killing task in proprietary guest OS (when guest os tries to reboot) could lead the task to call aio_cancel during cleanup, that would ask the ide guest kernel driver to cancel the I/O as a whole (not partially). In general I think there's a small chance that this really is the source of the fs corruption, and if it does then the corruption would also happen after killing kvm/qemu with sigkill (but we don't kill kvm as often as we reboot the guest in it). So I'm not really sure if this is needed or not, but it certainly rings a bell the fact this ide_dma_cancel is definitely issued with aio commands in flight in the reboot loop that reproduces corruption (triggers always a few seconds before reboot). Fix is mostly untested as hitting this path takes time, more a RFC so far. Signed-off-by: Andrea Arcangeli Index: hw/ide.c =================================================================== --- hw/ide.c (revision 5089) +++ hw/ide.c (working copy) @@ -2894,8 +2904,21 @@ printf("%s: 0x%08x\n", __func__, val); #endif if (!(val & BM_CMD_START)) { - /* XXX: do it better */ - ide_dma_cancel(bm); + /* + * We can't cancel Scatter Gather DMA in the middle of the + * operation or a partial (not full) DMA transfer would reach + * the storage so we wait for completion instead (we beahve + * like if the DMA was complated by the time the guest trying + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not + * set). + * + * In the future we'll be able to safely cancel the I/O if the + * whole DMA operation will be submitted to disk with a single + * aio operation in the form of aio_readv/aio_writev + * (supported by linux kernel AIO but not by glibc pthread aio + * lib). + */ + qemu_aio_flush(); bm->cmd = val & 0x09; } else { if (!(bm->status & BM_STATUS_DMAING)) {