qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] ide_dma_cancel will result in partial DMA transfer
@ 2008-08-29 13:52 Andrea Arcangeli
  2008-09-01 10:43 ` [Qemu-devel] [PATCH 1/2] " Andrea Arcangeli
  2008-09-01 11:21 ` Ian Jackson
  0 siblings, 2 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2008-08-29 13:52 UTC (permalink / raw)
  To: qemu-devel

Hello,

while trying to track down weird fs corruption, I noticed the way the
cmd_writeb ioport write cancels the I/O is not atomic from a DMA point
of view if a SG table with more than one entry is used. The DMA
command with qemu/kvm is aborted partially. Unfortunately I don't know
the IDE specs well enough to know what happens in the hardware, but I
doubt hardware will abort the DMA command in the middle. I wonder if
this could explain fs corruption or not. If yes, this should possibly
fix it.

There's also the possibility that the reboot handler is canceling a
dma in the middle of a SG table processing, but if guest has still DMA
writes in flight to disk when it issues reboots, that sounds a guest
bug. Furthermore during poweroff (kind of reboot) a large dma may not
reach the disk. So I'm more worried about the "software" behavior of
bmdma_cmd_writeb than the reboot handler. I wonder if perhaps killing
task in proprietary guest OS (when guest os tries to reboot) could
lead the task to call aio_cancel during cleanup, that would ask the
ide guest kernel driver to cancel the I/O as a whole (not partially).

In general I think there's a small chance that this really is the
source of the fs corruption, and if it does then the corruption would
also happen after killing kvm/qemu with sigkill (but we don't kill kvm
as often as we reboot the guest in it). So I'm not really sure if this
is needed or not, but it certainly rings a bell the fact this
ide_dma_cancel is definitely issued with aio commands in flight in the
reboot loop that reproduces corruption (triggers always a few seconds
before reboot).

Fix is mostly untested as hitting this path takes time, more a RFC so
far.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>

Index: hw/ide.c
===================================================================
--- hw/ide.c	(revision 5089)
+++ hw/ide.c	(working copy)
@@ -2894,8 +2904,21 @@
     printf("%s: 0x%08x\n", __func__, val);
 #endif
     if (!(val & BM_CMD_START)) {
-        /* XXX: do it better */
-        ide_dma_cancel(bm);
+        /*
+	 * We can't cancel Scatter Gather DMA in the middle of the
+	 * operation or a partial (not full) DMA transfer would reach
+	 * the storage so we wait for completion instead (we beahve
+	 * like if the DMA was complated by the time the guest trying
+	 * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
+	 * set).
+	 *
+	 * In the future we'll be able to safely cancel the I/O if the
+	 * whole DMA operation will be submitted to disk with a single
+	 * aio operation in the form of aio_readv/aio_writev
+	 * (supported by linux kernel AIO but not by glibc pthread aio
+	 * lib).
+	 */
+	qemu_aio_flush();
         bm->cmd = val & 0x09;
     } else {
         if (!(bm->status & BM_STATUS_DMAING)) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-02-26 19:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-29 13:52 [Qemu-devel] [PATCH] ide_dma_cancel will result in partial DMA transfer Andrea Arcangeli
2008-09-01 10:43 ` [Qemu-devel] [PATCH 1/2] " Andrea Arcangeli
2008-09-01 10:53   ` [Qemu-devel] [PATCH 2/2] fix bdrv_aio_read API breakage in qcow2 Andrea Arcangeli
2008-10-22 14:14     ` [Qemu-devel] [PATCH] " Andrea Arcangeli
2008-10-27 13:49       ` Anthony Liguori
2008-10-31 17:32       ` Anthony Liguori
2009-01-14 18:06   ` [Qemu-devel] [PATCH] ide_dma_cancel will result in partial DMA transfer Andrea Arcangeli
2009-01-16 16:41     ` Ian Jackson
2009-01-22 19:02     ` Anthony Liguori
2009-02-26 16:43       ` Andrea Arcangeli
2008-09-01 11:21 ` Ian Jackson
2008-09-01 12:13   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).