From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M9iXl-0005JS-Um for qemu-devel@nongnu.org; Thu, 28 May 2009 12:33:25 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M9iXh-0005Gn-BN for qemu-devel@nongnu.org; Thu, 28 May 2009 12:33:25 -0400 Received: from [199.232.76.173] (port=34813 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M9iXh-0005Gd-51 for qemu-devel@nongnu.org; Thu, 28 May 2009 12:33:21 -0400 Received: from mx20.gnu.org ([199.232.41.8]:47596) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1M9iXg-0001fn-Jl for qemu-devel@nongnu.org; Thu, 28 May 2009 12:33:20 -0400 Received: from mx2.redhat.com ([66.187.237.31]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M9iXf-0002DQ-3L for qemu-devel@nongnu.org; Thu, 28 May 2009 12:33:19 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n4SGXHgl022116 for ; Thu, 28 May 2009 12:33:17 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n4SGXHBJ008829 for ; Thu, 28 May 2009 12:33:17 -0400 Received: from random.random (vpn-10-71.str.redhat.com [10.32.10.71]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n4SGXDoa006515 for ; Thu, 28 May 2009 12:33:15 -0400 Date: Thu, 28 May 2009 18:33:10 +0200 From: Andrea Arcangeli Message-ID: <20090528163309.GJ20464@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hello, the debug code in my ide_dma_cancel patch (not yet included upstream) made us notice that when qemu_aio_flush returns, there are still pending aio commands that waits to complete. Auditing the code I found strange stuff like the fact qemu_aio_waits does nothing if there's an unrelated (no aio related) bh executed. And I think I found the reason of why there was still pending aio when qemu_aio_flush because qemu_aio_wait does a lot more than wait, it can start aio, and if the previous ->io_flush returned zero, the loop ends and ->io_flush isn't repeated. The fact an unrelated bh can make qemu_aio_wait a noop seems troublesome for all callers that aren't calling qemu_aio_wait in a loop like qemu_aio_flush, so I preferred to change those callers to a safer qemu_aio_flush in case the bh executed generates more pending I/O. What you think about this patch against qemu git? ------------ Subject: fix bdrv_read/write_em and qemu_aio_flush From: Andrea Arcangeli qemu_aio_wait() was used to start aio through qemu_bh_poll(), like in case of qcow2 reads from holes. The bh is global. I can't see how it can be possibly safe to make qemu_aio_wait a noop, if any unrelated bh was pending and had to be executed. In addition qemu_aio_wait by invoking the bh, could execute new aio callbacks that would issue more aio commands during their completion handlers, breaking the invariant that qemu_aio_poll returns only when all ->io_flush methods returns 0 (which lead to a failure in my fix to ide_dma_cancel that has debug code to catch that, code that isn't yet in qemu upstream). ->io_flush returns 0 qemu_aio_wait() qemu_aio_bh() -> qcow_aio_read_bh -> ide_read_dma_cb -> ide_dma_submit_check break the loop despite ->io_flush wouldn't return 0 I also changed most aio_wait callers to call aio_flush instead, this will ensure that even a read from hole submitted with bdrv_aio_readv will be complete by the time aio_flush returns. Signed-off-by: Andrea Arcangeli --- diff --git a/aio.c b/aio.c index 11fbb6c..b304852 100644 --- a/aio.c +++ b/aio.c @@ -103,6 +103,12 @@ void qemu_aio_flush(void) do { ret = 0; + /* + * If there are pending emulated aio start them so + * flush will be able to return 1. + */ + qemu_bh_poll(); + LIST_FOREACH(node, &aio_handlers, node) { ret |= node->io_flush(node->opaque); } @@ -115,9 +121,6 @@ void qemu_aio_wait(void) { int ret; - if (qemu_bh_poll()) - return; - do { AioHandler *node; fd_set rdfds, wrfds; diff --git a/block.c b/block.c index c80d4b9..78088a9 100644 --- a/block.c +++ b/block.c @@ -1487,7 +1487,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, return -1; while (async_ret == NOT_DONE) { - qemu_aio_wait(); + qemu_aio_flush(); } return async_ret; @@ -1510,7 +1510,7 @@ static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, if (acb == NULL) return -1; while (async_ret == NOT_DONE) { - qemu_aio_wait(); + qemu_aio_flush(); } return async_ret; } diff --git a/qemu-io.c b/qemu-io.c index f0a17b9..0d52074 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -153,7 +153,7 @@ static int do_aio_readv(QEMUIOVector *qiov, int64_t offset, int *total) return -EIO; while (async_ret == NOT_DONE) - qemu_aio_wait(); + qemu_aio_flush(); *total = qiov->size; return async_ret < 0 ? async_ret : 1; @@ -170,7 +170,7 @@ static int do_aio_writev(QEMUIOVector *qiov, int64_t offset, int *total) return -EIO; while (async_ret == NOT_DONE) - qemu_aio_wait(); + qemu_aio_flush(); *total = qiov->size; return async_ret < 0 ? async_ret : 1;