From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:33419) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qm4Q9-0007Ee-0x for qemu-devel@nongnu.org; Wed, 27 Jul 2011 09:45:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qm4Q7-0004Y3-J5 for qemu-devel@nongnu.org; Wed, 27 Jul 2011 09:45:08 -0400 Received: from mtagate1.uk.ibm.com ([194.196.100.161]:53107) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qm4Q7-0004Wf-Bs for qemu-devel@nongnu.org; Wed, 27 Jul 2011 09:45:07 -0400 Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate1.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p6RDj5Kd003510 for ; Wed, 27 Jul 2011 13:45:05 GMT Received: from d06av09.portsmouth.uk.ibm.com (d06av09.portsmouth.uk.ibm.com [9.149.37.250]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p6RDj5Nu2154712 for ; Wed, 27 Jul 2011 14:45:05 +0100 Received: from d06av09.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p6RDj5Nk016166 for ; Wed, 27 Jul 2011 07:45:05 -0600 From: Stefan Hajnoczi Date: Wed, 27 Jul 2011 14:44:46 +0100 Message-Id: <1311774295-8696-7-git-send-email-stefanha@linux.vnet.ibm.com> In-Reply-To: <1311774295-8696-1-git-send-email-stefanha@linux.vnet.ibm.com> References: <1311774295-8696-1-git-send-email-stefanha@linux.vnet.ibm.com> Subject: [Qemu-devel] [PATCH 06/15] qed: avoid deadlock on emulated synchronous I/O List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Kevin Wolf , Anthony Liguori , Stefan Hajnoczi , Adam Litke The block layer emulates synchronous bdrv_read()/bdrv_write() for drivers that only provide the asynchronous interfaces. The emulation issues an asynchronous request inside a new "async context" and waits for that request to complete. If currently outstanding requests complete during this time, their completion functions are not invoked until the async context is popped again. This can lead to deadlock if an allocating write is being processed when synchronous I/O emulation starts. The emulated synchronous write will be queued because an existing request is being processed. But the existing request on cannot complete until the async context is popped. The result is that qemu_aio_wait() sits in a deadlock. Address this problem in two ways: 1. Add an assertion so that we instantly know if this corner case is hit. This saves us time by giving a clear failure indication. 2. Ignore the copy-on-read hint for emulated synchronous reads. This allows us to do emulated synchronous reads without hitting the deadlock. Keep this as a separate commit instead of merging with previous QED patches so it is easy to drop when coroutines are introduced and eliminate async contexts. Signed-off-by: Stefan Hajnoczi --- block/qed.c | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/block/qed.c b/block/qed.c index 6ca57f2..ffdbc2d 100644 --- a/block/qed.c +++ b/block/qed.c @@ -1120,6 +1120,14 @@ static bool qed_start_allocating_write(QEDAIOCB *acb) } if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) || s->allocating_write_reqs_plugged) { + /* Queuing an emulated synchronous write causes deadlock since + * currently outstanding requests are not in the current async context + * and their completion will never be invoked. Once the block layer + * moves to truly asynchronous semantics this failure case will be + * eliminated. + */ + assert(get_async_context_id() == 0); + return false; } return true; @@ -1246,7 +1254,9 @@ static void qed_aio_read_data(void *opaque, int ret, } else if (ret != QED_CLUSTER_FOUND) { BlockDriverCompletionFunc *cb = qed_aio_next_io; - if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ)) { + /* See qed_start_allocating_write() for get_async_context_id() hack */ + if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ) && + get_async_context_id() == 0) { if (!qed_start_allocating_write(acb)) { qemu_iovec_reset(&acb->cur_qiov); return; /* wait for current allocating write to complete */ -- 1.7.5.4