From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=55775 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Pi7U8-0001lE-Ck
	for qemu-devel@nongnu.org; Wed, 26 Jan 2011 10:40:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1Pi7U6-0003B7-P6
	for qemu-devel@nongnu.org; Wed, 26 Jan 2011 10:40:40 -0500
Received: from mx1.redhat.com ([209.132.183.28]:13039)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1Pi7U6-0003As-I5
	for qemu-devel@nongnu.org; Wed, 26 Jan 2011 10:40:38 -0500
Message-ID: <4D40406B.2070302@redhat.com>
Date: Wed, 26 Jan 2011 17:40:27 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to
	use	coroutines for async I/O
References: <1295688567-25496-1-git-send-email-stefanha@linux.vnet.ibm.com>
	<1295688567-25496-12-git-send-email-stefanha@linux.vnet.ibm.com>
In-Reply-To: <1295688567-25496-12-git-send-email-stefanha@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Anthony Liguori <aliguori@us.ibm.com>, qemu-devel@nongnu.org

On 01/22/2011 11:29 AM, Stefan Hajnoczi wrote:
> Converting qcow2 to use coroutines is fairly simple since most of qcow2
> is synchronous.  The synchronous I/O functions likes bdrv_pread() now
> transparently work when called from a coroutine, so all the synchronous
> code just works.
>
> The explicitly asynchronous code is adjusted to repeatedly call
> qcow2_aio_read_cb() or qcow2_aio_write_cb() until the request completes.
> At that point the coroutine will return from its entry function and its
> resources are freed.
>
> The bdrv_aio_readv() and bdrv_aio_writev() user callback is now invoked
> from a BH.  This is necessary since the user callback code does not
> expect to be executed from a coroutine.
>
> This conversion is not completely correct because the safety the
> synchronous code does not carry over to the coroutine version.
> Previously, a synchronous code path could assume that it will never be
> interleaved with another request executing.  This is no longer true
> because bdrv_pread() and bdrv_pwrite() cause the coroutine to yield and
> other requests can be processed during that time.
>
> The solution is to carefully introduce checks so that pending requests
> do not step on each other's toes.  That is left for a future patch...
>


The way I thought of doing this is:

qcow_aio_write(...)
{
     execute_in_coroutine {
         co_mutex_lock(&bs->mutex);
         do_qcow_aio_write(...); // original qcow code
         co_mutex_release(&bs->mutex);
     }
}

(similar changes for the the other callbacks)

if the code happens to be asynchronous (no metadata changes), we'll take 
the mutex and release it immediately after submitting the I/O, so no 
extra serialization happens.  If the code does issue a synchronous 
metadata write, we'll lock out all other operations on the same block 
device, but still allow the vcpu to execute, since all the locking 
happens in a coroutine.

Essentially, a mutex becomes the dependency tracking mechnism.  A global 
mutex means all synchronous operations are dependent.  Later, we can 
convert the metadata cache entry dependency lists to local mutexes 
inside the cache entry structures.

-- 
error compiling committee.c: too many arguments to function