From: Anthony Liguori <anthony@codemonkey.ws>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to use coroutines for async I/O
Date: Sun, 23 Jan 2011 17:40:22 -0600 [thread overview]
Message-ID: <4D3CBC66.1010203@codemonkey.ws> (raw)
In-Reply-To: <1295688567-25496-12-git-send-email-stefanha@linux.vnet.ibm.com>
On 01/22/2011 03:29 AM, Stefan Hajnoczi wrote:
> Converting qcow2 to use coroutines is fairly simple since most of qcow2
> is synchronous. The synchronous I/O functions likes bdrv_pread() now
> transparently work when called from a coroutine, so all the synchronous
> code just works.
>
> The explicitly asynchronous code is adjusted to repeatedly call
> qcow2_aio_read_cb() or qcow2_aio_write_cb() until the request completes.
> At that point the coroutine will return from its entry function and its
> resources are freed.
>
> The bdrv_aio_readv() and bdrv_aio_writev() user callback is now invoked
> from a BH. This is necessary since the user callback code does not
> expect to be executed from a coroutine.
>
> This conversion is not completely correct because the safety the
> synchronous code does not carry over to the coroutine version.
> Previously, a synchronous code path could assume that it will never be
> interleaved with another request executing. This is no longer true
> because bdrv_pread() and bdrv_pwrite() cause the coroutine to yield and
> other requests can be processed during that time.
>
> The solution is to carefully introduce checks so that pending requests
> do not step on each other's toes. That is left for a future patch...
>
> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>
As an alternative approach, could we trap async calls from the block
device, implement them in a synchronous fashion, then issue the callback
immediately?
This would mean that qcow_aio_write() would become fully synchronous
which also means that you can track when the operation is completed
entirely within the block layer. IOW, it should be possible to do this
with almost no change to qcow2.
I think this is the right approach too. If we're using coroutines, we
shouldn't do anything asynchronous in the image formats. The good bit
about this is that we can probably dramatically simplify the block layer
API but eliminating the sync/async versions of everything.
Regards,
Anthony Liguori
> ---
> block/qcow2.c | 160 ++++++++++++++++++++++++++++++---------------------------
> 1 files changed, 85 insertions(+), 75 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b6b094c..4b33ef3 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -361,19 +361,20 @@ typedef struct QCowAIOCB {
> uint64_t bytes_done;
> uint64_t cluster_offset;
> uint8_t *cluster_data;
> - BlockDriverAIOCB *hd_aiocb;
> QEMUIOVector hd_qiov;
> QEMUBH *bh;
> QCowL2Meta l2meta;
> QLIST_ENTRY(QCowAIOCB) next_depend;
> + Coroutine *coroutine;
> + int ret; /* return code for user callback */
> } QCowAIOCB;
>
> static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb)
> {
> QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common);
> - if (acb->hd_aiocb)
> - bdrv_aio_cancel(acb->hd_aiocb);
> qemu_aio_release(acb);
> + /* XXX This function looks broken, we could be in the middle of a request
> + * and releasing the acb is not a good idea */
> }
>
> static AIOPool qcow2_aio_pool = {
> @@ -381,13 +382,14 @@ static AIOPool qcow2_aio_pool = {
> .cancel = qcow2_aio_cancel,
> };
>
> -static void qcow2_aio_read_cb(void *opaque, int ret);
> -static void qcow2_aio_read_bh(void *opaque)
> +static void qcow2_aio_bh(void *opaque)
> {
> QCowAIOCB *acb = opaque;
> qemu_bh_delete(acb->bh);
> acb->bh = NULL;
> - qcow2_aio_read_cb(opaque, 0);
> + acb->common.cb(acb->common.opaque, acb->ret);
> + qemu_iovec_destroy(&acb->hd_qiov);
> + qemu_aio_release(acb);
> }
>
> static int qcow2_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
> @@ -404,14 +406,13 @@ static int qcow2_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
> return 0;
> }
>
> -static void qcow2_aio_read_cb(void *opaque, int ret)
> +static int coroutine_fn qcow2_aio_read_cb(void *opaque, int ret)
> {
> QCowAIOCB *acb = opaque;
> BlockDriverState *bs = acb->common.bs;
> BDRVQcowState *s = bs->opaque;
> int index_in_cluster, n1;
>
> - acb->hd_aiocb = NULL;
> if (ret< 0)
> goto done;
>
> @@ -469,22 +470,13 @@ static void qcow2_aio_read_cb(void *opaque, int ret)
> acb->sector_num, acb->cur_nr_sectors);
> if (n1> 0) {
> BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
> - acb->hd_aiocb = bdrv_aio_readv(bs->backing_hd, acb->sector_num,
> -&acb->hd_qiov, acb->cur_nr_sectors,
> - qcow2_aio_read_cb, acb);
> - if (acb->hd_aiocb == NULL)
> - goto done;
> - } else {
> - ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
> - if (ret< 0)
> + ret = bdrv_co_readv(bs->backing_hd, acb->sector_num,&acb->hd_qiov, acb->cur_nr_sectors);
> + if (ret< 0) {
> goto done;
> + }
> }
> } else {
> - /* Note: in this case, no need to wait */
> qemu_iovec_memset(&acb->hd_qiov, 0, 512 * acb->cur_nr_sectors);
> - ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
> - if (ret< 0)
> - goto done;
> }
> } else if (acb->cluster_offset& QCOW_OFLAG_COMPRESSED) {
> /* add AIO support for compressed blocks ? */
> @@ -494,10 +486,6 @@ static void qcow2_aio_read_cb(void *opaque, int ret)
> qemu_iovec_from_buffer(&acb->hd_qiov,
> s->cluster_cache + index_in_cluster * 512,
> 512 * acb->cur_nr_sectors);
> -
> - ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
> - if (ret< 0)
> - goto done;
> } else {
> if ((acb->cluster_offset& 511) != 0) {
> ret = -EIO;
> @@ -522,34 +510,50 @@ static void qcow2_aio_read_cb(void *opaque, int ret)
> }
>
> BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
> - acb->hd_aiocb = bdrv_aio_readv(bs->file,
> + ret = bdrv_co_readv(bs->file,
> (acb->cluster_offset>> 9) + index_in_cluster,
> -&acb->hd_qiov, acb->cur_nr_sectors,
> - qcow2_aio_read_cb, acb);
> - if (acb->hd_aiocb == NULL) {
> - ret = -EIO;
> +&acb->hd_qiov, acb->cur_nr_sectors);
> + if (ret< 0) {
> goto done;
> }
> }
>
> - return;
> + return 1;
> done:
> - acb->common.cb(acb->common.opaque, ret);
> - qemu_iovec_destroy(&acb->hd_qiov);
> - qemu_aio_release(acb);
> + acb->ret = ret;
> + qcow2_schedule_bh(qcow2_aio_bh, acb);
> + return 0;
> +}
> +
> +static void * coroutine_fn qcow2_co_read(void *opaque)
> +{
> + QCowAIOCB *acb = opaque;
> +
> + while (qcow2_aio_read_cb(acb, 0)) {
> + }
> + return NULL;
> +}
> +
> +static int coroutine_fn qcow2_aio_write_cb(void *opaque, int ret);
> +static void * coroutine_fn qcow2_co_write(void *opaque)
> +{
> + QCowAIOCB *acb = opaque;
> +
> + while (qcow2_aio_write_cb(acb, 0)) {
> + }
> + return NULL;
> }
>
> -static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num,
> - QEMUIOVector *qiov, int nb_sectors,
> - BlockDriverCompletionFunc *cb,
> - void *opaque, int is_write)
> +static BlockDriverAIOCB *qcow2_aio_setup(BlockDriverState *bs,
> + int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> + BlockDriverCompletionFunc *cb, void *opaque, int is_write)
> {
> QCowAIOCB *acb;
> + Coroutine *coroutine;
>
> acb = qemu_aio_get(&qcow2_aio_pool, bs, cb, opaque);
> if (!acb)
> return NULL;
> - acb->hd_aiocb = NULL;
> acb->sector_num = sector_num;
> acb->qiov = qiov;
>
> @@ -561,7 +565,12 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num,
> acb->cluster_offset = 0;
> acb->l2meta.nb_clusters = 0;
> QLIST_INIT(&acb->l2meta.dependent_requests);
> - return acb;
> +
> + coroutine = qemu_coroutine_create(is_write ? qcow2_co_write
> + : qcow2_co_read);
> + acb->coroutine = coroutine;
> + qemu_coroutine_enter(coroutine, acb);
> + return&acb->common;
> }
>
> static BlockDriverAIOCB *qcow2_aio_readv(BlockDriverState *bs,
> @@ -570,38 +579,48 @@ static BlockDriverAIOCB *qcow2_aio_readv(BlockDriverState *bs,
> BlockDriverCompletionFunc *cb,
> void *opaque)
> {
> - QCowAIOCB *acb;
> -
> - acb = qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
> - if (!acb)
> - return NULL;
> -
> - qcow2_aio_read_cb(acb, 0);
> - return&acb->common;
> + return qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
> }
>
> -static void qcow2_aio_write_cb(void *opaque, int ret);
> -
> -static void run_dependent_requests(QCowL2Meta *m)
> +static void qcow2_co_run_dependent_requests(void *opaque)
> {
> + QCowAIOCB *acb = opaque;
> QCowAIOCB *req;
> QCowAIOCB *next;
>
> + qemu_bh_delete(acb->bh);
> + acb->bh = NULL;
> +
> + /* Restart all dependent requests */
> + QLIST_FOREACH_SAFE(req,&acb->l2meta.dependent_requests, next_depend, next) {
> + qemu_coroutine_enter(req->coroutine, NULL);
> + }
> +
> + /* Reenter the original request */
> + qemu_coroutine_enter(acb->coroutine, NULL);
> +}
> +
> +static void run_dependent_requests(QCowL2Meta *m)
> +{
> /* Take the request off the list of running requests */
> if (m->nb_clusters != 0) {
> QLIST_REMOVE(m, next_in_flight);
> }
>
> - /* Restart all dependent requests */
> - QLIST_FOREACH_SAFE(req,&m->dependent_requests, next_depend, next) {
> - qcow2_aio_write_cb(req, 0);
> + if (!QLIST_EMPTY(&m->dependent_requests)) {
> + /* TODO This is a hack to get at the acb, may not be correct if called
> + * with a QCowL2Meta that is not part of a QCowAIOCB.
> + */
> + QCowAIOCB *acb = container_of(m, QCowAIOCB, l2meta);
> + qcow2_schedule_bh(qcow2_co_run_dependent_requests, acb);
> + qemu_coroutine_yield(NULL);
> }
>
> /* Empty the list for the next part of the request */
> QLIST_INIT(&m->dependent_requests);
> }
>
> -static void qcow2_aio_write_cb(void *opaque, int ret)
> +static int coroutine_fn qcow2_aio_write_cb(void *opaque, int ret)
> {
> QCowAIOCB *acb = opaque;
> BlockDriverState *bs = acb->common.bs;
> @@ -609,8 +628,6 @@ static void qcow2_aio_write_cb(void *opaque, int ret)
> int index_in_cluster;
> int n_end;
>
> - acb->hd_aiocb = NULL;
> -
> if (ret>= 0) {
> ret = qcow2_alloc_cluster_link_l2(bs,&acb->l2meta);
> }
> @@ -648,7 +665,8 @@ static void qcow2_aio_write_cb(void *opaque, int ret)
> if (acb->l2meta.nb_clusters == 0&& acb->l2meta.depends_on != NULL) {
> QLIST_INSERT_HEAD(&acb->l2meta.depends_on->dependent_requests,
> acb, next_depend);
> - return;
> + qemu_coroutine_yield(NULL);
> + return 1;
> }
>
> assert((acb->cluster_offset& 511) == 0);
> @@ -675,25 +693,22 @@ static void qcow2_aio_write_cb(void *opaque, int ret)
> }
>
> BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
> - acb->hd_aiocb = bdrv_aio_writev(bs->file,
> - (acb->cluster_offset>> 9) + index_in_cluster,
> -&acb->hd_qiov, acb->cur_nr_sectors,
> - qcow2_aio_write_cb, acb);
> - if (acb->hd_aiocb == NULL) {
> - ret = -EIO;
> + ret = bdrv_co_writev(bs->file,
> + (acb->cluster_offset>> 9) + index_in_cluster,
> +&acb->hd_qiov, acb->cur_nr_sectors);
> + if (ret< 0) {
> goto fail;
> }
> -
> - return;
> + return 1;
>
> fail:
> if (acb->l2meta.nb_clusters != 0) {
> QLIST_REMOVE(&acb->l2meta, next_in_flight);
> }
> done:
> - acb->common.cb(acb->common.opaque, ret);
> - qemu_iovec_destroy(&acb->hd_qiov);
> - qemu_aio_release(acb);
> + acb->ret = ret;
> + qcow2_schedule_bh(qcow2_aio_bh, acb);
> + return 0;
> }
>
> static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState *bs,
> @@ -703,16 +718,10 @@ static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState *bs,
> void *opaque)
> {
> BDRVQcowState *s = bs->opaque;
> - QCowAIOCB *acb;
>
> s->cluster_cache_offset = -1; /* disable compressed cache */
>
> - acb = qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
> - if (!acb)
> - return NULL;
> -
> - qcow2_aio_write_cb(acb, 0);
> - return&acb->common;
> + return qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
> }
>
> static void qcow2_close(BlockDriverState *bs)
> @@ -824,6 +833,7 @@ static int qcow2_change_backing_file(BlockDriverState *bs,
> return qcow2_update_ext_header(bs, backing_file, backing_fmt);
> }
>
> +/* TODO did we break this for coroutines? */
> static int preallocate(BlockDriverState *bs)
> {
> uint64_t nb_sectors;
>
next prev parent reply other threads:[~2011-01-23 23:41 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-22 9:29 [Qemu-devel] [RFC][PATCH 00/12] qcow2: Convert qcow2 to use coroutines for async I/O Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 01/12] coroutine: Add gtk-vnc coroutines library Stefan Hajnoczi
2011-01-26 15:25 ` Avi Kivity
2011-01-26 16:00 ` Anthony Liguori
2011-01-26 16:13 ` Avi Kivity
2011-01-26 16:19 ` Anthony Liguori
2011-01-26 16:22 ` Avi Kivity
2011-01-26 16:29 ` Anthony Liguori
2011-01-26 16:21 ` Anthony Liguori
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 02/12] continuation: Fix container_of() redefinition Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 03/12] Make sure to release allocated stack when coroutine is released Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 04/12] coroutine: Use thread-local leader and current variables Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 05/12] coroutine: Add coroutines Stefan Hajnoczi
2011-01-26 15:29 ` Avi Kivity
2011-01-26 16:00 ` Anthony Liguori
2011-01-27 9:40 ` Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 06/12] coroutine: Add qemu_coroutine_self() Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 07/12] coroutine: Add coroutine_is_leader() Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 08/12] coroutine: Add qemu_in_coroutine() Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 09/12] block: Add bdrv_co_readv() and bdrv_co_writev() Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 10/12] block: Add coroutine support to synchronous I/O functions Stefan Hajnoczi
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to use coroutines for async I/O Stefan Hajnoczi
2011-01-23 23:40 ` Anthony Liguori [this message]
2011-01-24 11:09 ` Stefan Hajnoczi
2011-01-26 15:40 ` Avi Kivity
2011-01-26 15:50 ` Kevin Wolf
2011-01-26 16:08 ` Anthony Liguori
2011-01-26 16:13 ` Avi Kivity
2011-01-26 16:28 ` Anthony Liguori
2011-01-26 16:38 ` Avi Kivity
2011-01-26 17:12 ` Anthony Liguori
2011-01-27 9:25 ` Avi Kivity
2011-01-27 9:27 ` Kevin Wolf
2011-01-27 9:49 ` Avi Kivity
2011-01-27 10:34 ` Kevin Wolf
2011-01-27 10:41 ` Avi Kivity
2011-01-27 11:27 ` Kevin Wolf
2011-01-27 12:21 ` Avi Kivity
2011-01-26 16:08 ` Avi Kivity
2011-01-27 10:09 ` Stefan Hajnoczi
2011-01-27 10:46 ` Avi Kivity
2011-01-22 9:29 ` [Qemu-devel] [RFC][PATCH 12/12] qcow2: Serialize all requests Stefan Hajnoczi
2011-01-23 23:31 ` [Qemu-devel] [RFC][PATCH 00/12] qcow2: Convert qcow2 to use coroutines for async I/O Anthony Liguori
2011-02-01 13:23 ` Kevin Wolf
2011-01-24 11:58 ` [Qemu-devel] " Kevin Wolf
2011-01-24 13:10 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3CBC66.1010203@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).