All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
	Eric Blake <eblake@redhat.com>,
	qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: [Qemu-devel] Re: [PATCH v5 4/5] qed: Read/write support
Date: Fri, 03 Dec 2010 17:06:25 +0100	[thread overview]
Message-ID: <4CF91581.40705@redhat.com> (raw)
In-Reply-To: <1290597076-18952-5-git-send-email-stefanha@linux.vnet.ibm.com>

Am 24.11.2010 12:11, schrieb Stefan Hajnoczi:
> This patch implements the read/write state machine.  Operations are
> fully asynchronous and multiple operations may be active at any time.
> 
> Allocating writes lock tables to ensure metadata updates do not
> interfere with each other.  If two allocating writes need to update the
> same L2 table they will run sequentially.  If two allocating writes need
> to update different L2 tables they will run in parallel.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  Makefile.objs    |    1 +
>  block/qed-lock.c |  124 ++++++++++
>  block/qed.c      |  667 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  block/qed.h      |   43 ++++
>  trace-events     |   10 +
>  5 files changed, 843 insertions(+), 2 deletions(-)
>  create mode 100644 block/qed-lock.c

> diff --git a/block/qed.c b/block/qed.c
> index cd1bead..1513763 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -12,8 +12,26 @@
>   *
>   */
>  
> +#include "trace.h"
>  #include "qed.h"
>  
> +static void qed_aio_cancel(BlockDriverAIOCB *blockacb)
> +{
> +    QEDAIOCB *acb = (QEDAIOCB *)blockacb;
> +    bool finished = false;
> +
> +    /* Wait for the request to finish */
> +    acb->finished = &finished;
> +    while (!finished) {
> +        qemu_aio_wait();
> +    }
> +}

Hm, you don't even try to cancel? I wonder how useful the individual
bdrv_aio_cancel implementations actually are when nobody implements
cancellation. It seems to be pretty hard to do it right.

Maybe we should consider implementing a default bdrv_aio_cancel
implementation in block.c that waits for completion?

> +/**
> + * Construct an iovec array for a given length
> + *
> + * @acb:        I/O request
> + * @len:        Maximum number of bytes
> + *
> + * This function can be called several times to build subset iovec arrays of
> + * acb->qiov.  For example:
> + *
> + *   acb->qiov->iov[] = {{0x100000, 1024},
> + *                       {0x200000, 1024}}
> + *
> + *   qed_acb_build_qiov(acb, 512) =>
> + *                      {{0x100000, 512}}
> + *
> + *   qed_acb_build_qiov(acb, 1024) =>
> + *                      {{0x100200, 512},
> + *                       {0x200000, 512}}
> + *
> + *   qed_acb_build_qiov(acb, 512) =>
> + *                      {{0x200200, 512}}
> + */
> +static void qed_acb_build_qiov(QEDAIOCB *acb, size_t len)
> +{
> +    struct iovec *iov_end = &acb->qiov->iov[acb->qiov->niov];
> +    size_t iov_offset = acb->cur_iov_offset;
> +    struct iovec *iov = acb->cur_iov;
> +
> +    while (iov != iov_end && len > 0) {
> +        size_t nbytes = MIN(iov->iov_len - iov_offset, len);
> +
> +        qemu_iovec_add(&acb->cur_qiov, iov->iov_base + iov_offset, nbytes);
> +        iov_offset += nbytes;
> +        len -= nbytes;
> +
> +        if (iov_offset >= iov->iov_len) {
> +            iov_offset = 0;
> +            iov++;
> +        }
> +    }
> +
> +    /* Stash state for next time */
> +    acb->cur_iov = iov;
> +    acb->cur_iov_offset = iov_offset;
> +}

Wouldn't it be much simpler to just store the offset into acb->qiov and
to use qemu_iovec_copy to get the subset qiov?

> +/**
> + * Write data to the image file
> + */
> +static void qed_aio_write_main(void *opaque, int ret)
> +{
> +    QEDAIOCB *acb = opaque;
> +    BDRVQEDState *s = acb_to_s(acb);
> +    uint64_t offset = acb->cur_cluster;
> +    BlockDriverCompletionFunc *next_fn;
> +    BlockDriverAIOCB *file_acb;
> +
> +    trace_qed_aio_write_main(s, acb, ret, offset, acb->cur_qiov.size);

Why does the trace use a different offset...

> +
> +    if (ret) {
> +        qed_aio_complete(acb, ret);
> +        return;
> +    }
> +
> +    if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
> +        next_fn = qed_aio_next_io;
> +    } else {
> +        if (s->bs->backing_hd) {
> +            next_fn = qed_aio_write_flush_before_l2_update;
> +        } else {
> +            next_fn = qed_aio_write_l2_update;
> +        }
> +    }
> +
> +    offset += qed_offset_into_cluster(s, acb->cur_pos);

...than the write uses?

I missed this line at first because offset is initialized above, so I
didn't expect that this was only half of its initialization.

> +    BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
> +    file_acb = bdrv_aio_writev(s->bs->file, offset / BDRV_SECTOR_SIZE,
> +                               &acb->cur_qiov,
> +                               acb->cur_qiov.size / BDRV_SECTOR_SIZE,
> +                               next_fn, acb);
> +    if (!file_acb) {
> +        qed_aio_complete(acb, -EIO);
> +    }
> +}
> +
> +/**
> + * Populate back untouched region of new data cluster
> + */
> +static void qed_aio_write_postfill(void *opaque, int ret)
> +{
> +    QEDAIOCB *acb = opaque;
> +    BDRVQEDState *s = acb_to_s(acb);
> +    uint64_t start = acb->cur_pos + acb->cur_qiov.size;
> +    uint64_t len = qed_start_of_cluster(s, start + s->header.cluster_size - 1) - start;
> +    uint64_t offset = acb->cur_cluster + qed_offset_into_cluster(s, acb->cur_pos) + acb->cur_qiov.size;

This look like more than 80 characters.

Kevin

  reply	other threads:[~2010-12-03 16:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-24 11:11 [Qemu-devel] [PATCH v5 0/5] qed: Add QEMU Enhanced Disk format Stefan Hajnoczi
2010-11-24 11:11 ` [Qemu-devel] [PATCH v5 1/5] docs: Add QED image format specification Stefan Hajnoczi
2010-11-24 11:11 ` [Qemu-devel] [PATCH v5 2/5] qed: Add QEMU Enhanced Disk image format Stefan Hajnoczi
2010-11-24 11:11 ` [Qemu-devel] [PATCH v5 3/5] qed: Table, L2 cache, and cluster functions Stefan Hajnoczi
2010-11-24 11:11 ` [Qemu-devel] [PATCH v5 4/5] qed: Read/write support Stefan Hajnoczi
2010-12-03 16:06   ` Kevin Wolf [this message]
2010-12-03 20:29     ` [Qemu-devel] " Stefan Hajnoczi
2010-11-24 11:11 ` [Qemu-devel] [PATCH v5 5/5] qed: Consistency check support Stefan Hajnoczi
2010-11-24 14:43 ` [Qemu-devel] [PATCH v5 0/5] qed: Add QEMU Enhanced Disk format Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF91581.40705@redhat.com \
    --to=kwolf@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=eblake@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.