All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org
Subject: [Qemu-devel] Re: [RFC PATCH v3 1/4] block: Implement bdrv_aio_pwrite
Date: Thu, 02 Dec 2010 13:30:38 +0100	[thread overview]
Message-ID: <4CF7916E.6050006@redhat.com> (raw)
In-Reply-To: <AANLkTimepq9aAc1WKRvQzU9h7dZnEcfmSv2C7gpSjvgw@mail.gmail.com>

Am 02.12.2010 13:07, schrieb Stefan Hajnoczi:
> On Tue, Nov 30, 2010 at 12:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> This implements an asynchronous version of bdrv_pwrite.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block.c |  167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block.h |    2 +
>>  2 files changed, 169 insertions(+), 0 deletions(-)
> 
> Is this function is necessary?
> 
> Current synchronous code uses pwrite() so this function makes it easy
> to convert existing code.  But if that code took the block-based
> nature of storage into account then this read-modify-write helper
> isn't needed.

For qcow2, most writes (refcount tables, L2 tables, etc.) are aligned to
512 byte sectors, but there are still some left that use pwrite with an
unaligned count. I'm not completely sure which data, but qemu-iotests
crashed with tmp_buf == NULL, so there are some ;-) Probably things like
header and snapshot table writes.

I'm not sure what other image formats do (we might want to use
block-queue for them, too, eventually), but usually that means that they
do strange things.

> I guess what I'm saying is that this function should only be used when
> you really need rmw (in many cases with image metadata it can be
> avoided because you have enough metadata cached in memory to do full
> sector writes).  If it turns out we don't need rmw then we can
> eliminate this function.

Maybe what we really should do is completely change the block layer
functions to use bytes as their unit and do any RMW in posix-aio-compat
and linux-aio. Other backends don't need it and without O_DIRECT we
don't even need to do it with files.

Also, using units of 512 bytes is completely arbitrary and may still
involve RMW if the host uses a different sector size.

>> +    switch (acb->state) {
>> +    case 0: {
>> +        /* Read first sector if needed */
> 
> Please use an enum instead of int literals with comments.  Or you
> could try separate functions and see if the switch statement really
> saves that many lines of code.

Okay, will use an enum.

I think the switch may not save that many lines of code, but it improves
readability because with chained functions (and no forward declarations)
you have to read backwards.

>> +    case 3: {
>> +        /* Read last sector if needed */
>> +        if (acb->bytes == 0) {
>> +            goto done;
>> +        }
>> +
>> +        acb->state = 4;
>> +        acb->iov.iov_base = acb->tmp_buf;
> 
> acb->tmp_buf may be NULL here if we took the state transition to 2
> instead of doing 1.

Yup, is already fixed.

>> +done:
>> +    qemu_free(acb->tmp_buf);
>> +    acb->common.cb(acb->common.opaque, ret);
> 
> Callback not invoked from a BH.  In an error case we might have made
> no blocking calls, i.e. never returned and this callback can cause
> reentrancy.

Good point.

>> +BlockDriverAIOCB *bdrv_aio_pwrite(BlockDriverState *bs, int64_t offset,
>> +    void* buf, size_t bytes, BlockDriverCompletionFunc *cb, void *opaque)
>> +{
>> +    PwriteAIOCB *acb;
>> +
>> +    acb = qemu_aio_get(&blkqueue_aio_pool, bs, cb, opaque);
>> +    acb->state      = 0;
>> +    acb->offset     = offset;
>> +    acb->buf        = buf;
>> +    acb->bytes      = bytes;
>> +    acb->tmp_buf    = NULL;
>> +
>> +    bdrv_aio_pwrite_cb(acb, 0);
> 
> We're missing the usual !bs->drv, bs->read_only, bdrv_check_request()
> checks here.  Are we okay to wait until calling
> bdrv_aio_readv/bdrv_aio_writev for these checks?

I think we are, but if you prefer, I can copy them here.

Kevin

  reply	other threads:[~2010-12-02 13:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-30 12:48 [Qemu-devel] [RFC PATCH v3 0/4] block-queue: Delay and batch metadata writes Kevin Wolf
2010-11-30 12:48 ` [Qemu-devel] [RFC PATCH v3 1/4] block: Implement bdrv_aio_pwrite Kevin Wolf
2010-12-02 12:07   ` [Qemu-devel] " Stefan Hajnoczi
2010-12-02 12:30     ` Kevin Wolf [this message]
2010-12-02 13:04       ` Stefan Hajnoczi
2010-11-30 12:48 ` [Qemu-devel] [RFC PATCH v3 2/4] Add block-queue Kevin Wolf
2010-12-03  9:44   ` [Qemu-devel] " Stefan Hajnoczi
2010-11-30 12:48 ` [Qemu-devel] [RFC PATCH v3 3/4] Test cases for block-queue Kevin Wolf
2010-11-30 12:48 ` [Qemu-devel] [RFC PATCH v3 4/4] qcow2: Preliminary block-queue support Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF7916E.6050006@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.