qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Charlie Shepherd <charlie@ctshepherd.com>
Cc: kwolf@redhat.com, stefanha@gmail.com, gabriel@kerneis.info,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/2] Make cow_co_is_allocated and cow_update_bitmap more efficient
Date: Wed, 21 Aug 2013 10:14:37 +0200	[thread overview]
Message-ID: <521476ED.5030308@redhat.com> (raw)
In-Reply-To: <5213F37C.8090300@ctshepherd.com>

Il 21/08/2013 00:53, Charlie Shepherd ha scritto:
> On 20/08/2013 21:48, Paolo Bonzini wrote:
>> Il 20/08/2013 20:34, Charlie Shepherd ha scritto:
>>>   /* Return true if first block has been changed (ie. current version is
>>> @@ -146,40 +114,82 @@ static inline int is_bit_set(BlockDriverState
>>> *bs, int64_t bitnum)
>>>   static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
>>>           int64_t sector_num, int nb_sectors, int *num_same)
>>>   {
>>> -    int changed;
>>> +    int ret, changed;
>>> +    uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8;
>>> +
>>> +    int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0;
>>> +    int remaining = sector_num - init_bits;
>>> +    int full_bytes = remaining / 8;
>>> +    int trail = remaining % 8;
>>> +
>>> +    int len = !!init_bits + full_bytes + !!trail;
>>> +    uint8_t bitmap[len];
>> This is a basically unbounded allocation on the stack.  You should split
>> this in smaller ranges using the "num_same" argument, which is what I
>> did in my patch.
> 
> So if I understand your patch correctly, you read the next 512 bytes
> (ie, one BDRV_SECTOR_SIZE) after offset into bitmap? Is this guaranteed
> to be safe (like if the file isn't that long)?

Yes, because the bitmap is always before the data, and always rounded to
full sector size so that the data stays aligned:

    bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
    s->cow_sectors_offset = (bitmap_size + 511) & ~511;

> What if nb_sectors > 512 * 8?

For cow_co_is_allocated, you have the luxury of returning information
only for the fewer than nb_sectors.  That is, you can set *num_same to a
smaller value than nb_sectors, even if sector_num + *num_same has the
same state as the [sector_num, sector_num + *num_same) range.  It will
cause extra calls to is_allocated in the callers, but that's it.

> I think it's best to use your version of cow_co_is_allocated(), but
> those are the questions that come to mind when trying to convert the
> stack allocation in cow_update_bitmap()

Good point.

>>> +    ret = bdrv_pread(bs->file, offset, buf, len);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    /* Do sector_num -> nearest byte boundary */
>>> +    if (init_bits) {
>>> +        /* This sets the highest init_bits bits in the byte */
>>> +        uint8_t bits = ((1 << init_bits) - 1) << (8 - init_bits);
>>> +        buf[0] |= bits;
>>> +    }
>>> +
>>> +    if (full_bytes) {
>>> +        memset(&buf[!!init_bits], ~0, full_bytes);
>>> +    }
>>> +
>>> +    /* Set the trailing bits in the final byte */
>>> +    if (trail) {
>>> +        /* This sets the lowest trail bits in the byte */
>>> +        uint8_t bits = (1 << trail) - 1;
>>> +        buf[len - 1] |= bits;
>>> +    }
>> ... and you should also check if there is a change in the bits, and skip
>> the flush if there is no change.  Flushing a multi-megabyte write is
>> very expensive.  It basically makes format=cow as slow as
>> format=raw,cache=writethrough.
> 
> So if ORing the allocation makes no difference, don't flush?

Yep!  This means if an image is fully COW-ed, there will be no extra
flush (only extra reads).

I already did this, but in a very inefficient way because each bit would
be read separately.

Paolo

> 
> Charlie
>>> +    ret = bdrv_pwrite(bs->file, offset, buf, len);
>>> +    if (ret < 0) {
>>> +        return ret;
>>>       }
>>>   -    return error;
>>> +    return 0;
>>>   }
>>>     static int coroutine_fn cow_read(BlockDriverState *bs, int64_t
>>> sector_num,
>>> @@ -237,6 +247,13 @@ static int cow_write(BlockDriverState *bs,
>>> int64_t sector_num,
>>>           return ret;
>>>       }
>>>   +    /* We need to flush the data before writing the metadata so
>>> that there is
>>> +     * no chance of metadata referring to data that doesn't exist. */
>>> +    ret = bdrv_flush(bs->file);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>> See above about this flush.
>>
>> Paolo
>>
>>>       return cow_update_bitmap(bs, sector_num, nb_sectors);
>>>   }
>>>  
> 
> 
> 

  reply	other threads:[~2013-08-21  8:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-20 18:34 [Qemu-devel] [PATCH 1/2] Make cow_co_is_allocated and cow_update_bitmap more efficient Charlie Shepherd
2013-08-20 18:37 ` Charlie Shepherd
2013-08-20 20:48 ` Paolo Bonzini
2013-08-20 22:53   ` Charlie Shepherd
2013-08-21  8:14     ` Paolo Bonzini [this message]
2013-08-21  9:11       ` Charlie Shepherd
2013-08-21  9:19         ` Paolo Bonzini
2013-08-21 10:31           ` Charlie Shepherd
2013-08-21 10:38             ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521476ED.5030308@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=charlie@ctshepherd.com \
    --cc=gabriel@kerneis.info \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).