From: Paolo Bonzini <pbonzini@redhat.com>
To: Charlie Shepherd <charlie@ctshepherd.com>
Cc: kwolf@redhat.com, stefanha@gmail.com, gabriel@kerneis.info,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/2] Make cow_co_is_allocated and cow_update_bitmap more efficient
Date: Wed, 21 Aug 2013 10:14:37 +0200 [thread overview]
Message-ID: <521476ED.5030308@redhat.com> (raw)
In-Reply-To: <5213F37C.8090300@ctshepherd.com>
Il 21/08/2013 00:53, Charlie Shepherd ha scritto:
> On 20/08/2013 21:48, Paolo Bonzini wrote:
>> Il 20/08/2013 20:34, Charlie Shepherd ha scritto:
>>> /* Return true if first block has been changed (ie. current version is
>>> @@ -146,40 +114,82 @@ static inline int is_bit_set(BlockDriverState
>>> *bs, int64_t bitnum)
>>> static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
>>> int64_t sector_num, int nb_sectors, int *num_same)
>>> {
>>> - int changed;
>>> + int ret, changed;
>>> + uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8;
>>> +
>>> + int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0;
>>> + int remaining = sector_num - init_bits;
>>> + int full_bytes = remaining / 8;
>>> + int trail = remaining % 8;
>>> +
>>> + int len = !!init_bits + full_bytes + !!trail;
>>> + uint8_t bitmap[len];
>> This is a basically unbounded allocation on the stack. You should split
>> this in smaller ranges using the "num_same" argument, which is what I
>> did in my patch.
>
> So if I understand your patch correctly, you read the next 512 bytes
> (ie, one BDRV_SECTOR_SIZE) after offset into bitmap? Is this guaranteed
> to be safe (like if the file isn't that long)?
Yes, because the bitmap is always before the data, and always rounded to
full sector size so that the data stays aligned:
bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
s->cow_sectors_offset = (bitmap_size + 511) & ~511;
> What if nb_sectors > 512 * 8?
For cow_co_is_allocated, you have the luxury of returning information
only for the fewer than nb_sectors. That is, you can set *num_same to a
smaller value than nb_sectors, even if sector_num + *num_same has the
same state as the [sector_num, sector_num + *num_same) range. It will
cause extra calls to is_allocated in the callers, but that's it.
> I think it's best to use your version of cow_co_is_allocated(), but
> those are the questions that come to mind when trying to convert the
> stack allocation in cow_update_bitmap()
Good point.
>>> + ret = bdrv_pread(bs->file, offset, buf, len);
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> +
>>> + /* Do sector_num -> nearest byte boundary */
>>> + if (init_bits) {
>>> + /* This sets the highest init_bits bits in the byte */
>>> + uint8_t bits = ((1 << init_bits) - 1) << (8 - init_bits);
>>> + buf[0] |= bits;
>>> + }
>>> +
>>> + if (full_bytes) {
>>> + memset(&buf[!!init_bits], ~0, full_bytes);
>>> + }
>>> +
>>> + /* Set the trailing bits in the final byte */
>>> + if (trail) {
>>> + /* This sets the lowest trail bits in the byte */
>>> + uint8_t bits = (1 << trail) - 1;
>>> + buf[len - 1] |= bits;
>>> + }
>> ... and you should also check if there is a change in the bits, and skip
>> the flush if there is no change. Flushing a multi-megabyte write is
>> very expensive. It basically makes format=cow as slow as
>> format=raw,cache=writethrough.
>
> So if ORing the allocation makes no difference, don't flush?
Yep! This means if an image is fully COW-ed, there will be no extra
flush (only extra reads).
I already did this, but in a very inefficient way because each bit would
be read separately.
Paolo
>
> Charlie
>>> + ret = bdrv_pwrite(bs->file, offset, buf, len);
>>> + if (ret < 0) {
>>> + return ret;
>>> }
>>> - return error;
>>> + return 0;
>>> }
>>> static int coroutine_fn cow_read(BlockDriverState *bs, int64_t
>>> sector_num,
>>> @@ -237,6 +247,13 @@ static int cow_write(BlockDriverState *bs,
>>> int64_t sector_num,
>>> return ret;
>>> }
>>> + /* We need to flush the data before writing the metadata so
>>> that there is
>>> + * no chance of metadata referring to data that doesn't exist. */
>>> + ret = bdrv_flush(bs->file);
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>> See above about this flush.
>>
>> Paolo
>>
>>> return cow_update_bitmap(bs, sector_num, nb_sectors);
>>> }
>>>
>
>
>
next prev parent reply other threads:[~2013-08-21 8:15 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-20 18:34 [Qemu-devel] [PATCH 1/2] Make cow_co_is_allocated and cow_update_bitmap more efficient Charlie Shepherd
2013-08-20 18:37 ` Charlie Shepherd
2013-08-20 20:48 ` Paolo Bonzini
2013-08-20 22:53 ` Charlie Shepherd
2013-08-21 8:14 ` Paolo Bonzini [this message]
2013-08-21 9:11 ` Charlie Shepherd
2013-08-21 9:19 ` Paolo Bonzini
2013-08-21 10:31 ` Charlie Shepherd
2013-08-21 10:38 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=521476ED.5030308@redhat.com \
--to=pbonzini@redhat.com \
--cc=charlie@ctshepherd.com \
--cc=gabriel@kerneis.info \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.