Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
To: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, den@openvz.org, stefanha@redhat.com,
	"pbonzini >> Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
Date: Tue, 13 Jan 2015 15:59:50 +0300	[thread overview]
Message-ID: <54B516C6.7010403@parallels.com> (raw)
In-Reply-To: <54AEF4E8.5070106@redhat.com>


Best regards,
Vladimir

On 09.01.2015 00:21, John Snow wrote:
>
>
> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Functions to store / restore HBitmap. HBitmap should be saved to linear
>> bitmap format independently of endianess.
>>
>> Because of restoring in several steps, every step writes only the last
>> level of the bitmap. All other levels are restored by
>> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
>> inconsistent while restoring.
.....
>
> I guess by nature of how bitmap migration has been implemented, we 
> cannot help but restore parts of the bitmap piece by piece, which 
> requires us to force the bitmap into an inconsistent state.
>
> Yuck.
>
> It might be "nice" to turn on a disable bit inside the hbitmap that 
> disallows its use until it is made consistent again, but I don't know 
> what the performance hit of the extra conditionals everywhere would be 
> like.
>

Another approach is to restore the bitmap after each piece set. It is a 
bit slower of course but the bitmap will be consistent after 
hbitmap_restore_data()

>> +/**
>> + * hbitmap_restore_finish
>> + * @hb: HBitmap to operate on.
>> + *
>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all 
>> HBitmap
>> + * layers are restore here.
>> + */
>> +void hbitmap_restore_finish(HBitmap *hb);
>> +
>> +/**
>>    * hbitmap_free:
>>    * @hb: HBitmap to operate on.
>>    *
>
> These are just biased opinions:
>
> - It might be nice to name the store/restore functions "serialize" and 
> "deserialize," to describe exactly what they are doing.
>
> - I might refer to "restore_finish" as "post_load" or "post_restore" 
> or something similar to mimic how device migration functions are 
> named. I think hbitmap_restore_data_finalize would also be fine; the 
> key part here is clearly naming it relative to "restore_data."
>

Hmm. Ok, what about the following set:
     hbitmap_serialize()
     hbitmap_deserialize_part()
     hbitmap_deserialize_finish()

>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>> index 8aa7406..ac0323f 100644
>> --- a/util/hbitmap.c
>> +++ b/util/hbitmap.c
>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & 
>> bit) != 0;
>>   }
>>
>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>> +{
>> +    uint64_t size;
>> +
>> +    if (count == 0) {
>> +        return 0;
>> +    }
>> +
>> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>> +
>> +    return size * sizeof(unsigned long);
>> +}
>> +
>
> This seems flawed to me: number of bits without an offset can't be 
> mapped to a number of real bytes, because "two bits" may or may not 
> cross a granularity boundary, depending on *WHERE* you start counting.
>
> e.g.
>
> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
> virtual: 001100
> real:     0 1 0
>
> The amount of space required to hold "two bits" here could be as 
> little as one bit, if the offset is k={0,2,4} but it could be as much 
> as two bits if the offset is k={1,3}.
>
> You may never use the function in this way, but mapping virtual bits 
> to an implementation byte-size seems like it is inviting an 
> inconsistency.
I dislike this function too.. But unfortunately we need the size in 
bytes used for serialization.

Hmm. Ok, without loss of generality, let offset be less than 
granularity. The precise formula should look like:

size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);

So,
size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >> 
BITS_PER_LEVEL) + 1;
would be enough in any case. Ok?



>
>> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
>> +                        uint64_t start, uint64_t count)
>> +{
>> +    uint64_t last = start + count - 1;
>> +    unsigned long *out = (unsigned long *)buf;
>> +
>> +    if (count == 0) {
>> +        return;
>> +    }
>> +
>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>> +    count = last - start + 1;
>> +
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> +    for (i = start; i <= last; ++i) {
>> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
>> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : 
>> cpu_to_le64(el));
>> +    }
>> +#else
>> +    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
>> +           count * sizeof(unsigned long));
>> +#endif
>> +}
>> +
>
> I suppose the ifdefs are trying to take an optimization by using 
> memcpy if at all possible, without a conversion.
>
> Why are the conversions to little endian, though? Shouldn't we be 
> serializing to a Big Endian format?
As Paolo already explained it is for portability. LE bitmap 
representation is independent of size of bitmap array elements, which 
may be 32bit/64bit, or whatever else with LE format.
>
>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>> +                          uint64_t start, uint64_t count)
>> +{
>> +    uint64_t last = start + count - 1;
>> +    unsigned long *in = (unsigned long *)buf;
>> +
>> +    if (count == 0) {
>> +        return;
>> +    }
>> +
>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>> +    count = last - start + 1;
>> +
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> +    for (i = start; i <= last; ++i) {
>> +        hb->levels[HBITMAP_LEVELS - 1][i] =
>> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) : 
>> be64_to_cpu(in[i]));
>> +    }
>> +#else
>> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>> +           count * sizeof(unsigned long));
>> +#endif
>> +}
>> +
>
> ...? We're storing as LE but restoring from BE? I'm confused.
Oh, it's a mistake. Thanks. I've missed it when testing because of 
{#ifdef __BIG_ENDIAN_BITFIELD } branch was never compiled.
>
> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want 
> to pack differently based on how C-bitfields are packed by the 
> compiler? I don't think that has any impact on how longs are stored 
> (always in the host native format.)
>
> I would rather see the serialize/deserialize always convert to and 
> from BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think 
> trying to optimize the pack/unpack in the no-op condition with a # 
> define and a memcpy is inviting portability problems.
Ok. I'll leave only branch with for(...) in v2.
>
>> +void hbitmap_restore_finish(HBitmap *bitmap)
>> +{
>> +    int64_t i, size;
>> +    int lev, j;
>> +
>> +    /* restore levels starting from penultimate to zero level, assuming
>> +     * that the last one is ok */
>> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 
>> 1);
>> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
>> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>> +        for (i = 0; i < size; ++i) {
>> +            bitmap->levels[lev][i] = 0;
>> +            for (j = 0; j < BITS_PER_LONG; ++j) {
>> +                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
>> +                    bitmap->levels[lev][i] |=  (1 << j);
>> +                }
>> +            }
>
> Just a musing: Should we cache the size of each level? I know we can 
> keep recalculating it, but... why bother? We recalculate it in so many 
> places now.
Interesting point.

However, there is another real bug: there are may be no longs on the 
next level for last bits of the current one. The new version of this 
function will be in v2.
>
>> +        }
>> +    }
>> +}
>> +
>>   void hbitmap_free(HBitmap *hb)
>>   {
>>       unsigned i;
>>

next prev parent reply	other threads:[~2015-01-13 13:00 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
2015-01-08 21:19   ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
2015-01-08 21:20   ` John Snow
2015-01-09 19:01     ` Dr. David Alan Gilbert
2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
2015-01-08 21:20   ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
2015-01-08 21:21   ` John Snow
2015-01-08 21:37     ` Paolo Bonzini
2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy [this message]
2015-01-13 17:08       ` John Snow
2015-01-14 10:29         ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
2015-01-08 21:22   ` John Snow
2015-01-14 11:27   ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
2015-01-08 21:23   ` John Snow
2015-01-14 12:26     ` Vladimir Sementsov-Ogievskiy
2015-01-14 16:53       ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
2015-01-08 21:24   ` John Snow
2015-01-16 12:54     ` Vladimir Sementsov-Ogievskiy
2015-01-08 22:28   ` Paolo Bonzini
2015-01-16 13:03     ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2014-12-11 15:18   ` Eric Blake
2014-12-15  8:33     ` Vladimir Sementsov-Ogievskiy
2015-01-08 21:51   ` John Snow
2015-01-08 22:29     ` Eric Blake
2015-01-08 22:31       ` John Snow
2015-01-08 22:37     ` Paolo Bonzini
2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-08 22:05   ` John Snow
2015-01-17 17:17     ` Vladimir Sementsov-Ogievskiy
2015-01-20 17:25       ` John Snow
2015-01-08 22:36   ` Paolo Bonzini
2015-01-08 22:45     ` Eric Blake
2015-01-08 22:49       ` Paolo Bonzini
2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
2015-01-12 14:42       ` Paolo Bonzini
2015-01-12 16:48   ` Paolo Bonzini
2015-01-12 17:31     ` John Snow
2015-01-12 19:09       ` Paolo Bonzini
2015-01-13  9:16         ` Vladimir Sementsov-Ogievskiy
2015-01-13 16:35           ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B516C6.7010403@parallels.com \
    --to=vsementsov@parallels.com \
    --cc=den@openvz.org \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).