qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>,
	qemu-devel@nongnu.org
Cc: kwolf@redhat.com, den@openvz.org, stefanha@redhat.com,
	"pbonzini >> Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
Date: Tue, 13 Jan 2015 12:08:40 -0500	[thread overview]
Message-ID: <54B55118.1020607@redhat.com> (raw)
In-Reply-To: <54B516C6.7010403@parallels.com>



On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> Best regards,
> Vladimir
>
> On 09.01.2015 00:21, John Snow wrote:
>>
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Functions to store / restore HBitmap. HBitmap should be saved to linear
>>> bitmap format independently of endianess.
>>>
>>> Because of restoring in several steps, every step writes only the last
>>> level of the bitmap. All other levels are restored by
>>> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
>>> inconsistent while restoring.
> .....
>>
>> I guess by nature of how bitmap migration has been implemented, we
>> cannot help but restore parts of the bitmap piece by piece, which
>> requires us to force the bitmap into an inconsistent state.
>>
>> Yuck.
>>
>> It might be "nice" to turn on a disable bit inside the hbitmap that
>> disallows its use until it is made consistent again, but I don't know
>> what the performance hit of the extra conditionals everywhere would be
>> like.
>>
>
> Another approach is to restore the bitmap after each piece set. It is a
> bit slower of course but the bitmap will be consistent after
> hbitmap_restore_data()
>

Yeah, that's not great either. The approach you have already taken is 
probably the best.

>>> +/**
>>> + * hbitmap_restore_finish
>>> + * @hb: HBitmap to operate on.
>>> + *
>>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all
>>> HBitmap
>>> + * layers are restore here.
>>> + */
>>> +void hbitmap_restore_finish(HBitmap *hb);
>>> +
>>> +/**
>>>    * hbitmap_free:
>>>    * @hb: HBitmap to operate on.
>>>    *
>>
>> These are just biased opinions:
>>
>> - It might be nice to name the store/restore functions "serialize" and
>> "deserialize," to describe exactly what they are doing.
>>
>> - I might refer to "restore_finish" as "post_load" or "post_restore"
>> or something similar to mimic how device migration functions are
>> named. I think hbitmap_restore_data_finalize would also be fine; the
>> key part here is clearly naming it relative to "restore_data."
>>
>
> Hmm. Ok, what about the following set:
>      hbitmap_serialize()
>      hbitmap_deserialize_part()
>      hbitmap_deserialize_finish()
>

Looks good to me!

>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 8aa7406..ac0323f 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>>>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] &
>>> bit) != 0;
>>>   }
>>>
>>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>>> +{
>>> +    uint64_t size;
>>> +
>>> +    if (count == 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>>> +
>>> +    return size * sizeof(unsigned long);
>>> +}
>>> +
>>
>> This seems flawed to me: number of bits without an offset can't be
>> mapped to a number of real bytes, because "two bits" may or may not
>> cross a granularity boundary, depending on *WHERE* you start counting.
>>
>> e.g.
>>
>> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
>> virtual: 001100
>> real:     0 1 0
>>
>> The amount of space required to hold "two bits" here could be as
>> little as one bit, if the offset is k={0,2,4} but it could be as much
>> as two bits if the offset is k={1,3}.
>>
>> You may never use the function in this way, but mapping virtual bits
>> to an implementation byte-size seems like it is inviting an
>> inconsistency.
> I dislike this function too.. But unfortunately we need the size in
> bytes used for serialization.
>
> Hmm. Ok, without loss of generality, let offset be less than
> granularity. The precise formula should look like:
>
> size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);
>
> So,
> size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >>
> BITS_PER_LEVEL) + 1;
> would be enough in any case. Ok?
>

I think so, as long as when you deserialize the object it does so 
correctly, regardless of whether or not you start on an even multiple of 
the granularity.

>
>
>>
>>> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
>>> +                        uint64_t start, uint64_t count)
>>> +{
>>> +    uint64_t last = start + count - 1;
>>> +    unsigned long *out = (unsigned long *)buf;
>>> +
>>> +    if (count == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> +    for (i = start; i <= last; ++i) {
>>> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
>>> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) :
>>> cpu_to_le64(el));
>>> +    }
>>> +#else
>>> +    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
>>> +           count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> I suppose the ifdefs are trying to take an optimization by using
>> memcpy if at all possible, without a conversion.
>>
>> Why are the conversions to little endian, though? Shouldn't we be
>> serializing to a Big Endian format?
> As Paolo already explained it is for portability. LE bitmap
> representation is independent of size of bitmap array elements, which
> may be 32bit/64bit, or whatever else with LE format.

Yes, that makes sense. :)

>>
>>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>>> +                          uint64_t start, uint64_t count)
>>> +{
>>> +    uint64_t last = start + count - 1;
>>> +    unsigned long *in = (unsigned long *)buf;
>>> +
>>> +    if (count == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> +    for (i = start; i <= last; ++i) {
>>> +        hb->levels[HBITMAP_LEVELS - 1][i] =
>>> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) :
>>> be64_to_cpu(in[i]));
>>> +    }
>>> +#else
>>> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>>> +           count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> ...? We're storing as LE but restoring from BE? I'm confused.
> Oh, it's a mistake. Thanks. I've missed it when testing because of
> {#ifdef __BIG_ENDIAN_BITFIELD } branch was never compiled.

Glad I am not crazy :)

>>
>> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want
>> to pack differently based on how C-bitfields are packed by the
>> compiler? I don't think that has any impact on how longs are stored
>> (always in the host native format.)
>>
>> I would rather see the serialize/deserialize always convert to and
>> from BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think
>> trying to optimize the pack/unpack in the no-op condition with a #
>> define and a memcpy is inviting portability problems.
> Ok. I'll leave only branch with for(...) in v2.

You can correct me if I am mistaken, but I seem to recall determining 
endianness in a platform, compiler and libc independent way is very 
difficult during the pre-processor stage.

If there _IS_ some way to do this, the optimization is fine, but I do 
not think it will work as written.

The un-optimized version is the safest.

>>
>>> +void hbitmap_restore_finish(HBitmap *bitmap)
>>> +{
>>> +    int64_t i, size;
>>> +    int lev, j;
>>> +
>>> +    /* restore levels starting from penultimate to zero level, assuming
>>> +     * that the last one is ok */
>>> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL,
>>> 1);
>>> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
>>> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>>> +        for (i = 0; i < size; ++i) {
>>> +            bitmap->levels[lev][i] = 0;
>>> +            for (j = 0; j < BITS_PER_LONG; ++j) {
>>> +                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
>>> +                    bitmap->levels[lev][i] |=  (1 << j);
>>> +                }
>>> +            }
>>
>> Just a musing: Should we cache the size of each level? I know we can
>> keep recalculating it, but... why bother? We recalculate it in so many
>> places now.
> Interesting point.
>
> However, there is another real bug: there are may be no longs on the
> next level for last bits of the current one. The new version of this
> function will be in v2.

Great :)

>>
>>> +        }
>>> +    }
>>> +}
>>> +
>>>   void hbitmap_free(HBitmap *hb)
>>>   {
>>>       unsigned i;
>>>
>

Thank you!
--js

  reply	other threads:[~2015-01-13 17:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
2015-01-08 21:19   ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
2015-01-08 21:20   ` John Snow
2015-01-09 19:01     ` Dr. David Alan Gilbert
2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
2015-01-08 21:20   ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
2015-01-08 21:21   ` John Snow
2015-01-08 21:37     ` Paolo Bonzini
2015-01-13 12:59     ` Vladimir Sementsov-Ogievskiy
2015-01-13 17:08       ` John Snow [this message]
2015-01-14 10:29         ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
2015-01-08 21:22   ` John Snow
2015-01-14 11:27   ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
2015-01-08 21:23   ` John Snow
2015-01-14 12:26     ` Vladimir Sementsov-Ogievskiy
2015-01-14 16:53       ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
2015-01-08 21:24   ` John Snow
2015-01-16 12:54     ` Vladimir Sementsov-Ogievskiy
2015-01-08 22:28   ` Paolo Bonzini
2015-01-16 13:03     ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2014-12-11 15:18   ` Eric Blake
2014-12-15  8:33     ` Vladimir Sementsov-Ogievskiy
2015-01-08 21:51   ` John Snow
2015-01-08 22:29     ` Eric Blake
2015-01-08 22:31       ` John Snow
2015-01-08 22:37     ` Paolo Bonzini
2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-08 22:05   ` John Snow
2015-01-17 17:17     ` Vladimir Sementsov-Ogievskiy
2015-01-20 17:25       ` John Snow
2015-01-08 22:36   ` Paolo Bonzini
2015-01-08 22:45     ` Eric Blake
2015-01-08 22:49       ` Paolo Bonzini
2015-01-12 14:20     ` Vladimir Sementsov-Ogievskiy
2015-01-12 14:42       ` Paolo Bonzini
2015-01-12 16:48   ` Paolo Bonzini
2015-01-12 17:31     ` John Snow
2015-01-12 19:09       ` Paolo Bonzini
2015-01-13  9:16         ` Vladimir Sementsov-Ogievskiy
2015-01-13 16:35           ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B55118.1020607@redhat.com \
    --to=jsnow@redhat.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).