From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45210)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1YB4xR-0000NH-QG
	for qemu-devel@nongnu.org; Tue, 13 Jan 2015 12:08:47 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1YB4xP-0007gU-1h
	for qemu-devel@nongnu.org; Tue, 13 Jan 2015 12:08:45 -0500
Received: from mx1.redhat.com ([209.132.183.28]:54515)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1YB4xO-0007gH-HC
	for qemu-devel@nongnu.org; Tue, 13 Jan 2015 12:08:42 -0500
Message-ID: <54B55118.1020607@redhat.com>
Date: Tue, 13 Jan 2015 12:08:40 -0500
From: John Snow <jsnow@redhat.com>
MIME-Version: 1.0
References: <1418307457-25996-1-git-send-email-vsementsov@parallels.com>
	<1418307457-25996-5-git-send-email-vsementsov@parallels.com>
	<54AEF4E8.5070106@redhat.com> <54B516C6.7010403@parallels.com>
In-Reply-To: <54B516C6.7010403@parallels.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, den@openvz.org, stefanha@redhat.com, "pbonzini >> Paolo Bonzini" <pbonzini@redhat.com>


On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> Best regards,
> Vladimir
>
> On 09.01.2015 00:21, John Snow wrote:
>>
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Functions to store / restore HBitmap. HBitmap should be saved to linear
>>> bitmap format independently of endianess.
>>>
>>> Because of restoring in several steps, every step writes only the last
>>> level of the bitmap. All other levels are restored by
>>> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
>>> inconsistent while restoring.
> .....
>>
>> I guess by nature of how bitmap migration has been implemented, we
>> cannot help but restore parts of the bitmap piece by piece, which
>> requires us to force the bitmap into an inconsistent state.
>>
>> Yuck.
>>
>> It might be "nice" to turn on a disable bit inside the hbitmap that
>> disallows its use until it is made consistent again, but I don't know
>> what the performance hit of the extra conditionals everywhere would be
>> like.
>>
>
> Another approach is to restore the bitmap after each piece set. It is a
> bit slower of course but the bitmap will be consistent after
> hbitmap_restore_data()
>

Yeah, that's not great either. The approach you have already taken is 
probably the best.

>>> +/**
>>> + * hbitmap_restore_finish
>>> + * @hb: HBitmap to operate on.
>>> + *
>>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all
>>> HBitmap
>>> + * layers are restore here.
>>> + */
>>> +void hbitmap_restore_finish(HBitmap *hb);
>>> +
>>> +/**
>>>    * hbitmap_free:
>>>    * @hb: HBitmap to operate on.
>>>    *
>>
>> These are just biased opinions:
>>
>> - It might be nice to name the store/restore functions "serialize" and
>> "deserialize," to describe exactly what they are doing.
>>
>> - I might refer to "restore_finish" as "post_load" or "post_restore"
>> or something similar to mimic how device migration functions are
>> named. I think hbitmap_restore_data_finalize would also be fine; the
>> key part here is clearly naming it relative to "restore_data."
>>
>
> Hmm. Ok, what about the following set:
>      hbitmap_serialize()
>      hbitmap_deserialize_part()
>      hbitmap_deserialize_finish()
>

Looks good to me!

>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 8aa7406..ac0323f 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>>>       return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] &
>>> bit) != 0;
>>>   }
>>>
>>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>>> +{
>>> +    uint64_t size;
>>> +
>>> +    if (count == 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>>> +
>>> +    return size * sizeof(unsigned long);
>>> +}
>>> +
>>
>> This seems flawed to me: number of bits without an offset can't be
>> mapped to a number of real bytes, because "two bits" may or may not
>> cross a granularity boundary, depending on *WHERE* you start counting.
>>
>> e.g.
>>
>> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
>> virtual: 001100
>> real:     0 1 0
>>
>> The amount of space required to hold "two bits" here could be as
>> little as one bit, if the offset is k={0,2,4} but it could be as much
>> as two bits if the offset is k={1,3}.
>>
>> You may never use the function in this way, but mapping virtual bits
>> to an implementation byte-size seems like it is inviting an
>> inconsistency.
> I dislike this function too.. But unfortunately we need the size in
> bytes used for serialization.
>
> Hmm. Ok, without loss of generality, let offset be less than
> granularity. The precise formula should look like:
>
> size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);
>
> So,
> size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >>
> BITS_PER_LEVEL) + 1;
> would be enough in any case. Ok?
>

I think so, as long as when you deserialize the object it does so 
correctly, regardless of whether or not you start on an even multiple of 
the granularity.

>
>
>>
>>> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
>>> +                        uint64_t start, uint64_t count)
>>> +{
>>> +    uint64_t last = start + count - 1;
>>> +    unsigned long *out = (unsigned long *)buf;
>>> +
>>> +    if (count == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> +    for (i = start; i <= last; ++i) {
>>> +        unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
>>> +        out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) :
>>> cpu_to_le64(el));
>>> +    }
>>> +#else
>>> +    memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
>>> +           count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> I suppose the ifdefs are trying to take an optimization by using
>> memcpy if at all possible, without a conversion.
>>
>> Why are the conversions to little endian, though? Shouldn't we be
>> serializing to a Big Endian format?
> As Paolo already explained it is for portability. LE bitmap
> representation is independent of size of bitmap array elements, which
> may be 32bit/64bit, or whatever else with LE format.

Yes, that makes sense. :)

>>
>>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>>> +                          uint64_t start, uint64_t count)
>>> +{
>>> +    uint64_t last = start + count - 1;
>>> +    unsigned long *in = (unsigned long *)buf;
>>> +
>>> +    if (count == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> +    count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> +    for (i = start; i <= last; ++i) {
>>> +        hb->levels[HBITMAP_LEVELS - 1][i] =
>>> +            (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) :
>>> be64_to_cpu(in[i]));
>>> +    }
>>> +#else
>>> +    memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>>> +           count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> ...? We're storing as LE but restoring from BE? I'm confused.
> Oh, it's a mistake. Thanks. I've missed it when testing because of
> {#ifdef __BIG_ENDIAN_BITFIELD } branch was never compiled.

Glad I am not crazy :)

>>
>> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want
>> to pack differently based on how C-bitfields are packed by the
>> compiler? I don't think that has any impact on how longs are stored
>> (always in the host native format.)
>>
>> I would rather see the serialize/deserialize always convert to and
>> from BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think
>> trying to optimize the pack/unpack in the no-op condition with a #
>> define and a memcpy is inviting portability problems.
> Ok. I'll leave only branch with for(...) in v2.

You can correct me if I am mistaken, but I seem to recall determining 
endianness in a platform, compiler and libc independent way is very 
difficult during the pre-processor stage.

If there _IS_ some way to do this, the optimization is fine, but I do 
not think it will work as written.

The un-optimized version is the safest.

>>
>>> +void hbitmap_restore_finish(HBitmap *bitmap)
>>> +{
>>> +    int64_t i, size;
>>> +    int lev, j;
>>> +
>>> +    /* restore levels starting from penultimate to zero level, assuming
>>> +     * that the last one is ok */
>>> +    size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL,
>>> 1);
>>> +    for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
>>> +        size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>>> +        for (i = 0; i < size; ++i) {
>>> +            bitmap->levels[lev][i] = 0;
>>> +            for (j = 0; j < BITS_PER_LONG; ++j) {
>>> +                if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
>>> +                    bitmap->levels[lev][i] |=  (1 << j);
>>> +                }
>>> +            }
>>
>> Just a musing: Should we cache the size of each level? I know we can
>> keep recalculating it, but... why bother? We recalculate it in so many
>> places now.
> Interesting point.
>
> However, there is another real bug: there are may be no longs on the
> next level for last bits of the current one. The new version of this
> function will be in v2.

Great :)

>>
>>> +        }
>>> +    }
>>> +}
>>> +
>>>   void hbitmap_free(HBitmap *hb)
>>>   {
>>>       unsigned i;
>>>
>

Thank you!
--js