From: John Snow <jsnow@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>,
qemu-devel@nongnu.org
Cc: kwolf@redhat.com, den@openvz.org, stefanha@redhat.com,
"pbonzini >> Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
Date: Tue, 13 Jan 2015 12:08:40 -0500 [thread overview]
Message-ID: <54B55118.1020607@redhat.com> (raw)
In-Reply-To: <54B516C6.7010403@parallels.com>
On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> Best regards,
> Vladimir
>
> On 09.01.2015 00:21, John Snow wrote:
>>
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Functions to store / restore HBitmap. HBitmap should be saved to linear
>>> bitmap format independently of endianess.
>>>
>>> Because of restoring in several steps, every step writes only the last
>>> level of the bitmap. All other levels are restored by
>>> hbitmap_restore_finish as a last step of restoring. So, HBitmap is
>>> inconsistent while restoring.
> .....
>>
>> I guess by nature of how bitmap migration has been implemented, we
>> cannot help but restore parts of the bitmap piece by piece, which
>> requires us to force the bitmap into an inconsistent state.
>>
>> Yuck.
>>
>> It might be "nice" to turn on a disable bit inside the hbitmap that
>> disallows its use until it is made consistent again, but I don't know
>> what the performance hit of the extra conditionals everywhere would be
>> like.
>>
>
> Another approach is to restore the bitmap after each piece set. It is a
> bit slower of course but the bitmap will be consistent after
> hbitmap_restore_data()
>
Yeah, that's not great either. The approach you have already taken is
probably the best.
>>> +/**
>>> + * hbitmap_restore_finish
>>> + * @hb: HBitmap to operate on.
>>> + *
>>> + * Repair HBitmap after calling hbitmap_restore_data. Actuall all
>>> HBitmap
>>> + * layers are restore here.
>>> + */
>>> +void hbitmap_restore_finish(HBitmap *hb);
>>> +
>>> +/**
>>> * hbitmap_free:
>>> * @hb: HBitmap to operate on.
>>> *
>>
>> These are just biased opinions:
>>
>> - It might be nice to name the store/restore functions "serialize" and
>> "deserialize," to describe exactly what they are doing.
>>
>> - I might refer to "restore_finish" as "post_load" or "post_restore"
>> or something similar to mimic how device migration functions are
>> named. I think hbitmap_restore_data_finalize would also be fine; the
>> key part here is clearly naming it relative to "restore_data."
>>
>
> Hmm. Ok, what about the following set:
> hbitmap_serialize()
> hbitmap_deserialize_part()
> hbitmap_deserialize_finish()
>
Looks good to me!
>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 8aa7406..ac0323f 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>>> return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] &
>>> bit) != 0;
>>> }
>>>
>>> +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
>>> +{
>>> + uint64_t size;
>>> +
>>> + if (count == 0) {
>>> + return 0;
>>> + }
>>> +
>>> + size = (((count - 1) >> hb->granularity) >> BITS_PER_LEVEL) + 1;
>>> +
>>> + return size * sizeof(unsigned long);
>>> +}
>>> +
>>
>> This seems flawed to me: number of bits without an offset can't be
>> mapped to a number of real bytes, because "two bits" may or may not
>> cross a granularity boundary, depending on *WHERE* you start counting.
>>
>> e.g.
>>
>> granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
>> virtual: 001100
>> real: 0 1 0
>>
>> The amount of space required to hold "two bits" here could be as
>> little as one bit, if the offset is k={0,2,4} but it could be as much
>> as two bits if the offset is k={1,3}.
>>
>> You may never use the function in this way, but mapping virtual bits
>> to an implementation byte-size seems like it is inviting an
>> inconsistency.
> I dislike this function too.. But unfortunately we need the size in
> bytes used for serialization.
>
> Hmm. Ok, without loss of generality, let offset be less than
> granularity. The precise formula should look like:
>
> size = (((offset+count-1) >> hb->granularity) >> BITS_PER_LEVEL);
>
> So,
> size = ((((1 << hb->granularity) + count - 2) >> hb->granularity) >>
> BITS_PER_LEVEL) + 1;
> would be enough in any case. Ok?
>
I think so, as long as when you deserialize the object it does so
correctly, regardless of whether or not you start on an even multiple of
the granularity.
>
>
>>
>>> +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
>>> + uint64_t start, uint64_t count)
>>> +{
>>> + uint64_t last = start + count - 1;
>>> + unsigned long *out = (unsigned long *)buf;
>>> +
>>> + if (count == 0) {
>>> + return;
>>> + }
>>> +
>>> + start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> + last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> + count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> + for (i = start; i <= last; ++i) {
>>> + unsigned long el = hb->levels[HBITMAP_LEVELS - 1][i];
>>> + out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) :
>>> cpu_to_le64(el));
>>> + }
>>> +#else
>>> + memcpy(out, &hb->levels[HBITMAP_LEVELS - 1][start],
>>> + count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> I suppose the ifdefs are trying to take an optimization by using
>> memcpy if at all possible, without a conversion.
>>
>> Why are the conversions to little endian, though? Shouldn't we be
>> serializing to a Big Endian format?
> As Paolo already explained it is for portability. LE bitmap
> representation is independent of size of bitmap array elements, which
> may be 32bit/64bit, or whatever else with LE format.
Yes, that makes sense. :)
>>
>>> +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
>>> + uint64_t start, uint64_t count)
>>> +{
>>> + uint64_t last = start + count - 1;
>>> + unsigned long *in = (unsigned long *)buf;
>>> +
>>> + if (count == 0) {
>>> + return;
>>> + }
>>> +
>>> + start = (start >> hb->granularity) >> BITS_PER_LEVEL;
>>> + last = (last >> hb->granularity) >> BITS_PER_LEVEL;
>>> + count = last - start + 1;
>>> +
>>> +#ifdef __BIG_ENDIAN_BITFIELD
>>> + for (i = start; i <= last; ++i) {
>>> + hb->levels[HBITMAP_LEVELS - 1][i] =
>>> + (BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) :
>>> be64_to_cpu(in[i]));
>>> + }
>>> +#else
>>> + memcpy(&hb->levels[HBITMAP_LEVELS - 1][start], in,
>>> + count * sizeof(unsigned long));
>>> +#endif
>>> +}
>>> +
>>
>> ...? We're storing as LE but restoring from BE? I'm confused.
> Oh, it's a mistake. Thanks. I've missed it when testing because of
> {#ifdef __BIG_ENDIAN_BITFIELD } branch was never compiled.
Glad I am not crazy :)
>>
>> I'm also not clear on the __BIG_ENDIAN_BITFIELD macro. Why do we want
>> to pack differently based on how C-bitfields are packed by the
>> compiler? I don't think that has any impact on how longs are stored
>> (always in the host native format.)
>>
>> I would rather see the serialize/deserialize always convert to and
>> from BE explicitly with cpu_to_be[32|64] and be[32|64]_to_cpu. I think
>> trying to optimize the pack/unpack in the no-op condition with a #
>> define and a memcpy is inviting portability problems.
> Ok. I'll leave only branch with for(...) in v2.
You can correct me if I am mistaken, but I seem to recall determining
endianness in a platform, compiler and libc independent way is very
difficult during the pre-processor stage.
If there _IS_ some way to do this, the optimization is fine, but I do
not think it will work as written.
The un-optimized version is the safest.
>>
>>> +void hbitmap_restore_finish(HBitmap *bitmap)
>>> +{
>>> + int64_t i, size;
>>> + int lev, j;
>>> +
>>> + /* restore levels starting from penultimate to zero level, assuming
>>> + * that the last one is ok */
>>> + size = MAX((bitmap->size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL,
>>> 1);
>>> + for (lev = HBITMAP_LEVELS - 1; lev-- > 0; ) {
>>> + size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>>> + for (i = 0; i < size; ++i) {
>>> + bitmap->levels[lev][i] = 0;
>>> + for (j = 0; j < BITS_PER_LONG; ++j) {
>>> + if (bitmap->levels[lev+1][(i << BITS_PER_LEVEL) + j]) {
>>> + bitmap->levels[lev][i] |= (1 << j);
>>> + }
>>> + }
>>
>> Just a musing: Should we cache the size of each level? I know we can
>> keep recalculating it, but... why bother? We recalculate it in so many
>> places now.
> Interesting point.
>
> However, there is another real bug: there are may be no longs on the
> next level for last bits of the current one. The new version of this
> function will be in v2.
Great :)
>>
>>> + }
>>> + }
>>> +}
>>> +
>>> void hbitmap_free(HBitmap *hb)
>>> {
>>> unsigned i;
>>>
>
Thank you!
--js
next prev parent reply other threads:[~2015-01-13 17:08 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
2015-01-08 21:19 ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
2015-01-08 21:20 ` John Snow
2015-01-09 19:01 ` Dr. David Alan Gilbert
2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
2015-01-08 21:20 ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
2015-01-08 21:21 ` John Snow
2015-01-08 21:37 ` Paolo Bonzini
2015-01-13 12:59 ` Vladimir Sementsov-Ogievskiy
2015-01-13 17:08 ` John Snow [this message]
2015-01-14 10:29 ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
2015-01-08 21:22 ` John Snow
2015-01-14 11:27 ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
2015-01-08 21:23 ` John Snow
2015-01-14 12:26 ` Vladimir Sementsov-Ogievskiy
2015-01-14 16:53 ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
2015-01-08 21:24 ` John Snow
2015-01-16 12:54 ` Vladimir Sementsov-Ogievskiy
2015-01-08 22:28 ` Paolo Bonzini
2015-01-16 13:03 ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2014-12-11 15:18 ` Eric Blake
2014-12-15 8:33 ` Vladimir Sementsov-Ogievskiy
2015-01-08 21:51 ` John Snow
2015-01-08 22:29 ` Eric Blake
2015-01-08 22:31 ` John Snow
2015-01-08 22:37 ` Paolo Bonzini
2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-08 22:05 ` John Snow
2015-01-17 17:17 ` Vladimir Sementsov-Ogievskiy
2015-01-20 17:25 ` John Snow
2015-01-08 22:36 ` Paolo Bonzini
2015-01-08 22:45 ` Eric Blake
2015-01-08 22:49 ` Paolo Bonzini
2015-01-12 14:20 ` Vladimir Sementsov-Ogievskiy
2015-01-12 14:42 ` Paolo Bonzini
2015-01-12 16:48 ` Paolo Bonzini
2015-01-12 17:31 ` John Snow
2015-01-12 19:09 ` Paolo Bonzini
2015-01-13 9:16 ` Vladimir Sementsov-Ogievskiy
2015-01-13 16:35 ` John Snow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54B55118.1020607@redhat.com \
--to=jsnow@redhat.com \
--cc=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).