qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
To: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, peter.maydell@linaro.org, quintela@redhat.com,
	dgilbert@redhat.com, stefanha@redhat.com, pbonzini@redhat.com,
	amit.shah@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps
Date: Thu, 19 Feb 2015 14:43:04 +0300	[thread overview]
Message-ID: <54E5CC48.1080704@parallels.com> (raw)
In-Reply-To: <54E52422.9050507@redhat.com>

On 19.02.2015 02:45, John Snow wrote:
>
>
> On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks
>> changes (set/unset) of this BdrvDirtyBitmap. It is needed for live
>> migration of block dirty bitmaps.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   block.c               | 40 ++++++++++++++++++++++++++++++++++++++++
>>   include/block/block.h |  5 +++++
>>   2 files changed, 45 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index a127fd2..aaa08b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -58,9 +58,15 @@
>>    * (3) successor is set: frozen mode.
>>    *     A frozen bitmap cannot be renamed, deleted, anonymized, 
>> cleared, set,
>>    *     or enabled. A frozen bitmap can only abdicate() or reclaim().
>> + *
>> + * Meta bitmap:
>> + * Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It 
>> tracks changes
>> + * (set/unset) of this BdrvDirtyBitmap. It is needed for live 
>> migration of
>> + * block dirty bitmaps.
>>    */
>>   struct BdrvDirtyBitmap {
>>       HBitmap *bitmap;            /* Dirty sector bitmap 
>> implementation */
>> +    HBitmap *meta_bitmap;       /* Meta bitmap */
>>       BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen 
>> status */
>>       char *name;                 /* Optional non-empty unique ID */
>>       int64_t size;               /* Size of the bitmap (Number of 
>> sectors) */
>> @@ -5398,6 +5404,31 @@ void 
>> bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>>       bitmap->name = NULL;
>>   }
>>
>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>> +                                        uint64_t granularity)
>> +{
>> +    uint64_t sector_granularity;
>> +
>> +    assert((granularity & (granularity - 1)) == 0);
>> +
>> +    granularity *= 8 * bdrv_dirty_bitmap_granularity(bitmap);
>> +    sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> +    assert(sector_granularity);
>> +
>
> The maths here could use a comment, I think.
>
> the "granularity" field here actually describes the desired 
> serialization buffer size; e.g. CHUNK_SIZE (1 << 20) or 1 MiB. This 
> parameter should be renamed to explain what it's actually for. 
> Something like "chunk_size" and a comment explaining that it is in bytes.
>
> ...
>
> That said, let's talk about the default chunk size you're using in 
> correlation with this function.
>
> a CHUNK_SIZE of 1MiB here is going to lead us to, if we have a bitmap 
> with the default granularity of 128 sectors\64KiB bytes, a granularity 
> for the meta_bitmap of one billion sectors (1 << 30) or 512GiB.
>
> That's going to be bigger than most drives entirely, which will 
> generally lead us to only using a single "chunk" per drive. Which 
> means we won't really get a lot of mileage out of the bulk/dirty 
> phases most of the time.
>
> It's wild to think about that the first 1,000,000,000 sectors or 
> 512,000,000,000 bytes will all be represented by the first single bit 
> in this bitmap. If a single hair on the drive changes, we resend the 
> _entire_ bitmap, possibly over and over again. Will we ever make 
> progress? Should we investigate a smaller chunk size?
>
> Here's some quick mappings of chunk size (bytes) to effective 
> meta_bitmap byte granularities, assuming the meta_bitmap is tracking a 
> bitmap with the default granularity of 64KiB:
>
> (1 << 20) 1MiB   -- 512GiB  // This is too high of a granularity
> (1 << 17) 128KiB --  64GiB
> (1 << 15) 32KiB  --  16GiB
> (1 << 11) 2KiB   --   1GiB
> (1 << 10) 1KiB   -- 512MiB
> (1 << 9)  512B   -- 256MiB
> (1 << 8)  256B   -- 128MiB
> (1 << 5)  32 B   --  16MiB  // This is too small of a chunk size.
> (1 << 1)   1 B   --   1MiB
>
> We want to make the chunk sends efficient, but we also want to make 
> sure that the dirty phase doesn't resend more data than it needs to, 
> so we need to strike a balance here, no?
>
> I think arguments could be made for most granularities between 128MiB 
> through 1GiB. Anything outside of that is too lopsided, IMO.
>
> What are your thoughts on this?
Ok, interesting thing to discuss.

My thoughts:
* the chunk size for block-migration is 1mb, than the bitmap (64kb 
granularity) for this chunk is 16bit=2bytes long. It's an intuitive 
reason for choosing the chunk size about 2 bytes. But in this case the 
data/metadata ratio is very bad (about 20bytes for the header of the 
chunk). So, taking the nearest value with adequate ratio gives (IMHO) 
'1kb -- 512mb': 20b/1k ~ 2%. Or 512b => 4%.

* for ndb+mirror migration scheme the default chunk is 64kb instead of 
1mb. So the bitmap is more smaller. But the same reason of data/metadata 
ratio leads to 1kb chunk for dirty bitmap migration.

So, what about default to 1kb and additional parameter for migration 
(migration capabilities) to give the user a possibility of chose?

* Yes, in most of user cases the bitmap (64kb granularity) will be small 
(< 1mb). In these cases, I think, it would be better to send the data 
only in complete step, only once. (for exmaple, if pending <= 1mb, 
dosn't send anything in incremental phase).
Live migration is actually needed only for migration of bitmaps for 
disks of several TBs size.

>
>> +    bitmap->meta_bitmap =
>> +        hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
>> +
>> +    return bitmap->meta_bitmap;
>> +}
>> +
>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
>> +{
>> +    if (bitmap->meta_bitmap) {
>> +        hbitmap_free(bitmap->meta_bitmap);
>> +        bitmap->meta_bitmap = NULL;
>> +    }
>> +}
>> +
>>   BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>>                                             uint32_t granularity,
>>                                             const char *name,
>> @@ -5532,6 +5563,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState 
>> *bs, BdrvDirtyBitmap *bitmap)
>>               assert(!bdrv_dirty_bitmap_frozen(bm));
>>               QLIST_REMOVE(bitmap, list);
>>               hbitmap_free(bitmap->bitmap);
>> +            if (bitmap->meta_bitmap) {
>> +                hbitmap_free(bitmap->meta_bitmap);
>> +            }
>>               g_free(bitmap->name);
>>               g_free(bitmap);
>>               return;
>> @@ -5659,6 +5693,9 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap 
>> *bitmap,
>>   {
>>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>>       hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>> +    if (bitmap->meta_bitmap) {
>> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>> +    }
>>   }
>>
>>   void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>> @@ -5666,6 +5703,9 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap 
>> *bitmap,
>>   {
>>       assert(bdrv_dirty_bitmap_enabled(bitmap));
>>       hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>> +    if (bitmap->meta_bitmap) {
>> +        hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>> +    }
>>   }
>>
>>   void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>> diff --git a/include/block/block.h b/include/block/block.h
>> index c6a928d..f2c62f6 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -4,6 +4,7 @@
>>   #include "block/aio.h"
>>   #include "qemu-common.h"
>>   #include "qemu/option.h"
>> +#include "qemu/hbitmap.h"
>>   #include "block/coroutine.h"
>>   #include "block/accounting.h"
>>   #include "qapi/qmp/qobject.h"
>> @@ -487,6 +488,10 @@ void 
>> bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
>>                                             uint64_t start, uint64_t 
>> count);
>>   void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>>
>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>> +                                        uint64_t granularity);
>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
>> +
>>   void bdrv_enable_copy_on_read(BlockDriverState *bs);
>>   void bdrv_disable_copy_on_read(BlockDriverState *bs);
>>
>>


-- 
Best regards,
Vladimir

  reply	other threads:[~2015-02-19 11:43 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-18 23:42   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-18 23:43   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage Vladimir Sementsov-Ogievskiy
2015-02-18 23:44   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps Vladimir Sementsov-Ogievskiy
2015-02-18 23:45   ` John Snow
2015-02-19 11:43     ` Vladimir Sementsov-Ogievskiy [this message]
2015-02-21  0:53       ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-18 23:45   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability Vladimir Sementsov-Ogievskiy
2015-02-18 23:45   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-18 23:47   ` John Snow
2015-02-19 13:48     ` Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test Vladimir Sementsov-Ogievskiy
2015-02-18 23:48   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class Vladimir Sementsov-Ogievskiy
2015-02-18 23:48   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
2015-02-19 18:47   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block Vladimir Sementsov-Ogievskiy
2015-02-19 18:53   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
2015-02-19 19:30   ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public Vladimir Sementsov-Ogievskiy
2015-02-19  0:00   ` John Snow
2015-02-19  0:11 ` [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54E5CC48.1080704@parallels.com \
    --to=vsementsov@parallels.com \
    --cc=amit.shah@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).