From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
To: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, peter.maydell@linaro.org, quintela@redhat.com,
dgilbert@redhat.com, stefanha@redhat.com, pbonzini@redhat.com,
amit.shah@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps
Date: Thu, 19 Feb 2015 14:43:04 +0300 [thread overview]
Message-ID: <54E5CC48.1080704@parallels.com> (raw)
In-Reply-To: <54E52422.9050507@redhat.com>
On 19.02.2015 02:45, John Snow wrote:
>
>
> On 02/18/2015 09:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It tracks
>> changes (set/unset) of this BdrvDirtyBitmap. It is needed for live
>> migration of block dirty bitmaps.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>> block.c | 40 ++++++++++++++++++++++++++++++++++++++++
>> include/block/block.h | 5 +++++
>> 2 files changed, 45 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index a127fd2..aaa08b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -58,9 +58,15 @@
>> * (3) successor is set: frozen mode.
>> * A frozen bitmap cannot be renamed, deleted, anonymized,
>> cleared, set,
>> * or enabled. A frozen bitmap can only abdicate() or reclaim().
>> + *
>> + * Meta bitmap:
>> + * Meta bitmap is a 'dirty bitmap' for the BdrvDirtyBitmap. It
>> tracks changes
>> + * (set/unset) of this BdrvDirtyBitmap. It is needed for live
>> migration of
>> + * block dirty bitmaps.
>> */
>> struct BdrvDirtyBitmap {
>> HBitmap *bitmap; /* Dirty sector bitmap
>> implementation */
>> + HBitmap *meta_bitmap; /* Meta bitmap */
>> BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen
>> status */
>> char *name; /* Optional non-empty unique ID */
>> int64_t size; /* Size of the bitmap (Number of
>> sectors) */
>> @@ -5398,6 +5404,31 @@ void
>> bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>> bitmap->name = NULL;
>> }
>>
>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>> + uint64_t granularity)
>> +{
>> + uint64_t sector_granularity;
>> +
>> + assert((granularity & (granularity - 1)) == 0);
>> +
>> + granularity *= 8 * bdrv_dirty_bitmap_granularity(bitmap);
>> + sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> + assert(sector_granularity);
>> +
>
> The maths here could use a comment, I think.
>
> the "granularity" field here actually describes the desired
> serialization buffer size; e.g. CHUNK_SIZE (1 << 20) or 1 MiB. This
> parameter should be renamed to explain what it's actually for.
> Something like "chunk_size" and a comment explaining that it is in bytes.
>
> ...
>
> That said, let's talk about the default chunk size you're using in
> correlation with this function.
>
> a CHUNK_SIZE of 1MiB here is going to lead us to, if we have a bitmap
> with the default granularity of 128 sectors\64KiB bytes, a granularity
> for the meta_bitmap of one billion sectors (1 << 30) or 512GiB.
>
> That's going to be bigger than most drives entirely, which will
> generally lead us to only using a single "chunk" per drive. Which
> means we won't really get a lot of mileage out of the bulk/dirty
> phases most of the time.
>
> It's wild to think about that the first 1,000,000,000 sectors or
> 512,000,000,000 bytes will all be represented by the first single bit
> in this bitmap. If a single hair on the drive changes, we resend the
> _entire_ bitmap, possibly over and over again. Will we ever make
> progress? Should we investigate a smaller chunk size?
>
> Here's some quick mappings of chunk size (bytes) to effective
> meta_bitmap byte granularities, assuming the meta_bitmap is tracking a
> bitmap with the default granularity of 64KiB:
>
> (1 << 20) 1MiB -- 512GiB // This is too high of a granularity
> (1 << 17) 128KiB -- 64GiB
> (1 << 15) 32KiB -- 16GiB
> (1 << 11) 2KiB -- 1GiB
> (1 << 10) 1KiB -- 512MiB
> (1 << 9) 512B -- 256MiB
> (1 << 8) 256B -- 128MiB
> (1 << 5) 32 B -- 16MiB // This is too small of a chunk size.
> (1 << 1) 1 B -- 1MiB
>
> We want to make the chunk sends efficient, but we also want to make
> sure that the dirty phase doesn't resend more data than it needs to,
> so we need to strike a balance here, no?
>
> I think arguments could be made for most granularities between 128MiB
> through 1GiB. Anything outside of that is too lopsided, IMO.
>
> What are your thoughts on this?
Ok, interesting thing to discuss.
My thoughts:
* the chunk size for block-migration is 1mb, than the bitmap (64kb
granularity) for this chunk is 16bit=2bytes long. It's an intuitive
reason for choosing the chunk size about 2 bytes. But in this case the
data/metadata ratio is very bad (about 20bytes for the header of the
chunk). So, taking the nearest value with adequate ratio gives (IMHO)
'1kb -- 512mb': 20b/1k ~ 2%. Or 512b => 4%.
* for ndb+mirror migration scheme the default chunk is 64kb instead of
1mb. So the bitmap is more smaller. But the same reason of data/metadata
ratio leads to 1kb chunk for dirty bitmap migration.
So, what about default to 1kb and additional parameter for migration
(migration capabilities) to give the user a possibility of chose?
* Yes, in most of user cases the bitmap (64kb granularity) will be small
(< 1mb). In these cases, I think, it would be better to send the data
only in complete step, only once. (for exmaple, if pending <= 1mb,
dosn't send anything in incremental phase).
Live migration is actually needed only for migration of bitmaps for
disks of several TBs size.
>
>> + bitmap->meta_bitmap =
>> + hbitmap_alloc(bitmap->size, ffsll(sector_granularity) - 1);
>> +
>> + return bitmap->meta_bitmap;
>> +}
>> +
>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
>> +{
>> + if (bitmap->meta_bitmap) {
>> + hbitmap_free(bitmap->meta_bitmap);
>> + bitmap->meta_bitmap = NULL;
>> + }
>> +}
>> +
>> BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>> uint32_t granularity,
>> const char *name,
>> @@ -5532,6 +5563,9 @@ void bdrv_release_dirty_bitmap(BlockDriverState
>> *bs, BdrvDirtyBitmap *bitmap)
>> assert(!bdrv_dirty_bitmap_frozen(bm));
>> QLIST_REMOVE(bitmap, list);
>> hbitmap_free(bitmap->bitmap);
>> + if (bitmap->meta_bitmap) {
>> + hbitmap_free(bitmap->meta_bitmap);
>> + }
>> g_free(bitmap->name);
>> g_free(bitmap);
>> return;
>> @@ -5659,6 +5693,9 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap
>> *bitmap,
>> {
>> assert(bdrv_dirty_bitmap_enabled(bitmap));
>> hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>> + if (bitmap->meta_bitmap) {
>> + hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>> + }
>> }
>>
>> void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>> @@ -5666,6 +5703,9 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap
>> *bitmap,
>> {
>> assert(bdrv_dirty_bitmap_enabled(bitmap));
>> hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
>> + if (bitmap->meta_bitmap) {
>> + hbitmap_set(bitmap->meta_bitmap, cur_sector, nr_sectors);
>> + }
>> }
>>
>> void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>> diff --git a/include/block/block.h b/include/block/block.h
>> index c6a928d..f2c62f6 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -4,6 +4,7 @@
>> #include "block/aio.h"
>> #include "qemu-common.h"
>> #include "qemu/option.h"
>> +#include "qemu/hbitmap.h"
>> #include "block/coroutine.h"
>> #include "block/accounting.h"
>> #include "qapi/qmp/qobject.h"
>> @@ -487,6 +488,10 @@ void
>> bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
>> uint64_t start, uint64_t
>> count);
>> void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
>>
>> +HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
>> + uint64_t granularity);
>> +void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
>> +
>> void bdrv_enable_copy_on_read(BlockDriverState *bs);
>> void bdrv_disable_copy_on_read(BlockDriverState *bs);
>>
>>
--
Best regards,
Vladimir
next prev parent reply other threads:[~2015-02-19 11:43 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-18 14:00 [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 01/14] qmp: add query-block-dirty-bitmap Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 02/14] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-18 23:42 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 03/14] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-18 23:43 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 04/14] block: tiny refactoring: minimize hbitmap_(set/reset) usage Vladimir Sementsov-Ogievskiy
2015-02-18 23:44 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 05/14] block: add meta bitmaps Vladimir Sementsov-Ogievskiy
2015-02-18 23:45 ` John Snow
2015-02-19 11:43 ` Vladimir Sementsov-Ogievskiy [this message]
2015-02-21 0:53 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 06/14] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-18 23:45 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 07/14] qapi: add dirty-bitmaps migration capability Vladimir Sementsov-Ogievskiy
2015-02-18 23:45 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 08/14] migration: add migration/block-dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-18 23:47 ` John Snow
2015-02-19 13:48 ` Vladimir Sementsov-Ogievskiy
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 09/14] iotests: maintain several vms in test Vladimir Sementsov-Ogievskiy
2015-02-18 23:48 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 10/14] iotests: add add_incoming_migration to VM class Vladimir Sementsov-Ogievskiy
2015-02-18 23:48 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 11/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
2015-02-19 18:47 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 12/14] qapi: add md5 checksum of last dirty bitmap level to query-block Vladimir Sementsov-Ogievskiy
2015-02-19 18:53 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 13/14] iotests: add dirty bitmap migration test Vladimir Sementsov-Ogievskiy
2015-02-19 19:30 ` John Snow
2015-02-18 14:00 ` [Qemu-devel] [PATCH RFC v3 14/14] migration/qemu-file: make functions qemu_(get/put)_string public Vladimir Sementsov-Ogievskiy
2015-02-19 0:00 ` John Snow
2015-02-19 0:11 ` [Qemu-devel] [PATCH RFC v3 00/14] Dirty bitmaps migration John Snow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54E5CC48.1080704@parallels.com \
--to=vsementsov@parallels.com \
--cc=amit.shah@redhat.com \
--cc=den@openvz.org \
--cc=dgilbert@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).