From: John Snow <jsnow@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>,
qemu-devel@nongnu.org
Cc: kwolf@redhat.com,
"Juan Q >> Juan Jose Quintela Carreira" <quintela@redhat.com>,
"gilbert >> Dr. David Alan Gilbert" <dgilbert@redhat.com>,
stefanha@redhat.com, Amit Shah <amit.shah@redhat.com>,
den@openvz.org
Subject: Re: [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration
Date: Tue, 20 Jan 2015 12:25:11 -0500 [thread overview]
Message-ID: <54BE8F77.2030706@redhat.com> (raw)
In-Reply-To: <54BA9926.7010509@parallels.com>
On 01/17/2015 12:17 PM, Vladimir Sementsov-Ogievskiy wrote:
>
> Best regards,
> Vladimir
>
> On 09.01.2015 01:05, John Snow wrote:
>> CCing migration maintainers, feedback otherwise in-line.
>>
>> On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Just migrate parts of dirty bitmaps, corresponding to the block being
>>> migrated. Also, skip block migration if it is disabled (blk parameter
>>> of migrate command is false).
>>>
>>> Skipping shared sectors: bitmaps are migrated independently of this,
>>> just send blk without block data.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> ---
>>
>> In terms of general approach, migrating the dirty bitmap alongside the
>> blocks it describes makes sense when migrating both.
>>
>> Is this a lot of overhead when that's not the case, though? If we
>> utilize the "bitmap only" pathways added here, don't we iterate
>> through the savevm handlers a lot to only transfer very little data
>> per section?
>>
>> If we really need migration of bitmaps apart from the data they
>> describe, is it not worth developing a more condensed transfer
>> mechanism to get more bitmap data per iteration, instead of just one
>> block's worth?
> The first stage of migration can be easily optimized in this case - just
> take a larger bitmap pieces. For the second stage, it's not as simple.
> If we will take larger pieces on each step - we will just increase dirty
> data transfer. One approach is to maintain "dirty-region" - two numbers,
> representing the region of set bits in migration dirty bitmap, and send
> data only when this region is large enough.
>
>>
>>> block-migration.c | 173
>>> +++++++++++++++++++++++++++++++++++++++++++++++-------
>>> savevm.c | 1 +
>>> 2 files changed, 154 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/block-migration.c b/block-migration.c
>>> index 908a66d..95d54a1 100644
>>> --- a/block-migration.c
>>> +++ b/block-migration.c
>>> @@ -32,6 +32,8 @@
>>> #define BLK_MIG_FLAG_EOS 0x02
>>> #define BLK_MIG_FLAG_PROGRESS 0x04
>>> #define BLK_MIG_FLAG_ZERO_BLOCK 0x08
>>> +#define BLK_MIG_FLAG_HAS_BLOCK 0x10
>>> +#define BLK_MIG_FLAG_HAS_BITMAPS 0x20
>>>
>>
>> OK: As a result of allowing bitmaps to be migrated without the block
>> data itself, we now must acknowledge the concept that we can send
>> either block data, bitmaps, both, or neither -- so new defines are
>> added here to indicate what data can be found in the section following.
>>
>>> #define MAX_IS_ALLOCATED_SEARCH 65536
>>>
>>> @@ -51,6 +53,8 @@ typedef struct BlkMigDevState {
>>> int shared_base;
>>> int64_t total_sectors;
>>> QSIMPLEQ_ENTRY(BlkMigDevState) entry;
>>> + int nr_bitmaps;
>>> + BdrvDirtyBitmap **dirty_bitmaps;
>>>
>>> /* Only used by migration thread. Does not need a lock. */
>>> int bulk_completed;
>>> @@ -64,6 +68,11 @@ typedef struct BlkMigDevState {
>>> Error *blocker;
>>> } BlkMigDevState;
>>>
>>> +typedef struct BlkMigDirtyBitmap {
>>> + BdrvDirtyBitmap *bitmap;
>>> + uint8_t *buf;
>>> +} BlkMigDirtyBitmap;
>>> +
>>> typedef struct BlkMigBlock {
>>> /* Only used by migration thread. */
>>> uint8_t *buf;
>>> @@ -74,6 +83,9 @@ typedef struct BlkMigBlock {
>>> QEMUIOVector qiov;
>>> BlockAIOCB *aiocb;
>>>
>>> + int nr_bitmaps;
>>> + BlkMigDirtyBitmap *dirty_bitmaps;
>>> +
>>> /* Protected by block migration lock. */
>>> int ret;
>>> QSIMPLEQ_ENTRY(BlkMigBlock) entry;
>>> @@ -83,6 +95,7 @@ typedef struct BlkMigState {
>>> /* Written during setup phase. Can be read without a lock. */
>>> int blk_enable;
>>> int shared_base;
>>> + int dbm_enable;
>>
>> Similar to the feedback in a previous patch, we may not want to use
>> 'dbm' to mean "Dirty Bitmap" because we have not been applying the
>> abbreviation consistently.
>>
>> For now, the recommendation from stefan is to use the full
>> "bdrv_dirty_bitmap" or "dirty_bitmap" in function names.
>>
>> If we do want an acronym to refer to this particular type of dirty
>> bitmap, we should apply it consistently to all functions that work
>> with the BdrvDirtyBitmap type.
>>
>> For now, "bdrv_dirty_bitmap_enable" should suffice, even though it's a
>> bit wordy.
> It's a flag, that enables the migration of dirty bitmaps, like
> blk_enable enables the migrtion of blocks.. Then, should it be
> "dirty_bitmap_migration_enable"? Isn't it too long for a simple flag
> variable?
Yeah, I see your point. I think the only difference is that "blk" is
used very widely throughout the code, but this is the first occurrence
of "dbm."
even just "bitmap_enable" would be sufficiently more self-evident than
"dbm_enable," unless we rework all of the BdrvDirtyBitmap functions to
use the "dbm" abbreviation.
I'd also be happy with "migrate_bitmaps" which is fairly
self-explanatory, too.
>>
>>> QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>>> int64_t total_sector_sum;
>>> bool zero_blocks;
>>> @@ -116,27 +129,63 @@ static void blk_mig_unlock(void)
>>> /* Only allocating and initializing structure fields, not copying
>>> any data. */
>>>
>>> static BlkMigBlock *blk_create(BlkMigDevState *bmds, int64_t sector,
>>> - int nr_sectors)
>>> + int nr_sectors, bool only_bitmaps)
>>> {
>>> + int i;
>>> BlkMigBlock *blk = g_new(BlkMigBlock, 1);
>>> - blk->buf = g_malloc(BLOCK_SIZE);
>>> + blk->buf = only_bitmaps ? NULL : g_malloc(BLOCK_SIZE);
>>> blk->bmds = bmds;
>>> blk->sector = sector;
>>> blk->nr_sectors = nr_sectors;
>>> + blk->dirty_bitmaps = NULL;
>>> + blk->nr_bitmaps = 0;
>>>
>>> blk->iov.iov_base = blk->buf;
>>> blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
>>> qemu_iovec_init_external(&blk->qiov, &blk->iov, 1);
>>>
>>
>> We can probably skip this block if only_bitmaps is true.
> I agree.
>>
>>> + if (block_mig_state.dbm_enable) {
>>> + BlkMigDirtyBitmap *mig_bm;
>>> +
>>> + blk->nr_bitmaps = bmds->nr_bitmaps;
>>> + mig_bm = blk->dirty_bitmaps = g_new(BlkMigDirtyBitmap,
>>> + bmds->nr_bitmaps);
>>> + for (i = 0; i < bmds->nr_bitmaps; ++i) {
>>> + BdrvDirtyBitmap *bm = bmds->dirty_bitmaps[i];
>>> + mig_bm->buf = g_malloc(bdrv_dbm_data_size(bm, nr_sectors));
>>> + mig_bm->bitmap = bm;
>>> + mig_bm++;
>>> + }
>>> + }
>>> +
>>
>> It's strange to me that we'd give it an "only bitmaps" boolean and
>> then rely on a different condition to populate them. Is it not worthy
>> of an error if we specify only_bitmaps when block_mig_state.dbm_enable
>> is false?
> only_bitmaps means: no blocks but only bitmaps. But there is another
> case: blocks with bitmaps, and in this case this "if ()" should execute.
>
> {only_bitmaps = true} when {block_mig_state.dbm_enable = false} - it's
> of course an error. I'll add assert() for this.
>>
>>> return blk;
>>> }
>>>
>>> static void blk_free(BlkMigBlock *blk)
>>> {
>>> + int i;
>>> g_free(blk->buf);
>>> +
>>> + if (blk->dirty_bitmaps) {
>>> + for (i = 0; i < blk->nr_bitmaps; ++i) {
>>> + g_free(blk->dirty_bitmaps[i].buf);
>>> + }
>>> + g_free(blk->dirty_bitmaps);
>>> + }
>>> +
>>> g_free(blk);
>>> }
>>>
>>> +static void blk_store_bitmaps(BlkMigBlock *blk)
>>> +{
>>> + int i;
>>> + for (i = 0; i < blk->nr_bitmaps; ++i) {
>>> + bdrv_dbm_store_data(blk->dirty_bitmaps[i].bitmap,
>>> + blk->dirty_bitmaps[i].buf,
>>> + blk->sector, blk->nr_sectors);
>>> + }
>>> +}
>>> +
>>
>> I am worried about the case in which blk->nr_sectors is not an
>> integral multiple of the bitmap sector granularity, for reasons
>> discussed in the previous patches.
>>
>> I'm not positive it IS a problem either, but maybe you have already
>> thought about this.
> Only allocated memory should be extended in the previous patch, as I've
> already proposed. As a general approach - it's ok, because we always
> calculate the real number of bit in bitmap from the virtual in the same
> way, therefore stored region will be restored in the same place.
OK, just checking.
>>
>>> /* Must run outside of the iothread lock during the bulk phase,
>>> * or the VM will stall.
>>> */
>>> @@ -146,9 +195,15 @@ static void blk_send(QEMUFile *f, BlkMigBlock *
>>> blk)
>>> int len;
>>> uint64_t flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>>>
>>> - if (block_mig_state.zero_blocks &&
>>> + if (block_mig_state.zero_blocks && blk->buf &&
>>> buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>>> flags |= BLK_MIG_FLAG_ZERO_BLOCK;
>>> + } else if (blk->buf) {
>>> + flags |= BLK_MIG_FLAG_HAS_BLOCK;
>>> + }
>>> +
>>> + if (block_mig_state.dbm_enable) {
>>> + flags |= BLK_MIG_FLAG_HAS_BITMAPS;
>>> }
>>>
>>> /* sector number and flags */
>>> @@ -160,10 +215,27 @@ static void blk_send(QEMUFile *f, BlkMigBlock *
>>> blk)
>>> qemu_put_byte(f, len);
>>> qemu_put_buffer(f, (uint8_t
>>> *)bdrv_get_device_name(blk->bmds->bs), len);
>>>
>>> + /* dirty bitmaps */
>>> + if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>>> + int i;
>>> + qemu_put_be32(f, blk->nr_bitmaps);
>>> + for (i = 0; i < blk->nr_bitmaps; ++i) {
>>> + BdrvDirtyBitmap *bm = blk->dirty_bitmaps[i].bitmap;
>>> + int buf_size = bdrv_dbm_data_size(bm, blk->nr_sectors);
>>> + const char *name = bdrv_dirty_bitmap_name(bm);
>>> + int len = strlen(name);
>>> +
>>> + qemu_put_byte(f, len);
>>> + qemu_put_buffer(f, (const uint8_t *)name, len);
>>
>> No length in the stream for the size of the dirty bitmap buffer?
>>
>>> + qemu_put_buffer(f, blk->dirty_bitmaps[i].buf, buf_size);
>>> + }
>>> + }
>>> +
>>
>> Future patch idea: We can probably optimize this block by testing a
>> range of sectors on the bitmap and skipping bitmaps that don't have
>> data for this block.
>>
>> Since we populated the bitmaps previously, we should already have the
>> chance to have counted how many of them are nonempty, so we can use
>> that number instead of blk->nr_bitmaps.
>>
>> Not important for the series right now.
>>
>>> /* if a block is zero we need to flush here since the network
>>> * bandwidth is now a lot higher than the storage device
>>> bandwidth.
>>> - * thus if we queue zero blocks we slow down the migration */
>>> - if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>>> + * thus if we queue zero blocks we slow down the migration.
>>> + * also, skip writing block when migrate only dirty bitmaps. */
>>> + if (!(flags & BLK_MIG_FLAG_HAS_BLOCK)) {
>>
>> I'm not completely clear on the reasoning behind the original
>> codeblock here, but changing it from "Only when zeroes" to "Not when
>> we have data: Zeroes and Bitmaps" makes sense, since bitmap-only
>> sections are going to be very sparse too.
>>
>>> qemu_fflush(f);
>>> return;
>>> }
>>
>> A little after this, there's a call to qemu_put_buffer(f, blk->buf,
>> BLOCK_SIZE), but we have the case where blk->buf may be NULL in bitmap
>> only cases. I think we need to guard against that.
> in this case (flags & BLK_MIG_FLAG_HAS_BLOCK) should be zero. However an
> assert here won't hurt.
>>
>>> @@ -282,13 +354,20 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>> BlockDriverState *bs = bmds->bs;
>>> BlkMigBlock *blk;
>>> int nr_sectors;
>>> + bool skip_chunk = false;
>>>
>>> if (bmds->shared_base) {
>>> qemu_mutex_lock_iothread();
>>> - while (cur_sector < total_sectors &&
>>> - !bdrv_is_allocated(bs, cur_sector,
>>> MAX_IS_ALLOCATED_SEARCH,
>>> - &nr_sectors)) {
>>> - cur_sector += nr_sectors;
>>> + if (block_mig_state.dbm_enable) {
>>> + bdrv_is_allocated(bs, cur_sector,
>>> BDRV_SECTORS_PER_DIRTY_CHUNK,
>>> + &nr_sectors);
>>> + skip_chunk = nr_sectors >= BDRV_SECTORS_PER_DIRTY_CHUNK;
>>> + } else {
>>> + while (cur_sector < total_sectors &&
>>> + !bdrv_is_allocated(bs, cur_sector,
>>> MAX_IS_ALLOCATED_SEARCH,
>>> + &nr_sectors)) {
>>> + cur_sector += nr_sectors;
>>> + }
>>
>> OK, so the approach taken here is that if bitmaps are enabled, we
>> check only to see if this current chunk is allocated or not. If it
>> isn't, we declare this a "bitmap only" chunk and set skip_chunk to true.
>>
>> The else clause taken implies no bitmap data, because either
>> dbm_enable or blk_enable (or both) MUST be set for us to be here.
>>
>> (1) Why are you not checking the return value of bdrv_is_allocated? It
>> could return either true or false AND nr_sectors ==
>> BDRV_SECTORS_PER_DIRTY_CHUNK, provided it was entirely allocated or
>> entirely unallocated, so I think this condition is not working as you
>> intend it to.
> Yes, it's my mistake.
>>
>> (2) This seems slightly hackey in a sparse or no-data situation. We
>> will transmit many small data sections, each containing a chunk of
>> bitmap data, one at a time, with no data interspersed -- which leaves
>> a high metadata-to-data ratio in these cases.
>>
>> Would it be an involved process to alter the nr_sectors count to be
>> something that isn't a single sector in the (dbm_enable &&
>> !blk_enable) case? That way we can spin until we find data worth
>> transferring and transfer all of the bitmap metadata in a larger chunk
>> all at once.
>>
>> Whatever approach you choose, it might be nice to have a comment
>> in-line explaining the general approach.
>>
>>> }
>>> qemu_mutex_unlock_iothread();
>>> }
>>> @@ -309,7 +388,8 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>> nr_sectors = total_sectors - cur_sector;
>>> }
>>>
>>> - blk = blk_create(bmds, cur_sector, nr_sectors);
>>> + blk = blk_create(bmds, cur_sector, nr_sectors,
>>> + skip_chunk || !block_mig_state.blk_enable);
>>>
>>> blk_mig_lock();
>>> block_mig_state.submitted++;
>>> @@ -317,8 +397,16 @@ static int mig_save_device_bulk(QEMUFile *f,
>>> BlkMigDevState *bmds)
>>>
>>> bdrv_reset_dirty_bitmap(bs, bmds->dirty_bitmap, cur_sector,
>>> nr_sectors);
>>>
>>> - blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>>> - nr_sectors, blk_mig_read_cb, blk);
>>> + if (block_mig_state.dbm_enable) {
>>> + blk_store_bitmaps(blk);
>>> + }
>>> +
>>> + if (block_mig_state.blk_enable && !skip_chunk) {
>>> + blk->aiocb = bdrv_aio_readv(bs, cur_sector, &blk->qiov,
>>> + nr_sectors, blk_mig_read_cb, blk);
>>> + } else if (block_mig_state.dbm_enable) {
>>> + blk_mig_read_cb(blk, 0);
>>> + }
>>>
>>> bmds->cur_sector = cur_sector + nr_sectors;
>>> return (bmds->cur_sector >= total_sectors);
>>> @@ -403,6 +491,8 @@ static void init_blk_migration(QEMUFile *f)
>>> DPRINTF("Start full migration for %s\n",
>>> bdrv_get_device_name(bs));
>>> }
>>>
>>> + bmds->dirty_bitmaps = bdrv_dbm_find_all_named(bs,
>>> &bmds->nr_bitmaps);
>>> +
>>
>> At this point, we should have block_mig_state.dbm_enable set (or
>> disabled), so we can skip this initialization if it is not set.
> Agree.
>>
>>> QSIMPLEQ_INSERT_TAIL(&block_mig_state.bmds_list, bmds, entry);
>>> }
>>> }
>>> @@ -481,20 +571,32 @@ static int mig_save_device_dirty(QEMUFile *f,
>>> BlkMigDevState *bmds,
>>> } else {
>>> nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>> }
>>> - blk = blk_create(bmds, sector, nr_sectors);
>>> + blk = blk_create(bmds, sector, nr_sectors,
>>> + !block_mig_state.blk_enable);
>>> +
>>> + if (block_mig_state.dbm_enable) {
>>> + blk_store_bitmaps(blk);
>>> + }
>>>
>>> if (is_async) {
>>> - blk->aiocb = bdrv_aio_readv(bmds->bs, sector,
>>> &blk->qiov,
>>> - nr_sectors,
>>> blk_mig_read_cb, blk);
>>> + if (block_mig_state.blk_enable) {
>>> + blk->aiocb = bdrv_aio_readv(bmds->bs, sector,
>>> &blk->qiov,
>>> + nr_sectors,
>>> blk_mig_read_cb,
>>> + blk);
>>> + } else {
>>> + blk_mig_read_cb(blk, 0);
>>> + }
>>>
>>> blk_mig_lock();
>>> block_mig_state.submitted++;
>>> bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
>>> blk_mig_unlock();
>>> } else {
>>> - ret = bdrv_read(bmds->bs, sector, blk->buf,
>>> nr_sectors);
>>> - if (ret < 0) {
>>> - goto error;
>>> + if (block_mig_state.blk_enable) {
>>> + ret = bdrv_read(bmds->bs, sector, blk->buf,
>>> nr_sectors);
>>> + if (ret < 0) {
>>> + goto error;
>>> + }
>>> }
>>> blk_send(f, blk);
>>>
>>> @@ -817,10 +919,39 @@ static int block_load(QEMUFile *f, void
>>> *opaque, int version_id)
>>> nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>> }
>>>
>>> + /* load dirty bitmaps */
>>> + if (flags & BLK_MIG_FLAG_HAS_BITMAPS) {
>>> + int i, nr_bitmaps = qemu_get_be32(f);
>>> +
>>> + for (i = 0; i < nr_bitmaps; ++i) {
>>> + char bitmap_name[256];
>>> + BdrvDirtyBitmap *bitmap;
>>> + int buf_size, len;
>>> +
>>> + len = qemu_get_byte(f);
>>> + qemu_get_buffer(f, (uint8_t *)bitmap_name, len);
>>> + bitmap_name[len] = '\0';
>>> +
>>> + bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>>> + if (!bitmap) {
>>> + fprintf(stderr, "Error unknown dirty bitmap\
>>> + %s for block device %s\n",
>>> + bitmap_name, device_name);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + buf_size = bdrv_dbm_data_size(bitmap, nr_sectors);
>>
>> Oh, this is why you didn't store the length... Still, since it seems
>> as if you expect the bitmap to already exist on the destination, what
>> if the granularity or size is different? It might be nice to
>> explicitly state the size of the buffer -- one user misconfiguration
>> and you might blow the rest of the migration.
>>
>> This way, if the calculated size doesn't match the received size, you
>> have the chance to throw a meaningful error.
>>
>>> + buf = g_malloc(buf_size);
>>> + qemu_get_buffer(f, buf, buf_size);
>>> + bdrv_dbm_restore_data(bitmap, buf, addr,
>>> nr_sectors);
>>> + g_free(buf);
>>> + }
>>> + }
>>> +
>>
>> So with this approach, the user needs to have created the bitmaps
>> already? Would it be possible to create a mode where they don't, or am
>> I misreading something?
>>
>> We could re-create them on the fly whenever the offset is 0 on the
>> receiving side, the only other piece of information we'd really need
>> at that point is the granularity.
> Agree.
The only problem is if we re-send the offset @ 0 a second time, or if we
skip sending zero sectors and we never receive that block.
Still, there's probably an elegant way to achieve what we want. Maybe a
pre-amble migration block, or arbitrarily the first sector we happen to
transmit carries an extra flag with it that describes the granularity of
the bitmap, or ... something.
>>
>>> if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>>> ret = bdrv_write_zeroes(bs, addr, nr_sectors,
>>> BDRV_REQ_MAY_UNMAP);
>>> - } else {
>>> + } else if (flags & BLK_MIG_FLAG_HAS_BLOCK) {
>>> buf = g_malloc(BLOCK_SIZE);
>>> qemu_get_buffer(f, buf, BLOCK_SIZE);
>>> ret = bdrv_write(bs, addr, buf, nr_sectors);
>>> @@ -855,6 +986,7 @@ static void block_set_params(const
>>> MigrationParams *params, void *opaque)
>>> {
>>> block_mig_state.blk_enable = params->blk;
>>> block_mig_state.shared_base = params->shared;
>>> + block_mig_state.dbm_enable = params->dirty;
>>>
>>> /* shared base means that blk_enable = 1 */
>>> block_mig_state.blk_enable |= params->shared;
>>> @@ -862,7 +994,8 @@ static void block_set_params(const
>>> MigrationParams *params, void *opaque)
>>>
>>> static bool block_is_active(void *opaque)
>>> {
>>> - return block_mig_state.blk_enable == 1;
>>> + return block_mig_state.blk_enable == 1 ||
>>> + block_mig_state.dbm_enable == 1;
>>> }
>>>
>>> static SaveVMHandlers savevm_block_handlers = {
>>> diff --git a/savevm.c b/savevm.c
>>> index a598d1d..1299faa 100644
>>> --- a/savevm.c
>>> +++ b/savevm.c
>>> @@ -983,6 +983,7 @@ int qemu_loadvm_state(QEMUFile *f)
>>> }
>>> }
>>>
>>> + bdrv_dbm_restore_finish();
>>
>> Do we not have a cleaner way to call this? I guess we have no explicit
>> callback mechanisms for when a particular BDS is completely done,
>> because of the multiple passes over the data that we perform --
>>
>> Still, it might be nicer to have a callback for the whole of block
>> migration, and call bdrv_dbm_restore_finish() from there instead of
>> putting this implementation detail all the way up inside of
>> qemu_loadvm_state.
>>
>> Can we migrate an empty-data sentinel section from block migration
>> upon completion that qemu_loadvm_state launches a post_load method for
>> that can invoke bdrv_dbm_restore_finish for us?
>>
>> Alternatively, can this not be safely done inside of block_load?
>> somewhere around here:
>>
>> printf("Completed %d %%%c", (int)addr,
>> (addr == 100) ? '\n' : '\r');
>> fflush(stdout);
>>
>> The "100%" completion flag is only written in block_save_complete(),
>> so once we read this flag we should have necessary and sufficient
>> information to conclude that it's safe to do the BdrvDirtyBitmap
>> migration post-load stuff.
> Good point, I think, I'll chose this approach.
>>
>>
>> Of course, maybe generic section post-load callbacks are useful to
>> other devices, and I'd be fine with either approach.
>>
>> CCing migration people for opinions.
>>
>>> cpu_synchronize_all_post_init();
>>>
>>> ret = 0;
>>>
>>
>>
>> Thank you, and sorry it took me so long to digest the series!
>> --John Snow
>>
>> P.S.: Happy New Year! I hope 2015 treats you well :~)
>
>
> With all this story about not efficient migration in case of no block in
> blk, I think, that, may be, it would be better to refuse using block
> migration bitmap to migrate bitmaps, make additional dirty bitmaps for
> dirty bitmaps and migrate bitmaps in separate migration/dirty-bitmaps.c
> file (sorry for the pun).. This will additionally provide more precise
> migration in case of resetting the bitmap while migration in progress.
>
Yeah, I tried to review this patch assuming we'd use this technique,
from a correctness standpoint. I am relying on other reviewers with more
knowledge of the migration layer to critique the general approach.
I think Paolo had some feedback on that front, but I haven't looked at
the code much to see how much I agree with his suggestion yet :)
Thanks!
--John
next prev parent reply other threads:[~2015-01-20 17:25 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-11 14:17 [Qemu-devel] [PATCH 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 1/9] block: rename bdrv_reset_dirty_bitmap Vladimir Sementsov-Ogievskiy
2015-01-08 21:19 ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 2/9] block-migration: fix pending() return value Vladimir Sementsov-Ogievskiy
2015-01-08 21:20 ` John Snow
2015-01-09 19:01 ` Dr. David Alan Gilbert
2014-12-11 14:17 ` [Qemu-devel] [PATCH 3/9] block: fix spoiling all dirty bitmaps by mirror and migration Vladimir Sementsov-Ogievskiy
2015-01-08 21:20 ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 4/9] hbitmap: store / restore Vladimir Sementsov-Ogievskiy
2015-01-08 21:21 ` John Snow
2015-01-08 21:37 ` Paolo Bonzini
2015-01-13 12:59 ` Vladimir Sementsov-Ogievskiy
2015-01-13 17:08 ` John Snow
2015-01-14 10:29 ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 5/9] block: BdrvDirtyBitmap store/restore interface Vladimir Sementsov-Ogievskiy
2015-01-08 21:22 ` John Snow
2015-01-14 11:27 ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 6/9] block-migration: tiny refactoring Vladimir Sementsov-Ogievskiy
2015-01-08 21:23 ` John Snow
2015-01-14 12:26 ` Vladimir Sementsov-Ogievskiy
2015-01-14 16:53 ` John Snow
2014-12-11 14:17 ` [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock Vladimir Sementsov-Ogievskiy
2015-01-08 21:24 ` John Snow
2015-01-16 12:54 ` Vladimir Sementsov-Ogievskiy
2015-01-08 22:28 ` Paolo Bonzini
2015-01-16 13:03 ` Vladimir Sementsov-Ogievskiy
2014-12-11 14:17 ` [Qemu-devel] [PATCH 8/9] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2014-12-11 15:18 ` Eric Blake
2014-12-15 8:33 ` Vladimir Sementsov-Ogievskiy
2015-01-08 21:51 ` John Snow
2015-01-08 22:29 ` Eric Blake
2015-01-08 22:31 ` John Snow
2015-01-08 22:37 ` Paolo Bonzini
2014-12-11 14:17 ` [Qemu-devel] [PATCH 9/9] block-migration: add named dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-08 22:05 ` John Snow
2015-01-17 17:17 ` Vladimir Sementsov-Ogievskiy
2015-01-20 17:25 ` John Snow [this message]
2015-01-08 22:36 ` Paolo Bonzini
2015-01-08 22:45 ` Eric Blake
2015-01-08 22:49 ` Paolo Bonzini
2015-01-12 14:20 ` Vladimir Sementsov-Ogievskiy
2015-01-12 14:42 ` Paolo Bonzini
2015-01-12 16:48 ` Paolo Bonzini
2015-01-12 17:31 ` John Snow
2015-01-12 19:09 ` Paolo Bonzini
2015-01-13 9:16 ` Vladimir Sementsov-Ogievskiy
2015-01-13 16:35 ` John Snow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54BE8F77.2030706@redhat.com \
--to=jsnow@redhat.com \
--cc=amit.shah@redhat.com \
--cc=den@openvz.org \
--cc=dgilbert@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=vsementsov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).