qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
To: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Peter Maydell <peter.maydell@linaro.org>,
	"Juan quin >> Juan Jose Quintela Carreira" <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	stefanha@redhat.com, den@openvz.org,
	amit Shah <amit.shah@redhat.com>,
	pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
Date: Fri, 13 Feb 2015 11:19:36 +0300	[thread overview]
Message-ID: <54DDB398.3010907@parallels.com> (raw)
In-Reply-To: <54DA793C.9020707@redhat.com>

On 11.02.2015 00:33, John Snow wrote:
> Peter Maydell: What's the right way to license a file as copied from a 
> previous version? See below, please;
>
> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you 
> think, if you would.
>
> Juan, Amit, David: Copying migration maintainers.
>
> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
>> If destination qemu is already containing a dirty bitmap with the same
>> name as a migrated bitmap, then their granularities should be the same,
>> otherwise the error will be generated. If destination qemu doesn't
>> contain such bitmap it will be created.
>>
>> format:
>>
>> 1 byte: flags
>>
>> [ 1 byte: node name size ] \  flags & DEVICE_NAME
>> [ n bytes: node name     ] /
>>
>> [ 1 byte: bitmap name size ]       \
>> [ n bytes: bitmap name     ]       | flags & BITMAP_NAME
>> [ [ be64: granularity    ] ]  flags & GRANULARITY
>>
>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>
>> [ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>> [ be32: number of sectors ] /
>>
>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>> [ n bytes: buffer   ] /
>>
>> The last chunk should contain flags & EOS. The chunk may skip device
>> and/or bitmap names, assuming them to be the same with the previous
>> chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
>> bit is sent in the end of "complete" stage of migration. So when
>> destination gets ENABLED flag it should deserialize_finish the bitmap
>> and set its enabled bit to corresponding value.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>>   include/migration/block.h |   1 +
>>   migration/Makefile.objs   |   2 +-
>>   migration/dirty-bitmap.c  | 606 
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>   vl.c                      |   1 +
>>   4 files changed, 609 insertions(+), 1 deletion(-)
>>   create mode 100644 migration/dirty-bitmap.c
>>
>> diff --git a/include/migration/block.h b/include/migration/block.h
>> index ffa8ac0..566bb9f 100644
>> --- a/include/migration/block.h
>> +++ b/include/migration/block.h
>> @@ -14,6 +14,7 @@
>>   #ifndef BLOCK_MIGRATION_H
>>   #define BLOCK_MIGRATION_H
>>
>> +void dirty_bitmap_mig_init(void);
>>   void blk_mig_init(void);
>>   int blk_mig_active(void);
>>   uint64_t blk_mig_bytes_transferred(void);
>
> OK.
>
>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>> index d929e96..9adfda9 100644
>> --- a/migration/Makefile.objs
>> +++ b/migration/Makefile.objs
>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>>   common-obj-$(CONFIG_RDMA) += rdma.o
>>   common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>
>> -common-obj-y += block.o
>> +common-obj-y += block.o dirty-bitmap.o
>>
>
> OK.
>
>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>> new file mode 100644
>> index 0000000..8621218
>> --- /dev/null
>> +++ b/migration/dirty-bitmap.c
>> @@ -0,0 +1,606 @@
>> +/*
>> + * QEMU dirty bitmap migration
>> + *
>> + * derived from migration/block.c
>> + *
>> + * Author:
>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>> + *
>> + * original copyright message:
>> + * 
>> =====================================================================
>> + * Copyright IBM, Corp. 2009
>> + *
>> + * Authors:
>> + *  Liran Schour <lirans@il.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  
>> See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Contributions after 2012-01-13 are licensed under the terms of the
>> + * GNU GPL, version 2 or (at your option) any later version.
>> + * 
>> =====================================================================
>> + */
>> +
>
> Not super familiar with the right way to do licensing here; it's 
> possible you may not need to copy the original here, but I'm not sure. 
> You will want to make it clear what license applies to /your/ work, I 
> think. Maybe Peter Maydell can clue us in.
>
>> +#include "block/block.h"
>> +#include "qemu/main-loop.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/block.h"
>> +#include "migration/migration.h"
>> +#include "qemu/hbitmap.h"
>> +#include <assert.h>
>> +
>> +#define CHUNK_SIZE                       (1 << 20)
>> +
>> +#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
>> +/* flags should be <= 0xff */
>> +
>
> We should give ourselves a little breathing room with the flags, since 
> we've only got room for one more.
Ok. Will one more byte be enough?
>
>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>> +
>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>> +#define DPRINTF(fmt, ...) \
>> +    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> +    do { } while (0)
>> +#endif
>> +
>> +typedef struct DirtyBitmapMigBitmapState {
>> +    /* Written during setup phase. */
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +    HBitmap *dirty_bitmap;
>
> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here; 
> "dirty_bitmap" is often used as a synonym (outside of this file) to 
> refer to the BdrvDirtyBitmap in general, so it's usage here can be 
> somewhat confusing.
>
> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for 
> consistency, or "meta_bitmap" as I recommend.
>
Ok
>> +    int64_t total_sectors;
>> +    uint64_t sectors_per_chunk;
>> +    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>> +
>> +    /* For bulk phase. */
>> +    bool bulk_completed;
>> +    int64_t cur_sector;
>> +    bool granularity_sent;
>> +
>> +    /* For dirty phase. */
>> +    int64_t cur_dirty;
>> +} DirtyBitmapMigBitmapState;
>> +
>> +typedef struct DirtyBitmapMigState {
>> +    int migration_enable;
>> +    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>> +
>> +    bool bulk_completed;
>> +
>> +    /* for send_bitmap() */
>> +    BlockDriverState *prev_bs;
>> +    BdrvDirtyBitmap *prev_bitmap;
>> +} DirtyBitmapMigState;
>> +
>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>> +
>> +/* read name from qemu file:
>> + * format:
>> + * 1 byte : len = name length (<256)
>> + * len bytes : name without last zero byte
>> + *
>> + * name should point to the buffer >= 256 bytes length
>> + */
>> +static char *qemu_get_name(QEMUFile *f, char *name)
>> +{
>> +    int len = qemu_get_byte(f);
>> +    qemu_get_buffer(f, (uint8_t *)name, len);
>> +    name[len] = '\0';
>> +
>> +    DPRINTF("get name: %d %s\n", len, name);
>> +
>> +    return name;
>> +}
>> +
>
> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and 
> added to qemu-file.c so others can use them.
If no objections for sharing this format, I'll do it.
>
>> +/* write name to qemu file:
>> + * format:
>> + * same as for qemu_get_name
>> + *
>> + * maximum name length is 255
>> + */
>> +static void qemu_put_name(QEMUFile *f, const char *name)
>> +{
>> +    int len = strlen(name);
>> +
>> +    DPRINTF("put name: %d %s\n", len, name);
>> +
>> +    assert(len < 256);
>> +    qemu_put_byte(f, len);
>> +    qemu_put_buffer(f, (const uint8_t *)name, len);
>> +}
>> +
>
> OK.
>
>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>> +                        uint64_t start_sector, uint32_t nr_sectors)
>> +{
>> +    BlockDriverState *bs = dbms->bs;
>> +    BdrvDirtyBitmap *bitmap = dbms->bitmap;
>> +    uint8_t flags = 0;
>> +    /* align for buffer_is_zero() */
>> +    uint64_t align = 4 * sizeof(long);
>> +    uint64_t buf_size =
>> +        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
>> +        ~(align - 1);
>> +    uint8_t *buf = g_malloc0(buf_size);
>> +
>> +    bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, 
>> nr_sectors);
>> +
>> +    if (buffer_is_zero(buf, buf_size)) {
>> +        g_free(buf);
>> +        buf = NULL;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>> +    } else {
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>> +    }
>> +
>> +    if (bs != dirty_bitmap_mig_state.prev_bs) {
>> +        dirty_bitmap_mig_state.prev_bs = bs;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>> +    }
>> +
>> +    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>> +        dirty_bitmap_mig_state.prev_bitmap = bitmap;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>> +    }
>
> OK, so we use the current bs/bitmap under consideration to detect if 
> we have switched context, and put the names in the stream when it 
> happens. OK.
>
>> +
>> +    if (dbms->granularity_sent == 0) {
>> +        dbms->granularity_sent = 1;
>> +        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>> +    }
>> +
>> +    DPRINTF("Enter send_bitmap"
>> +            "\n   flags:        %x"
>> +            "\n   start_sector: %" PRIu64
>> +            "\n   nr_sectors:   %" PRIu32
>> +            "\n   data_size:    %" PRIu64 "\n",
>> +            flags, start_sector, nr_sectors, buf_size);
>> +
>> +    qemu_put_byte(f, flags);
>> +
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> +        qemu_put_name(f, bdrv_get_device_name(bs));
>> +    }
>
> I am still not fully clear on this myself, but I think we are phasing 
> out bdrv_get_device_name. In the context of bitmaps, we are mostly 
> likely using them one-per-tree, but they /can/ be attached 
> one-per-node, so we shouldn't be trying to get the device name here, 
> but rather, the node-name.
>
> I /think/ we may want to be using bdrv_get_node_name, but I think it 
> is not currently true that all nodes WILL be named ... I am CC'ing 
> Markus Armbruster and Max Reitz for (A) A refresher course and (B) 
> Opinions on what function call might make sense here, given that we 
> wish to migrate bitmaps attached to arbitrary nodes.
Hmm.. I'm not familiar with hierarchy of nodes and devices. As I 
understand, both command_line- and qmp- created drives are created 
through blockdev_init, which creates both blk(device) and bs(node) 
through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used 
in migration/block.c.
>
>> +
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>> +            qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, 
>> bitmap));
>> +        }
>> +    } else {
>> +        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>> +    }
>> +
>
> I thought we were only migrating bitmaps with names?
> I suppose the conditional can't hurt, but I am not clear on when we 
> won't have a bitmap name here.
You are right, 'else' case is not possible.. Hmm. I've added it to be 
sure that format is not corrupted, when I decided to put granularity 
only with name. Wi won't have a bitmap name only when we send the same 
bitmap as on the previous send_bitmap() call. May be it will be better 
to use two separate if's without else and assert.
>
>> +    qemu_put_be64(f, start_sector);
>> +    qemu_put_be32(f, nr_sectors);
>> +
>> +    /* if a block is zero we need to flush here since the network
>> +     * bandwidth is now a lot higher than the storage device bandwidth.
>> +     * thus if we queue zero blocks we slow down the migration.
>> +     * also, skip writing block when migrate only dirty bitmaps. */
>> +    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>> +        qemu_fflush(f);
>> +        return;
>> +    }
>> +
>> +    qemu_put_be64(f, buf_size);
>> +    qemu_put_buffer(f, buf, buf_size);
>> +    g_free(buf);
>> +}
>> +
>> +
>> +/* Called with iothread lock taken.  */
>> +
>> +static void set_dirty_tracking(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        dbms->dirty_bitmap =
>> +            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>> +    }
>> +}
>> +
>
> OK: so we only have these dirty-dirty bitmaps when migration is 
> starting, which makes sense.
>
>> +static void unset_dirty_tracking(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>> +    }
>> +}
>> +
>
> OK.
>
>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>> +{
>> +    BlockDriverState *bs;
>> +    BdrvDirtyBitmap *bitmap;
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    dirty_bitmap_mig_state.bulk_completed = false;
>> +    dirty_bitmap_mig_state.prev_bs = NULL;
>> +    dirty_bitmap_mig_state.prev_bitmap = NULL;
>> +
>> +    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>> +        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>> +             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>> +            if (!bdrv_dirty_bitmap_name(bitmap)) {
>> +                continue;
>> +            }
>> +
>> +            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>> +            dbms->bs = bs;
>> +            dbms->bitmap = bitmap;
>> +            dbms->total_sectors = bdrv_nb_sectors(bs);
>> +            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>> +                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>> +                >> BDRV_SECTOR_BITS;
>> +
>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>> +                                 dbms, entry);
>> +        }
>> +    }
>> +}
>> +
>
> OK, but see the note below for dirty_bitmap_mig_init.
actually it is not 'init' but 'reinit' - called on every migration 
start.. Hmm. dbms_list should be cleared here before fill it again.
>
>> +/* Called with no lock taken.  */
>> +static void bulk_phase_send_chunk(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>> +                             dbms->sectors_per_chunk);
>
> What about cases where nr_sectors will put us past the end of the 
> bitmap? The bitmap serialization implementation might need a touchup 
> with this in mind.
I don't understand.. nr_sectors <=  dbms->total_sectors - 
dbms->cur_sector and it can't put us past the end...
>
>> +
>> +    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>> +
>> +    dbms->cur_sector += nr_sectors;
>> +    if (dbms->cur_sector >= dbms->total_sectors) {
>> +        dbms->bulk_completed = true;
>> +    }
>> +}
>> +
>> +/* Called with no lock taken.  */
>> +static void bulk_phase(QEMUFile *f, bool limit)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        while (!dbms->bulk_completed) {
>> +            bulk_phase_send_chunk(f, dbms);
>> +            if (limit && qemu_file_rate_limit(f)) {
>> +                return;
>> +            }
>> +        }
>> +    }
>> +
>> +    dirty_bitmap_mig_state.bulk_completed = true;
>> +}
>
> OK.
>
>> +
>> +static void blk_mig_reset_dirty_cursor(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        dbms->cur_dirty = 0;
>> +    }
>> +}
>> +
>
> OK.
>
>> +/* Called with iothread lock taken.  */
>> +static void dirty_phase_send_chunk(QEMUFile *f, 
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> +    uint32_t nr_sectors;
>> +
>> +    while (dbms->cur_dirty < dbms->total_sectors &&
>> +           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>> +        dbms->cur_dirty += dbms->sectors_per_chunk;
>> +    }
>
> OK, so we fast forward the dirty cursor while the meta-bitmap is 
> empty. Is it not worth using the HBitmapIterator here? You can reset 
> them everywhere you reset the dirty cursor, and then just fast-seek to 
> the first dirty sector.
Yes, I've thought about it, just used simpler way (copied from 
migration/block.c) for an early version of the patch set. I will do it.
>
>> +
>> +    if (dbms->cur_dirty >= dbms->total_sectors) {
>> +        return;
>> +    }
>> +
>> +    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>> +                     dbms->sectors_per_chunk);
>
> What happens if nr_sectors goes past the end?
>
>> +    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>> +    hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, 
>> dbms->sectors_per_chunk);
>> +    dbms->cur_dirty += nr_sectors;
>> +}
>> +
>> +/* Called with iothread lock taken.
>> + *
>> + * return value:
>> + * 0: too much data for max_downtime
>> + * 1: few enough data for max_downtime
>> +*/
>
> dirty_phase below doesn't have a return value.
rudimentary comment.. thanks.
>
>> +static void dirty_phase(QEMUFile *f, bool limit)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        while (dbms->cur_dirty < dbms->total_sectors) {
>> +            dirty_phase_send_chunk(f, dbms);
>> +            if (limit && qemu_file_rate_limit(f)) {
>> +                return;
>> +            }
>> +        }
>> +    }
>> +}
>> +
>
> OK.
>
>> +
>> +/* Called with iothread lock taken.  */
>> +static void dirty_bitmap_mig_cleanup(void)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +
>> +    unset_dirty_tracking();
>> +
>> +    while ((dbms = 
>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>> +        g_free(dbms);
>> +    }
>> +}
>> +
>
> OK.
>
>> +static void dirty_bitmap_migration_cancel(void *opaque)
>> +{
>> +    dirty_bitmap_mig_cleanup();
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>> +{
>> +    DPRINTF("Enter save live iterate\n");
>> +
>> +    blk_mig_reset_dirty_cursor();
>
> I suppose this is because it's easier to check if we are finished by 
> starting from sector 0 every time.
>
> A harder, but faster method may be: Use HBitmapIterators, but don't 
> reset them every iteration: just iterate until the end, and check that 
> the bitmap is empty. If the meta bitmap is empty, the dirty phase is 
> complete. If the meta bitmap is NOT empty, reset the HBI and continue 
> allowing iterations over the dirty phase.
Ok, will do.
>
>> +
>> +    if (dirty_bitmap_mig_state.bulk_completed) {
>> +        qemu_mutex_lock_iothread();
>> +        dirty_phase(f, true);
>> +        qemu_mutex_unlock_iothread();
>> +    } else {
>> +        bulk_phase(f, true);
>> +    }
>> +
>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    return dirty_bitmap_mig_state.bulk_completed;
>> +}
>> +
>> +/* Called with iothread lock taken.  */
>> +
>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +    DPRINTF("Enter save live complete\n");
>> +
>> +    if (!dirty_bitmap_mig_state.bulk_completed) {
>> +        bulk_phase(f, false);
>> +    }
>
> [Not expertly familiar with savevm:] Under what conditions can this 
> happen?
This can happen. save_complete will happen when savevm decide that 
pending data size to send is small enough. It was the case for my bugfix 
for migration/block.c about pending. To prevent save_complete when bulk 
phase isn't completed, save_pending returns (in my bugfix for 
migration/block.c) big value. Here I decided to make more honest 
save_pending, so I need to complete (if it doesn't) bulk phase in 
save_complete.
>
>> +
>> +    blk_mig_reset_dirty_cursor();
>> +    dirty_phase(f, false);
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>> +                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>> +                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
>> +
>> +        qemu_put_byte(f, flags);
>> +        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>> +        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>> +        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>> +    }
>> +
>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    DPRINTF("Dirty bitmaps migration completed\n");
>> +
>> +    dirty_bitmap_mig_cleanup();
>> +    return 0;
>> +}
>> +
>
> I suppose we don't need a flag that distinctly SAYS this is the end 
> section, since we can tell by omission of 
> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
Hmm. I think it simplifies the logic (to use EOS after each section). 
And the same approach is in migration/block.c.. It's a question about 
which format is better:  "Each section for dirty_bitmap_load ends with 
EOS" or "Each section for dirty_bitmap_load ends with EOS except the 
last one. The last one may be recognized by absent NORMAL_CHUNK and 
ZERO_CHUNK"
>
>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>> +                                          uint64_t max_size)
>> +{
>> +    DirtyBitmapMigBitmapState *dbms;
>> +    uint64_t pending = 0;
>> +
>> +    qemu_mutex_lock_iothread();
>> +
>> +    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> +        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>> +        if (!dbms->bulk_completed) {
>> +            sectors += dbms->total_sectors - dbms->cur_sector;
>> +        }
>> +        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
>> +    }
>> +
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
>> +            pending, max_size);
>> +    return pending;
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>> +{
>> +    int flags;
>> +
>> +    static char device_name[256], bitmap_name[256];
>> +    static BlockDriverState *bs;
>> +    static BdrvDirtyBitmap *bitmap;
>> +
>> +    uint8_t *buf;
>> +    uint64_t first_sector;
>> +    uint32_t  nr_sectors;
>> +    int ret;
>> +
>> +    DPRINTF("load start\n");
>> +
>> +    do {
>> +        flags = qemu_get_byte(f);
>> +        DPRINTF("flags: %x\n", flags);
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> +            qemu_get_name(f, device_name);
>> +            bs = bdrv_find(device_name);
>
> Similar to the above confusion, you may want bdrv_lookup_bs or 
> similar, since we're going to be looking for BDS nodes instead of 
> "devices."
In this case, should it be changed in migration/block.c too?
>
>> +            if (!bs) {
>> +                fprintf(stderr, "Error: unknown block device '%s'\n",
>> +                        device_name);
>> +                return -EINVAL;
>> +            }
>> +        }
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> +            if (!bs) {
>> +                fprintf(stderr, "Error: block device name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +
>> +            qemu_get_name(f, bitmap_name);
>> +            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>> +                /* First chunk from this bitmap */
>> +                uint64_t granularity = qemu_get_be64(f);
>> +                if (!bitmap) {
>> +                    Error *local_err = NULL;
>> +                    bitmap = bdrv_create_dirty_bitmap(bs, granularity,
>> + bitmap_name,
>> + &local_err);
>> +                    if (!bitmap) {
>> +                        error_report("%s", 
>> error_get_pretty(local_err));
>> +                        error_free(local_err);
>> +                        return -EINVAL;
>> +                    }
>> +                } else {
>> +                    uint64_t dest_granularity =
>> +                        bdrv_dirty_bitmap_granularity(bs, bitmap);
>> +                    if (dest_granularity != granularity) {
>> +                        fprintf(stderr,
>> +                                "Error: "
>> +                                "Migrated bitmap granularity (%" 
>> PRIu64 ") "
>> +                                "is not match with destination 
>> bitmap '%s' "
>> +                                "granularity (%" PRIu64 ")\n",
>> +                                granularity,
>> +                                bitmap_name,
>> +                                dest_granularity);
>> +                        return -EINVAL;
>> +                    }
>> +                }
>> +                bdrv_disable_dirty_bitmap(bitmap);
>> +            }
>> +            if (!bitmap) {
>> +                fprintf(stderr, "Error: unknown dirty bitmap "
>> +                        "'%s' for block device '%s'\n",
>> +                        bitmap_name, device_name);
>> +                return -EINVAL;
>> +            }
>> +        }
>> +
>> +        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>> +            bool enabled;
>> +            if (!bitmap) {
>> +                fprintf(stderr, "Error: dirty bitmap name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +            bdrv_dirty_bitmap_deserialize_finish(bitmap);
>> +            /* complete migration */
>> +            enabled = qemu_get_byte(f);
>> +            if (enabled) {
>> +                bdrv_enable_dirty_bitmap(bitmap);
>> +            }
>> +        }
>
> Oh, so you use the ENABLED flag to show that migration is over.
Yes, it was bad idea..
> If we are going to commit to a stream format for bitmaps, though, 
> maybe it's best to actually create a "COMPLETION BLOCK" flag and then 
> split this function into two pieces:
>
> (1) The part that receives regular / zero blocks, and
> (2) The part that receives completion data.
>
> That way, if we change the properties that bitmaps have down the line, 
> we aren't reliant on literally the "enabled" flag to decide what to do.
>
> Also, it might help make this fairly long function a little smaller 
> and more readable.
Ok.
>
>> +
>> +        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>> +                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>> +            if (!bs) {
>> +                fprintf(stderr, "Error: block device name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +            if (!bitmap) {
>> +                fprintf(stderr, "Error: dirty bitmap name is not 
>> set\n");
>> +                return -EINVAL;
>> +            }
>> +
>> +            first_sector = qemu_get_be64(f);
>> +            nr_sectors = qemu_get_be32(f);
>> +            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>> +
>> +
>> +            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>> +                bdrv_dirty_bitmap_deserialize_part0(bitmap, 
>> first_sector,
>> + nr_sectors);
>> +            } else {
>> +                uint64_t buf_size = qemu_get_be64(f);
>> +                uint64_t needed_size =
>> +                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>> +
>> +                if (needed_size > buf_size) {
>> +                    fprintf(stderr,
>> +                            "Error: Migrated bitmap granularity is 
>> not "
>> +                            "match with destination bitmap 
>> granularity\n");
>> +                    return -EINVAL;
>> +                }
>> +
>
> "Migrated bitmap granularity doesn't match the destination bitmap 
> granularity" perhaps.
>
>> +                buf = g_malloc(buf_size);
>> +                qemu_get_buffer(f, buf, buf_size);
>> +                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>> + first_sector,
>> +                                                   nr_sectors);
>> +                g_free(buf);
>> +            }
>> +        }
>> +
>> +        ret = qemu_file_get_error(f);
>> +        if (ret != 0) {
>> +            return ret;
>> +        }
>> +    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>> +
>> +    DPRINTF("load finish\n");
>> +    return 0;
>> +}
>> +
>> +static void dirty_bitmap_set_params(const MigrationParams *params, 
>> void *opaque)
>> +{
>> +    dirty_bitmap_mig_state.migration_enable = params->dirty;
>> +}
>> +
>
> OK; though I am not immediately aware of what changes need to happen 
> to accommodate Eric's suggestions.
This function will be dropped in v3.
>
>> +static bool dirty_bitmap_is_active(void *opaque)
>> +{
>> +    return dirty_bitmap_mig_state.migration_enable == 1;
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>> +{
>> +    init_dirty_bitmap_migration(f);
>> +
>> +    qemu_mutex_lock_iothread();
>> +    /* start track dirtyness of dirty bitmaps */
>> +    set_dirty_tracking();
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    blk_mig_reset_dirty_cursor();
>> +    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> +    return 0;
>> +}
>> +
>
> OK; see dirty_bitmap_mig_init below, though.
>
>> +static SaveVMHandlers savevm_block_handlers = {
>> +    .set_params = dirty_bitmap_set_params,
>> +    .save_live_setup = dirty_bitmap_save_setup,
>> +    .save_live_iterate = dirty_bitmap_save_iterate,
>> +    .save_live_complete = dirty_bitmap_save_complete,
>> +    .save_live_pending = dirty_bitmap_save_pending,
>> +    .load_state = dirty_bitmap_load,
>> +    .cancel = dirty_bitmap_migration_cancel,
>> +    .is_active = dirty_bitmap_is_active,
>> +};
>> +
>> +void dirty_bitmap_mig_init(void)
>> +{
>> +    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>
> Maybe I haven't looked thoroughly enough yet, but it's weird that part 
> of the dirty_bitmap_mig_state is initialized here, and the rest of it 
> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if 
> possible.
dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT 
should be called once. dirty_bitmap_save_setup is called on every 
migration start, it's like 'reinitialize'.
>
>> +
>> +    register_savevm_live(NULL, "dirty-bitmap", 0, 1, 
>> &savevm_block_handlers,
>> +                         &dirty_bitmap_mig_state);
>> +}
>
> OK.
>
>> diff --git a/vl.c b/vl.c
>> index a824a7d..dee7220 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>
>>       blk_mig_init();
>>       ram_mig_init();
>> +    dirty_bitmap_mig_init();
>>
>>       /* If the currently selected machine wishes to override the 
>> units-per-bus
>>        * property of its default HBA interface type, do so now. */
>>
>
> Hm, since dirty bitmaps are a sub-component of the block layer, would 
> it not make sense to put this hook under blk_mig_init, perhaps?
IMHO the reason to put it here is to keep all 
register_savevm_live-entities in one place.
>
>
> Overall this looks very clean compared to the intermingled format in 
> V1, and the code is organized pretty well. Just a few minor comments, 
> and I'd like to get the opinion of the migration maintainers, but I am 
> happy. Sorry it took me so long to review, please feel free to let me 
> know if you disagree with any of my opinions :)
>
> Thank you,
> --John

Thank you for reviewing my series)

-- 
Best regards,
Vladimir

  reply	other threads:[~2015-02-13  8:19 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
2015-01-27 16:17   ` Eric Blake
2015-01-27 16:23     ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:28   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-10 21:29   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-10 21:29   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-02-10 21:30   ` John Snow
2015-02-12 10:51     ` Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
2015-02-10 21:30   ` John Snow
2015-02-12 10:54     ` Vladimir Sementsov-Ogievskiy
2015-02-12 16:22       ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-10 21:31   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2015-01-27 16:20   ` Eric Blake
2015-02-04 14:42     ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:32   ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-10 21:33   ` John Snow
2015-02-13  8:19     ` Vladimir Sementsov-Ogievskiy [this message]
2015-02-13  9:06       ` Vladimir Sementsov-Ogievskiy
2015-02-13 17:32         ` John Snow
2015-02-13 17:41           ` Vladimir Sementsov-Ogievskiy
2015-02-13 20:22       ` John Snow
2015-02-16 12:06         ` Vladimir Sementsov-Ogievskiy
2015-02-16 18:18           ` John Snow
2015-02-16 18:22             ` Dr. David Alan Gilbert
2015-02-17  8:54             ` Vladimir Sementsov-Ogievskiy
2015-02-17 18:45               ` John Snow
2015-02-17 19:12                 ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DDB398.3010907@parallels.com \
    --to=vsementsov@parallels.com \
    --cc=amit.shah@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).