From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
To: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Peter Maydell <peter.maydell@linaro.org>,
"Juan quin >> Juan Jose Quintela Carreira" <quintela@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
stefanha@redhat.com, den@openvz.org,
amit Shah <amit.shah@redhat.com>,
pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
Date: Fri, 13 Feb 2015 11:19:36 +0300 [thread overview]
Message-ID: <54DDB398.3010907@parallels.com> (raw)
In-Reply-To: <54DA793C.9020707@redhat.com>
On 11.02.2015 00:33, John Snow wrote:
> Peter Maydell: What's the right way to license a file as copied from a
> previous version? See below, please;
>
> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you
> think, if you would.
>
> Juan, Amit, David: Copying migration maintainers.
>
> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
>> If destination qemu is already containing a dirty bitmap with the same
>> name as a migrated bitmap, then their granularities should be the same,
>> otherwise the error will be generated. If destination qemu doesn't
>> contain such bitmap it will be created.
>>
>> format:
>>
>> 1 byte: flags
>>
>> [ 1 byte: node name size ] \ flags & DEVICE_NAME
>> [ n bytes: node name ] /
>>
>> [ 1 byte: bitmap name size ] \
>> [ n bytes: bitmap name ] | flags & BITMAP_NAME
>> [ [ be64: granularity ] ] flags & GRANULARITY
>>
>> [ 1 byte: bitmap enabled bit ] flags & ENABLED
>>
>> [ be64: start sector ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
>> [ be32: number of sectors ] /
>>
>> [ be64: buffer size ] \ flags & NORMAL_CHUNK
>> [ n bytes: buffer ] /
>>
>> The last chunk should contain flags & EOS. The chunk may skip device
>> and/or bitmap names, assuming them to be the same with the previous
>> chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
>> bit is sent in the end of "complete" stage of migration. So when
>> destination gets ENABLED flag it should deserialize_finish the bitmap
>> and set its enabled bit to corresponding value.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> ---
>> include/migration/block.h | 1 +
>> migration/Makefile.objs | 2 +-
>> migration/dirty-bitmap.c | 606
>> ++++++++++++++++++++++++++++++++++++++++++++++
>> vl.c | 1 +
>> 4 files changed, 609 insertions(+), 1 deletion(-)
>> create mode 100644 migration/dirty-bitmap.c
>>
>> diff --git a/include/migration/block.h b/include/migration/block.h
>> index ffa8ac0..566bb9f 100644
>> --- a/include/migration/block.h
>> +++ b/include/migration/block.h
>> @@ -14,6 +14,7 @@
>> #ifndef BLOCK_MIGRATION_H
>> #define BLOCK_MIGRATION_H
>>
>> +void dirty_bitmap_mig_init(void);
>> void blk_mig_init(void);
>> int blk_mig_active(void);
>> uint64_t blk_mig_bytes_transferred(void);
>
> OK.
>
>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
>> index d929e96..9adfda9 100644
>> --- a/migration/Makefile.objs
>> +++ b/migration/Makefile.objs
>> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
>> common-obj-$(CONFIG_RDMA) += rdma.o
>> common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
>>
>> -common-obj-y += block.o
>> +common-obj-y += block.o dirty-bitmap.o
>>
>
> OK.
>
>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
>> new file mode 100644
>> index 0000000..8621218
>> --- /dev/null
>> +++ b/migration/dirty-bitmap.c
>> @@ -0,0 +1,606 @@
>> +/*
>> + * QEMU dirty bitmap migration
>> + *
>> + * derived from migration/block.c
>> + *
>> + * Author:
>> + * Sementsov-Ogievskiy Vladimir <vsementsov@parallels.com>
>> + *
>> + * original copyright message:
>> + *
>> =====================================================================
>> + * Copyright IBM, Corp. 2009
>> + *
>> + * Authors:
>> + * Liran Schour <lirans@il.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.
>> See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Contributions after 2012-01-13 are licensed under the terms of the
>> + * GNU GPL, version 2 or (at your option) any later version.
>> + *
>> =====================================================================
>> + */
>> +
>
> Not super familiar with the right way to do licensing here; it's
> possible you may not need to copy the original here, but I'm not sure.
> You will want to make it clear what license applies to /your/ work, I
> think. Maybe Peter Maydell can clue us in.
>
>> +#include "block/block.h"
>> +#include "qemu/main-loop.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/block.h"
>> +#include "migration/migration.h"
>> +#include "qemu/hbitmap.h"
>> +#include <assert.h>
>> +
>> +#define CHUNK_SIZE (1 << 20)
>> +
>> +#define DIRTY_BITMAP_MIG_FLAG_EOS 0x01
>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK 0x02
>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK 0x04
>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME 0x08
>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME 0x10
>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY 0x20
>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED 0x40
>> +/* flags should be <= 0xff */
>> +
>
> We should give ourselves a little breathing room with the flags, since
> we've only got room for one more.
Ok. Will one more byte be enough?
>
>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
>> +
>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
>> +#define DPRINTF(fmt, ...) \
>> + do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> + do { } while (0)
>> +#endif
>> +
>> +typedef struct DirtyBitmapMigBitmapState {
>> + /* Written during setup phase. */
>> + BlockDriverState *bs;
>> + BdrvDirtyBitmap *bitmap;
>> + HBitmap *dirty_bitmap;
>
> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here;
> "dirty_bitmap" is often used as a synonym (outside of this file) to
> refer to the BdrvDirtyBitmap in general, so it's usage here can be
> somewhat confusing.
>
> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for
> consistency, or "meta_bitmap" as I recommend.
>
Ok
>> + int64_t total_sectors;
>> + uint64_t sectors_per_chunk;
>> + QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
>> +
>> + /* For bulk phase. */
>> + bool bulk_completed;
>> + int64_t cur_sector;
>> + bool granularity_sent;
>> +
>> + /* For dirty phase. */
>> + int64_t cur_dirty;
>> +} DirtyBitmapMigBitmapState;
>> +
>> +typedef struct DirtyBitmapMigState {
>> + int migration_enable;
>> + QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
>> +
>> + bool bulk_completed;
>> +
>> + /* for send_bitmap() */
>> + BlockDriverState *prev_bs;
>> + BdrvDirtyBitmap *prev_bitmap;
>> +} DirtyBitmapMigState;
>> +
>> +static DirtyBitmapMigState dirty_bitmap_mig_state;
>> +
>> +/* read name from qemu file:
>> + * format:
>> + * 1 byte : len = name length (<256)
>> + * len bytes : name without last zero byte
>> + *
>> + * name should point to the buffer >= 256 bytes length
>> + */
>> +static char *qemu_get_name(QEMUFile *f, char *name)
>> +{
>> + int len = qemu_get_byte(f);
>> + qemu_get_buffer(f, (uint8_t *)name, len);
>> + name[len] = '\0';
>> +
>> + DPRINTF("get name: %d %s\n", len, name);
>> +
>> + return name;
>> +}
>> +
>
> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and
> added to qemu-file.c so others can use them.
If no objections for sharing this format, I'll do it.
>
>> +/* write name to qemu file:
>> + * format:
>> + * same as for qemu_get_name
>> + *
>> + * maximum name length is 255
>> + */
>> +static void qemu_put_name(QEMUFile *f, const char *name)
>> +{
>> + int len = strlen(name);
>> +
>> + DPRINTF("put name: %d %s\n", len, name);
>> +
>> + assert(len < 256);
>> + qemu_put_byte(f, len);
>> + qemu_put_buffer(f, (const uint8_t *)name, len);
>> +}
>> +
>
> OK.
>
>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
>> + uint64_t start_sector, uint32_t nr_sectors)
>> +{
>> + BlockDriverState *bs = dbms->bs;
>> + BdrvDirtyBitmap *bitmap = dbms->bitmap;
>> + uint8_t flags = 0;
>> + /* align for buffer_is_zero() */
>> + uint64_t align = 4 * sizeof(long);
>> + uint64_t buf_size =
>> + (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
>> + ~(align - 1);
>> + uint8_t *buf = g_malloc0(buf_size);
>> +
>> + bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector,
>> nr_sectors);
>> +
>> + if (buffer_is_zero(buf, buf_size)) {
>> + g_free(buf);
>> + buf = NULL;
>> + flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
>> + } else {
>> + flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
>> + }
>> +
>> + if (bs != dirty_bitmap_mig_state.prev_bs) {
>> + dirty_bitmap_mig_state.prev_bs = bs;
>> + flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
>> + }
>> +
>> + if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
>> + dirty_bitmap_mig_state.prev_bitmap = bitmap;
>> + flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
>> + }
>
> OK, so we use the current bs/bitmap under consideration to detect if
> we have switched context, and put the names in the stream when it
> happens. OK.
>
>> +
>> + if (dbms->granularity_sent == 0) {
>> + dbms->granularity_sent = 1;
>> + flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
>> + }
>> +
>> + DPRINTF("Enter send_bitmap"
>> + "\n flags: %x"
>> + "\n start_sector: %" PRIu64
>> + "\n nr_sectors: %" PRIu32
>> + "\n data_size: %" PRIu64 "\n",
>> + flags, start_sector, nr_sectors, buf_size);
>> +
>> + qemu_put_byte(f, flags);
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> + qemu_put_name(f, bdrv_get_device_name(bs));
>> + }
>
> I am still not fully clear on this myself, but I think we are phasing
> out bdrv_get_device_name. In the context of bitmaps, we are mostly
> likely using them one-per-tree, but they /can/ be attached
> one-per-node, so we shouldn't be trying to get the device name here,
> but rather, the node-name.
>
> I /think/ we may want to be using bdrv_get_node_name, but I think it
> is not currently true that all nodes WILL be named ... I am CC'ing
> Markus Armbruster and Max Reitz for (A) A refresher course and (B)
> Opinions on what function call might make sense here, given that we
> wish to migrate bitmaps attached to arbitrary nodes.
Hmm.. I'm not familiar with hierarchy of nodes and devices. As I
understand, both command_line- and qmp- created drives are created
through blockdev_init, which creates both blk(device) and bs(node)
through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used
in migration/block.c.
>
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> + qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>> + qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs,
>> bitmap));
>> + }
>> + } else {
>> + assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
>> + }
>> +
>
> I thought we were only migrating bitmaps with names?
> I suppose the conditional can't hurt, but I am not clear on when we
> won't have a bitmap name here.
You are right, 'else' case is not possible.. Hmm. I've added it to be
sure that format is not corrupted, when I decided to put granularity
only with name. Wi won't have a bitmap name only when we send the same
bitmap as on the previous send_bitmap() call. May be it will be better
to use two separate if's without else and assert.
>
>> + qemu_put_be64(f, start_sector);
>> + qemu_put_be32(f, nr_sectors);
>> +
>> + /* if a block is zero we need to flush here since the network
>> + * bandwidth is now a lot higher than the storage device bandwidth.
>> + * thus if we queue zero blocks we slow down the migration.
>> + * also, skip writing block when migrate only dirty bitmaps. */
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>> + qemu_fflush(f);
>> + return;
>> + }
>> +
>> + qemu_put_be64(f, buf_size);
>> + qemu_put_buffer(f, buf, buf_size);
>> + g_free(buf);
>> +}
>> +
>> +
>> +/* Called with iothread lock taken. */
>> +
>> +static void set_dirty_tracking(void)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + dbms->dirty_bitmap =
>> + bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
>> + }
>> +}
>> +
>
> OK: so we only have these dirty-dirty bitmaps when migration is
> starting, which makes sense.
>
>> +static void unset_dirty_tracking(void)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
>> + }
>> +}
>> +
>
> OK.
>
>> +static void init_dirty_bitmap_migration(QEMUFile *f)
>> +{
>> + BlockDriverState *bs;
>> + BdrvDirtyBitmap *bitmap;
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + dirty_bitmap_mig_state.bulk_completed = false;
>> + dirty_bitmap_mig_state.prev_bs = NULL;
>> + dirty_bitmap_mig_state.prev_bitmap = NULL;
>> +
>> + for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
>> + for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
>> + bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
>> + if (!bdrv_dirty_bitmap_name(bitmap)) {
>> + continue;
>> + }
>> +
>> + dbms = g_new0(DirtyBitmapMigBitmapState, 1);
>> + dbms->bs = bs;
>> + dbms->bitmap = bitmap;
>> + dbms->total_sectors = bdrv_nb_sectors(bs);
>> + dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>> + bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
>> + >> BDRV_SECTOR_BITS;
>> +
>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
>> + dbms, entry);
>> + }
>> + }
>> +}
>> +
>
> OK, but see the note below for dirty_bitmap_mig_init.
actually it is not 'init' but 'reinit' - called on every migration
start.. Hmm. dbms_list should be cleared here before fill it again.
>
>> +/* Called with no lock taken. */
>> +static void bulk_phase_send_chunk(QEMUFile *f,
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> + uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
>> + dbms->sectors_per_chunk);
>
> What about cases where nr_sectors will put us past the end of the
> bitmap? The bitmap serialization implementation might need a touchup
> with this in mind.
I don't understand.. nr_sectors <= dbms->total_sectors -
dbms->cur_sector and it can't put us past the end...
>
>> +
>> + send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
>> +
>> + dbms->cur_sector += nr_sectors;
>> + if (dbms->cur_sector >= dbms->total_sectors) {
>> + dbms->bulk_completed = true;
>> + }
>> +}
>> +
>> +/* Called with no lock taken. */
>> +static void bulk_phase(QEMUFile *f, bool limit)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + while (!dbms->bulk_completed) {
>> + bulk_phase_send_chunk(f, dbms);
>> + if (limit && qemu_file_rate_limit(f)) {
>> + return;
>> + }
>> + }
>> + }
>> +
>> + dirty_bitmap_mig_state.bulk_completed = true;
>> +}
>
> OK.
>
>> +
>> +static void blk_mig_reset_dirty_cursor(void)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + dbms->cur_dirty = 0;
>> + }
>> +}
>> +
>
> OK.
>
>> +/* Called with iothread lock taken. */
>> +static void dirty_phase_send_chunk(QEMUFile *f,
>> DirtyBitmapMigBitmapState *dbms)
>> +{
>> + uint32_t nr_sectors;
>> +
>> + while (dbms->cur_dirty < dbms->total_sectors &&
>> + !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
>> + dbms->cur_dirty += dbms->sectors_per_chunk;
>> + }
>
> OK, so we fast forward the dirty cursor while the meta-bitmap is
> empty. Is it not worth using the HBitmapIterator here? You can reset
> them everywhere you reset the dirty cursor, and then just fast-seek to
> the first dirty sector.
Yes, I've thought about it, just used simpler way (copied from
migration/block.c) for an early version of the patch set. I will do it.
>
>> +
>> + if (dbms->cur_dirty >= dbms->total_sectors) {
>> + return;
>> + }
>> +
>> + nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
>> + dbms->sectors_per_chunk);
>
> What happens if nr_sectors goes past the end?
>
>> + send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
>> + hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty,
>> dbms->sectors_per_chunk);
>> + dbms->cur_dirty += nr_sectors;
>> +}
>> +
>> +/* Called with iothread lock taken.
>> + *
>> + * return value:
>> + * 0: too much data for max_downtime
>> + * 1: few enough data for max_downtime
>> +*/
>
> dirty_phase below doesn't have a return value.
rudimentary comment.. thanks.
>
>> +static void dirty_phase(QEMUFile *f, bool limit)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + while (dbms->cur_dirty < dbms->total_sectors) {
>> + dirty_phase_send_chunk(f, dbms);
>> + if (limit && qemu_file_rate_limit(f)) {
>> + return;
>> + }
>> + }
>> + }
>> +}
>> +
>
> OK.
>
>> +
>> +/* Called with iothread lock taken. */
>> +static void dirty_bitmap_mig_cleanup(void)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> +
>> + unset_dirty_tracking();
>> +
>> + while ((dbms =
>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
>> + g_free(dbms);
>> + }
>> +}
>> +
>
> OK.
>
>> +static void dirty_bitmap_migration_cancel(void *opaque)
>> +{
>> + dirty_bitmap_mig_cleanup();
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
>> +{
>> + DPRINTF("Enter save live iterate\n");
>> +
>> + blk_mig_reset_dirty_cursor();
>
> I suppose this is because it's easier to check if we are finished by
> starting from sector 0 every time.
>
> A harder, but faster method may be: Use HBitmapIterators, but don't
> reset them every iteration: just iterate until the end, and check that
> the bitmap is empty. If the meta bitmap is empty, the dirty phase is
> complete. If the meta bitmap is NOT empty, reset the HBI and continue
> allowing iterations over the dirty phase.
Ok, will do.
>
>> +
>> + if (dirty_bitmap_mig_state.bulk_completed) {
>> + qemu_mutex_lock_iothread();
>> + dirty_phase(f, true);
>> + qemu_mutex_unlock_iothread();
>> + } else {
>> + bulk_phase(f, true);
>> + }
>> +
>> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> + return dirty_bitmap_mig_state.bulk_completed;
>> +}
>> +
>> +/* Called with iothread lock taken. */
>> +
>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> + DPRINTF("Enter save live complete\n");
>> +
>> + if (!dirty_bitmap_mig_state.bulk_completed) {
>> + bulk_phase(f, false);
>> + }
>
> [Not expertly familiar with savevm:] Under what conditions can this
> happen?
This can happen. save_complete will happen when savevm decide that
pending data size to send is small enough. It was the case for my bugfix
for migration/block.c about pending. To prevent save_complete when bulk
phase isn't completed, save_pending returns (in my bugfix for
migration/block.c) big value. Here I decided to make more honest
save_pending, so I need to complete (if it doesn't) bulk phase in
save_complete.
>
>> +
>> + blk_mig_reset_dirty_cursor();
>> + dirty_phase(f, false);
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
>> + DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
>> + DIRTY_BITMAP_MIG_FLAG_ENABLED;
>> +
>> + qemu_put_byte(f, flags);
>> + qemu_put_name(f, bdrv_get_device_name(dbms->bs));
>> + qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
>> + qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
>> + }
>> +
>> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> + DPRINTF("Dirty bitmaps migration completed\n");
>> +
>> + dirty_bitmap_mig_cleanup();
>> + return 0;
>> +}
>> +
>
> I suppose we don't need a flag that distinctly SAYS this is the end
> section, since we can tell by omission of
> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
Hmm. I think it simplifies the logic (to use EOS after each section).
And the same approach is in migration/block.c.. It's a question about
which format is better: "Each section for dirty_bitmap_load ends with
EOS" or "Each section for dirty_bitmap_load ends with EOS except the
last one. The last one may be recognized by absent NORMAL_CHUNK and
ZERO_CHUNK"
>
>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
>> + uint64_t max_size)
>> +{
>> + DirtyBitmapMigBitmapState *dbms;
>> + uint64_t pending = 0;
>> +
>> + qemu_mutex_lock_iothread();
>> +
>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
>> + uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
>> + if (!dbms->bulk_completed) {
>> + sectors += dbms->total_sectors - dbms->cur_sector;
>> + }
>> + pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
>> + }
>> +
>> + qemu_mutex_unlock_iothread();
>> +
>> + DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
>> + pending, max_size);
>> + return pending;
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
>> +{
>> + int flags;
>> +
>> + static char device_name[256], bitmap_name[256];
>> + static BlockDriverState *bs;
>> + static BdrvDirtyBitmap *bitmap;
>> +
>> + uint8_t *buf;
>> + uint64_t first_sector;
>> + uint32_t nr_sectors;
>> + int ret;
>> +
>> + DPRINTF("load start\n");
>> +
>> + do {
>> + flags = qemu_get_byte(f);
>> + DPRINTF("flags: %x\n", flags);
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
>> + qemu_get_name(f, device_name);
>> + bs = bdrv_find(device_name);
>
> Similar to the above confusion, you may want bdrv_lookup_bs or
> similar, since we're going to be looking for BDS nodes instead of
> "devices."
In this case, should it be changed in migration/block.c too?
>
>> + if (!bs) {
>> + fprintf(stderr, "Error: unknown block device '%s'\n",
>> + device_name);
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>> + if (!bs) {
>> + fprintf(stderr, "Error: block device name is not
>> set\n");
>> + return -EINVAL;
>> + }
>> +
>> + qemu_get_name(f, bitmap_name);
>> + bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
>> + /* First chunk from this bitmap */
>> + uint64_t granularity = qemu_get_be64(f);
>> + if (!bitmap) {
>> + Error *local_err = NULL;
>> + bitmap = bdrv_create_dirty_bitmap(bs, granularity,
>> + bitmap_name,
>> + &local_err);
>> + if (!bitmap) {
>> + error_report("%s",
>> error_get_pretty(local_err));
>> + error_free(local_err);
>> + return -EINVAL;
>> + }
>> + } else {
>> + uint64_t dest_granularity =
>> + bdrv_dirty_bitmap_granularity(bs, bitmap);
>> + if (dest_granularity != granularity) {
>> + fprintf(stderr,
>> + "Error: "
>> + "Migrated bitmap granularity (%"
>> PRIu64 ") "
>> + "is not match with destination
>> bitmap '%s' "
>> + "granularity (%" PRIu64 ")\n",
>> + granularity,
>> + bitmap_name,
>> + dest_granularity);
>> + return -EINVAL;
>> + }
>> + }
>> + bdrv_disable_dirty_bitmap(bitmap);
>> + }
>> + if (!bitmap) {
>> + fprintf(stderr, "Error: unknown dirty bitmap "
>> + "'%s' for block device '%s'\n",
>> + bitmap_name, device_name);
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
>> + bool enabled;
>> + if (!bitmap) {
>> + fprintf(stderr, "Error: dirty bitmap name is not
>> set\n");
>> + return -EINVAL;
>> + }
>> + bdrv_dirty_bitmap_deserialize_finish(bitmap);
>> + /* complete migration */
>> + enabled = qemu_get_byte(f);
>> + if (enabled) {
>> + bdrv_enable_dirty_bitmap(bitmap);
>> + }
>> + }
>
> Oh, so you use the ENABLED flag to show that migration is over.
Yes, it was bad idea..
> If we are going to commit to a stream format for bitmaps, though,
> maybe it's best to actually create a "COMPLETION BLOCK" flag and then
> split this function into two pieces:
>
> (1) The part that receives regular / zero blocks, and
> (2) The part that receives completion data.
>
> That way, if we change the properties that bitmaps have down the line,
> we aren't reliant on literally the "enabled" flag to decide what to do.
>
> Also, it might help make this fairly long function a little smaller
> and more readable.
Ok.
>
>> +
>> + if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
>> + DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
>> + if (!bs) {
>> + fprintf(stderr, "Error: block device name is not
>> set\n");
>> + return -EINVAL;
>> + }
>> + if (!bitmap) {
>> + fprintf(stderr, "Error: dirty bitmap name is not
>> set\n");
>> + return -EINVAL;
>> + }
>> +
>> + first_sector = qemu_get_be64(f);
>> + nr_sectors = qemu_get_be32(f);
>> + DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
>> +
>> +
>> + if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
>> + bdrv_dirty_bitmap_deserialize_part0(bitmap,
>> first_sector,
>> + nr_sectors);
>> + } else {
>> + uint64_t buf_size = qemu_get_be64(f);
>> + uint64_t needed_size =
>> + bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
>> +
>> + if (needed_size > buf_size) {
>> + fprintf(stderr,
>> + "Error: Migrated bitmap granularity is
>> not "
>> + "match with destination bitmap
>> granularity\n");
>> + return -EINVAL;
>> + }
>> +
>
> "Migrated bitmap granularity doesn't match the destination bitmap
> granularity" perhaps.
>
>> + buf = g_malloc(buf_size);
>> + qemu_get_buffer(f, buf, buf_size);
>> + bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
>> + first_sector,
>> + nr_sectors);
>> + g_free(buf);
>> + }
>> + }
>> +
>> + ret = qemu_file_get_error(f);
>> + if (ret != 0) {
>> + return ret;
>> + }
>> + } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
>> +
>> + DPRINTF("load finish\n");
>> + return 0;
>> +}
>> +
>> +static void dirty_bitmap_set_params(const MigrationParams *params,
>> void *opaque)
>> +{
>> + dirty_bitmap_mig_state.migration_enable = params->dirty;
>> +}
>> +
>
> OK; though I am not immediately aware of what changes need to happen
> to accommodate Eric's suggestions.
This function will be dropped in v3.
>
>> +static bool dirty_bitmap_is_active(void *opaque)
>> +{
>> + return dirty_bitmap_mig_state.migration_enable == 1;
>> +}
>> +
>
> OK.
>
>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
>> +{
>> + init_dirty_bitmap_migration(f);
>> +
>> + qemu_mutex_lock_iothread();
>> + /* start track dirtyness of dirty bitmaps */
>> + set_dirty_tracking();
>> + qemu_mutex_unlock_iothread();
>> +
>> + blk_mig_reset_dirty_cursor();
>> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
>> +
>> + return 0;
>> +}
>> +
>
> OK; see dirty_bitmap_mig_init below, though.
>
>> +static SaveVMHandlers savevm_block_handlers = {
>> + .set_params = dirty_bitmap_set_params,
>> + .save_live_setup = dirty_bitmap_save_setup,
>> + .save_live_iterate = dirty_bitmap_save_iterate,
>> + .save_live_complete = dirty_bitmap_save_complete,
>> + .save_live_pending = dirty_bitmap_save_pending,
>> + .load_state = dirty_bitmap_load,
>> + .cancel = dirty_bitmap_migration_cancel,
>> + .is_active = dirty_bitmap_is_active,
>> +};
>> +
>> +void dirty_bitmap_mig_init(void)
>> +{
>> + QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);
>
> Maybe I haven't looked thoroughly enough yet, but it's weird that part
> of the dirty_bitmap_mig_state is initialized here, and the rest of it
> in init_dirty_bitmap_migration. I'd prefer to keep it all together, if
> possible.
dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT
should be called once. dirty_bitmap_save_setup is called on every
migration start, it's like 'reinitialize'.
>
>> +
>> + register_savevm_live(NULL, "dirty-bitmap", 0, 1,
>> &savevm_block_handlers,
>> + &dirty_bitmap_mig_state);
>> +}
>
> OK.
>
>> diff --git a/vl.c b/vl.c
>> index a824a7d..dee7220 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)
>>
>> blk_mig_init();
>> ram_mig_init();
>> + dirty_bitmap_mig_init();
>>
>> /* If the currently selected machine wishes to override the
>> units-per-bus
>> * property of its default HBA interface type, do so now. */
>>
>
> Hm, since dirty bitmaps are a sub-component of the block layer, would
> it not make sense to put this hook under blk_mig_init, perhaps?
IMHO the reason to put it here is to keep all
register_savevm_live-entities in one place.
>
>
> Overall this looks very clean compared to the intermingled format in
> V1, and the code is organized pretty well. Just a few minor comments,
> and I'd like to get the opinion of the migration maintainers, but I am
> happy. Sorry it took me so long to review, please feel free to let me
> know if you disagree with any of my opinions :)
>
> Thank you,
> --John
Thank you for reviewing my series)
--
Best regards,
Vladimir
next prev parent reply other threads:[~2015-02-13 8:19 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-27 10:56 [Qemu-devel] [PATCH RFC v2 0/8] Dirty bitmaps migration Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 1/8] qmp: print dirty bitmap Vladimir Sementsov-Ogievskiy
2015-01-27 16:17 ` Eric Blake
2015-01-27 16:23 ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:28 ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 2/8] hbitmap: serialization Vladimir Sementsov-Ogievskiy
2015-02-10 21:29 ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 3/8] block: BdrvDirtyBitmap serialization interface Vladimir Sementsov-Ogievskiy
2015-02-10 21:29 ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 4/8] block: add dirty-dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-02-10 21:30 ` John Snow
2015-02-12 10:51 ` Vladimir Sementsov-Ogievskiy
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 5/8] block: add bdrv_dirty_bitmap_enabled() Vladimir Sementsov-Ogievskiy
2015-02-10 21:30 ` John Snow
2015-02-12 10:54 ` Vladimir Sementsov-Ogievskiy
2015-02-12 16:22 ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 6/8] block: add bdrv_next_dirty_bitmap() Vladimir Sementsov-Ogievskiy
2015-02-10 21:31 ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 7/8] migration: add dirty parameter Vladimir Sementsov-Ogievskiy
2015-01-27 16:20 ` Eric Blake
2015-02-04 14:42 ` Vladimir Sementsov-Ogievskiy
2015-02-10 21:32 ` John Snow
2015-01-27 10:56 ` [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c Vladimir Sementsov-Ogievskiy
2015-02-10 21:33 ` John Snow
2015-02-13 8:19 ` Vladimir Sementsov-Ogievskiy [this message]
2015-02-13 9:06 ` Vladimir Sementsov-Ogievskiy
2015-02-13 17:32 ` John Snow
2015-02-13 17:41 ` Vladimir Sementsov-Ogievskiy
2015-02-13 20:22 ` John Snow
2015-02-16 12:06 ` Vladimir Sementsov-Ogievskiy
2015-02-16 18:18 ` John Snow
2015-02-16 18:22 ` Dr. David Alan Gilbert
2015-02-17 8:54 ` Vladimir Sementsov-Ogievskiy
2015-02-17 18:45 ` John Snow
2015-02-17 19:12 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DDB398.3010907@parallels.com \
--to=vsementsov@parallels.com \
--cc=amit.shah@redhat.com \
--cc=den@openvz.org \
--cc=dgilbert@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).