From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39723) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YMBTe-0004G7-Dp for qemu-devel@nongnu.org; Fri, 13 Feb 2015 03:19:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YMBTZ-0002gs-9q for qemu-devel@nongnu.org; Fri, 13 Feb 2015 03:19:54 -0500 Received: from mx2.parallels.com ([199.115.105.18]:34392) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YMBTY-0002gj-UL for qemu-devel@nongnu.org; Fri, 13 Feb 2015 03:19:49 -0500 Message-ID: <54DDB398.3010907@parallels.com> Date: Fri, 13 Feb 2015 11:19:36 +0300 From: Vladimir Sementsov-Ogievskiy MIME-Version: 1.0 References: <1422356197-5285-1-git-send-email-vsementsov@parallels.com> <1422356197-5285-9-git-send-email-vsementsov@parallels.com> <54DA793C.9020707@redhat.com> In-Reply-To: <54DA793C.9020707@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow , qemu-devel@nongnu.org Cc: kwolf@redhat.com, Peter Maydell , "Juan quin >> Juan Jose Quintela Carreira" , "Dr. David Alan Gilbert" , stefanha@redhat.com, den@openvz.org, amit Shah , pbonzini@redhat.com On 11.02.2015 00:33, John Snow wrote: > Peter Maydell: What's the right way to license a file as copied from a > previous version? See below, please; > > Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you > think, if you would. > > Juan, Amit, David: Copying migration maintainers. > > On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote: >> Live migration of dirty bitmaps. Only named dirty bitmaps are migrated. >> If destination qemu is already containing a dirty bitmap with the same >> name as a migrated bitmap, then their granularities should be the same, >> otherwise the error will be generated. If destination qemu doesn't >> contain such bitmap it will be created. >> >> format: >> >> 1 byte: flags >> >> [ 1 byte: node name size ] \ flags & DEVICE_NAME >> [ n bytes: node name ] / >> >> [ 1 byte: bitmap name size ] \ >> [ n bytes: bitmap name ] | flags & BITMAP_NAME >> [ [ be64: granularity ] ] flags & GRANULARITY >> >> [ 1 byte: bitmap enabled bit ] flags & ENABLED >> >> [ be64: start sector ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK) >> [ be32: number of sectors ] / >> >> [ be64: buffer size ] \ flags & NORMAL_CHUNK >> [ n bytes: buffer ] / >> >> The last chunk should contain flags & EOS. The chunk may skip device >> and/or bitmap names, assuming them to be the same with the previous >> chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED >> bit is sent in the end of "complete" stage of migration. So when >> destination gets ENABLED flag it should deserialize_finish the bitmap >> and set its enabled bit to corresponding value. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy >> --- >> include/migration/block.h | 1 + >> migration/Makefile.objs | 2 +- >> migration/dirty-bitmap.c | 606 >> ++++++++++++++++++++++++++++++++++++++++++++++ >> vl.c | 1 + >> 4 files changed, 609 insertions(+), 1 deletion(-) >> create mode 100644 migration/dirty-bitmap.c >> >> diff --git a/include/migration/block.h b/include/migration/block.h >> index ffa8ac0..566bb9f 100644 >> --- a/include/migration/block.h >> +++ b/include/migration/block.h >> @@ -14,6 +14,7 @@ >> #ifndef BLOCK_MIGRATION_H >> #define BLOCK_MIGRATION_H >> >> +void dirty_bitmap_mig_init(void); >> void blk_mig_init(void); >> int blk_mig_active(void); >> uint64_t blk_mig_bytes_transferred(void); > > OK. > >> diff --git a/migration/Makefile.objs b/migration/Makefile.objs >> index d929e96..9adfda9 100644 >> --- a/migration/Makefile.objs >> +++ b/migration/Makefile.objs >> @@ -6,5 +6,5 @@ common-obj-y += xbzrle.o >> common-obj-$(CONFIG_RDMA) += rdma.o >> common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o >> >> -common-obj-y += block.o >> +common-obj-y += block.o dirty-bitmap.o >> > > OK. > >> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c >> new file mode 100644 >> index 0000000..8621218 >> --- /dev/null >> +++ b/migration/dirty-bitmap.c >> @@ -0,0 +1,606 @@ >> +/* >> + * QEMU dirty bitmap migration >> + * >> + * derived from migration/block.c >> + * >> + * Author: >> + * Sementsov-Ogievskiy Vladimir >> + * >> + * original copyright message: >> + * >> ===================================================================== >> + * Copyright IBM, Corp. 2009 >> + * >> + * Authors: >> + * Liran Schour >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2. >> See >> + * the COPYING file in the top-level directory. >> + * >> + * Contributions after 2012-01-13 are licensed under the terms of the >> + * GNU GPL, version 2 or (at your option) any later version. >> + * >> ===================================================================== >> + */ >> + > > Not super familiar with the right way to do licensing here; it's > possible you may not need to copy the original here, but I'm not sure. > You will want to make it clear what license applies to /your/ work, I > think. Maybe Peter Maydell can clue us in. > >> +#include "block/block.h" >> +#include "qemu/main-loop.h" >> +#include "qemu/error-report.h" >> +#include "migration/block.h" >> +#include "migration/migration.h" >> +#include "qemu/hbitmap.h" >> +#include >> + >> +#define CHUNK_SIZE (1 << 20) >> + >> +#define DIRTY_BITMAP_MIG_FLAG_EOS 0x01 >> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK 0x02 >> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK 0x04 >> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME 0x08 >> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME 0x10 >> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY 0x20 >> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED 0x40 >> +/* flags should be <= 0xff */ >> + > > We should give ourselves a little breathing room with the flags, since > we've only got room for one more. Ok. Will one more byte be enough? > >> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */ >> + >> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION >> +#define DPRINTF(fmt, ...) \ >> + do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0) >> +#else >> +#define DPRINTF(fmt, ...) \ >> + do { } while (0) >> +#endif >> + >> +typedef struct DirtyBitmapMigBitmapState { >> + /* Written during setup phase. */ >> + BlockDriverState *bs; >> + BdrvDirtyBitmap *bitmap; >> + HBitmap *dirty_bitmap; > > For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here; > "dirty_bitmap" is often used as a synonym (outside of this file) to > refer to the BdrvDirtyBitmap in general, so it's usage here can be > somewhat confusing. > > I'd appreciate "dirty_dirty_bitmap" as in your previous patch for > consistency, or "meta_bitmap" as I recommend. > Ok >> + int64_t total_sectors; >> + uint64_t sectors_per_chunk; >> + QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry; >> + >> + /* For bulk phase. */ >> + bool bulk_completed; >> + int64_t cur_sector; >> + bool granularity_sent; >> + >> + /* For dirty phase. */ >> + int64_t cur_dirty; >> +} DirtyBitmapMigBitmapState; >> + >> +typedef struct DirtyBitmapMigState { >> + int migration_enable; >> + QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list; >> + >> + bool bulk_completed; >> + >> + /* for send_bitmap() */ >> + BlockDriverState *prev_bs; >> + BdrvDirtyBitmap *prev_bitmap; >> +} DirtyBitmapMigState; >> + >> +static DirtyBitmapMigState dirty_bitmap_mig_state; >> + >> +/* read name from qemu file: >> + * format: >> + * 1 byte : len = name length (<256) >> + * len bytes : name without last zero byte >> + * >> + * name should point to the buffer >= 256 bytes length >> + */ >> +static char *qemu_get_name(QEMUFile *f, char *name) >> +{ >> + int len = qemu_get_byte(f); >> + qemu_get_buffer(f, (uint8_t *)name, len); >> + name[len] = '\0'; >> + >> + DPRINTF("get name: %d %s\n", len, name); >> + >> + return name; >> +} >> + > > OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and > added to qemu-file.c so others can use them. If no objections for sharing this format, I'll do it. > >> +/* write name to qemu file: >> + * format: >> + * same as for qemu_get_name >> + * >> + * maximum name length is 255 >> + */ >> +static void qemu_put_name(QEMUFile *f, const char *name) >> +{ >> + int len = strlen(name); >> + >> + DPRINTF("put name: %d %s\n", len, name); >> + >> + assert(len < 256); >> + qemu_put_byte(f, len); >> + qemu_put_buffer(f, (const uint8_t *)name, len); >> +} >> + > > OK. > >> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms, >> + uint64_t start_sector, uint32_t nr_sectors) >> +{ >> + BlockDriverState *bs = dbms->bs; >> + BdrvDirtyBitmap *bitmap = dbms->bitmap; >> + uint8_t flags = 0; >> + /* align for buffer_is_zero() */ >> + uint64_t align = 4 * sizeof(long); >> + uint64_t buf_size = >> + (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) & >> + ~(align - 1); >> + uint8_t *buf = g_malloc0(buf_size); >> + >> + bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, >> nr_sectors); >> + >> + if (buffer_is_zero(buf, buf_size)) { >> + g_free(buf); >> + buf = NULL; >> + flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK; >> + } else { >> + flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK; >> + } >> + >> + if (bs != dirty_bitmap_mig_state.prev_bs) { >> + dirty_bitmap_mig_state.prev_bs = bs; >> + flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME; >> + } >> + >> + if (bitmap != dirty_bitmap_mig_state.prev_bitmap) { >> + dirty_bitmap_mig_state.prev_bitmap = bitmap; >> + flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME; >> + } > > OK, so we use the current bs/bitmap under consideration to detect if > we have switched context, and put the names in the stream when it > happens. OK. > >> + >> + if (dbms->granularity_sent == 0) { >> + dbms->granularity_sent = 1; >> + flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY; >> + } >> + >> + DPRINTF("Enter send_bitmap" >> + "\n flags: %x" >> + "\n start_sector: %" PRIu64 >> + "\n nr_sectors: %" PRIu32 >> + "\n data_size: %" PRIu64 "\n", >> + flags, start_sector, nr_sectors, buf_size); >> + >> + qemu_put_byte(f, flags); >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) { >> + qemu_put_name(f, bdrv_get_device_name(bs)); >> + } > > I am still not fully clear on this myself, but I think we are phasing > out bdrv_get_device_name. In the context of bitmaps, we are mostly > likely using them one-per-tree, but they /can/ be attached > one-per-node, so we shouldn't be trying to get the device name here, > but rather, the node-name. > > I /think/ we may want to be using bdrv_get_node_name, but I think it > is not currently true that all nodes WILL be named ... I am CC'ing > Markus Armbruster and Max Reitz for (A) A refresher course and (B) > Opinions on what function call might make sense here, given that we > wish to migrate bitmaps attached to arbitrary nodes. Hmm.. I'm not familiar with hierarchy of nodes and devices. As I understand, both command_line- and qmp- created drives are created through blockdev_init, which creates both blk(device) and bs(node) through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used in migration/block.c. > >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) { >> + qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap)); >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) { >> + qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, >> bitmap)); >> + } >> + } else { >> + assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY)); >> + } >> + > > I thought we were only migrating bitmaps with names? > I suppose the conditional can't hurt, but I am not clear on when we > won't have a bitmap name here. You are right, 'else' case is not possible.. Hmm. I've added it to be sure that format is not corrupted, when I decided to put granularity only with name. Wi won't have a bitmap name only when we send the same bitmap as on the previous send_bitmap() call. May be it will be better to use two separate if's without else and assert. > >> + qemu_put_be64(f, start_sector); >> + qemu_put_be32(f, nr_sectors); >> + >> + /* if a block is zero we need to flush here since the network >> + * bandwidth is now a lot higher than the storage device bandwidth. >> + * thus if we queue zero blocks we slow down the migration. >> + * also, skip writing block when migrate only dirty bitmaps. */ >> + if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) { >> + qemu_fflush(f); >> + return; >> + } >> + >> + qemu_put_be64(f, buf_size); >> + qemu_put_buffer(f, buf, buf_size); >> + g_free(buf); >> +} >> + >> + >> +/* Called with iothread lock taken. */ >> + >> +static void set_dirty_tracking(void) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + dbms->dirty_bitmap = >> + bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE); >> + } >> +} >> + > > OK: so we only have these dirty-dirty bitmaps when migration is > starting, which makes sense. > >> +static void unset_dirty_tracking(void) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + bdrv_release_dirty_dirty_bitmap(dbms->bitmap); >> + } >> +} >> + > > OK. > >> +static void init_dirty_bitmap_migration(QEMUFile *f) >> +{ >> + BlockDriverState *bs; >> + BdrvDirtyBitmap *bitmap; >> + DirtyBitmapMigBitmapState *dbms; >> + >> + dirty_bitmap_mig_state.bulk_completed = false; >> + dirty_bitmap_mig_state.prev_bs = NULL; >> + dirty_bitmap_mig_state.prev_bitmap = NULL; >> + >> + for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) { >> + for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap; >> + bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) { >> + if (!bdrv_dirty_bitmap_name(bitmap)) { >> + continue; >> + } >> + >> + dbms = g_new0(DirtyBitmapMigBitmapState, 1); >> + dbms->bs = bs; >> + dbms->bitmap = bitmap; >> + dbms->total_sectors = bdrv_nb_sectors(bs); >> + dbms->sectors_per_chunk = CHUNK_SIZE * 8 * >> + bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap) >> + >> BDRV_SECTOR_BITS; >> + >> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list, >> + dbms, entry); >> + } >> + } >> +} >> + > > OK, but see the note below for dirty_bitmap_mig_init. actually it is not 'init' but 'reinit' - called on every migration start.. Hmm. dbms_list should be cleared here before fill it again. > >> +/* Called with no lock taken. */ >> +static void bulk_phase_send_chunk(QEMUFile *f, >> DirtyBitmapMigBitmapState *dbms) >> +{ >> + uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector, >> + dbms->sectors_per_chunk); > > What about cases where nr_sectors will put us past the end of the > bitmap? The bitmap serialization implementation might need a touchup > with this in mind. I don't understand.. nr_sectors <= dbms->total_sectors - dbms->cur_sector and it can't put us past the end... > >> + >> + send_bitmap(f, dbms, dbms->cur_sector, nr_sectors); >> + >> + dbms->cur_sector += nr_sectors; >> + if (dbms->cur_sector >= dbms->total_sectors) { >> + dbms->bulk_completed = true; >> + } >> +} >> + >> +/* Called with no lock taken. */ >> +static void bulk_phase(QEMUFile *f, bool limit) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + while (!dbms->bulk_completed) { >> + bulk_phase_send_chunk(f, dbms); >> + if (limit && qemu_file_rate_limit(f)) { >> + return; >> + } >> + } >> + } >> + >> + dirty_bitmap_mig_state.bulk_completed = true; >> +} > > OK. > >> + >> +static void blk_mig_reset_dirty_cursor(void) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + dbms->cur_dirty = 0; >> + } >> +} >> + > > OK. > >> +/* Called with iothread lock taken. */ >> +static void dirty_phase_send_chunk(QEMUFile *f, >> DirtyBitmapMigBitmapState *dbms) >> +{ >> + uint32_t nr_sectors; >> + >> + while (dbms->cur_dirty < dbms->total_sectors && >> + !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) { >> + dbms->cur_dirty += dbms->sectors_per_chunk; >> + } > > OK, so we fast forward the dirty cursor while the meta-bitmap is > empty. Is it not worth using the HBitmapIterator here? You can reset > them everywhere you reset the dirty cursor, and then just fast-seek to > the first dirty sector. Yes, I've thought about it, just used simpler way (copied from migration/block.c) for an early version of the patch set. I will do it. > >> + >> + if (dbms->cur_dirty >= dbms->total_sectors) { >> + return; >> + } >> + >> + nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty, >> + dbms->sectors_per_chunk); > > What happens if nr_sectors goes past the end? > >> + send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors); >> + hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, >> dbms->sectors_per_chunk); >> + dbms->cur_dirty += nr_sectors; >> +} >> + >> +/* Called with iothread lock taken. >> + * >> + * return value: >> + * 0: too much data for max_downtime >> + * 1: few enough data for max_downtime >> +*/ > > dirty_phase below doesn't have a return value. rudimentary comment.. thanks. > >> +static void dirty_phase(QEMUFile *f, bool limit) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + while (dbms->cur_dirty < dbms->total_sectors) { >> + dirty_phase_send_chunk(f, dbms); >> + if (limit && qemu_file_rate_limit(f)) { >> + return; >> + } >> + } >> + } >> +} >> + > > OK. > >> + >> +/* Called with iothread lock taken. */ >> +static void dirty_bitmap_mig_cleanup(void) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + >> + unset_dirty_tracking(); >> + >> + while ((dbms = >> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) { >> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry); >> + g_free(dbms); >> + } >> +} >> + > > OK. > >> +static void dirty_bitmap_migration_cancel(void *opaque) >> +{ >> + dirty_bitmap_mig_cleanup(); >> +} >> + > > OK. > >> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque) >> +{ >> + DPRINTF("Enter save live iterate\n"); >> + >> + blk_mig_reset_dirty_cursor(); > > I suppose this is because it's easier to check if we are finished by > starting from sector 0 every time. > > A harder, but faster method may be: Use HBitmapIterators, but don't > reset them every iteration: just iterate until the end, and check that > the bitmap is empty. If the meta bitmap is empty, the dirty phase is > complete. If the meta bitmap is NOT empty, reset the HBI and continue > allowing iterations over the dirty phase. Ok, will do. > >> + >> + if (dirty_bitmap_mig_state.bulk_completed) { >> + qemu_mutex_lock_iothread(); >> + dirty_phase(f, true); >> + qemu_mutex_unlock_iothread(); >> + } else { >> + bulk_phase(f, true); >> + } >> + >> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS); >> + >> + return dirty_bitmap_mig_state.bulk_completed; >> +} >> + >> +/* Called with iothread lock taken. */ >> + >> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + DPRINTF("Enter save live complete\n"); >> + >> + if (!dirty_bitmap_mig_state.bulk_completed) { >> + bulk_phase(f, false); >> + } > > [Not expertly familiar with savevm:] Under what conditions can this > happen? This can happen. save_complete will happen when savevm decide that pending data size to send is small enough. It was the case for my bugfix for migration/block.c about pending. To prevent save_complete when bulk phase isn't completed, save_pending returns (in my bugfix for migration/block.c) big value. Here I decided to make more honest save_pending, so I need to complete (if it doesn't) bulk phase in save_complete. > >> + >> + blk_mig_reset_dirty_cursor(); >> + dirty_phase(f, false); >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME | >> + DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME | >> + DIRTY_BITMAP_MIG_FLAG_ENABLED; >> + >> + qemu_put_byte(f, flags); >> + qemu_put_name(f, bdrv_get_device_name(dbms->bs)); >> + qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap)); >> + qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap)); >> + } >> + >> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS); >> + >> + DPRINTF("Dirty bitmaps migration completed\n"); >> + >> + dirty_bitmap_mig_cleanup(); >> + return 0; >> +} >> + > > I suppose we don't need a flag that distinctly SAYS this is the end > section, since we can tell by omission of > DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK. Hmm. I think it simplifies the logic (to use EOS after each section). And the same approach is in migration/block.c.. It's a question about which format is better: "Each section for dirty_bitmap_load ends with EOS" or "Each section for dirty_bitmap_load ends with EOS except the last one. The last one may be recognized by absent NORMAL_CHUNK and ZERO_CHUNK" > >> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque, >> + uint64_t max_size) >> +{ >> + DirtyBitmapMigBitmapState *dbms; >> + uint64_t pending = 0; >> + >> + qemu_mutex_lock_iothread(); >> + >> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) { >> + uint64_t sectors = hbitmap_count(dbms->dirty_bitmap); >> + if (!dbms->bulk_completed) { >> + sectors += dbms->total_sectors - dbms->cur_sector; >> + } >> + pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors); >> + } >> + >> + qemu_mutex_unlock_iothread(); >> + >> + DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n", >> + pending, max_size); >> + return pending; >> +} >> + > > OK. > >> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id) >> +{ >> + int flags; >> + >> + static char device_name[256], bitmap_name[256]; >> + static BlockDriverState *bs; >> + static BdrvDirtyBitmap *bitmap; >> + >> + uint8_t *buf; >> + uint64_t first_sector; >> + uint32_t nr_sectors; >> + int ret; >> + >> + DPRINTF("load start\n"); >> + >> + do { >> + flags = qemu_get_byte(f); >> + DPRINTF("flags: %x\n", flags); >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) { >> + qemu_get_name(f, device_name); >> + bs = bdrv_find(device_name); > > Similar to the above confusion, you may want bdrv_lookup_bs or > similar, since we're going to be looking for BDS nodes instead of > "devices." In this case, should it be changed in migration/block.c too? > >> + if (!bs) { >> + fprintf(stderr, "Error: unknown block device '%s'\n", >> + device_name); >> + return -EINVAL; >> + } >> + } >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) { >> + if (!bs) { >> + fprintf(stderr, "Error: block device name is not >> set\n"); >> + return -EINVAL; >> + } >> + >> + qemu_get_name(f, bitmap_name); >> + bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name); >> + if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) { >> + /* First chunk from this bitmap */ >> + uint64_t granularity = qemu_get_be64(f); >> + if (!bitmap) { >> + Error *local_err = NULL; >> + bitmap = bdrv_create_dirty_bitmap(bs, granularity, >> + bitmap_name, >> + &local_err); >> + if (!bitmap) { >> + error_report("%s", >> error_get_pretty(local_err)); >> + error_free(local_err); >> + return -EINVAL; >> + } >> + } else { >> + uint64_t dest_granularity = >> + bdrv_dirty_bitmap_granularity(bs, bitmap); >> + if (dest_granularity != granularity) { >> + fprintf(stderr, >> + "Error: " >> + "Migrated bitmap granularity (%" >> PRIu64 ") " >> + "is not match with destination >> bitmap '%s' " >> + "granularity (%" PRIu64 ")\n", >> + granularity, >> + bitmap_name, >> + dest_granularity); >> + return -EINVAL; >> + } >> + } >> + bdrv_disable_dirty_bitmap(bitmap); >> + } >> + if (!bitmap) { >> + fprintf(stderr, "Error: unknown dirty bitmap " >> + "'%s' for block device '%s'\n", >> + bitmap_name, device_name); >> + return -EINVAL; >> + } >> + } >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) { >> + bool enabled; >> + if (!bitmap) { >> + fprintf(stderr, "Error: dirty bitmap name is not >> set\n"); >> + return -EINVAL; >> + } >> + bdrv_dirty_bitmap_deserialize_finish(bitmap); >> + /* complete migration */ >> + enabled = qemu_get_byte(f); >> + if (enabled) { >> + bdrv_enable_dirty_bitmap(bitmap); >> + } >> + } > > Oh, so you use the ENABLED flag to show that migration is over. Yes, it was bad idea.. > If we are going to commit to a stream format for bitmaps, though, > maybe it's best to actually create a "COMPLETION BLOCK" flag and then > split this function into two pieces: > > (1) The part that receives regular / zero blocks, and > (2) The part that receives completion data. > > That way, if we change the properties that bitmaps have down the line, > we aren't reliant on literally the "enabled" flag to decide what to do. > > Also, it might help make this fairly long function a little smaller > and more readable. Ok. > >> + >> + if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK | >> + DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) { >> + if (!bs) { >> + fprintf(stderr, "Error: block device name is not >> set\n"); >> + return -EINVAL; >> + } >> + if (!bitmap) { >> + fprintf(stderr, "Error: dirty bitmap name is not >> set\n"); >> + return -EINVAL; >> + } >> + >> + first_sector = qemu_get_be64(f); >> + nr_sectors = qemu_get_be32(f); >> + DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors); >> + >> + >> + if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) { >> + bdrv_dirty_bitmap_deserialize_part0(bitmap, >> first_sector, >> + nr_sectors); >> + } else { >> + uint64_t buf_size = qemu_get_be64(f); >> + uint64_t needed_size = >> + bdrv_dirty_bitmap_data_size(bitmap, nr_sectors); >> + >> + if (needed_size > buf_size) { >> + fprintf(stderr, >> + "Error: Migrated bitmap granularity is >> not " >> + "match with destination bitmap >> granularity\n"); >> + return -EINVAL; >> + } >> + > > "Migrated bitmap granularity doesn't match the destination bitmap > granularity" perhaps. > >> + buf = g_malloc(buf_size); >> + qemu_get_buffer(f, buf, buf_size); >> + bdrv_dirty_bitmap_deserialize_part(bitmap, buf, >> + first_sector, >> + nr_sectors); >> + g_free(buf); >> + } >> + } >> + >> + ret = qemu_file_get_error(f); >> + if (ret != 0) { >> + return ret; >> + } >> + } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS)); >> + >> + DPRINTF("load finish\n"); >> + return 0; >> +} >> + >> +static void dirty_bitmap_set_params(const MigrationParams *params, >> void *opaque) >> +{ >> + dirty_bitmap_mig_state.migration_enable = params->dirty; >> +} >> + > > OK; though I am not immediately aware of what changes need to happen > to accommodate Eric's suggestions. This function will be dropped in v3. > >> +static bool dirty_bitmap_is_active(void *opaque) >> +{ >> + return dirty_bitmap_mig_state.migration_enable == 1; >> +} >> + > > OK. > >> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) >> +{ >> + init_dirty_bitmap_migration(f); >> + >> + qemu_mutex_lock_iothread(); >> + /* start track dirtyness of dirty bitmaps */ >> + set_dirty_tracking(); >> + qemu_mutex_unlock_iothread(); >> + >> + blk_mig_reset_dirty_cursor(); >> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS); >> + >> + return 0; >> +} >> + > > OK; see dirty_bitmap_mig_init below, though. > >> +static SaveVMHandlers savevm_block_handlers = { >> + .set_params = dirty_bitmap_set_params, >> + .save_live_setup = dirty_bitmap_save_setup, >> + .save_live_iterate = dirty_bitmap_save_iterate, >> + .save_live_complete = dirty_bitmap_save_complete, >> + .save_live_pending = dirty_bitmap_save_pending, >> + .load_state = dirty_bitmap_load, >> + .cancel = dirty_bitmap_migration_cancel, >> + .is_active = dirty_bitmap_is_active, >> +}; >> + >> +void dirty_bitmap_mig_init(void) >> +{ >> + QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list); > > Maybe I haven't looked thoroughly enough yet, but it's weird that part > of the dirty_bitmap_mig_state is initialized here, and the rest of it > in init_dirty_bitmap_migration. I'd prefer to keep it all together, if > possible. dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT should be called once. dirty_bitmap_save_setup is called on every migration start, it's like 'reinitialize'. > >> + >> + register_savevm_live(NULL, "dirty-bitmap", 0, 1, >> &savevm_block_handlers, >> + &dirty_bitmap_mig_state); >> +} > > OK. > >> diff --git a/vl.c b/vl.c >> index a824a7d..dee7220 100644 >> --- a/vl.c >> +++ b/vl.c >> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp) >> >> blk_mig_init(); >> ram_mig_init(); >> + dirty_bitmap_mig_init(); >> >> /* If the currently selected machine wishes to override the >> units-per-bus >> * property of its default HBA interface type, do so now. */ >> > > Hm, since dirty bitmaps are a sub-component of the block layer, would > it not make sense to put this hook under blk_mig_init, perhaps? IMHO the reason to put it here is to keep all register_savevm_live-entities in one place. > > > Overall this looks very clean compared to the intermingled format in > V1, and the code is organized pretty well. Just a few minor comments, > and I'd like to get the opinion of the migration maintainers, but I am > happy. Sorry it took me so long to review, please feel free to let me > know if you disagree with any of my opinions :) > > Thank you, > --John Thank you for reviewing my series) -- Best regards, Vladimir