From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60144) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YNQFl-0001YM-0s for qemu-devel@nongnu.org; Mon, 16 Feb 2015 13:18:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YNQFg-0001Qq-Cy for qemu-devel@nongnu.org; Mon, 16 Feb 2015 13:18:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43568) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YNQFf-0001Qa-Vs for qemu-devel@nongnu.org; Mon, 16 Feb 2015 13:18:36 -0500 Message-ID: <54E23477.9040801@redhat.com> Date: Mon, 16 Feb 2015 13:18:31 -0500 From: John Snow MIME-Version: 1.0 References: <1422356197-5285-1-git-send-email-vsementsov@parallels.com> <1422356197-5285-9-git-send-email-vsementsov@parallels.com> <54DA793C.9020707@redhat.com> <54DDB398.3010907@parallels.com> <54DE5D0F.5080304@redhat.com> <54E1DD49.3050102@parallels.com> In-Reply-To: <54E1DD49.3050102@parallels.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org Cc: kwolf@redhat.com, Peter Maydell , "Juan quin >> Juan Jose Quintela Carreira" , "Dr. David Alan Gilbert" , stefanha@redhat.com, den@openvz.org, amit Shah , pbonzini@redhat.com On 02/16/2015 07:06 AM, Vladimir Sementsov-Ogievskiy wrote: > On 13.02.2015 23:22, John Snow wrote: >> >> >> On 02/13/2015 03:19 AM, Vladimir Sementsov-Ogievskiy wrote: >>> On 11.02.2015 00:33, John Snow wrote: >>>> Peter Maydell: What's the right way to license a file as copied from= a >>>> previous version? See below, please; >>>> >>>> Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you >>>> think, if you would. >>>> >>>> Juan, Amit, David: Copying migration maintainers. >>>> >>>> On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>> Live migration of dirty bitmaps. Only named dirty bitmaps are >>>>> migrated. >>>>> If destination qemu is already containing a dirty bitmap with the s= ame >>>>> name as a migrated bitmap, then their granularities should be the >>>>> same, >>>>> otherwise the error will be generated. If destination qemu doesn't >>>>> contain such bitmap it will be created. >>>>> >>>>> format: >>>>> >>>>> 1 byte: flags >>>>> >>>>> [ 1 byte: node name size ] \ flags & DEVICE_NAME >>>>> [ n bytes: node name ] / >>>>> >>>>> [ 1 byte: bitmap name size ] \ >>>>> [ n bytes: bitmap name ] | flags & BITMAP_NAME >>>>> [ [ be64: granularity ] ] flags & GRANULARITY >>>>> >>>>> [ 1 byte: bitmap enabled bit ] flags & ENABLED >>>>> >>>>> [ be64: start sector ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK) >>>>> [ be32: number of sectors ] / >>>>> >>>>> [ be64: buffer size ] \ flags & NORMAL_CHUNK >>>>> [ n bytes: buffer ] / >>>>> >>>>> The last chunk should contain flags & EOS. The chunk may skip devic= e >>>>> and/or bitmap names, assuming them to be the same with the previous >>>>> chunk. GRANULARITY is sent with the first chunk for the bitmap. >>>>> ENABLED >>>>> bit is sent in the end of "complete" stage of migration. So when >>>>> destination gets ENABLED flag it should deserialize_finish the bitm= ap >>>>> and set its enabled bit to corresponding value. >>>>> >>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>>>> --- >>>>> include/migration/block.h | 1 + >>>>> migration/Makefile.objs | 2 +- >>>>> migration/dirty-bitmap.c | 606 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++ >>>>> vl.c | 1 + >>>>> 4 files changed, 609 insertions(+), 1 deletion(-) >>>>> create mode 100644 migration/dirty-bitmap.c >>>>> >>>>> diff --git a/include/migration/block.h b/include/migration/block.h >>>>> index ffa8ac0..566bb9f 100644 >>>>> --- a/include/migration/block.h >>>>> +++ b/include/migration/block.h >>>>> @@ -14,6 +14,7 @@ >>>>> #ifndef BLOCK_MIGRATION_H >>>>> #define BLOCK_MIGRATION_H >>>>> >>>>> +void dirty_bitmap_mig_init(void); >>>>> void blk_mig_init(void); >>>>> int blk_mig_active(void); >>>>> uint64_t blk_mig_bytes_transferred(void); >>>> >>>> OK. >>>> >>>>> diff --git a/migration/Makefile.objs b/migration/Makefile.objs >>>>> index d929e96..9adfda9 100644 >>>>> --- a/migration/Makefile.objs >>>>> +++ b/migration/Makefile.objs >>>>> @@ -6,5 +6,5 @@ common-obj-y +=3D xbzrle.o >>>>> common-obj-$(CONFIG_RDMA) +=3D rdma.o >>>>> common-obj-$(CONFIG_POSIX) +=3D exec.o unix.o fd.o >>>>> >>>>> -common-obj-y +=3D block.o >>>>> +common-obj-y +=3D block.o dirty-bitmap.o >>>>> >>>> >>>> OK. >>>> >>>>> diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c >>>>> new file mode 100644 >>>>> index 0000000..8621218 >>>>> --- /dev/null >>>>> +++ b/migration/dirty-bitmap.c >>>>> @@ -0,0 +1,606 @@ >>>>> +/* >>>>> + * QEMU dirty bitmap migration >>>>> + * >>>>> + * derived from migration/block.c >>>>> + * >>>>> + * Author: >>>>> + * Sementsov-Ogievskiy Vladimir >>>>> + * >>>>> + * original copyright message: >>>>> + * >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> + * Copyright IBM, Corp. 2009 >>>>> + * >>>>> + * Authors: >>>>> + * Liran Schour >>>>> + * >>>>> + * This work is licensed under the terms of the GNU GPL, version >>>>> 2. See >>>>> + * the COPYING file in the top-level directory. >>>>> + * >>>>> + * Contributions after 2012-01-13 are licensed under the terms of = the >>>>> + * GNU GPL, version 2 or (at your option) any later version. >>>>> + * >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> + */ >>>>> + >>>> >>>> Not super familiar with the right way to do licensing here; it's >>>> possible you may not need to copy the original here, but I'm not sur= e. >>>> You will want to make it clear what license applies to /your/ work, = I >>>> think. Maybe Peter Maydell can clue us in. >>>> >>>>> +#include "block/block.h" >>>>> +#include "qemu/main-loop.h" >>>>> +#include "qemu/error-report.h" >>>>> +#include "migration/block.h" >>>>> +#include "migration/migration.h" >>>>> +#include "qemu/hbitmap.h" >>>>> +#include >>>>> + >>>>> +#define CHUNK_SIZE (1 << 20) >>>>> + >>>>> +#define DIRTY_BITMAP_MIG_FLAG_EOS 0x01 >>>>> +#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK 0x02 >>>>> +#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK 0x04 >>>>> +#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME 0x08 >>>>> +#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME 0x10 >>>>> +#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY 0x20 >>>>> +#define DIRTY_BITMAP_MIG_FLAG_ENABLED 0x40 >>>>> +/* flags should be <=3D 0xff */ >>>>> + >>>> >>>> We should give ourselves a little breathing room with the flags, sin= ce >>>> we've only got room for one more. >>> Ok. Will one more byte be enough? >> >> I should hope so. If you do add a completion chunk and flag, that >> fills up the first byte completely, so having a second byte is a good >> idea. >> >> I might recommend reserving the last bit of the second byte to be a >> flag such as DIRTY_BITMAP_EXTRA_FLAGS that indicates the presence of >> additional byte(s) of flags, to be determined later, if we ever need >> them, but two bytes for now should be sufficient. > Ok. >> >>>> >>>>> +/* #define DEBUG_DIRTY_BITMAP_MIGRATION */ >>>>> + >>>>> +#ifdef DEBUG_DIRTY_BITMAP_MIGRATION >>>>> +#define DPRINTF(fmt, ...) \ >>>>> + do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while = (0) >>>>> +#else >>>>> +#define DPRINTF(fmt, ...) \ >>>>> + do { } while (0) >>>>> +#endif >>>>> + >>>>> +typedef struct DirtyBitmapMigBitmapState { >>>>> + /* Written during setup phase. */ >>>>> + BlockDriverState *bs; >>>>> + BdrvDirtyBitmap *bitmap; >>>>> + HBitmap *dirty_bitmap; >>>> >>>> For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here= ; >>>> "dirty_bitmap" is often used as a synonym (outside of this file) to >>>> refer to the BdrvDirtyBitmap in general, so it's usage here can be >>>> somewhat confusing. >>>> >>>> I'd appreciate "dirty_dirty_bitmap" as in your previous patch for >>>> consistency, or "meta_bitmap" as I recommend. >>>> >>> Ok >>>>> + int64_t total_sectors; >>>>> + uint64_t sectors_per_chunk; >>>>> + QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry; >>>>> + >>>>> + /* For bulk phase. */ >>>>> + bool bulk_completed; >>>>> + int64_t cur_sector; >>>>> + bool granularity_sent; >>>>> + >>>>> + /* For dirty phase. */ >>>>> + int64_t cur_dirty; >>>>> +} DirtyBitmapMigBitmapState; >>>>> + >>>>> +typedef struct DirtyBitmapMigState { >>>>> + int migration_enable; >>>>> + QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list; >>>>> + >>>>> + bool bulk_completed; >>>>> + >>>>> + /* for send_bitmap() */ >>>>> + BlockDriverState *prev_bs; >>>>> + BdrvDirtyBitmap *prev_bitmap; >>>>> +} DirtyBitmapMigState; >>>>> + >>>>> +static DirtyBitmapMigState dirty_bitmap_mig_state; >>>>> + >>>>> +/* read name from qemu file: >>>>> + * format: >>>>> + * 1 byte : len =3D name length (<256) >>>>> + * len bytes : name without last zero byte >>>>> + * >>>>> + * name should point to the buffer >=3D 256 bytes length >>>>> + */ >>>>> +static char *qemu_get_name(QEMUFile *f, char *name) >>>>> +{ >>>>> + int len =3D qemu_get_byte(f); >>>>> + qemu_get_buffer(f, (uint8_t *)name, len); >>>>> + name[len] =3D '\0'; >>>>> + >>>>> + DPRINTF("get name: %d %s\n", len, name); >>>>> + >>>>> + return name; >>>>> +} >>>>> + >>>> >>>> OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and >>>> added to qemu-file.c so others can use them. >>> If no objections for sharing this format, I'll do it. >>>> >>>>> +/* write name to qemu file: >>>>> + * format: >>>>> + * same as for qemu_get_name >>>>> + * >>>>> + * maximum name length is 255 >>>>> + */ >>>>> +static void qemu_put_name(QEMUFile *f, const char *name) >>>>> +{ >>>>> + int len =3D strlen(name); >>>>> + >>>>> + DPRINTF("put name: %d %s\n", len, name); >>>>> + >>>>> + assert(len < 256); >>>>> + qemu_put_byte(f, len); >>>>> + qemu_put_buffer(f, (const uint8_t *)name, len); >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *db= ms, >>>>> + uint64_t start_sector, uint32_t nr_sectors= ) >>>>> +{ >>>>> + BlockDriverState *bs =3D dbms->bs; >>>>> + BdrvDirtyBitmap *bitmap =3D dbms->bitmap; >>>>> + uint8_t flags =3D 0; >>>>> + /* align for buffer_is_zero() */ >>>>> + uint64_t align =3D 4 * sizeof(long); >>>>> + uint64_t buf_size =3D >>>>> + (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - >>>>> 1) & >>>>> + ~(align - 1); >>>>> + uint8_t *buf =3D g_malloc0(buf_size); >>>>> + >>>>> + bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, >>>>> nr_sectors); >>>>> + >>>>> + if (buffer_is_zero(buf, buf_size)) { >>>>> + g_free(buf); >>>>> + buf =3D NULL; >>>>> + flags |=3D DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK; >>>>> + } else { >>>>> + flags |=3D DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK; >>>>> + } >>>>> + >>>>> + if (bs !=3D dirty_bitmap_mig_state.prev_bs) { >>>>> + dirty_bitmap_mig_state.prev_bs =3D bs; >>>>> + flags |=3D DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME; >>>>> + } >>>>> + >>>>> + if (bitmap !=3D dirty_bitmap_mig_state.prev_bitmap) { >>>>> + dirty_bitmap_mig_state.prev_bitmap =3D bitmap; >>>>> + flags |=3D DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME; >>>>> + } >>>> >>>> OK, so we use the current bs/bitmap under consideration to detect if >>>> we have switched context, and put the names in the stream when it >>>> happens. OK. >>>> >>>>> + >>>>> + if (dbms->granularity_sent =3D=3D 0) { >>>>> + dbms->granularity_sent =3D 1; >>>>> + flags |=3D DIRTY_BITMAP_MIG_FLAG_GRANULARITY; >>>>> + } >>>>> + >>>>> + DPRINTF("Enter send_bitmap" >>>>> + "\n flags: %x" >>>>> + "\n start_sector: %" PRIu64 >>>>> + "\n nr_sectors: %" PRIu32 >>>>> + "\n data_size: %" PRIu64 "\n", >>>>> + flags, start_sector, nr_sectors, buf_size); >>>>> + >>>>> + qemu_put_byte(f, flags); >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) { >>>>> + qemu_put_name(f, bdrv_get_device_name(bs)); >>>>> + } >>>> >>>> I am still not fully clear on this myself, but I think we are phasin= g >>>> out bdrv_get_device_name. In the context of bitmaps, we are mostly >>>> likely using them one-per-tree, but they /can/ be attached >>>> one-per-node, so we shouldn't be trying to get the device name here, >>>> but rather, the node-name. >>>> >>>> I /think/ we may want to be using bdrv_get_node_name, but I think it >>>> is not currently true that all nodes WILL be named ... I am CC'ing >>>> Markus Armbruster and Max Reitz for (A) A refresher course and (B) >>>> Opinions on what function call might make sense here, given that we >>>> wish to migrate bitmaps attached to arbitrary nodes. >>> Hmm.. I'm not familiar with hierarchy of nodes and devices. As I >>> understand, both command_line- and qmp- created drives are created >>> through blockdev_init, which creates both blk(device) and bs(node) >>> through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is u= sed >>> in migration/block.c. >> >> Now that I'm more awake, here's a better rundown of what's going on: >> >> It's something that is a little bit in flux right now, unfortunately. >> We're trying to transition to a format where we have arbitrarily >> complex Block trees, where the root of the tree is always a >> BlockBackend (See the big series by Max Reitz) and the configuration >> of the tree may become arbitrarily complex. >> >> Simple trees may consist of just one BlockBackend and one >> BlockDriverState node, where I think we can refer to this BDS as the >> "root node," not to be confused with the BlockBackend "root." The >> BlockBackend is a relatively new invention, so it isn't actually used >> consistently everywhere yet. >> >> In the future, we may have commands that make distinctions based on if >> you want to work on the BlockBackend, the root node under the >> blockbackend associated with a BDS, only the explicit node/BDS you >> identify, or some combination of the above semantics. >> >> As of right now, bitmaps can be *attached* to any arbitrary node, >> though they are currently only *useful* when attached to the first >> child of the BlockBackend, the root node. It's only useful currently >> in cases where it is attached to the root because I've only proposed >> patches for adding bitmap support to produce incrementals for Drive >> Backup, which operates only on drives/devices (the root node of a tree= .) >> >> However, in the context of migrating, it could be that we want to >> migrate any bitmaps attached to /any/ nodes, so we should be careful >> about what names we are pulling - we don't necessarily want the name >> of the root node or BlockBackend, we may want the BDS and accompanying >> name of strictly the node the bitmap is attached to. >> >> I know other areas of the code don't provide a good example for this >> distinction, yet, but the block layer people are actively working on >> fixing that. (See also the back-and-forth reviews for what to name my >> QMP parameters in the incremental backup patches for some overview of >> this semantic transition.) >> >> That said, We should think carefully about *which* name we want to put >> in the stream and what implications it has for migration. >> >> >> (1) bdrv_get_node_name and bdrv_find_node >> >> This would migrate bitmaps as attached to their specific BDS. This >> would mean that the node layout on the destination is either >> identical, or similar enough such that no named bitmaps are attached >> to a node not present on the destination. >> >> This gives us precision: bitmaps may be attached lower in the tree and >> can provide more fine-grained detail for which layers have been >> changed or modified during runtime. >> >> This also gives us fragility: In cases where we transfer, say, a >> complex tree of nodes and collapse it to a single destination drive, >> we'd be unable to migrate bitmaps not attached to the root along with >> it, because they'd have nowhere meaningful to attach. >> >> It is perhaps somewhat unneccessary at this exact moment in time, as >> well, because bitmaps are currently only useful on root nodes. >> >> (2) bdrv_get_device_name and bdrv_lookup_bs(device_name, NULL, errp) >> >> This would migrate any bitmaps in a tree and attach them to the entire >> drive on the destination. >> >> This is simpler: You just need to make sure that the root nodes have >> the same names, which is a lot easier to manage. >> >> This matches how drive migration currently appears to work: The entire >> tree appears to be generally squashed into a single node and >> transferred cluster-by-cluster, without general consideration as to >> the layout of the local block tree. As we both know by now, none of >> the metadata is transferred, just the data. >> >> It prevents migration of just bitmaps where you WANT the extra >> complexity: If a bitmap is attached lower in the tree, re-affixing it >> to the root of a destination tree might invalidate the semantics of >> what that bitmap was meant to track, and it may become useless. >> >> >> So in summary: >> using device names is probably fine for now, as it matches the current >> use case of bitmaps as well as drive migration; but using node names >> may give us more power and precision later. >> >> I talked to Max about it, and he is leaning towards using device names >> for now and switching to node names if we decide we want that power. >> >> (...I wonder if we could use a flag, for now, that says we're >> including DEVICE names. Later, we could add a flag that says we're >> using NODE names and add an option to toggle as the usage case sees fi= t.) >> >> >> Are you confused yet? :D > O, thanks for the explanation). Are we really need this flag? As Markus > wrote, nodes and devices are sharing namespaces.. We can use > bdrv_lookup_bs(name, name, errp).. what 'name' are you using here, though? It looked to me like in your=20 backup routine we got a list of BDS entries and get the name *from* the=20 BDS, so we still have to think about how we want to /get/ the name. > > Also, we can, for example, send bitmaps as follows: > > if node has name - send bitmap with this name > if node is root, but hasn't name - send it with blk name > otherwise - don't send the bitmap The node a bitmap is attached to should always have a name -- it would=20 not be possible via the existing interface to attach it to a node=20 without a name. I *think* the root node should always have a name, but I am actually=20 less sure of that. >> >> >>>> >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) { >>>>> + qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap)); >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) { >>>>> + qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, >>>>> bitmap)); >>>>> + } >>>>> + } else { >>>>> + assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY)); >>>>> + } >>>>> + >>>> >>>> I thought we were only migrating bitmaps with names? >>>> I suppose the conditional can't hurt, but I am not clear on when we >>>> won't have a bitmap name here. >>> You are right, 'else' case is not possible.. Hmm. I've added it to be >>> sure that format is not corrupted, when I decided to put granularity >>> only with name. Wi won't have a bitmap name only when we send the sam= e >>> bitmap as on the previous send_bitmap() call. May be it will be bette= r >>> to use two separate if's without else and assert. >> >> It's okay if it is just "paranoia," but I was just checking. It would >> make a decent assert(). >> >>>> >>>>> + qemu_put_be64(f, start_sector); >>>>> + qemu_put_be32(f, nr_sectors); >>>>> + >>>>> + /* if a block is zero we need to flush here since the network >>>>> + * bandwidth is now a lot higher than the storage device >>>>> bandwidth. >>>>> + * thus if we queue zero blocks we slow down the migration. >>>>> + * also, skip writing block when migrate only dirty bitmaps. *= / >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) { >>>>> + qemu_fflush(f); >>>>> + return; >>>>> + } >>>>> + >>>>> + qemu_put_be64(f, buf_size); >>>>> + qemu_put_buffer(f, buf, buf_size); >>>>> + g_free(buf); >>>>> +} >>>>> + >>>>> + >>>>> +/* Called with iothread lock taken. */ >>>>> + >>>>> +static void set_dirty_tracking(void) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + dbms->dirty_bitmap =3D >>>>> + bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZ= E); >>>>> + } >>>>> +} >>>>> + >>>> >>>> OK: so we only have these dirty-dirty bitmaps when migration is >>>> starting, which makes sense. >>>> >>>>> +static void unset_dirty_tracking(void) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + bdrv_release_dirty_dirty_bitmap(dbms->bitmap); >>>>> + } >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +static void init_dirty_bitmap_migration(QEMUFile *f) >>>>> +{ >>>>> + BlockDriverState *bs; >>>>> + BdrvDirtyBitmap *bitmap; >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + dirty_bitmap_mig_state.bulk_completed =3D false; >>>>> + dirty_bitmap_mig_state.prev_bs =3D NULL; >>>>> + dirty_bitmap_mig_state.prev_bitmap =3D NULL; >>>>> + >>>>> + for (bs =3D bdrv_next(NULL); bs; bs =3D bdrv_next(bs)) { >>>>> + for (bitmap =3D bdrv_next_dirty_bitmap(bs, NULL); bitmap; >>>>> + bitmap =3D bdrv_next_dirty_bitmap(bs, bitmap)) { >>>>> + if (!bdrv_dirty_bitmap_name(bitmap)) { >>>>> + continue; >>>>> + } >>>>> + >>>>> + dbms =3D g_new0(DirtyBitmapMigBitmapState, 1); >>>>> + dbms->bs =3D bs; >>>>> + dbms->bitmap =3D bitmap; >>>>> + dbms->total_sectors =3D bdrv_nb_sectors(bs); >>>>> + dbms->sectors_per_chunk =3D CHUNK_SIZE * 8 * >>>>> + bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitm= ap) >>>>> + >> BDRV_SECTOR_BITS; >>>>> + >>>>> + QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list, >>>>> + dbms, entry); >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>> >>>> OK, but see the note below for dirty_bitmap_mig_init. >>> actually it is not 'init' but 'reinit' - called on every migration >>> start.. Hmm. dbms_list should be cleared here before fill it again. >>>> >>>>> +/* Called with no lock taken. */ >>>>> +static void bulk_phase_send_chunk(QEMUFile *f, >>>>> DirtyBitmapMigBitmapState *dbms) >>>>> +{ >>>>> + uint32_t nr_sectors =3D MIN(dbms->total_sectors - dbms->cur_se= ctor, >>>>> + dbms->sectors_per_chunk); >>>> >>>> What about cases where nr_sectors will put us past the end of the >>>> bitmap? The bitmap serialization implementation might need a touchup >>>> with this in mind. >>> I don't understand.. nr_sectors <=3D dbms->total_sectors - >>> dbms->cur_sector and it can't put us past the end... >> >> Oh, because you take the minimum, so we don't have to worry about >> sectors_per_chunk eclipsing what we have. >> >> Nevermind, I can't read... :( >> >>>> >>>>> + >>>>> + send_bitmap(f, dbms, dbms->cur_sector, nr_sectors); >>>>> + >>>>> + dbms->cur_sector +=3D nr_sectors; >>>>> + if (dbms->cur_sector >=3D dbms->total_sectors) { >>>>> + dbms->bulk_completed =3D true; >>>>> + } >>>>> +} >>>>> + >>>>> +/* Called with no lock taken. */ >>>>> +static void bulk_phase(QEMUFile *f, bool limit) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + while (!dbms->bulk_completed) { >>>>> + bulk_phase_send_chunk(f, dbms); >>>>> + if (limit && qemu_file_rate_limit(f)) { >>>>> + return; >>>>> + } >>>>> + } >>>>> + } >>>>> + >>>>> + dirty_bitmap_mig_state.bulk_completed =3D true; >>>>> +} >>>> >>>> OK. >>>> >>>>> + >>>>> +static void blk_mig_reset_dirty_cursor(void) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + dbms->cur_dirty =3D 0; >>>>> + } >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +/* Called with iothread lock taken. */ >>>>> +static void dirty_phase_send_chunk(QEMUFile *f, >>>>> DirtyBitmapMigBitmapState *dbms) >>>>> +{ >>>>> + uint32_t nr_sectors; >>>>> + >>>>> + while (dbms->cur_dirty < dbms->total_sectors && >>>>> + !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) { >>>>> + dbms->cur_dirty +=3D dbms->sectors_per_chunk; >>>>> + } >>>> >>>> OK, so we fast forward the dirty cursor while the meta-bitmap is >>>> empty. Is it not worth using the HBitmapIterator here? You can reset >>>> them everywhere you reset the dirty cursor, and then just fast-seek = to >>>> the first dirty sector. >>> Yes, I've thought about it, just used simpler way (copied from >>> migration/block.c) for an early version of the patch set. I will do i= t. >> >> Only if it doesn't make things more complicated to look at. >> >>>> >>>>> + >>>>> + if (dbms->cur_dirty >=3D dbms->total_sectors) { >>>>> + return; >>>>> + } >>>>> + >>>>> + nr_sectors =3D MIN(dbms->total_sectors - dbms->cur_dirty, >>>>> + dbms->sectors_per_chunk); >>>> >>>> What happens if nr_sectors goes past the end? >> >> Again, I misread. >> >>>> >>>>> + send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors); >>>>> + hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, >>>>> dbms->sectors_per_chunk); >>>>> + dbms->cur_dirty +=3D nr_sectors; >>>>> +} >>>>> + >>>>> +/* Called with iothread lock taken. >>>>> + * >>>>> + * return value: >>>>> + * 0: too much data for max_downtime >>>>> + * 1: few enough data for max_downtime >>>>> +*/ >>>> >>>> dirty_phase below doesn't have a return value. >>> rudimentary comment.. thanks. >>>> >>>>> +static void dirty_phase(QEMUFile *f, bool limit) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + while (dbms->cur_dirty < dbms->total_sectors) { >>>>> + dirty_phase_send_chunk(f, dbms); >>>>> + if (limit && qemu_file_rate_limit(f)) { >>>>> + return; >>>>> + } >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> + >>>>> +/* Called with iothread lock taken. */ >>>>> +static void dirty_bitmap_mig_cleanup(void) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + >>>>> + unset_dirty_tracking(); >>>>> + >>>>> + while ((dbms =3D >>>>> QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) !=3D NULL) { >>>>> + QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry); >>>>> + g_free(dbms); >>>>> + } >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +static void dirty_bitmap_migration_cancel(void *opaque) >>>>> +{ >>>>> + dirty_bitmap_mig_cleanup(); >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque) >>>>> +{ >>>>> + DPRINTF("Enter save live iterate\n"); >>>>> + >>>>> + blk_mig_reset_dirty_cursor(); >>>> >>>> I suppose this is because it's easier to check if we are finished by >>>> starting from sector 0 every time. >>>> >>>> A harder, but faster method may be: Use HBitmapIterators, but don't >>>> reset them every iteration: just iterate until the end, and check th= at >>>> the bitmap is empty. If the meta bitmap is empty, the dirty phase is >>>> complete. If the meta bitmap is NOT empty, reset the HBI and continu= e >>>> allowing iterations over the dirty phase. >>> Ok, will do. >>>> >>>>> + >>>>> + if (dirty_bitmap_mig_state.bulk_completed) { >>>>> + qemu_mutex_lock_iothread(); >>>>> + dirty_phase(f, true); >>>>> + qemu_mutex_unlock_iothread(); >>>>> + } else { >>>>> + bulk_phase(f, true); >>>>> + } >>>>> + >>>>> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS); >>>>> + >>>>> + return dirty_bitmap_mig_state.bulk_completed; >>>>> +} >>>>> + >>>>> +/* Called with iothread lock taken. */ >>>>> + >>>>> +static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + DPRINTF("Enter save live complete\n"); >>>>> + >>>>> + if (!dirty_bitmap_mig_state.bulk_completed) { >>>>> + bulk_phase(f, false); >>>>> + } >>>> >>>> [Not expertly familiar with savevm:] Under what conditions can this >>>> happen? >>> This can happen. save_complete will happen when savevm decide that >>> pending data size to send is small enough. It was the case for my bug= fix >>> for migration/block.c about pending. To prevent save_complete when bu= lk >>> phase isn't completed, save_pending returns (in my bugfix for >>> migration/block.c) big value. Here I decided to make more honest >>> save_pending, so I need to complete (if it doesn't) bulk phase in >>> save_complete. >> >> OK, Gotcha. >> >>>> >>>>> + >>>>> + blk_mig_reset_dirty_cursor(); >>>>> + dirty_phase(f, false); >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + uint8_t flags =3D DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME | >>>>> + DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME | >>>>> + DIRTY_BITMAP_MIG_FLAG_ENABLED; >>>>> + >>>>> + qemu_put_byte(f, flags); >>>>> + qemu_put_name(f, bdrv_get_device_name(dbms->bs)); >>>>> + qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap)); >>>>> + qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap)); >>>>> + } >>>>> + >>>>> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS); >>>>> + >>>>> + DPRINTF("Dirty bitmaps migration completed\n"); >>>>> + >>>>> + dirty_bitmap_mig_cleanup(); >>>>> + return 0; >>>>> +} >>>>> + >>>> >>>> I suppose we don't need a flag that distinctly SAYS this is the end >>>> section, since we can tell by omission of >>>> DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK. >>> Hmm. I think it simplifies the logic (to use EOS after each section). >>> And the same approach is in migration/block.c.. It's a question about >>> which format is better: "Each section for dirty_bitmap_load ends wit= h >>> EOS" or "Each section for dirty_bitmap_load ends with EOS except the >>> last one. The last one may be recognized by absent NORMAL_CHUNK and >>> ZERO_CHUNK" >>>> >>>>> +static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaqu= e, >>>>> + uint64_t max_size) >>>>> +{ >>>>> + DirtyBitmapMigBitmapState *dbms; >>>>> + uint64_t pending =3D 0; >>>>> + >>>>> + qemu_mutex_lock_iothread(); >>>>> + >>>>> + QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, >>>>> entry) { >>>>> + uint64_t sectors =3D hbitmap_count(dbms->dirty_bitmap); >>>>> + if (!dbms->bulk_completed) { >>>>> + sectors +=3D dbms->total_sectors - dbms->cur_sector; >>>>> + } >>>>> + pending +=3D bdrv_dirty_bitmap_data_size(dbms->bitmap, >>>>> sectors); >>>>> + } >>>>> + >>>>> + qemu_mutex_unlock_iothread(); >>>>> + >>>>> + DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 >>>>> "\n", >>>>> + pending, max_size); >>>>> + return pending; >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +static int dirty_bitmap_load(QEMUFile *f, void *opaque, int >>>>> version_id) >>>>> +{ >>>>> + int flags; >>>>> + >>>>> + static char device_name[256], bitmap_name[256]; >>>>> + static BlockDriverState *bs; >>>>> + static BdrvDirtyBitmap *bitmap; >>>>> + >>>>> + uint8_t *buf; >>>>> + uint64_t first_sector; >>>>> + uint32_t nr_sectors; >>>>> + int ret; >>>>> + >>>>> + DPRINTF("load start\n"); >>>>> + >>>>> + do { >>>>> + flags =3D qemu_get_byte(f); >>>>> + DPRINTF("flags: %x\n", flags); >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) { >>>>> + qemu_get_name(f, device_name); >>>>> + bs =3D bdrv_find(device_name); >>>> >>>> Similar to the above confusion, you may want bdrv_lookup_bs or >>>> similar, since we're going to be looking for BDS nodes instead of >>>> "devices." >>> In this case, should it be changed in migration/block.c too? >> >> [See discussion above!] >> >>>> >>>>> + if (!bs) { >>>>> + fprintf(stderr, "Error: unknown block device '%s'\= n", >>>>> + device_name); >>>>> + return -EINVAL; >>>>> + } >>>>> + } >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) { >>>>> + if (!bs) { >>>>> + fprintf(stderr, "Error: block device name is not >>>>> set\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + qemu_get_name(f, bitmap_name); >>>>> + bitmap =3D bdrv_find_dirty_bitmap(bs, bitmap_name); >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) { >>>>> + /* First chunk from this bitmap */ >>>>> + uint64_t granularity =3D qemu_get_be64(f); >>>>> + if (!bitmap) { >>>>> + Error *local_err =3D NULL; >>>>> + bitmap =3D bdrv_create_dirty_bitmap(bs, >>>>> granularity, >>>>> + bitmap_name, >>>>> + &local_err); >>>>> + if (!bitmap) { >>>>> + error_report("%s", >>>>> error_get_pretty(local_err)); >>>>> + error_free(local_err); >>>>> + return -EINVAL; >>>>> + } >>>>> + } else { >>>>> + uint64_t dest_granularity =3D >>>>> + bdrv_dirty_bitmap_granularity(bs, bitmap); >>>>> + if (dest_granularity !=3D granularity) { >>>>> + fprintf(stderr, >>>>> + "Error: " >>>>> + "Migrated bitmap granularity (%" >>>>> PRIu64 ") " >>>>> + "is not match with destination >>>>> bitmap '%s' " >>>>> + "granularity (%" PRIu64 ")\n", >>>>> + granularity, >>>>> + bitmap_name, >>>>> + dest_granularity); >>>>> + return -EINVAL; >>>>> + } >>>>> + } >>>>> + bdrv_disable_dirty_bitmap(bitmap); >>>>> + } >>>>> + if (!bitmap) { >>>>> + fprintf(stderr, "Error: unknown dirty bitmap " >>>>> + "'%s' for block device '%s'\n", >>>>> + bitmap_name, device_name); >>>>> + return -EINVAL; >>>>> + } >>>>> + } >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) { >>>>> + bool enabled; >>>>> + if (!bitmap) { >>>>> + fprintf(stderr, "Error: dirty bitmap name is not >>>>> set\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + bdrv_dirty_bitmap_deserialize_finish(bitmap); >>>>> + /* complete migration */ >>>>> + enabled =3D qemu_get_byte(f); >>>>> + if (enabled) { >>>>> + bdrv_enable_dirty_bitmap(bitmap); >>>>> + } >>>>> + } >>>> >>>> Oh, so you use the ENABLED flag to show that migration is over. >>> Yes, it was bad idea.. >>>> If we are going to commit to a stream format for bitmaps, though, >>>> maybe it's best to actually create a "COMPLETION BLOCK" flag and the= n >>>> split this function into two pieces: >>>> >>>> (1) The part that receives regular / zero blocks, and >>>> (2) The part that receives completion data. >>>> >>>> That way, if we change the properties that bitmaps have down the lin= e, >>>> we aren't reliant on literally the "enabled" flag to decide what to = do. >>>> >>>> Also, it might help make this fairly long function a little smaller >>>> and more readable. >>> Ok. >>>> >>>>> + >>>>> + if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK | >>>>> + DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) { >>>>> + if (!bs) { >>>>> + fprintf(stderr, "Error: block device name is not >>>>> set\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + if (!bitmap) { >>>>> + fprintf(stderr, "Error: dirty bitmap name is not >>>>> set\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + first_sector =3D qemu_get_be64(f); >>>>> + nr_sectors =3D qemu_get_be32(f); >>>>> + DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors); >>>>> + >>>>> + >>>>> + if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) { >>>>> + bdrv_dirty_bitmap_deserialize_part0(bitmap, >>>>> first_sector, >>>>> + nr_sectors); >>>>> + } else { >>>>> + uint64_t buf_size =3D qemu_get_be64(f); >>>>> + uint64_t needed_size =3D >>>>> + bdrv_dirty_bitmap_data_size(bitmap, nr_sectors= ); >>>>> + >>>>> + if (needed_size > buf_size) { >>>>> + fprintf(stderr, >>>>> + "Error: Migrated bitmap granularity is >>>>> not " >>>>> + "match with destination bitmap >>>>> granularity\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>> >>>> "Migrated bitmap granularity doesn't match the destination bitmap >>>> granularity" perhaps. >>>> >>>>> + buf =3D g_malloc(buf_size); >>>>> + qemu_get_buffer(f, buf, buf_size); >>>>> + bdrv_dirty_bitmap_deserialize_part(bitmap, buf, >>>>> + first_sector, >>>>> + nr_sectors); >>>>> + g_free(buf); >>>>> + } >>>>> + } >>>>> + >>>>> + ret =3D qemu_file_get_error(f); >>>>> + if (ret !=3D 0) { >>>>> + return ret; >>>>> + } >>>>> + } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS)); >>>>> + >>>>> + DPRINTF("load finish\n"); >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static void dirty_bitmap_set_params(const MigrationParams *params, >>>>> void *opaque) >>>>> +{ >>>>> + dirty_bitmap_mig_state.migration_enable =3D params->dirty; >>>>> +} >>>>> + >>>> >>>> OK; though I am not immediately aware of what changes need to happen >>>> to accommodate Eric's suggestions. >>> This function will be dropped in v3. >>>> >>>>> +static bool dirty_bitmap_is_active(void *opaque) >>>>> +{ >>>>> + return dirty_bitmap_mig_state.migration_enable =3D=3D 1; >>>>> +} >>>>> + >>>> >>>> OK. >>>> >>>>> +static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) >>>>> +{ >>>>> + init_dirty_bitmap_migration(f); >>>>> + >>>>> + qemu_mutex_lock_iothread(); >>>>> + /* start track dirtyness of dirty bitmaps */ >>>>> + set_dirty_tracking(); >>>>> + qemu_mutex_unlock_iothread(); >>>>> + >>>>> + blk_mig_reset_dirty_cursor(); >>>>> + qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>> >>>> OK; see dirty_bitmap_mig_init below, though. >>>> >>>>> +static SaveVMHandlers savevm_block_handlers =3D { >>>>> + .set_params =3D dirty_bitmap_set_params, >>>>> + .save_live_setup =3D dirty_bitmap_save_setup, >>>>> + .save_live_iterate =3D dirty_bitmap_save_iterate, >>>>> + .save_live_complete =3D dirty_bitmap_save_complete, >>>>> + .save_live_pending =3D dirty_bitmap_save_pending, >>>>> + .load_state =3D dirty_bitmap_load, >>>>> + .cancel =3D dirty_bitmap_migration_cancel, >>>>> + .is_active =3D dirty_bitmap_is_active, >>>>> +}; >>>>> + >>>>> +void dirty_bitmap_mig_init(void) >>>>> +{ >>>>> + QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list); >>>> >>>> Maybe I haven't looked thoroughly enough yet, but it's weird that pa= rt >>>> of the dirty_bitmap_mig_state is initialized here, and the rest of i= t >>>> in init_dirty_bitmap_migration. I'd prefer to keep it all together, = if >>>> possible. >>> dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_I= NIT >>> should be called once. dirty_bitmap_save_setup is called on every >>> migration start, it's like 'reinitialize'. >>>> >>>>> + >>>>> + register_savevm_live(NULL, "dirty-bitmap", 0, 1, >>>>> &savevm_block_handlers, >>>>> + &dirty_bitmap_mig_state); >>>>> +} >>>> >>>> OK. >>>> >>>>> diff --git a/vl.c b/vl.c >>>>> index a824a7d..dee7220 100644 >>>>> --- a/vl.c >>>>> +++ b/vl.c >>>>> @@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp) >>>>> >>>>> blk_mig_init(); >>>>> ram_mig_init(); >>>>> + dirty_bitmap_mig_init(); >>>>> >>>>> /* If the currently selected machine wishes to override the >>>>> units-per-bus >>>>> * property of its default HBA interface type, do so now. */ >>>>> >>>> >>>> Hm, since dirty bitmaps are a sub-component of the block layer, woul= d >>>> it not make sense to put this hook under blk_mig_init, perhaps? >>> IMHO the reason to put it here is to keep all >>> register_savevm_live-entities in one place. >> >> If you still feel that way I won't withhold my R-b, but there are >> already other cases such as ppc_spapr_init which are not in this >> general area of vl.c. >> >> Plus the dozens of devices that use register_savevm as a wrapper to >> register_savevm_live, so maybe consolidating calls to this function >> isn't that important. > Hm, I've missed it, ppc_spapr_init is not here, yes.. Another thing > here: dirty bitmaps migration are separate from blk migration. And it > may be used without blk migration (nbd+mirrow migration may be used).. > Is it ok to connect dirty bitmaps migration to blk_mig_init, which is > located in migration/block.c, which may not be used at all when we > bitmaps are migrated using migration/dirty-bitmap.c? > In other words, yes, dirty bitmaps are a sub-component of the block > layer, but dirty bitmap migration is not a sub-component of blk migrati= on. >> >>>> >>>> >>>> Overall this looks very clean compared to the intermingled format in >>>> V1, and the code is organized pretty well. Just a few minor comments= , >>>> and I'd like to get the opinion of the migration maintainers, but I = am >>>> happy. Sorry it took me so long to review, please feel free to let m= e >>>> know if you disagree with any of my opinions :) >>>> >>>> Thank you, >>>> --John >>> >>> Thank you for reviewing my series) >>> >> >> Yup. Hopefully I didn't miss too much that will irritate the Migration >> overlords. >> >> Once you respin on top of v12, I can run some thorough migration tests >> on it (perhaps over a weekend) and verify that it survives a couple >> hundred migrations without any kind of integrity loss. > I hope I'll do it with all other things in about two days. >> >> This is what makes sense to me right now, anyway. >> >> Do you think you'll be including the bitmap checksum in the >> BlockDirtyInfo command? That'd be convenient for iotests. > Ok, will do. Good idea. Only two points: > 1) Is it ok to include debug info into BlockDirtyInfo? Will users be > happy with it? > 2) When I was debugging my code, the information about dirty-regions wa= s > very useful. Now, all is working and checksums are enough for regressio= n > control. I think leaving some tactical debug prints in the code, disabled, is=20 perfectly fine from my personal standpoint. We're not really done=20 implementing all of these features yet and they may yet be useful. I'd vote for leaving in any non-QMP/QAPI debug information you wish, for=20 now. We just have to be careful about interfaces -- no QMP/HMP commands. As long as it looks reasonably tidy, I don't think it will be a problem. --=20 =97js