From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43003) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WjnOg-0001XN-VW for qemu-devel@nongnu.org; Mon, 12 May 2014 06:23:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WjnOb-0000nJ-Vl for qemu-devel@nongnu.org; Mon, 12 May 2014 06:23:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:19184) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WjnOb-0000nF-Nx for qemu-devel@nongnu.org; Mon, 12 May 2014 06:23:45 -0400 Message-ID: <5370A128.5060109@redhat.com> Date: Mon, 12 May 2014 12:23:36 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1399592721-1082-1-git-send-email-pl@kamp.de> In-Reply-To: <1399592721-1082-1-git-send-email-pl@kamp.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] migration: reintroduce skipped zero pages List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven , qemu-devel@nongnu.org Cc: quintela@redhat.com Il 09/05/2014 01:45, Peter Lieven ha scritto: > commit f1c72795a introduced skipping of all zero pages during > bulk phase of ram migration. In theory this should have worked, > however the underlying assumption that the memory of target VM > is totally empty (zeroed) was wrong. Altough qemu accepts an incoming > migration BIOS, ROMs or tables are set up. If e.g. a ROM differs > between source and target we get memory corruption if a page > is zero at the source and not at the target. Therefore the > original patch was reverted later on. > > This patch now reintroduces the feature to skip zero pages. > However, this time it has to be explicitely turned on through > a migration capability which should only be enabled if both > source and destination support it. > > The feature especially makes sense if you expect a significant portion > of zero pages while bandwidth or disk space is limited. > Because even if a zero page is compressed we still transfer 9 bytes for > each page. What is the actual effect of this? Is the is_zero_range in ram_handle_compressed actually a bottleneck? A positive effect on throughput is unlikely. Multiple compressed pages are sent in a single socket send. If 1% of memory is zero (which seems already a lot to me), you would save 0.002% bandwidth or disk space, and you would waste a lot of time in the new loop of ram_load. Even on a freshly-booted guest, where 99% of the pages are zero, you would get 0.2% savings only. Paolo > Signed-off-by: Peter Lieven > --- > arch_init.c | 44 +++++++++++++++++++++++++++++++++-------- > include/migration/migration.h | 2 +- > migration.c | 9 +++++++++ > qapi-schema.json | 11 ++++++++--- > 4 files changed, 54 insertions(+), 12 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 995f56d..2579302 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -123,7 +123,8 @@ static uint64_t bitmap_sync_count; > #define RAM_SAVE_FLAG_EOS 0x10 > #define RAM_SAVE_FLAG_CONTINUE 0x20 > #define RAM_SAVE_FLAG_XBZRLE 0x40 > -/* 0x80 is reserved in migration.h start with 0x100 next */ > +/* 0x80 is reserved in migration.h */ > +#define RAM_SAVE_FLAG_ZERO_TARGET 0x100 > > static struct defconfig_file { > const char *filename; > @@ -575,8 +576,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) > MemoryRegion *mr; > ram_addr_t current_addr; > > - if (!block) > + if (!block) { > block = QTAILQ_FIRST(&ram_list.blocks); > + } > > while (true) { > mr = block->mr; > @@ -619,11 +621,16 @@ static int ram_save_block(QEMUFile *f, bool last_stage) > } > } > } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { > - acct_info.dup_pages++; > - bytes_sent = save_block_hdr(f, block, offset, cont, > - RAM_SAVE_FLAG_COMPRESS); > - qemu_put_byte(f, 0); > - bytes_sent++; > + if (!ram_bulk_stage || !migrate_skip_zero_pages()) { > + acct_info.dup_pages++; > + bytes_sent = save_block_hdr(f, block, offset, cont, > + RAM_SAVE_FLAG_COMPRESS); > + qemu_put_byte(f, 0); > + bytes_sent++; > + } else { > + acct_info.skipped_pages++; > + bytes_sent = 0; > + } > /* Must let xbzrle know, otherwise a previous (now 0'd) cached > * page would be stale > */ > @@ -752,6 +759,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > { > RAMBlock *block; > int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */ > + uint64_t flags = 0; > > mig_throttle_on = false; > dirty_rate_high_cnt = 0; > @@ -812,7 +820,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > migration_bitmap_sync(); > qemu_mutex_unlock_iothread(); > > - qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); > + if (migrate_skip_zero_pages()) { > + flags |= RAM_SAVE_FLAG_ZERO_TARGET; > + } > + > + qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE | flags); > > QTAILQ_FOREACH(block, &ram_list.blocks, next) { > qemu_put_byte(f, strlen(block->idstr)); > @@ -1082,6 +1094,22 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > goto done; > } > > + /* ensure that the target memory is really zero initialized > + * so we can skip zero pages in the bulk phase. > + * TODO: Ideally this should not be needed since we mmap the > + * target memory, however machine individual code may still > + * load BIOS, ROMs etc. altough we await an incoming migration. > + * (see commit 9ef051e5) */ > + if (flags & RAM_SAVE_FLAG_ZERO_TARGET) { > + ram_addr_t offset = 0; > + void *base = memory_region_get_ram_ptr(block->mr); > + for (; offset < block->length; offset += TARGET_PAGE_SIZE) { > + if (!is_zero_range(base + offset, TARGET_PAGE_SIZE)) { > + memset(base + offset, 0x00, TARGET_PAGE_SIZE); > + } > + } > + } > + > total_ram_bytes -= length; > } > } > diff --git a/include/migration/migration.h b/include/migration/migration.h > index 3cb5ba8..4320f6a 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -144,8 +144,8 @@ void migrate_del_blocker(Error *reason); > > bool migrate_rdma_pin_all(void); > bool migrate_zero_blocks(void); > - > bool migrate_auto_converge(void); > +bool migrate_skip_zero_pages(void); > > int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, > uint8_t *dst, int dlen); > diff --git a/migration.c b/migration.c > index 52cda27..033b958 100644 > --- a/migration.c > +++ b/migration.c > @@ -565,6 +565,15 @@ int migrate_use_xbzrle(void) > return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE]; > } > > +bool migrate_skip_zero_pages(void) > +{ > + MigrationState *s; > + > + s = migrate_get_current(); > + > + return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_ZERO_PAGES]; > +} > + > int64_t migrate_xbzrle_cache_size(void) > { > MigrationState *s; > diff --git a/qapi-schema.json b/qapi-schema.json > index 36cb964..8add600 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -760,19 +760,24 @@ > # mlock()'d on demand or all at once. Refer to docs/rdma.txt for usage. > # Disabled by default. (since 2.0) > # > +# @auto-converge: If enabled, QEMU will automatically throttle down the guest > +# to speed up convergence of RAM migration. (since 1.6) > +# > # @zero-blocks: During storage migration encode blocks of zeroes efficiently. This > # essentially saves 1MB of zeroes per block on the wire. Enabling requires > # source and target VM to support this feature. To enable it is sufficient > # to enable the capability on the source VM. The feature is disabled by > # default. (since 1.6) > # > -# @auto-converge: If enabled, QEMU will automatically throttle down the guest > -# to speed up convergence of RAM migration. (since 1.6) > +# @skip-zero-pages: Skip zero pages during bulk phase of ram migration. Enabling requires > +# source and target VM to support this feature. To enable it is sufficient > +# to enable the capability on the source VM. The feature is disabled by > +# default. (since 2.1) > # > # Since: 1.2 > ## > { 'enum': 'MigrationCapability', > - 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] } > + 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'skip-zero-pages'] } > > ## > # @MigrationCapabilityStatus >