From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60030) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1anNqk-00031z-V8 for qemu-devel@nongnu.org; Tue, 05 Apr 2016 06:04:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1anNqe-0005dp-TE for qemu-devel@nongnu.org; Tue, 05 Apr 2016 06:04:42 -0400 Received: from g1t6223.austin.hp.com ([15.73.96.124]:53774) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1anNqe-0005YI-K3 for qemu-devel@nongnu.org; Tue, 05 Apr 2016 06:04:36 -0400 References: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com> <20160331163941.GL2265@work-vm> From: Jitendra Kolhe Message-ID: <57038DA2.8090608@hpe.com> Date: Tue, 5 Apr 2016 15:34:18 +0530 MIME-Version: 1.0 In-Reply-To: <20160331163941.GL2265@work-vm> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: liang.z.li@intel.com, JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, armbru@redhat.com, quintela@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, den@openvz.org, amit.shah@redhat.com, pbonzini@redhat.com, rth@twiddle.net On 3/31/2016 10:09 PM, Dr. David Alan Gilbert wrote: > * Jitendra Kolhe (jitendra.kolhe@hpe.com) wrote: >> While measuring live migration performance for qemu/kvm guest, it >> was observed that the qemu doesn=92t maintain any intelligence for the >> guest ram pages which are released by the guest balloon driver and >> treat such pages as any other normal guest ram pages. This has direct >> impact on overall migration time for the guest which has released >> (ballooned out) memory to the host. sorry for delayed response. >=20 > Hi Jitendra, > I've read over the patch and I've got a mix of comments; I've not rea= d > it in full detail: >=20 > a) It does need splitting up; it's a bit big to review in one go; > I suggest you split it into the main code, and separately the bit= map > save/load. It might be worth splitting it up even more. Sure will post patch series in next version. >=20 > b) in balloon_bitmap_load check the next and len fields; since it's = read > over the wire we've got to treat them as hostile; so check they d= on't > run over the length of the bitmap. will check for validity for next and len fields, would assert=20 makes sense in case the values are read wrongly form the wire? >=20 > c) The bitmap_load/save needs to be tied to the machine type or some= thing > that means that if you were migraitng in a stream from an older q= emu > it wouldn't get upset when it tried to read the extra data you re= ad. > I prefer if it's tied to either the config setting or the new mac= hine > type (that way backwards migration works as well). Thanks, migration from older qemu to qemu+patch and back is a problem.=20 We already have a configuration option =93skip-balloon=94 to disable the=20 optimization (test against bitmap during migration), but bitmap_load/save= =20 are independent of this option. Will extend the configuration option for=20 bitmap_load/save. >=20 > d) I agree with the other comments tha the stuff looking up the ram = blocks > addressing looks wrong; you use last_ram_offset() to size the bi= tmap, > so it makes me think it's the whole of the ram_addr_t; but I thin= k you're > saying you're not interested in all of it. However remember tha= t > the order of ram_addr_t is not stable between two qemu's - even s= omething > like hotplugging a ethercard in one qemu vs having it on the comm= and line on > the other can change that order; so anything going over the wire = has > to be block+offset-into-block. Also remember that last_ram_offset= () includes > annoying things like firmware RAM, video memory and all those thi= ngs; >=20 Yes, we are using last_ram_offset() to create the bitmap, but the offsett= ing=20 in the bitmap was wrongly done in patch (epically for hotplugged ramblock= ). Will change code to calculate ram_addr_t using qemu_ram_block_from_host()= =20 rather than just using section.offset > e) It should be possible to get it to work for postcopy if you just = use > it as a way to speed up the zero detection but still send the zer= o page > messages. >=20 In case of postcopy, bitmap_save will happen after vm_stop(),=20 this should have direct impact on downtime. Would that be fine?=20 In case of pre-copy the major chunk of the bitmap is copied as=20 part of ram_save_iterate itself. zero page detection can be enabled for postcopy. Can=92t ballooned=20 out zero page messages be skipped in postcopy as it is done for=20 precopy, am I missing something? > Dave >=20 >> >> In case of large systems, where we can configure large guests with 1TB >> and with considerable amount of memory release by balloon driver to th= e, >> host the migration time gets worse. >> >> The solution proposed below is local only to qemu (and does not requir= e >> any modification to Linux kernel or any guest driver). We have verifie= d >> the fix for large guests =3D1TB on HPE Superdome X (which can support = up >> to 240 cores and 12TB of memory) and in case where 90% of memory is >> released by balloon driver the migration time for an idle guests reduc= es >> to ~600 sec's from ~1200 sec=92s. >> >> Detail: During live migration, as part of 1st iteration in ram_save_it= erate() >> -> ram_find_and_save_block () will try to migrate ram pages which are >> released by vitrio-balloon driver as part of dynamic memory delete. >> Even though the pages which are returned to the host by virtio-balloon >> driver are zero pages, the migration algorithm will still end up >> scanning the entire page ram_find_and_save_block() -> ram_save_page/ >> ram_save_compressed_page -> save_zero_page() -> is_zero_range(). We >> also end-up sending some control information over network for these >> page during migration. This adds to total migration time. >> >> The proposed fix, uses the existing bitmap infrastructure to create >> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram >> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents >> entire guest ram memory till max configured memory. Guest ram pages >> claimed by the virtio-balloon driver will be represented by 1 in the >> bitmap. During live migration, each guest ram page (host VA offset) >> is checked against the virtio-balloon bitmap, if the bit is set the >> corresponding ram page will be excluded from scanning and sending >> control information during migration. The bitmap is also migrated to >> the target as part of every ram_save_iterate loop and after the >> guest is stopped remaining balloon bitmap is migrated as part of >> balloon driver save / load interface. >> >> With the proposed fix, the average migration time for an idle guest >> with 1TB maximum memory and 64vCpus >> - reduces from ~1200 secs to ~600 sec, with guest memory ballooned >> down to 128GB (~10% of 1TB). >> - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned >> down to 896GB (~90% of 1TB), >> - with no ballooning configured, we don=92t expect to see any impact >> on total migration time. >> >> The optimization gets temporarily disabled, if the balloon operation i= s >> in progress. Since the optimization skips scanning and migrating contr= ol >> information for ballooned out pages, we might skip guest ram pages in >> cases where the guest balloon driver has freed the ram page to the gue= st >> but not yet informed the host/qemu about the ram page >> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we >> might skip migrating ram pages which the guest is using. Since this >> problem is specific to balloon leak, we can restrict balloon operation= in >> progress check to only balloon leak operation in progress check. >> >> The optimization also get permanently disabled (for all subsequent >> migrations) in case any of the migration uses postcopy capability. In = case >> of postcopy the balloon bitmap would be required to send after vm_stop= , >> which has significant impact on the downtime. Moreover, the applicatio= ns >> in the guest space won=92t be actually faulting on the ram pages which= are >> already ballooned out, the proposed optimization will not show any >> improvement in migration time during postcopy. >> >> Signed-off-by: Jitendra Kolhe >> --- >> Changed in v2: >> - Resolved compilation issue for qemu-user binaries in exec.c >> - Localize balloon bitmap test to save_zero_page(). >> - Updated version string for newly added migration capability to 2.7. >> - Made minor modifications to patch commit text. >> >> balloon.c | 253 ++++++++++++++++++++++++++++= ++++++++- >> exec.c | 3 + >> hw/virtio/virtio-balloon.c | 35 ++++- >> include/hw/virtio/virtio-balloon.h | 1 + >> include/migration/migration.h | 1 + >> include/sysemu/balloon.h | 15 ++- >> migration/migration.c | 9 ++ >> migration/ram.c | 31 ++++- >> qapi-schema.json | 5 +- >> 9 files changed, 341 insertions(+), 12 deletions(-) >> >> diff --git a/balloon.c b/balloon.c >> index f2ef50c..1c2d228 100644 >> --- a/balloon.c >> +++ b/balloon.c >> @@ -33,11 +33,34 @@ >> #include "qmp-commands.h" >> #include "qapi/qmp/qerror.h" >> #include "qapi/qmp/qjson.h" >> +#include "exec/ram_addr.h" >> +#include "migration/migration.h" >> + >> +#define BALLOON_BITMAP_DISABLE_FLAG -1UL >> + >> +typedef enum { >> + BALLOON_BITMAP_DISABLE_NONE =3D 1, /* Enabled */ >> + BALLOON_BITMAP_DISABLE_CURRENT, >> + BALLOON_BITMAP_DISABLE_PERMANENT, >> +} BalloonBitmapDisableState; >> =20 >> static QEMUBalloonEvent *balloon_event_fn; >> static QEMUBalloonStatus *balloon_stat_fn; >> +static QEMUBalloonInProgress *balloon_in_progress_fn; >> static void *balloon_opaque; >> static bool balloon_inhibited; >> +static unsigned long balloon_bitmap_pages; >> +static unsigned int balloon_bitmap_pfn_shift; >> +static QemuMutex balloon_bitmap_mutex; >> +static bool balloon_bitmap_xfered; >> +static unsigned long balloon_min_bitmap_offset; >> +static unsigned long balloon_max_bitmap_offset; >> +static BalloonBitmapDisableState balloon_bitmap_disable_state; >> + >> +static struct BitmapRcu { >> + struct rcu_head rcu; >> + unsigned long *bmap; >> +} *balloon_bitmap_rcu; >> =20 >> bool qemu_balloon_is_inhibited(void) >> { >> @@ -49,6 +72,21 @@ void qemu_balloon_inhibit(bool state) >> balloon_inhibited =3D state; >> } >> =20 >> +void qemu_mutex_lock_balloon_bitmap(void) >> +{ >> + qemu_mutex_lock(&balloon_bitmap_mutex); >> +} >> + >> +void qemu_mutex_unlock_balloon_bitmap(void) >> +{ >> + qemu_mutex_unlock(&balloon_bitmap_mutex); >> +} >> + >> +void qemu_balloon_reset_bitmap_data(void) >> +{ >> + balloon_bitmap_xfered =3D false; >> +} >> + >> static bool have_balloon(Error **errp) >> { >> if (kvm_enabled() && !kvm_has_sync_mmu()) { >> @@ -65,9 +103,12 @@ static bool have_balloon(Error **errp) >> } >> =20 >> int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, >> - QEMUBalloonStatus *stat_func, void *opaq= ue) >> + QEMUBalloonStatus *stat_func, >> + QEMUBalloonInProgress *in_progress_func, >> + void *opaque, int pfn_shift) >> { >> - if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { >> + if (balloon_event_fn || balloon_stat_fn || >> + balloon_in_progress_fn || balloon_opaque) { >> /* We're already registered one balloon handler. How many ca= n >> * a guest really have? >> */ >> @@ -75,17 +116,39 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *ev= ent_func, >> } >> balloon_event_fn =3D event_func; >> balloon_stat_fn =3D stat_func; >> + balloon_in_progress_fn =3D in_progress_func; >> balloon_opaque =3D opaque; >> + >> + qemu_mutex_init(&balloon_bitmap_mutex); >> + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_NONE; >> + balloon_bitmap_pfn_shift =3D pfn_shift; >> + balloon_bitmap_pages =3D (last_ram_offset() >> balloon_bitmap_pfn= _shift); >> + balloon_bitmap_rcu =3D g_new0(struct BitmapRcu, 1); >> + balloon_bitmap_rcu->bmap =3D bitmap_new(balloon_bitmap_pages); >> + bitmap_clear(balloon_bitmap_rcu->bmap, 0, balloon_bitmap_pages); >> + >> return 0; >> } >> =20 >> +static void balloon_bitmap_free(struct BitmapRcu *bmap) >> +{ >> + g_free(bmap->bmap); >> + g_free(bmap); >> +} >> + >> void qemu_remove_balloon_handler(void *opaque) >> { >> + struct BitmapRcu *bitmap =3D balloon_bitmap_rcu; >> if (balloon_opaque !=3D opaque) { >> return; >> } >> + atomic_rcu_set(&balloon_bitmap_rcu, NULL); >> + if (bitmap) { >> + call_rcu(bitmap, balloon_bitmap_free, rcu); >> + } >> balloon_event_fn =3D NULL; >> balloon_stat_fn =3D NULL; >> + balloon_in_progress_fn =3D NULL; >> balloon_opaque =3D NULL; >> } >> =20 >> @@ -116,3 +179,189 @@ void qmp_balloon(int64_t target, Error **errp) >> trace_balloon_event(balloon_opaque, target); >> balloon_event_fn(balloon_opaque, target); >> } >> + >> +/* Handle Ram hotplug case, only called in case old < new */ >> +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new) >> +{ >> + struct BitmapRcu *old_bitmap =3D balloon_bitmap_rcu, *bitmap; >> + unsigned long old_offset, new_offset; >> + >> + if (!balloon_bitmap_rcu) { >> + return -1; >> + } >> + >> + old_offset =3D (old >> balloon_bitmap_pfn_shift); >> + new_offset =3D (new >> balloon_bitmap_pfn_shift); >> + >> + bitmap =3D g_new(struct BitmapRcu, 1); >> + bitmap->bmap =3D bitmap_new(new_offset); >> + >> + qemu_mutex_lock_balloon_bitmap(); >> + bitmap_clear(bitmap->bmap, 0, >> + balloon_bitmap_pages + new_offset - old_offset); >> + bitmap_copy(bitmap->bmap, old_bitmap->bmap, old_offset); >> + >> + atomic_rcu_set(&balloon_bitmap_rcu, bitmap); >> + balloon_bitmap_pages +=3D new_offset - old_offset; >> + qemu_mutex_unlock_balloon_bitmap(); >> + call_rcu(old_bitmap, balloon_bitmap_free, rcu); >> + >> + return 0; >> +} >> + >> +/* Should be called with balloon bitmap mutex lock held */ >> +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate) >> +{ >> + unsigned long *bitmap; >> + unsigned long offset =3D 0; >> + >> + if (!balloon_bitmap_rcu) { >> + return -1; >> + } >> + offset =3D (addr >> balloon_bitmap_pfn_shift); >> + if (balloon_bitmap_xfered) { >> + if (offset < balloon_min_bitmap_offset) { >> + balloon_min_bitmap_offset =3D offset; >> + } >> + if (offset > balloon_max_bitmap_offset) { >> + balloon_max_bitmap_offset =3D offset; >> + } >> + } >> + >> + rcu_read_lock(); >> + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; >> + if (deflate =3D=3D 0) { >> + set_bit(offset, bitmap); >> + } else { >> + clear_bit(offset, bitmap); >> + } >> + rcu_read_unlock(); >> + return 0; >> +} >> + >> +void qemu_balloon_bitmap_setup(void) >> +{ >> + if (migrate_postcopy_ram()) { >> + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_PERMA= NENT; >> + } else if ((!balloon_bitmap_rcu || !migrate_skip_balloon()) && >> + (balloon_bitmap_disable_state !=3D >> + BALLOON_BITMAP_DISABLE_PERMANENT)) { >> + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_CURRE= NT; >> + } >> +} >> + >> +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr) >> +{ >> + unsigned long *bitmap; >> + ram_addr_t base; >> + unsigned long nr =3D 0; >> + int ret =3D 0; >> + >> + if (balloon_bitmap_disable_state =3D=3D BALLOON_BITMAP_DISABLE_CU= RRENT || >> + balloon_bitmap_disable_state =3D=3D BALLOON_BITMAP_DISABLE_PE= RMANENT) { >> + return 0; >> + } >> + balloon_in_progress_fn(balloon_opaque, &ret); >> + if (ret =3D=3D 1) { >> + return 0; >> + } >> + >> + rcu_read_lock(); >> + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; >> + base =3D rb->offset >> balloon_bitmap_pfn_shift; >> + nr =3D base + (addr >> balloon_bitmap_pfn_shift); >> + if (test_bit(nr, bitmap)) { >> + ret =3D 1; >> + } >> + rcu_read_unlock(); >> + return ret; >> +} >> + >> +int qemu_balloon_bitmap_save(QEMUFile *f) >> +{ >> + unsigned long *bitmap; >> + unsigned long offset =3D 0, next =3D 0, len =3D 0; >> + unsigned long tmpoffset =3D 0, tmplimit =3D 0; >> + >> + if (balloon_bitmap_disable_state =3D=3D BALLOON_BITMAP_DISABLE_PE= RMANENT) { >> + qemu_put_be64(f, BALLOON_BITMAP_DISABLE_FLAG); >> + return 0; >> + } >> + >> + qemu_mutex_lock_balloon_bitmap(); >> + if (balloon_bitmap_xfered) { >> + tmpoffset =3D balloon_min_bitmap_offset; >> + tmplimit =3D balloon_max_bitmap_offset; >> + } else { >> + balloon_bitmap_xfered =3D true; >> + tmpoffset =3D offset; >> + tmplimit =3D balloon_bitmap_pages; >> + } >> + >> + balloon_min_bitmap_offset =3D balloon_bitmap_pages; >> + balloon_max_bitmap_offset =3D 0; >> + >> + qemu_put_be64(f, balloon_bitmap_pages); >> + qemu_put_be64(f, tmpoffset); >> + qemu_put_be64(f, tmplimit); >> + rcu_read_lock(); >> + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; >> + while (tmpoffset < tmplimit) { >> + unsigned long next_set_bit, start_set_bit; >> + next_set_bit =3D find_next_bit(bitmap, balloon_bitmap_pages, = tmpoffset); >> + start_set_bit =3D next_set_bit; >> + if (next_set_bit =3D=3D balloon_bitmap_pages) { >> + len =3D 0; >> + next =3D start_set_bit; >> + qemu_put_be64(f, next); >> + qemu_put_be64(f, len); >> + break; >> + } >> + next_set_bit =3D find_next_zero_bit(bitmap, >> + balloon_bitmap_pages, >> + ++next_set_bit); >> + len =3D (next_set_bit - start_set_bit); >> + next =3D start_set_bit; >> + qemu_put_be64(f, next); >> + qemu_put_be64(f, len); >> + tmpoffset =3D next + len; >> + } >> + rcu_read_unlock(); >> + qemu_mutex_unlock_balloon_bitmap(); >> + return 0; >> +} >> + >> +int qemu_balloon_bitmap_load(QEMUFile *f) >> +{ >> + unsigned long *bitmap; >> + unsigned long next =3D 0, len =3D 0; >> + unsigned long tmpoffset =3D 0, tmplimit =3D 0; >> + >> + if (!balloon_bitmap_rcu) { >> + return -1; >> + } >> + >> + qemu_mutex_lock_balloon_bitmap(); >> + balloon_bitmap_pages =3D qemu_get_be64(f); >> + if (balloon_bitmap_pages =3D=3D BALLOON_BITMAP_DISABLE_FLAG) { >> + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_PERMA= NENT; >> + qemu_mutex_unlock_balloon_bitmap(); >> + return 0; >> + } >> + tmpoffset =3D qemu_get_be64(f); >> + tmplimit =3D qemu_get_be64(f); >> + rcu_read_lock(); >> + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; >> + while (tmpoffset < tmplimit) { >> + next =3D qemu_get_be64(f); >> + len =3D qemu_get_be64(f); >> + if (len =3D=3D 0) { >> + break; >> + } >> + bitmap_set(bitmap, next, len); >> + tmpoffset =3D next + len; >> + } >> + rcu_read_unlock(); >> + qemu_mutex_unlock_balloon_bitmap(); >> + return 0; >> +} >> diff --git a/exec.c b/exec.c >> index f398d21..7a448e5 100644 >> --- a/exec.c >> +++ b/exec.c >> @@ -43,6 +43,7 @@ >> #else /* !CONFIG_USER_ONLY */ >> #include "sysemu/xen-mapcache.h" >> #include "trace.h" >> +#include "sysemu/balloon.h" >> #endif >> #include "exec/cpu-all.h" >> #include "qemu/rcu_queue.h" >> @@ -1610,6 +1611,8 @@ static void ram_block_add(RAMBlock *new_block, E= rror **errp) >> if (new_ram_size > old_ram_size) { >> migration_bitmap_extend(old_ram_size, new_ram_size); >> dirty_memory_extend(old_ram_size, new_ram_size); >> + qemu_balloon_bitmap_extend(old_ram_size << TARGET_PAGE_BITS, >> + new_ram_size << TARGET_PAGE_BITS); >> } >> /* Keep the list sorted from biggest to smallest block. Unlike Q= TAILQ, >> * QLIST (which has an RCU-friendly variant) does not have insert= ion at >> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c >> index 22ad25c..9f3a4c8 100644 >> --- a/hw/virtio/virtio-balloon.c >> +++ b/hw/virtio/virtio-balloon.c >> @@ -27,6 +27,7 @@ >> #include "qapi/visitor.h" >> #include "qapi-event.h" >> #include "trace.h" >> +#include "migration/migration.h" >> =20 >> #if defined(__linux__) >> #include >> @@ -214,11 +215,13 @@ static void virtio_balloon_handle_output(VirtIOD= evice *vdev, VirtQueue *vq) >> VirtQueueElement *elem; >> MemoryRegionSection section; >> =20 >> + qemu_mutex_lock_balloon_bitmap(); >> for (;;) { >> size_t offset =3D 0; >> uint32_t pfn; >> elem =3D virtqueue_pop(vq, sizeof(VirtQueueElement)); >> if (!elem) { >> + qemu_mutex_unlock_balloon_bitmap(); >> return; >> } >> =20 >> @@ -242,6 +245,7 @@ static void virtio_balloon_handle_output(VirtIODev= ice *vdev, VirtQueue *vq) >> addr =3D section.offset_within_region; >> balloon_page(memory_region_get_ram_ptr(section.mr) + addr= , >> !!(vq =3D=3D s->dvq)); >> + qemu_balloon_bitmap_update(addr, !!(vq =3D=3D s->dvq)); >> memory_region_unref(section.mr); >> } >> =20 >> @@ -249,6 +253,7 @@ static void virtio_balloon_handle_output(VirtIODev= ice *vdev, VirtQueue *vq) >> virtio_notify(vdev, vq); >> g_free(elem); >> } >> + qemu_mutex_unlock_balloon_bitmap(); >> } >> =20 >> static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueu= e *vq) >> @@ -303,6 +308,16 @@ out: >> } >> } >> =20 >> +static void virtio_balloon_migration_state_changed(Notifier *notifier= , >> + void *data) >> +{ >> + MigrationState *mig =3D data; >> + >> + if (migration_has_failed(mig)) { >> + qemu_balloon_reset_bitmap_data(); >> + } >> +} >> + >> static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *co= nfig_data) >> { >> VirtIOBalloon *dev =3D VIRTIO_BALLOON(vdev); >> @@ -382,6 +397,16 @@ static void virtio_balloon_stat(void *opaque, Bal= loonInfo *info) >> VIRTIO_BALLOON_PFN_SHIFT= ); >> } >> =20 >> +static void virtio_balloon_in_progress(void *opaque, int *status) >> +{ >> + VirtIOBalloon *dev =3D VIRTIO_BALLOON(opaque); >> + if (cpu_to_le32(dev->actual) !=3D cpu_to_le32(dev->num_pages)) { >> + *status =3D 1; >> + return; >> + } >> + *status =3D 0; >> +} >> + >> static void virtio_balloon_to_target(void *opaque, ram_addr_t target) >> { >> VirtIOBalloon *dev =3D VIRTIO_BALLOON(opaque); >> @@ -409,6 +434,7 @@ static void virtio_balloon_save_device(VirtIODevic= e *vdev, QEMUFile *f) >> =20 >> qemu_put_be32(f, s->num_pages); >> qemu_put_be32(f, s->actual); >> + qemu_balloon_bitmap_save(f); >> } >> =20 >> static int virtio_balloon_load(QEMUFile *f, void *opaque, int version= _id) >> @@ -426,6 +452,7 @@ static int virtio_balloon_load_device(VirtIODevice= *vdev, QEMUFile *f, >> =20 >> s->num_pages =3D qemu_get_be32(f); >> s->actual =3D qemu_get_be32(f); >> + qemu_balloon_bitmap_load(f); >> return 0; >> } >> =20 >> @@ -439,7 +466,9 @@ static void virtio_balloon_device_realize(DeviceSt= ate *dev, Error **errp) >> sizeof(struct virtio_balloon_config)); >> =20 >> ret =3D qemu_add_balloon_handler(virtio_balloon_to_target, >> - virtio_balloon_stat, s); >> + virtio_balloon_stat, >> + virtio_balloon_in_progress, s, >> + VIRTIO_BALLOON_PFN_SHIFT); >> =20 >> if (ret < 0) { >> error_setg(errp, "Only one balloon device is supported"); >> @@ -453,6 +482,9 @@ static void virtio_balloon_device_realize(DeviceSt= ate *dev, Error **errp) >> =20 >> reset_stats(s); >> =20 >> + s->migration_state_notifier.notify =3D virtio_balloon_migration_s= tate_changed; >> + add_migration_state_change_notifier(&s->migration_state_notifier)= ; >> + >> register_savevm(dev, "virtio-balloon", -1, 1, >> virtio_balloon_save, virtio_balloon_load, s); >> } >> @@ -462,6 +494,7 @@ static void virtio_balloon_device_unrealize(Device= State *dev, Error **errp) >> VirtIODevice *vdev =3D VIRTIO_DEVICE(dev); >> VirtIOBalloon *s =3D VIRTIO_BALLOON(dev); >> =20 >> + remove_migration_state_change_notifier(&s->migration_state_notifi= er); >> balloon_stats_destroy_timer(s); >> qemu_remove_balloon_handler(s); >> unregister_savevm(dev, "virtio-balloon", s); >> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/vi= rtio-balloon.h >> index 35f62ac..1ded5a9 100644 >> --- a/include/hw/virtio/virtio-balloon.h >> +++ b/include/hw/virtio/virtio-balloon.h >> @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon { >> int64_t stats_last_update; >> int64_t stats_poll_interval; >> uint32_t host_features; >> + Notifier migration_state_notifier; >> } VirtIOBalloon; >> =20 >> #endif >> diff --git a/include/migration/migration.h b/include/migration/migrati= on.h >> index ac2c12c..6c1d1af 100644 >> --- a/include/migration/migration.h >> +++ b/include/migration/migration.h >> @@ -267,6 +267,7 @@ void migrate_del_blocker(Error *reason); >> =20 >> bool migrate_postcopy_ram(void); >> bool migrate_zero_blocks(void); >> +bool migrate_skip_balloon(void); >> =20 >> bool migrate_auto_converge(void); >> =20 >> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h >> index 3f976b4..5325c38 100644 >> --- a/include/sysemu/balloon.h >> +++ b/include/sysemu/balloon.h >> @@ -15,14 +15,27 @@ >> #define _QEMU_BALLOON_H >> =20 >> #include "qapi-types.h" >> +#include "migration/qemu-file.h" >> =20 >> typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target); >> typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info); >> +typedef void (QEMUBalloonInProgress) (void *opaque, int *status); >> =20 >> int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, >> - QEMUBalloonStatus *stat_func, void *opaque); >> + QEMUBalloonStatus *stat_func, >> + QEMUBalloonInProgress *progress_func, >> + void *opaque, int pfn_shift); >> void qemu_remove_balloon_handler(void *opaque); >> bool qemu_balloon_is_inhibited(void); >> void qemu_balloon_inhibit(bool state); >> +void qemu_mutex_lock_balloon_bitmap(void); >> +void qemu_mutex_unlock_balloon_bitmap(void); >> +void qemu_balloon_reset_bitmap_data(void); >> +void qemu_balloon_bitmap_setup(void); >> +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new); >> +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate); >> +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr); >> +int qemu_balloon_bitmap_save(QEMUFile *f); >> +int qemu_balloon_bitmap_load(QEMUFile *f); >> =20 >> #endif >> diff --git a/migration/migration.c b/migration/migration.c >> index 034a918..cb86307 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1200,6 +1200,15 @@ int migrate_use_xbzrle(void) >> return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE]; >> } >> =20 >> +bool migrate_skip_balloon(void) >> +{ >> + MigrationState *s; >> + >> + s =3D migrate_get_current(); >> + >> + return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_BALLOON]= ; >> +} >> + >> int64_t migrate_xbzrle_cache_size(void) >> { >> MigrationState *s; >> diff --git a/migration/ram.c b/migration/ram.c >> index 704f6a9..161ab73 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -40,6 +40,7 @@ >> #include "trace.h" >> #include "exec/ram_addr.h" >> #include "qemu/rcu_queue.h" >> +#include "sysemu/balloon.h" >> =20 >> #ifdef DEBUG_MIGRATION_RAM >> #define DPRINTF(fmt, ...) \ >> @@ -65,6 +66,7 @@ static uint64_t bitmap_sync_count; >> #define RAM_SAVE_FLAG_XBZRLE 0x40 >> /* 0x80 is reserved in migration.h start with 0x100 next */ >> #define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 >> +#define RAM_SAVE_FLAG_BALLOON 0x200 >> =20 >> static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE]; >> =20 >> @@ -702,13 +704,17 @@ static int save_zero_page(QEMUFile *f, RAMBlock = *block, ram_addr_t offset, >> { >> int pages =3D -1; >> =20 >> - if (is_zero_range(p, TARGET_PAGE_SIZE)) { >> - acct_info.dup_pages++; >> - *bytes_transferred +=3D save_page_header(f, block, >> + if (qemu_balloon_bitmap_test(block, offset) !=3D 1) { >> + if (is_zero_range(p, TARGET_PAGE_SIZE)) { >> + acct_info.dup_pages++; >> + *bytes_transferred +=3D save_page_header(f, block, >> offset | RAM_SAVE_FLAG= _COMPRESS); >> - qemu_put_byte(f, 0); >> - *bytes_transferred +=3D 1; >> - pages =3D 1; >> + qemu_put_byte(f, 0); >> + *bytes_transferred +=3D 1; >> + pages =3D 1; >> + } >> + } else { >> + pages =3D 0; >> } >> =20 >> return pages; >> @@ -773,7 +779,7 @@ static int ram_save_page(QEMUFile *f, PageSearchSt= atus *pss, >> * page would be stale >> */ >> xbzrle_cache_zero_page(current_addr); >> - } else if (!ram_bulk_stage && migrate_use_xbzrle()) { >> + } else if (pages !=3D 0 && !ram_bulk_stage && migrate_use_xbz= rle()) { >=20 > Is this test the right way around - don't you want to try xbzrle if you= 've NOT sent > a page? >=20 The page is ballooned out on the source itself, so we need not xbzrle and= send the=20 page. But will change the code to cache_insert() ZERO_TARGET_PAGE, so tha= t even=20 when the page gets ballooned back in on the source (during migration),=20 get_cached_data() would return zero page. Thanks, - Jitendra >> pages =3D save_xbzrle_page(f, &p, current_addr, block, >> offset, last_stage, bytes_transf= erred); >> if (!last_stage) { >> @@ -1355,6 +1361,9 @@ static int ram_find_and_save_block(QEMUFile *f, = bool last_stage, >> } >> =20 >> if (found) { >> + /* skip saving ram host page if the corresponding guest p= age >> + * is ballooned out >> + */ >> pages =3D ram_save_host_page(ms, f, &pss, >> last_stage, bytes_transferred, >> dirty_ram_abs); >> @@ -1959,6 +1968,7 @@ static int ram_save_setup(QEMUFile *f, void *opa= que) >> =20 >> rcu_read_unlock(); >> =20 >> + qemu_balloon_bitmap_setup(); >> ram_control_before_iterate(f, RAM_CONTROL_SETUP); >> ram_control_after_iterate(f, RAM_CONTROL_SETUP); >> =20 >> @@ -1984,6 +1994,9 @@ static int ram_save_iterate(QEMUFile *f, void *o= paque) >> =20 >> ram_control_before_iterate(f, RAM_CONTROL_ROUND); >> =20 >> + qemu_put_be64(f, RAM_SAVE_FLAG_BALLOON); >> + qemu_balloon_bitmap_save(f); >> + >> t0 =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); >> i =3D 0; >> while ((ret =3D qemu_file_rate_limit(f)) =3D=3D 0) { >> @@ -2493,6 +2506,10 @@ static int ram_load(QEMUFile *f, void *opaque, = int version_id) >> } >> break; >> =20 >> + case RAM_SAVE_FLAG_BALLOON: >> + qemu_balloon_bitmap_load(f); >> + break; >> + >> case RAM_SAVE_FLAG_COMPRESS: >> ch =3D qemu_get_byte(f); >> ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); >> diff --git a/qapi-schema.json b/qapi-schema.json >> index 7f8d799..38163ca 100644 >> --- a/qapi-schema.json >> +++ b/qapi-schema.json >> @@ -544,11 +544,14 @@ >> # been migrated, pulling the remaining pages along as needed= . NOTE: If >> # the migration fails during postcopy the VM will fail. (si= nce 2.6) >> # >> +# @skip-balloon: Skip scanning ram pages released by virtio-balloon d= river. >> +# (since 2.7) >> +# >> # Since: 1.2 >> ## >> { 'enum': 'MigrationCapability', >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >> - 'compress', 'events', 'postcopy-ram'] } >> + 'compress', 'events', 'postcopy-ram', 'skip-balloon'] } >> =20 >> ## >> # @MigrationCapabilityStatus >> --=20 >> 1.8.3.1 >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >=20