From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58626) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alfdU-0007e2-MF for qemu-devel@nongnu.org; Thu, 31 Mar 2016 12:39:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1alfdQ-0001rD-0G for qemu-devel@nongnu.org; Thu, 31 Mar 2016 12:39:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41449) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alfdP-0001r6-Mi for qemu-devel@nongnu.org; Thu, 31 Mar 2016 12:39:51 -0400 Date: Thu, 31 Mar 2016 17:39:43 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160331163941.GL2265@work-vm> References: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jitendra Kolhe Cc: liang.z.li@intel.com, JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, armbru@redhat.com, quintela@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, den@openvz.org, amit.shah@redhat.com, pbonzini@redhat.com, rth@twiddle.net * Jitendra Kolhe (jitendra.kolhe@hpe.com) wrote: > While measuring live migration performance for qemu/kvm guest, it > was observed that the qemu doesn=E2=80=99t maintain any intelligence fo= r the > guest ram pages which are released by the guest balloon driver and > treat such pages as any other normal guest ram pages. This has direct > impact on overall migration time for the guest which has released > (ballooned out) memory to the host. Hi Jitendra, I've read over the patch and I've got a mix of comments; I've not read it in full detail: a) It does need splitting up; it's a bit big to review in one go; I suggest you split it into the main code, and separately the bitma= p save/load. It might be worth splitting it up even more. b) in balloon_bitmap_load check the next and len fields; since it's re= ad over the wire we've got to treat them as hostile; so check they don= 't run over the length of the bitmap. c) The bitmap_load/save needs to be tied to the machine type or someth= ing that means that if you were migraitng in a stream from an older qem= u it wouldn't get upset when it tried to read the extra data you read= . I prefer if it's tied to either the config setting or the new machi= ne type (that way backwards migration works as well). d) I agree with the other comments tha the stuff looking up the ram bl= ocks addressing looks wrong; you use last_ram_offset() to size the bitm= ap, so it makes me think it's the whole of the ram_addr_t; but I think = you're saying you're not interested in all of it. However remember that the order of ram_addr_t is not stable between two qemu's - even som= ething like hotplugging a ethercard in one qemu vs having it on the comman= d line on the other can change that order; so anything going over the wire ha= s to be block+offset-into-block. Also remember that last_ram_offset()= includes annoying things like firmware RAM, video memory and all those thing= s; e) It should be possible to get it to work for postcopy if you just us= e it as a way to speed up the zero detection but still send the zero = page messages. Dave >=20 > In case of large systems, where we can configure large guests with 1TB > and with considerable amount of memory release by balloon driver to the= , > host the migration time gets worse. >=20 > The solution proposed below is local only to qemu (and does not require > any modification to Linux kernel or any guest driver). We have verified > the fix for large guests =3D1TB on HPE Superdome X (which can support u= p > to 240 cores and 12TB of memory) and in case where 90% of memory is > released by balloon driver the migration time for an idle guests reduce= s > to ~600 sec's from ~1200 sec=E2=80=99s. >=20 > Detail: During live migration, as part of 1st iteration in ram_save_ite= rate() > -> ram_find_and_save_block () will try to migrate ram pages which are > released by vitrio-balloon driver as part of dynamic memory delete. > Even though the pages which are returned to the host by virtio-balloon > driver are zero pages, the migration algorithm will still end up > scanning the entire page ram_find_and_save_block() -> ram_save_page/ > ram_save_compressed_page -> save_zero_page() -> is_zero_range(). We > also end-up sending some control information over network for these > page during migration. This adds to total migration time. >=20 > The proposed fix, uses the existing bitmap infrastructure to create > a virtio-balloon bitmap. The bits in the bitmap represent a guest ram > page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents > entire guest ram memory till max configured memory. Guest ram pages > claimed by the virtio-balloon driver will be represented by 1 in the > bitmap. During live migration, each guest ram page (host VA offset) > is checked against the virtio-balloon bitmap, if the bit is set the > corresponding ram page will be excluded from scanning and sending > control information during migration. The bitmap is also migrated to > the target as part of every ram_save_iterate loop and after the > guest is stopped remaining balloon bitmap is migrated as part of > balloon driver save / load interface. >=20 > With the proposed fix, the average migration time for an idle guest > with 1TB maximum memory and 64vCpus > - reduces from ~1200 secs to ~600 sec, with guest memory ballooned > down to 128GB (~10% of 1TB). > - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned > down to 896GB (~90% of 1TB), > - with no ballooning configured, we don=E2=80=99t expect to see any im= pact > on total migration time. >=20 > The optimization gets temporarily disabled, if the balloon operation is > in progress. Since the optimization skips scanning and migrating contro= l > information for ballooned out pages, we might skip guest ram pages in > cases where the guest balloon driver has freed the ram page to the gues= t > but not yet informed the host/qemu about the ram page > (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we > might skip migrating ram pages which the guest is using. Since this > problem is specific to balloon leak, we can restrict balloon operation = in > progress check to only balloon leak operation in progress check. >=20 > The optimization also get permanently disabled (for all subsequent > migrations) in case any of the migration uses postcopy capability. In c= ase > of postcopy the balloon bitmap would be required to send after vm_stop, > which has significant impact on the downtime. Moreover, the application= s > in the guest space won=E2=80=99t be actually faulting on the ram pages = which are > already ballooned out, the proposed optimization will not show any > improvement in migration time during postcopy. >=20 > Signed-off-by: Jitendra Kolhe > --- > Changed in v2: > - Resolved compilation issue for qemu-user binaries in exec.c > - Localize balloon bitmap test to save_zero_page(). > - Updated version string for newly added migration capability to 2.7. > - Made minor modifications to patch commit text. >=20 > balloon.c | 253 +++++++++++++++++++++++++++++= +++++++- > exec.c | 3 + > hw/virtio/virtio-balloon.c | 35 ++++- > include/hw/virtio/virtio-balloon.h | 1 + > include/migration/migration.h | 1 + > include/sysemu/balloon.h | 15 ++- > migration/migration.c | 9 ++ > migration/ram.c | 31 ++++- > qapi-schema.json | 5 +- > 9 files changed, 341 insertions(+), 12 deletions(-) >=20 > diff --git a/balloon.c b/balloon.c > index f2ef50c..1c2d228 100644 > --- a/balloon.c > +++ b/balloon.c > @@ -33,11 +33,34 @@ > #include "qmp-commands.h" > #include "qapi/qmp/qerror.h" > #include "qapi/qmp/qjson.h" > +#include "exec/ram_addr.h" > +#include "migration/migration.h" > + > +#define BALLOON_BITMAP_DISABLE_FLAG -1UL > + > +typedef enum { > + BALLOON_BITMAP_DISABLE_NONE =3D 1, /* Enabled */ > + BALLOON_BITMAP_DISABLE_CURRENT, > + BALLOON_BITMAP_DISABLE_PERMANENT, > +} BalloonBitmapDisableState; > =20 > static QEMUBalloonEvent *balloon_event_fn; > static QEMUBalloonStatus *balloon_stat_fn; > +static QEMUBalloonInProgress *balloon_in_progress_fn; > static void *balloon_opaque; > static bool balloon_inhibited; > +static unsigned long balloon_bitmap_pages; > +static unsigned int balloon_bitmap_pfn_shift; > +static QemuMutex balloon_bitmap_mutex; > +static bool balloon_bitmap_xfered; > +static unsigned long balloon_min_bitmap_offset; > +static unsigned long balloon_max_bitmap_offset; > +static BalloonBitmapDisableState balloon_bitmap_disable_state; > + > +static struct BitmapRcu { > + struct rcu_head rcu; > + unsigned long *bmap; > +} *balloon_bitmap_rcu; > =20 > bool qemu_balloon_is_inhibited(void) > { > @@ -49,6 +72,21 @@ void qemu_balloon_inhibit(bool state) > balloon_inhibited =3D state; > } > =20 > +void qemu_mutex_lock_balloon_bitmap(void) > +{ > + qemu_mutex_lock(&balloon_bitmap_mutex); > +} > + > +void qemu_mutex_unlock_balloon_bitmap(void) > +{ > + qemu_mutex_unlock(&balloon_bitmap_mutex); > +} > + > +void qemu_balloon_reset_bitmap_data(void) > +{ > + balloon_bitmap_xfered =3D false; > +} > + > static bool have_balloon(Error **errp) > { > if (kvm_enabled() && !kvm_has_sync_mmu()) { > @@ -65,9 +103,12 @@ static bool have_balloon(Error **errp) > } > =20 > int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, > - QEMUBalloonStatus *stat_func, void *opaqu= e) > + QEMUBalloonStatus *stat_func, > + QEMUBalloonInProgress *in_progress_func, > + void *opaque, int pfn_shift) > { > - if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { > + if (balloon_event_fn || balloon_stat_fn || > + balloon_in_progress_fn || balloon_opaque) { > /* We're already registered one balloon handler. How many can > * a guest really have? > */ > @@ -75,17 +116,39 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *eve= nt_func, > } > balloon_event_fn =3D event_func; > balloon_stat_fn =3D stat_func; > + balloon_in_progress_fn =3D in_progress_func; > balloon_opaque =3D opaque; > + > + qemu_mutex_init(&balloon_bitmap_mutex); > + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_NONE; > + balloon_bitmap_pfn_shift =3D pfn_shift; > + balloon_bitmap_pages =3D (last_ram_offset() >> balloon_bitmap_pfn_= shift); > + balloon_bitmap_rcu =3D g_new0(struct BitmapRcu, 1); > + balloon_bitmap_rcu->bmap =3D bitmap_new(balloon_bitmap_pages); > + bitmap_clear(balloon_bitmap_rcu->bmap, 0, balloon_bitmap_pages); > + > return 0; > } > =20 > +static void balloon_bitmap_free(struct BitmapRcu *bmap) > +{ > + g_free(bmap->bmap); > + g_free(bmap); > +} > + > void qemu_remove_balloon_handler(void *opaque) > { > + struct BitmapRcu *bitmap =3D balloon_bitmap_rcu; > if (balloon_opaque !=3D opaque) { > return; > } > + atomic_rcu_set(&balloon_bitmap_rcu, NULL); > + if (bitmap) { > + call_rcu(bitmap, balloon_bitmap_free, rcu); > + } > balloon_event_fn =3D NULL; > balloon_stat_fn =3D NULL; > + balloon_in_progress_fn =3D NULL; > balloon_opaque =3D NULL; > } > =20 > @@ -116,3 +179,189 @@ void qmp_balloon(int64_t target, Error **errp) > trace_balloon_event(balloon_opaque, target); > balloon_event_fn(balloon_opaque, target); > } > + > +/* Handle Ram hotplug case, only called in case old < new */ > +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new) > +{ > + struct BitmapRcu *old_bitmap =3D balloon_bitmap_rcu, *bitmap; > + unsigned long old_offset, new_offset; > + > + if (!balloon_bitmap_rcu) { > + return -1; > + } > + > + old_offset =3D (old >> balloon_bitmap_pfn_shift); > + new_offset =3D (new >> balloon_bitmap_pfn_shift); > + > + bitmap =3D g_new(struct BitmapRcu, 1); > + bitmap->bmap =3D bitmap_new(new_offset); > + > + qemu_mutex_lock_balloon_bitmap(); > + bitmap_clear(bitmap->bmap, 0, > + balloon_bitmap_pages + new_offset - old_offset); > + bitmap_copy(bitmap->bmap, old_bitmap->bmap, old_offset); > + > + atomic_rcu_set(&balloon_bitmap_rcu, bitmap); > + balloon_bitmap_pages +=3D new_offset - old_offset; > + qemu_mutex_unlock_balloon_bitmap(); > + call_rcu(old_bitmap, balloon_bitmap_free, rcu); > + > + return 0; > +} > + > +/* Should be called with balloon bitmap mutex lock held */ > +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate) > +{ > + unsigned long *bitmap; > + unsigned long offset =3D 0; > + > + if (!balloon_bitmap_rcu) { > + return -1; > + } > + offset =3D (addr >> balloon_bitmap_pfn_shift); > + if (balloon_bitmap_xfered) { > + if (offset < balloon_min_bitmap_offset) { > + balloon_min_bitmap_offset =3D offset; > + } > + if (offset > balloon_max_bitmap_offset) { > + balloon_max_bitmap_offset =3D offset; > + } > + } > + > + rcu_read_lock(); > + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; > + if (deflate =3D=3D 0) { > + set_bit(offset, bitmap); > + } else { > + clear_bit(offset, bitmap); > + } > + rcu_read_unlock(); > + return 0; > +} > + > +void qemu_balloon_bitmap_setup(void) > +{ > + if (migrate_postcopy_ram()) { > + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_PERMAN= ENT; > + } else if ((!balloon_bitmap_rcu || !migrate_skip_balloon()) && > + (balloon_bitmap_disable_state !=3D > + BALLOON_BITMAP_DISABLE_PERMANENT)) { > + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_CURREN= T; > + } > +} > + > +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr) > +{ > + unsigned long *bitmap; > + ram_addr_t base; > + unsigned long nr =3D 0; > + int ret =3D 0; > + > + if (balloon_bitmap_disable_state =3D=3D BALLOON_BITMAP_DISABLE_CUR= RENT || > + balloon_bitmap_disable_state =3D=3D BALLOON_BITMAP_DISABLE_PER= MANENT) { > + return 0; > + } > + balloon_in_progress_fn(balloon_opaque, &ret); > + if (ret =3D=3D 1) { > + return 0; > + } > + > + rcu_read_lock(); > + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; > + base =3D rb->offset >> balloon_bitmap_pfn_shift; > + nr =3D base + (addr >> balloon_bitmap_pfn_shift); > + if (test_bit(nr, bitmap)) { > + ret =3D 1; > + } > + rcu_read_unlock(); > + return ret; > +} > + > +int qemu_balloon_bitmap_save(QEMUFile *f) > +{ > + unsigned long *bitmap; > + unsigned long offset =3D 0, next =3D 0, len =3D 0; > + unsigned long tmpoffset =3D 0, tmplimit =3D 0; > + > + if (balloon_bitmap_disable_state =3D=3D BALLOON_BITMAP_DISABLE_PER= MANENT) { > + qemu_put_be64(f, BALLOON_BITMAP_DISABLE_FLAG); > + return 0; > + } > + > + qemu_mutex_lock_balloon_bitmap(); > + if (balloon_bitmap_xfered) { > + tmpoffset =3D balloon_min_bitmap_offset; > + tmplimit =3D balloon_max_bitmap_offset; > + } else { > + balloon_bitmap_xfered =3D true; > + tmpoffset =3D offset; > + tmplimit =3D balloon_bitmap_pages; > + } > + > + balloon_min_bitmap_offset =3D balloon_bitmap_pages; > + balloon_max_bitmap_offset =3D 0; > + > + qemu_put_be64(f, balloon_bitmap_pages); > + qemu_put_be64(f, tmpoffset); > + qemu_put_be64(f, tmplimit); > + rcu_read_lock(); > + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; > + while (tmpoffset < tmplimit) { > + unsigned long next_set_bit, start_set_bit; > + next_set_bit =3D find_next_bit(bitmap, balloon_bitmap_pages, t= mpoffset); > + start_set_bit =3D next_set_bit; > + if (next_set_bit =3D=3D balloon_bitmap_pages) { > + len =3D 0; > + next =3D start_set_bit; > + qemu_put_be64(f, next); > + qemu_put_be64(f, len); > + break; > + } > + next_set_bit =3D find_next_zero_bit(bitmap, > + balloon_bitmap_pages, > + ++next_set_bit); > + len =3D (next_set_bit - start_set_bit); > + next =3D start_set_bit; > + qemu_put_be64(f, next); > + qemu_put_be64(f, len); > + tmpoffset =3D next + len; > + } > + rcu_read_unlock(); > + qemu_mutex_unlock_balloon_bitmap(); > + return 0; > +} > + > +int qemu_balloon_bitmap_load(QEMUFile *f) > +{ > + unsigned long *bitmap; > + unsigned long next =3D 0, len =3D 0; > + unsigned long tmpoffset =3D 0, tmplimit =3D 0; > + > + if (!balloon_bitmap_rcu) { > + return -1; > + } > + > + qemu_mutex_lock_balloon_bitmap(); > + balloon_bitmap_pages =3D qemu_get_be64(f); > + if (balloon_bitmap_pages =3D=3D BALLOON_BITMAP_DISABLE_FLAG) { > + balloon_bitmap_disable_state =3D BALLOON_BITMAP_DISABLE_PERMAN= ENT; > + qemu_mutex_unlock_balloon_bitmap(); > + return 0; > + } > + tmpoffset =3D qemu_get_be64(f); > + tmplimit =3D qemu_get_be64(f); > + rcu_read_lock(); > + bitmap =3D atomic_rcu_read(&balloon_bitmap_rcu)->bmap; > + while (tmpoffset < tmplimit) { > + next =3D qemu_get_be64(f); > + len =3D qemu_get_be64(f); > + if (len =3D=3D 0) { > + break; > + } > + bitmap_set(bitmap, next, len); > + tmpoffset =3D next + len; > + } > + rcu_read_unlock(); > + qemu_mutex_unlock_balloon_bitmap(); > + return 0; > +} > diff --git a/exec.c b/exec.c > index f398d21..7a448e5 100644 > --- a/exec.c > +++ b/exec.c > @@ -43,6 +43,7 @@ > #else /* !CONFIG_USER_ONLY */ > #include "sysemu/xen-mapcache.h" > #include "trace.h" > +#include "sysemu/balloon.h" > #endif > #include "exec/cpu-all.h" > #include "qemu/rcu_queue.h" > @@ -1610,6 +1611,8 @@ static void ram_block_add(RAMBlock *new_block, Er= ror **errp) > if (new_ram_size > old_ram_size) { > migration_bitmap_extend(old_ram_size, new_ram_size); > dirty_memory_extend(old_ram_size, new_ram_size); > + qemu_balloon_bitmap_extend(old_ram_size << TARGET_PAGE_BITS, > + new_ram_size << TARGET_PAGE_BITS); > } > /* Keep the list sorted from biggest to smallest block. Unlike QT= AILQ, > * QLIST (which has an RCU-friendly variant) does not have inserti= on at > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 22ad25c..9f3a4c8 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -27,6 +27,7 @@ > #include "qapi/visitor.h" > #include "qapi-event.h" > #include "trace.h" > +#include "migration/migration.h" > =20 > #if defined(__linux__) > #include > @@ -214,11 +215,13 @@ static void virtio_balloon_handle_output(VirtIODe= vice *vdev, VirtQueue *vq) > VirtQueueElement *elem; > MemoryRegionSection section; > =20 > + qemu_mutex_lock_balloon_bitmap(); > for (;;) { > size_t offset =3D 0; > uint32_t pfn; > elem =3D virtqueue_pop(vq, sizeof(VirtQueueElement)); > if (!elem) { > + qemu_mutex_unlock_balloon_bitmap(); > return; > } > =20 > @@ -242,6 +245,7 @@ static void virtio_balloon_handle_output(VirtIODevi= ce *vdev, VirtQueue *vq) > addr =3D section.offset_within_region; > balloon_page(memory_region_get_ram_ptr(section.mr) + addr, > !!(vq =3D=3D s->dvq)); > + qemu_balloon_bitmap_update(addr, !!(vq =3D=3D s->dvq)); > memory_region_unref(section.mr); > } > =20 > @@ -249,6 +253,7 @@ static void virtio_balloon_handle_output(VirtIODevi= ce *vdev, VirtQueue *vq) > virtio_notify(vdev, vq); > g_free(elem); > } > + qemu_mutex_unlock_balloon_bitmap(); > } > =20 > static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue= *vq) > @@ -303,6 +308,16 @@ out: > } > } > =20 > +static void virtio_balloon_migration_state_changed(Notifier *notifier, > + void *data) > +{ > + MigrationState *mig =3D data; > + > + if (migration_has_failed(mig)) { > + qemu_balloon_reset_bitmap_data(); > + } > +} > + > static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *con= fig_data) > { > VirtIOBalloon *dev =3D VIRTIO_BALLOON(vdev); > @@ -382,6 +397,16 @@ static void virtio_balloon_stat(void *opaque, Ball= oonInfo *info) > VIRTIO_BALLOON_PFN_SHIFT)= ; > } > =20 > +static void virtio_balloon_in_progress(void *opaque, int *status) > +{ > + VirtIOBalloon *dev =3D VIRTIO_BALLOON(opaque); > + if (cpu_to_le32(dev->actual) !=3D cpu_to_le32(dev->num_pages)) { > + *status =3D 1; > + return; > + } > + *status =3D 0; > +} > + > static void virtio_balloon_to_target(void *opaque, ram_addr_t target) > { > VirtIOBalloon *dev =3D VIRTIO_BALLOON(opaque); > @@ -409,6 +434,7 @@ static void virtio_balloon_save_device(VirtIODevice= *vdev, QEMUFile *f) > =20 > qemu_put_be32(f, s->num_pages); > qemu_put_be32(f, s->actual); > + qemu_balloon_bitmap_save(f); > } > =20 > static int virtio_balloon_load(QEMUFile *f, void *opaque, int version_= id) > @@ -426,6 +452,7 @@ static int virtio_balloon_load_device(VirtIODevice = *vdev, QEMUFile *f, > =20 > s->num_pages =3D qemu_get_be32(f); > s->actual =3D qemu_get_be32(f); > + qemu_balloon_bitmap_load(f); > return 0; > } > =20 > @@ -439,7 +466,9 @@ static void virtio_balloon_device_realize(DeviceSta= te *dev, Error **errp) > sizeof(struct virtio_balloon_config)); > =20 > ret =3D qemu_add_balloon_handler(virtio_balloon_to_target, > - virtio_balloon_stat, s); > + virtio_balloon_stat, > + virtio_balloon_in_progress, s, > + VIRTIO_BALLOON_PFN_SHIFT); > =20 > if (ret < 0) { > error_setg(errp, "Only one balloon device is supported"); > @@ -453,6 +482,9 @@ static void virtio_balloon_device_realize(DeviceSta= te *dev, Error **errp) > =20 > reset_stats(s); > =20 > + s->migration_state_notifier.notify =3D virtio_balloon_migration_st= ate_changed; > + add_migration_state_change_notifier(&s->migration_state_notifier); > + > register_savevm(dev, "virtio-balloon", -1, 1, > virtio_balloon_save, virtio_balloon_load, s); > } > @@ -462,6 +494,7 @@ static void virtio_balloon_device_unrealize(DeviceS= tate *dev, Error **errp) > VirtIODevice *vdev =3D VIRTIO_DEVICE(dev); > VirtIOBalloon *s =3D VIRTIO_BALLOON(dev); > =20 > + remove_migration_state_change_notifier(&s->migration_state_notifie= r); > balloon_stats_destroy_timer(s); > qemu_remove_balloon_handler(s); > unregister_savevm(dev, "virtio-balloon", s); > diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/vir= tio-balloon.h > index 35f62ac..1ded5a9 100644 > --- a/include/hw/virtio/virtio-balloon.h > +++ b/include/hw/virtio/virtio-balloon.h > @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon { > int64_t stats_last_update; > int64_t stats_poll_interval; > uint32_t host_features; > + Notifier migration_state_notifier; > } VirtIOBalloon; > =20 > #endif > diff --git a/include/migration/migration.h b/include/migration/migratio= n.h > index ac2c12c..6c1d1af 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -267,6 +267,7 @@ void migrate_del_blocker(Error *reason); > =20 > bool migrate_postcopy_ram(void); > bool migrate_zero_blocks(void); > +bool migrate_skip_balloon(void); > =20 > bool migrate_auto_converge(void); > =20 > diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h > index 3f976b4..5325c38 100644 > --- a/include/sysemu/balloon.h > +++ b/include/sysemu/balloon.h > @@ -15,14 +15,27 @@ > #define _QEMU_BALLOON_H > =20 > #include "qapi-types.h" > +#include "migration/qemu-file.h" > =20 > typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target); > typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info); > +typedef void (QEMUBalloonInProgress) (void *opaque, int *status); > =20 > int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, > - QEMUBalloonStatus *stat_func, void *opaque); > + QEMUBalloonStatus *stat_func, > + QEMUBalloonInProgress *progress_func, > + void *opaque, int pfn_shift); > void qemu_remove_balloon_handler(void *opaque); > bool qemu_balloon_is_inhibited(void); > void qemu_balloon_inhibit(bool state); > +void qemu_mutex_lock_balloon_bitmap(void); > +void qemu_mutex_unlock_balloon_bitmap(void); > +void qemu_balloon_reset_bitmap_data(void); > +void qemu_balloon_bitmap_setup(void); > +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new); > +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate); > +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr); > +int qemu_balloon_bitmap_save(QEMUFile *f); > +int qemu_balloon_bitmap_load(QEMUFile *f); > =20 > #endif > diff --git a/migration/migration.c b/migration/migration.c > index 034a918..cb86307 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1200,6 +1200,15 @@ int migrate_use_xbzrle(void) > return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE]; > } > =20 > +bool migrate_skip_balloon(void) > +{ > + MigrationState *s; > + > + s =3D migrate_get_current(); > + > + return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_BALLOON]; > +} > + > int64_t migrate_xbzrle_cache_size(void) > { > MigrationState *s; > diff --git a/migration/ram.c b/migration/ram.c > index 704f6a9..161ab73 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -40,6 +40,7 @@ > #include "trace.h" > #include "exec/ram_addr.h" > #include "qemu/rcu_queue.h" > +#include "sysemu/balloon.h" > =20 > #ifdef DEBUG_MIGRATION_RAM > #define DPRINTF(fmt, ...) \ > @@ -65,6 +66,7 @@ static uint64_t bitmap_sync_count; > #define RAM_SAVE_FLAG_XBZRLE 0x40 > /* 0x80 is reserved in migration.h start with 0x100 next */ > #define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 > +#define RAM_SAVE_FLAG_BALLOON 0x200 > =20 > static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE]; > =20 > @@ -702,13 +704,17 @@ static int save_zero_page(QEMUFile *f, RAMBlock *= block, ram_addr_t offset, > { > int pages =3D -1; > =20 > - if (is_zero_range(p, TARGET_PAGE_SIZE)) { > - acct_info.dup_pages++; > - *bytes_transferred +=3D save_page_header(f, block, > + if (qemu_balloon_bitmap_test(block, offset) !=3D 1) { > + if (is_zero_range(p, TARGET_PAGE_SIZE)) { > + acct_info.dup_pages++; > + *bytes_transferred +=3D save_page_header(f, block, > offset | RAM_SAVE_FLAG_= COMPRESS); > - qemu_put_byte(f, 0); > - *bytes_transferred +=3D 1; > - pages =3D 1; > + qemu_put_byte(f, 0); > + *bytes_transferred +=3D 1; > + pages =3D 1; > + } > + } else { > + pages =3D 0; > } > =20 > return pages; > @@ -773,7 +779,7 @@ static int ram_save_page(QEMUFile *f, PageSearchSta= tus *pss, > * page would be stale > */ > xbzrle_cache_zero_page(current_addr); > - } else if (!ram_bulk_stage && migrate_use_xbzrle()) { > + } else if (pages !=3D 0 && !ram_bulk_stage && migrate_use_xbzr= le()) { Is this test the right way around - don't you want to try xbzrle if you'v= e NOT sent a page? > pages =3D save_xbzrle_page(f, &p, current_addr, block, > offset, last_stage, bytes_transfe= rred); > if (!last_stage) { > @@ -1355,6 +1361,9 @@ static int ram_find_and_save_block(QEMUFile *f, b= ool last_stage, > } > =20 > if (found) { > + /* skip saving ram host page if the corresponding guest pa= ge > + * is ballooned out > + */ > pages =3D ram_save_host_page(ms, f, &pss, > last_stage, bytes_transferred, > dirty_ram_abs); > @@ -1959,6 +1968,7 @@ static int ram_save_setup(QEMUFile *f, void *opaq= ue) > =20 > rcu_read_unlock(); > =20 > + qemu_balloon_bitmap_setup(); > ram_control_before_iterate(f, RAM_CONTROL_SETUP); > ram_control_after_iterate(f, RAM_CONTROL_SETUP); > =20 > @@ -1984,6 +1994,9 @@ static int ram_save_iterate(QEMUFile *f, void *op= aque) > =20 > ram_control_before_iterate(f, RAM_CONTROL_ROUND); > =20 > + qemu_put_be64(f, RAM_SAVE_FLAG_BALLOON); > + qemu_balloon_bitmap_save(f); > + > t0 =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > i =3D 0; > while ((ret =3D qemu_file_rate_limit(f)) =3D=3D 0) { > @@ -2493,6 +2506,10 @@ static int ram_load(QEMUFile *f, void *opaque, i= nt version_id) > } > break; > =20 > + case RAM_SAVE_FLAG_BALLOON: > + qemu_balloon_bitmap_load(f); > + break; > + > case RAM_SAVE_FLAG_COMPRESS: > ch =3D qemu_get_byte(f); > ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); > diff --git a/qapi-schema.json b/qapi-schema.json > index 7f8d799..38163ca 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -544,11 +544,14 @@ > # been migrated, pulling the remaining pages along as needed.= NOTE: If > # the migration fails during postcopy the VM will fail. (sin= ce 2.6) > # > +# @skip-balloon: Skip scanning ram pages released by virtio-balloon dr= iver. > +# (since 2.7) > +# > # Since: 1.2 > ## > { 'enum': 'MigrationCapability', > 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > - 'compress', 'events', 'postcopy-ram'] } > + 'compress', 'events', 'postcopy-ram', 'skip-balloon'] } > =20 > ## > # @MigrationCapabilityStatus > --=20 > 1.8.3.1 >=20 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK