From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33716) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aksWS-0007dw-Kx for qemu-devel@nongnu.org; Tue, 29 Mar 2016 08:13:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aksWN-0001XO-Ky for qemu-devel@nongnu.org; Tue, 29 Mar 2016 08:13:24 -0400 Received: from g1t6220.austin.hp.com ([15.73.96.84]:56523) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aksWN-0001XB-Ff for qemu-devel@nongnu.org; Tue, 29 Mar 2016 08:13:19 -0400 References: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com> <56F93B80.8090803@redhat.com> From: Jitendra Kolhe Message-ID: <56FA7155.6050700@hpe.com> Date: Tue, 29 Mar 2016 17:43:09 +0530 MIME-Version: 1.0 In-Reply-To: <56F93B80.8090803@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, quintela@redhat.com, armbru@redhat.com, lcapitulino@redhat.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, den@openvz.org, amit.shah@redhat.com, pbonzini@redhat.com, dgilbert@redhat.com, rth@twiddle.net On 3/28/2016 7:41 PM, Eric Blake wrote: > On 03/27/2016 10:16 PM, Jitendra Kolhe wrote: >> While measuring live migration performance for qemu/kvm guest, it >> was observed that the qemu doesn=E2=80=99t maintain any intelligence f= or the >> guest ram pages which are released by the guest balloon driver and >> treat such pages as any other normal guest ram pages. This has direct >> impact on overall migration time for the guest which has released >> (ballooned out) memory to the host. >> >> In case of large systems, where we can configure large guests with 1TB >> and with considerable amount of memory release by balloon driver to th= e, >> host the migration time gets worse. >=20 > s/the, host/the host,/ >=20 >> >> The optimization gets temporarily disabled, if the balloon operation i= s >=20 > s/disabled,/disabled/ >=20 >> in progress. Since the optimization skips scanning and migrating contr= ol >> information for ballooned out pages, we might skip guest ram pages in >> cases where the guest balloon driver has freed the ram page to the gue= st >> but not yet informed the host/qemu about the ram page >> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we >> might skip migrating ram pages which the guest is using. Since this >> problem is specific to balloon leak, we can restrict balloon operation= in >> progress check to only balloon leak operation in progress check. >> >> The optimization also get permanently disabled (for all subsequent >=20 > s/get/gets/ >=20 >> migrations) in case any of the migration uses postcopy capability. In = case >> of postcopy the balloon bitmap would be required to send after vm_stop= , >> which has significant impact on the downtime. Moreover, the applicatio= ns >> in the guest space won=E2=80=99t be actually faulting on the ram pages= which are >> already ballooned out, the proposed optimization will not show any >> improvement in migration time during postcopy. >> >> Signed-off-by: Jitendra Kolhe >> --- >> Changed in v2: >> - Resolved compilation issue for qemu-user binaries in exec.c >> - Localize balloon bitmap test to save_zero_page(). >> - Updated version string for newly added migration capability to 2.7. >> - Made minor modifications to patch commit text. >=20 > I'll leave the technical review to others. >=20 >> +++ b/qapi-schema.json >> @@ -544,11 +544,14 @@ >> # been migrated, pulling the remaining pages along as needed= . NOTE: If >> # the migration fails during postcopy the VM will fail. (si= nce 2.6) >> # >> +# @skip-balloon: Skip scanning ram pages released by virtio-balloon d= river. >> +# (since 2.7) >> +# >> # Since: 1.2 >> ## >> { 'enum': 'MigrationCapability', >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >> - 'compress', 'events', 'postcopy-ram'] } >> + 'compress', 'events', 'postcopy-ram', 'skip-balloon'] } >=20 > Does this flag make sense to always have enabled (in which case we don'= t > need it as a flag), or are there cases where we'd explicitly want to > disable it? >=20 Yes the flag can be enabled for most of the time, except in cases=20 like migration using postcopy-ram (mutually exclusive) or in cases=20 where the user is confident that the optimization is of no benefit=20 (for e.g. no or very less pct of balloon activity has happened on=20 VM i.e. penalty vs gain). Thanks, - Jitendra