From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50128) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akrv0-0003E8-Kt for qemu-devel@nongnu.org; Tue, 29 Mar 2016 07:34:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1akrux-0007WB-CI for qemu-devel@nongnu.org; Tue, 29 Mar 2016 07:34:42 -0400 Received: from g2t4618.austin.hp.com ([15.73.212.83]:40871) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akrux-0007W5-5O for qemu-devel@nongnu.org; Tue, 29 Mar 2016 07:34:39 -0400 References: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com> <56F90931.8070006@openvz.org> From: Jitendra Kolhe Message-ID: <56FA6845.1070202@hpe.com> Date: Tue, 29 Mar 2016 17:04:29 +0530 MIME-Version: 1.0 In-Reply-To: <56F90931.8070006@openvz.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Denis V. Lunev" , qemu-devel@nongnu.org Cc: JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, quintela@redhat.com, armbru@redhat.com, lcapitulino@redhat.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, dgilbert@redhat.com, rth@twiddle.net On 3/28/2016 4:06 PM, Denis V. Lunev wrote: > On 03/28/2016 07:16 AM, Jitendra Kolhe wrote: >> While measuring live migration performance for qemu/kvm guest, it >> was observed that the qemu doesn=E2=80=99t maintain any intelligence f= or the >> guest ram pages which are released by the guest balloon driver and >> treat such pages as any other normal guest ram pages. This has direct >> impact on overall migration time for the guest which has released >> (ballooned out) memory to the host. >> >> In case of large systems, where we can configure large guests with 1TB >> and with considerable amount of memory release by balloon driver to th= e, >> host the migration time gets worse. >> >> The solution proposed below is local only to qemu (and does not requir= e >> any modification to Linux kernel or any guest driver). We have verifie= d >> the fix for large guests =3D1TB on HPE Superdome X (which can support = up >> to 240 cores and 12TB of memory) and in case where 90% of memory is >> released by balloon driver the migration time for an idle guests reduc= es >> to ~600 sec's from ~1200 sec=E2=80=99s. >> >> Detail: During live migration, as part of 1st iteration in ram_save_it= erate() >> -> ram_find_and_save_block () will try to migrate ram pages which are >> released by vitrio-balloon driver as part of dynamic memory delete. >> Even though the pages which are returned to the host by virtio-balloon >> driver are zero pages, the migration algorithm will still end up >> scanning the entire page ram_find_and_save_block() -> ram_save_page/ >> ram_save_compressed_page -> save_zero_page() -> is_zero_range(). We >> also end-up sending some control information over network for these >> page during migration. This adds to total migration time. >> >> The proposed fix, uses the existing bitmap infrastructure to create >> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram >> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents >> entire guest ram memory till max configured memory. Guest ram pages >> claimed by the virtio-balloon driver will be represented by 1 in the >> bitmap. During live migration, each guest ram page (host VA offset) >> is checked against the virtio-balloon bitmap, if the bit is set the >> corresponding ram page will be excluded from scanning and sending >> control information during migration. The bitmap is also migrated to >> the target as part of every ram_save_iterate loop and after the >> guest is stopped remaining balloon bitmap is migrated as part of >> balloon driver save / load interface. >> >> With the proposed fix, the average migration time for an idle guest >> with 1TB maximum memory and 64vCpus >> - reduces from ~1200 secs to ~600 sec, with guest memory ballooned >> down to 128GB (~10% of 1TB). >> - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned >> down to 896GB (~90% of 1TB), >> - with no ballooning configured, we don=E2=80=99t expect to see any = impact >> on total migration time. >> >> The optimization gets temporarily disabled, if the balloon operation i= s >> in progress. Since the optimization skips scanning and migrating contr= ol >> information for ballooned out pages, we might skip guest ram pages in >> cases where the guest balloon driver has freed the ram page to the gue= st >> but not yet informed the host/qemu about the ram page >> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we >> might skip migrating ram pages which the guest is using. Since this >> problem is specific to balloon leak, we can restrict balloon operation= in >> progress check to only balloon leak operation in progress check. >> >> The optimization also get permanently disabled (for all subsequent >> migrations) in case any of the migration uses postcopy capability. In = case >> of postcopy the balloon bitmap would be required to send after vm_stop= , >> which has significant impact on the downtime. Moreover, the applicatio= ns >> in the guest space won=E2=80=99t be actually faulting on the ram pages= which are >> already ballooned out, the proposed optimization will not show any >> improvement in migration time during postcopy. > I think that you could start the guest without the knowledge of the > ballooned pages and that would be completely fine as these pages > will not be touched at all by the guest. They will come into play > when the host will deflate balloon a bit. In this case QEMU can safely > give local zeroed page for the guest. >=20 > Thus you could send bitmap of ballooned pages when the guest > on dest side will be started and in this case this will not influence > downtime. >=20 > Den we too were kind of more inclined to same approach to have absolutely zer= o=20 impact on downtime, but had some concern (below) on how to approach Would it be safe to let the guest start (without balloon bitmap) on dest=20 even when the balloon operation is in progress (especially deflate) durin= g=20 migration? in which case either we would need to merge source and dest balloon bitmaps, that would in-turn require to keep track of offsets updated on the dest side or block the balloon operation on dest till the balloon=20 bitmap is completely merged with the dest. Thanks, - Jitendra