From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47270)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1akUXR-0005Ax-OB
	for qemu-devel@nongnu.org; Mon, 28 Mar 2016 06:36:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1akUXM-0000oq-ND
	for qemu-devel@nongnu.org; Mon, 28 Mar 2016 06:36:49 -0400
Received: from mail-am1on0139.outbound.protection.outlook.com
	([157.56.112.139]:47362
	helo=emea01-am1-obe.outbound.protection.outlook.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1akUXM-0000nc-5h
	for qemu-devel@nongnu.org; Mon, 28 Mar 2016 06:36:44 -0400
References: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com>
From: "Denis V. Lunev" <den@openvz.org>
Message-ID: <56F90931.8070006@openvz.org>
Date: Mon, 28 Mar 2016 13:36:33 +0300
MIME-Version: 1.0
In-Reply-To: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages
 released by virtio-balloon driver.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jitendra Kolhe <jitendra.kolhe@hpe.com>, qemu-devel@nongnu.org
Cc: JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, quintela@redhat.com, armbru@redhat.com, lcapitulino@redhat.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, dgilbert@redhat.com, rth@twiddle.net

On 03/28/2016 07:16 AM, Jitendra Kolhe wrote:
> While measuring live migration performance for qemu/kvm guest, it
> was observed that the qemu doesn=E2=80=99t maintain any intelligence for =
the
> guest ram pages which are released by the guest balloon driver and
> treat such pages as any other normal guest ram pages. This has direct
> impact on overall migration time for the guest which has released
> (ballooned out) memory to the host.
>
> In case of large systems, where we can configure large guests with 1TB
> and with considerable amount of memory release by balloon driver to the,
> host the migration time gets worse.
>
> The solution proposed below is local only to qemu (and does not require
> any modification to Linux kernel or any guest driver). We have verified
> the fix for large guests =3D1TB on HPE Superdome X (which can support up
> to 240 cores and 12TB of memory) and in case where 90% of memory is
> released by balloon driver the migration time for an idle guests reduces
> to ~600 sec's from ~1200 sec=E2=80=99s.
>
> Detail: During live migration, as part of 1st iteration in ram_save_itera=
te()
> -> ram_find_and_save_block () will try to migrate ram pages which are
> released by vitrio-balloon driver as part of dynamic memory delete.
> Even though the pages which are returned to the host by virtio-balloon
> driver are zero pages, the migration algorithm will still end up
> scanning the entire page ram_find_and_save_block() -> ram_save_page/
> ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
> also end-up sending some control information over network for these
> page during migration. This adds to total migration time.
>
> The proposed fix, uses the existing bitmap infrastructure to create
> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
> entire guest ram memory till max configured memory. Guest ram pages
> claimed by the virtio-balloon driver will be represented by 1 in the
> bitmap. During live migration, each guest ram page (host VA offset)
> is checked against the virtio-balloon bitmap, if the bit is set the
> corresponding ram page will be excluded from scanning and sending
> control information during migration. The bitmap is also migrated to
> the target as part of every ram_save_iterate loop and after the
> guest is stopped remaining balloon bitmap is migrated as part of
> balloon driver save / load interface.
>
> With the proposed fix, the average migration time for an idle guest
> with 1TB maximum memory and 64vCpus
>   - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
>     down to 128GB (~10% of 1TB).
>   - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
>     down to 896GB (~90% of 1TB),
>   - with no ballooning configured, we don=E2=80=99t expect to see any imp=
act
>     on total migration time.
>
> The optimization gets temporarily disabled, if the balloon operation is
> in progress. Since the optimization skips scanning and migrating control
> information for ballooned out pages, we might skip guest ram pages in
> cases where the guest balloon driver has freed the ram page to the guest
> but not yet informed the host/qemu about the ram page
> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
> might skip migrating ram pages which the guest is using. Since this
> problem is specific to balloon leak, we can restrict balloon operation in
> progress check to only balloon leak operation in progress check.
>
> The optimization also get permanently disabled (for all subsequent
> migrations) in case any of the migration uses postcopy capability. In cas=
e
> of postcopy the balloon bitmap would be required to send after vm_stop,
> which has significant impact on the downtime. Moreover, the applications
> in the guest space won=E2=80=99t be actually faulting on the ram pages wh=
ich are
> already ballooned out, the proposed optimization will not show any
> improvement in migration time during postcopy.
I think that you could start the guest without the knowledge of the
ballooned pages and that would be completely fine as these pages
will not be touched at all by the guest. They will come into play
when the host will deflate balloon a bit. In this case QEMU can safely
give local zeroed page for the guest.

Thus you could send bitmap of ballooned pages when the guest
on dest side will be started and in this case this will not influence
downtime.

Den