From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50128)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jitendra.kolhe@hpe.com>) id 1akrv0-0003E8-Kt
	for qemu-devel@nongnu.org; Tue, 29 Mar 2016 07:34:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jitendra.kolhe@hpe.com>) id 1akrux-0007WB-CI
	for qemu-devel@nongnu.org; Tue, 29 Mar 2016 07:34:42 -0400
Received: from g2t4618.austin.hp.com ([15.73.212.83]:40871)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jitendra.kolhe@hpe.com>) id 1akrux-0007W5-5O
	for qemu-devel@nongnu.org; Tue, 29 Mar 2016 07:34:39 -0400
References: <1459138565-6244-1-git-send-email-jitendra.kolhe@hpe.com>
	<56F90931.8070006@openvz.org>
From: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-ID: <56FA6845.1070202@hpe.com>
Date: Tue, 29 Mar 2016 17:04:29 +0530
MIME-Version: 1.0
In-Reply-To: <56F90931.8070006@openvz.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages
 released by virtio-balloon driver.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Denis V. Lunev" <den@openvz.org>, qemu-devel@nongnu.org
Cc: JBottomley@Odin.com, ehabkost@redhat.com, crosthwaite.peter@gmail.com, simhan@hpe.com, quintela@redhat.com, armbru@redhat.com, lcapitulino@redhat.com, borntraeger@de.ibm.com, mst@redhat.com, mohan_parthasarathy@hpe.com, stefanha@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, dgilbert@redhat.com, rth@twiddle.net

On 3/28/2016 4:06 PM, Denis V. Lunev wrote:
> On 03/28/2016 07:16 AM, Jitendra Kolhe wrote:
>> While measuring live migration performance for qemu/kvm guest, it
>> was observed that the qemu doesn=E2=80=99t maintain any intelligence f=
or the
>> guest ram pages which are released by the guest balloon driver and
>> treat such pages as any other normal guest ram pages. This has direct
>> impact on overall migration time for the guest which has released
>> (ballooned out) memory to the host.
>>
>> In case of large systems, where we can configure large guests with 1TB
>> and with considerable amount of memory release by balloon driver to th=
e,
>> host the migration time gets worse.
>>
>> The solution proposed below is local only to qemu (and does not requir=
e
>> any modification to Linux kernel or any guest driver). We have verifie=
d
>> the fix for large guests =3D1TB on HPE Superdome X (which can support =
up
>> to 240 cores and 12TB of memory) and in case where 90% of memory is
>> released by balloon driver the migration time for an idle guests reduc=
es
>> to ~600 sec's from ~1200 sec=E2=80=99s.
>>
>> Detail: During live migration, as part of 1st iteration in ram_save_it=
erate()
>> -> ram_find_and_save_block () will try to migrate ram pages which are
>> released by vitrio-balloon driver as part of dynamic memory delete.
>> Even though the pages which are returned to the host by virtio-balloon
>> driver are zero pages, the migration algorithm will still end up
>> scanning the entire page ram_find_and_save_block() -> ram_save_page/
>> ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
>> also end-up sending some control information over network for these
>> page during migration. This adds to total migration time.
>>
>> The proposed fix, uses the existing bitmap infrastructure to create
>> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
>> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
>> entire guest ram memory till max configured memory. Guest ram pages
>> claimed by the virtio-balloon driver will be represented by 1 in the
>> bitmap. During live migration, each guest ram page (host VA offset)
>> is checked against the virtio-balloon bitmap, if the bit is set the
>> corresponding ram page will be excluded from scanning and sending
>> control information during migration. The bitmap is also migrated to
>> the target as part of every ram_save_iterate loop and after the
>> guest is stopped remaining balloon bitmap is migrated as part of
>> balloon driver save / load interface.
>>
>> With the proposed fix, the average migration time for an idle guest
>> with 1TB maximum memory and 64vCpus
>>   - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
>>     down to 128GB (~10% of 1TB).
>>   - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
>>     down to 896GB (~90% of 1TB),
>>   - with no ballooning configured, we don=E2=80=99t expect to see any =
impact
>>     on total migration time.
>>
>> The optimization gets temporarily disabled, if the balloon operation i=
s
>> in progress. Since the optimization skips scanning and migrating contr=
ol
>> information for ballooned out pages, we might skip guest ram pages in
>> cases where the guest balloon driver has freed the ram page to the gue=
st
>> but not yet informed the host/qemu about the ram page
>> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
>> might skip migrating ram pages which the guest is using. Since this
>> problem is specific to balloon leak, we can restrict balloon operation=
 in
>> progress check to only balloon leak operation in progress check.
>>
>> The optimization also get permanently disabled (for all subsequent
>> migrations) in case any of the migration uses postcopy capability. In =
case
>> of postcopy the balloon bitmap would be required to send after vm_stop=
,
>> which has significant impact on the downtime. Moreover, the applicatio=
ns
>> in the guest space won=E2=80=99t be actually faulting on the ram pages=
 which are
>> already ballooned out, the proposed optimization will not show any
>> improvement in migration time during postcopy.
> I think that you could start the guest without the knowledge of the
> ballooned pages and that would be completely fine as these pages
> will not be touched at all by the guest. They will come into play
> when the host will deflate balloon a bit. In this case QEMU can safely
> give local zeroed page for the guest.
>=20
> Thus you could send bitmap of ballooned pages when the guest
> on dest side will be started and in this case this will not influence
> downtime.
>=20
> Den

we too were kind of more inclined to same approach to have absolutely zer=
o=20
impact on downtime, but had some concern (below) on how to approach

Would it be safe to let the guest start (without balloon bitmap) on dest=20
even when the balloon operation is in progress (especially deflate) durin=
g=20
migration?
in which case either we would need to merge source and dest balloon
bitmaps, that would in-turn require to keep track of offsets updated on
the dest side or block the balloon operation on dest till the balloon=20
bitmap is completely merged with the dest.

Thanks,
- Jitendra