From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "ehabkost@redhat.com" <ehabkost@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"mst@redhat.com" <mst@redhat.com>,
"simhan@hpe.com" <simhan@hpe.com>,
"quintela@redhat.com" <quintela@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>,
Amit Shah <amit.shah@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
Date: Mon, 14 Mar 2016 17:03:34 +0000 [thread overview]
Message-ID: <20160314170334.GK2234@work-vm> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0414B118@shsmsx102.ccr.corp.intel.com>
* Li, Liang Z (liang.z.li@intel.com) wrote:
> >
> > Hi,
> > I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> >
> > 1) How many of the free pages do we tell the host about?
> > Your main change is telling the host about all the
> > free pages.
>
> Yes, all the guest's free pages.
>
> > If we tell the host about all the free pages, then we might
> > end up needing to allocate more pages and update the host
> > with pages we now want to use; that would have to wait for the
> > host to acknowledge that use of these pages, since if we don't
> > wait for it then it might have skipped migrating a page we
> > just started using (I don't understand how your series solves that).
> > So the guest probably needs to keep some free pages - how many?
>
> Actually, there is no need to care about whether the free pages will be used by the host.
> We only care about some of the free pages we get reused by the guest, right?
>
> The dirty page logging can be used to solve this, starting the dirty page logging before getting
> the free pages informant from guest. Even some of the free pages are modified by the guest
> during the process of getting the free pages information, these modified pages will be traced
> by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> omit any dirtied pages.
>
> So, guest doesn't need to keep any free pages.
OK, yes, that works; so we do:
* enable dirty logging
* ask guest for free pages
* initialise the migration bitmap as everything-free
* then later we do the normal sync-dirty bitmap stuff and it all just works.
That's nice and simple.
> > 2) Clearing out caches
> > Does it make sense to clean caches? They're apparently useful data
> > so if we clean them it's likely to slow the guest down; I guess
> > they're also likely to be fairly static data - so at least fairly
> > easy to migrate.
> > The answer here partially depends on what you want from your migration;
> > if you're after the fastest possible migration time it might make
> > sense to clean the caches and avoid migrating them; but that might
> > be at the cost of more disruption to the guest - there's a trade off
> > somewhere and it's not clear to me how you set that depending on your
> > guest/network/reqirements.
> >
>
> Yes, clean the caches is an option. Let the users decide using it or not.
>
> > 3) Why is ballooning slow?
> > You've got a figure of 5s to balloon on an 8GB VM - but an
> > 8GB VM isn't huge; so I worry about how long it would take
> > on a big VM. We need to understand why it's slow
> > * is it due to the guest shuffling pages around?
> > * is it due to the virtio-balloon protocol sending one page
> > at a time?
> > + Do balloon pages normally clump in physical memory
> > - i.e. would a 'large balloon' message help
> > - or do we need a bitmap because it tends not to clump?
> >
>
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.
It might be a good idea to analyse it a bit more to convince people where
the problem is.
> > * is it due to the madvise on the host?
> > If we were using the normal balloon messages, then we
> > could, during migration, just route those to the migration
> > code rather than bothering with the madvise.
> > If they're clumping together we could just turn that into
> > one big madvise; if they're not then would we benefit from
> > a call that lets us madvise lots of areas?
> >
>
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total inflating balloon operation time.
> Big madvise can more or less improve the performance.
OK; 10% of the total is still pretty big even for your 8GB VM.
> > 4) Speeding up the migration of those free pages
> > You're using the bitmap to avoid migrating those free pages; HPe's
> > patchset is reconstructing a bitmap from the balloon data; OK, so
> > this all makes sense to avoid migrating them - I'd also been thinking
> > of using pagemap to spot zero pages that would help find other zero'd
> > pages, but perhaps ballooned is enough?
> >
> Could you describe your ideal with more details?
At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that would find all the balloon pages.
> > 5) Second-migrate
> > Given a VM where you've done all those tricks on, what happens when
> > you migrate it a second time? I guess you're aiming for the guest
> > to update it's bitmap; HPe's solution is to migrate it's balloon
> > bitmap along with the migration data.
>
> Nothing is special in the second migration, QEMU will request the guest for free pages
> Information, and the guest will traverse it's current free page list to construct a
> new free page bitmap and send it to QEMU. Just like in the first migration.
Right.
Dave
> Liang
> >
> > Dave
> >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2016-03-14 17:03 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-03 10:44 [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization Liang Li
2016-03-03 10:44 ` [Qemu-devel] [RFC qemu 1/4] pc: Add code to get the lowmem form PCMachineState Liang Li
2016-03-03 10:44 ` [Qemu-devel] [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device Liang Li
2016-03-03 12:23 ` Cornelia Huck
2016-03-04 2:38 ` Li, Liang Z
2016-03-03 12:56 ` Michael S. Tsirkin
2016-03-04 2:29 ` Li, Liang Z
2016-03-03 10:44 ` [Qemu-devel] [RFC qemu 3/4] migration: not set migration bitmap in setup stage Liang Li
2016-03-03 10:44 ` [Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage Liang Li
2016-03-03 12:16 ` Cornelia Huck
2016-03-04 2:32 ` Li, Liang Z
2016-03-03 12:45 ` Daniel P. Berrange
2016-03-04 2:43 ` Li, Liang Z
2016-03-03 13:58 ` [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization Roman Kagan
2016-03-04 1:35 ` Li, Liang Z
2016-03-03 17:46 ` Dr. David Alan Gilbert
2016-03-04 1:52 ` Li, Liang Z
2016-03-04 8:14 ` Roman Kagan
2016-03-04 9:08 ` Li, Liang Z
2016-03-04 10:23 ` Roman Kagan
2016-03-04 14:26 ` Li, Liang Z
2016-03-04 14:45 ` Michael S. Tsirkin
2016-03-04 15:49 ` Li, Liang Z
2016-03-05 19:55 ` Michael S. Tsirkin
2016-03-07 6:49 ` Li, Liang Z
2016-03-07 11:40 ` Michael S. Tsirkin
2016-03-07 15:06 ` Li, Liang Z
2016-03-09 14:28 ` Roman Kagan
2016-03-09 15:27 ` Li, Liang Z
2016-03-09 15:30 ` Michael S. Tsirkin
2016-03-10 1:41 ` Li, Liang Z
2016-03-10 12:29 ` Michael S. Tsirkin
2016-03-09 15:41 ` Michael S. Tsirkin
2016-03-09 17:04 ` Roman Kagan
2016-03-09 17:39 ` Michael S. Tsirkin
2016-03-10 10:21 ` Roman Kagan
2016-03-09 19:38 ` Rik van Riel
2016-03-10 9:30 ` Roman Kagan
2016-03-04 16:24 ` Paolo Bonzini
2016-03-04 18:51 ` Dr. David Alan Gilbert
2016-03-07 5:34 ` Li, Liang Z
2016-03-09 13:22 ` Roman Kagan
2016-03-09 14:19 ` Li, Liang Z
2016-03-09 6:18 ` Li, Liang Z
2016-03-04 7:55 ` Roman Kagan
2016-03-04 8:23 ` Li, Liang Z
2016-03-04 8:35 ` Roman Kagan
2016-03-04 9:08 ` Dr. David Alan Gilbert
2016-03-04 9:12 ` Li, Liang Z
2016-03-04 9:47 ` Michael S. Tsirkin
2016-03-04 10:11 ` Li, Liang Z
2016-03-04 10:36 ` Michael S. Tsirkin
2016-03-04 15:13 ` Li, Liang Z
2016-03-08 14:03 ` Michael S. Tsirkin
2016-03-08 14:17 ` Li, Liang Z
2016-03-04 9:35 ` Roman Kagan
2016-03-08 11:13 ` Amit Shah
2016-03-08 13:11 ` Li, Liang Z
2016-03-10 7:44 ` Li, Liang Z
2016-03-10 7:57 ` Amit Shah
2016-03-10 8:36 ` Li, Liang Z
2016-03-10 11:18 ` Dr. David Alan Gilbert
2016-03-11 2:38 ` Li, Liang Z
2016-03-14 17:03 ` Dr. David Alan Gilbert [this message]
2016-03-15 3:31 ` Li, Liang Z
2016-03-15 10:29 ` Michael S. Tsirkin
2016-03-15 11:11 ` Li, Liang Z
2016-03-15 19:55 ` Dr. David Alan Gilbert
2016-03-16 1:20 ` Li, Liang Z
-- strict thread matches above, loose matches on Subject: below --
2016-03-04 9:32 Jitendra Kolhe
2016-03-04 9:36 ` Li, Liang Z
2016-03-08 11:14 ` Amit Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160314170334.GK2234@work-vm \
--to=dgilbert@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amit.shah@redhat.com \
--cc=ehabkost@redhat.com \
--cc=jitendra.kolhe@hpe.com \
--cc=kvm@vger.kernel.org \
--cc=liang.z.li@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mohan_parthasarathy@hpe.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=rth@twiddle.net \
--cc=simhan@hpe.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).