From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59836)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aznF7-0006qf-Sd
	for qemu-devel@nongnu.org; Mon, 09 May 2016 11:37:11 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aznF3-0008IY-Ji
	for qemu-devel@nongnu.org; Mon, 09 May 2016 11:37:08 -0400
Received: from mx1.redhat.com ([209.132.183.28]:42414)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aznF3-0008IQ-BE
	for qemu-devel@nongnu.org; Mon, 09 May 2016 11:37:05 -0400
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
	(int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 8BF15201EA
	for <qemu-devel@nongnu.org>; Mon,  9 May 2016 15:37:02 +0000 (UTC)
Date: Mon, 9 May 2016 16:36:58 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160509153658.GI2320@work-vm>
References: <1462458480-20555-1-git-send-email-berrange@redhat.com>
	<20160505153945.GC11787@work-vm>
	<20160509150156.GC14467@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160509150156.GC14467@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing
 framework
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>, Amit Shah <amit.shah@redhat.com>

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Thu, May 05, 2016 at 04:39:45PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > Some interesting things that I have observed with this
> 
> > >  - Post-copy, by its very nature, obviously ensured that the migraton would
> > >    complete. While post-copy was running in pre-copy mode there was a somewhat
> > >    chaotic small impact on guest CPU performance, causing performance to
> > >    periodically oscillate between 400ms/GB and 800ms/GB. This is less than
> > >    the impact at the start of each migration iteration which was 1000ms/GB
> > >    in this test. There was also a massive penalty at time of switchover from
> > >    pre to post copy, as to be expected. The migration completed in post-copy
> > >    phase quite quickly though. For this workload, number of iterations in
> > >    pre-copy mode before switching to post-copy did not have much impact. I
> > >    expect a less extreme workload would have shown more interesting results
> > >    wrt number of iterations of pre-copy:
> > > 
> > >     https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html
> > 
> > Hmm; I hadn't actually expected that much performance difference during the
> > precopy phase (it used to in earlier postcopy versions but the later versions
> > should have got simpler).  The number of iterations wouldn't make that much difference
> > for your workload - because you're changing all of memory then we're going to have to
> > resend it; if you had a workload where some of the memory was mostly static
> > and some was rapidly changing, then one or two passes to transfer the mostly
> > static data would show a benefit.
> 
> Ok, so I have repeated the tests with a standard kernel. I also measured
> the exact same settings except without post-copy active, and also see
> the exact same magnitude of jitter without post-copy. IOW, this is not
> the fault of post-copy, its a factor whenever migration is running.

OK, good.

> What
> is most interesting is that I see greater jitter in guest performance,
> the higher the overall network transfer bandwidth is. ie with migration
> throttled to 100mbs, the jitter is massively smaller than the jitter when
> it is allowed to use 10gbs.

That doesn't surprise me, for a few reasons; I think there are three
main sources of overhead:
    a) The syncing of the bitmap
    b) Write faults after a sync when the guest redirties a page
    c) The CPU overhead of shuffling pages down a socket and checking if
       they're zero etc

With a lower bandwidth connection (a) happens more rarely and (c) is lower.
Also, since (a) happens more rarely, and you only fault a page once
between sync's, (b) has a lower overhead.

> Also, I only see the jitter on my 4 vCPU guest, not the 1 vCPU guest.
>
> The QEMU process is confined to only run on 4 pCPUs, so I believe the
> cause of this jitter is simply a result of the migration thread in QEMU
> stealing a little time from the vCPU threads.

Oh yes, that's cruel - you need an extra pCPU for migration if you've
got a fast network connection because (a) & (c) are quite expensive.

> IOW completely expected and there is no penalty of having post-copy
> enabled even if you never get beyond the pre-copy stage :-)

Great.

Dave

> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK