From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59836) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aznF7-0006qf-Sd for qemu-devel@nongnu.org; Mon, 09 May 2016 11:37:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aznF3-0008IY-Ji for qemu-devel@nongnu.org; Mon, 09 May 2016 11:37:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42414) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aznF3-0008IQ-BE for qemu-devel@nongnu.org; Mon, 09 May 2016 11:37:05 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8BF15201EA for ; Mon, 9 May 2016 15:37:02 +0000 (UTC) Date: Mon, 9 May 2016 16:36:58 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160509153658.GI2320@work-vm> References: <1462458480-20555-1-git-send-email-berrange@redhat.com> <20160505153945.GC11787@work-vm> <20160509150156.GC14467@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160509150156.GC14467@redhat.com> Subject: Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" Cc: qemu-devel@nongnu.org, Juan Quintela , Amit Shah * Daniel P. Berrange (berrange@redhat.com) wrote: > On Thu, May 05, 2016 at 04:39:45PM +0100, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrange (berrange@redhat.com) wrote: > > > Some interesting things that I have observed with this > > > > - Post-copy, by its very nature, obviously ensured that the migraton would > > > complete. While post-copy was running in pre-copy mode there was a somewhat > > > chaotic small impact on guest CPU performance, causing performance to > > > periodically oscillate between 400ms/GB and 800ms/GB. This is less than > > > the impact at the start of each migration iteration which was 1000ms/GB > > > in this test. There was also a massive penalty at time of switchover from > > > pre to post copy, as to be expected. The migration completed in post-copy > > > phase quite quickly though. For this workload, number of iterations in > > > pre-copy mode before switching to post-copy did not have much impact. I > > > expect a less extreme workload would have shown more interesting results > > > wrt number of iterations of pre-copy: > > > > > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html > > > > Hmm; I hadn't actually expected that much performance difference during the > > precopy phase (it used to in earlier postcopy versions but the later versions > > should have got simpler). The number of iterations wouldn't make that much difference > > for your workload - because you're changing all of memory then we're going to have to > > resend it; if you had a workload where some of the memory was mostly static > > and some was rapidly changing, then one or two passes to transfer the mostly > > static data would show a benefit. > > Ok, so I have repeated the tests with a standard kernel. I also measured > the exact same settings except without post-copy active, and also see > the exact same magnitude of jitter without post-copy. IOW, this is not > the fault of post-copy, its a factor whenever migration is running. OK, good. > What > is most interesting is that I see greater jitter in guest performance, > the higher the overall network transfer bandwidth is. ie with migration > throttled to 100mbs, the jitter is massively smaller than the jitter when > it is allowed to use 10gbs. That doesn't surprise me, for a few reasons; I think there are three main sources of overhead: a) The syncing of the bitmap b) Write faults after a sync when the guest redirties a page c) The CPU overhead of shuffling pages down a socket and checking if they're zero etc With a lower bandwidth connection (a) happens more rarely and (c) is lower. Also, since (a) happens more rarely, and you only fault a page once between sync's, (b) has a lower overhead. > Also, I only see the jitter on my 4 vCPU guest, not the 1 vCPU guest. > > The QEMU process is confined to only run on 4 pCPUs, so I believe the > cause of this jitter is simply a result of the migration thread in QEMU > stealing a little time from the vCPU threads. Oh yes, that's cruel - you need an extra pCPU for migration if you've got a fast network connection because (a) & (c) are quite expensive. > IOW completely expected and there is no penalty of having post-copy > enabled even if you never get beyond the pre-copy stage :-) Great. Dave > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK