From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:53453) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ubrxc-0008Op-KR for qemu-devel@nongnu.org; Mon, 13 May 2013 08:34:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ubrxb-0000tv-4T for qemu-devel@nongnu.org; Mon, 13 May 2013 08:34:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:10945) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ubrxa-0000tP-S7 for qemu-devel@nongnu.org; Mon, 13 May 2013 08:34:35 -0400 Date: Mon, 13 May 2013 13:33:27 +0100 From: "Daniel P. Berrange" Message-ID: <20130513123327.GA32268@redhat.com> References: <1368128600-30721-1-git-send-email-chegu_vinod@hp.com> <1368128600-30721-4-git-send-email-chegu_vinod@hp.com> <87y5bnc7a0.fsf@codemonkey.ws> <20130510141759.GO13475@redhat.com> <87li7mzxd6.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87li7mzxd6.fsf@codemonkey.ws> Subject: Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration Reply-To: "Daniel P. Berrange" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: quintela@redhat.com, Chegu Vinod , qemu-devel@nongnu.org, owasserm@redhat.com, pbonzini@redhat.com On Fri, May 10, 2013 at 10:08:05AM -0500, Anthony Liguori wrote: > "Daniel P. Berrange" writes: > > > On Fri, May 10, 2013 at 08:07:51AM -0500, Anthony Liguori wrote: > >> Chegu Vinod writes: > >> > >> > If a user chooses to turn on the auto-converge migration capability > >> > these changes detect the lack of convergence and throttle down the > >> > guest. i.e. force the VCPUs out of the guest for some duration > >> > and let the migration thread catchup and help converge. > >> > > >> > Verified the convergence using the following : > >> > - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy) > >> > - OLTP like workload running on a 80VCPU/512G guest (~80% busy) > >> > > >> > Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and > >> > migrate downtime set to 4seconds). > >> > >> Would it make sense to separate out the "slow the VCPU down" part of > >> this? > >> > >> That would give a management tool more flexibility to create policies > >> around slowing the VCPU down to encourage migration. > >> > >> In fact, I wonder if we need anything in the migration path if we just > >> expose the "slow the VCPU down" bit as a feature. > >> > >> Slow the VCPU down is not quite the same as setting priority of the VCPU > >> thread largely because of the QBL so I recognize the need to have > >> something for this in QEMU. > > > > Rather than the priority, could you perhaps do the VCPU slow down > > using cfs_quota_us + cfs_period_us settings though ? These let you > > place hard caps on schedular time afforded to vCPUs and we can already > > control those via libvirt + cgroups. > > The problem with the bandwidth controller is the same with priorities. > You can end up causing lock holder pre-emption which would negatively > impact migration performance. > > It's far better for QEMU to voluntarily give up some time knowing that > it's not holding the QBL since then migration can continue without > impact. IMHO it'd be nice to get some clear benchmark numbers of just how bug the lock holder pre-emption problem is when using cgroup hard caps, before we invent another mechanism for throttling the CPUs that has to be plumbed into the whole stack. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|