From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49236) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dGRij-0004cZ-Eu for qemu-devel@nongnu.org; Thu, 01 Jun 2017 11:09:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dGRig-0001Aw-9V for qemu-devel@nongnu.org; Thu, 01 Jun 2017 11:09:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34464) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dGRig-0001Ak-1I for qemu-devel@nongnu.org; Thu, 01 Jun 2017 11:09:02 -0400 Date: Thu, 1 Jun 2017 16:08:54 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20170601150854.GD2845@work-vm> References: <1495229390-18909-1-git-send-email-felipe@nutanix.com> <20170601143616.GA2845@work-vm> <9C902EF4-3E10-4326-8D28-412478005F56@nutanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9C902EF4-3E10-4326-8D28-412478005F56@nutanix.com> Subject: Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Felipe Franciosi Cc: "Jason J. Herne" , Paolo Bonzini , Malcolm Crossley , Juan Quintela , "qemu-devel@nongnu.org" * Felipe Franciosi (felipe@nutanix.com) wrote: > > > On 1 Jun 2017, at 15:36, Dr. David Alan Gilbert wrote: > > > > * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: > >> On 05/19/2017 05:29 PM, Felipe Franciosi wrote: > >>> Currently, the throttle_thread_scheduled flag is reset back to 0 before > >>> sleeping (as part of the throttling logic). Given that throttle_timer > >>> (well, any timer) may tick with a slight delay, it so happens that under > >>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may > >>> schedule a further cpu_throttle_thread() work item after the flag reset, > >>> but before the previous sleep completed. This results on the vCPU thread > >>> sleeping continuously for potentially several seconds in a row. > >>> > >>> The chances of that happening can be drastically minimised by resetting > >>> the flag after the sleep. > >>> > >>> Signed-off-by: Felipe Franciosi > >>> Signed-off-by: Malcolm Crossley > >>> --- > >>> cpus.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/cpus.c b/cpus.c > >>> index 516e5cb..f42eebd 100644 > >>> --- a/cpus.c > >>> +++ b/cpus.c > >>> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque) > >>> sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS); > >>> > >>> qemu_mutex_unlock_iothread(); > >>> - atomic_set(&cpu->throttle_thread_scheduled, 0); > >>> g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */ > >>> qemu_mutex_lock_iothread(); > >>> + atomic_set(&cpu->throttle_thread_scheduled, 0); > >>> } > >>> > >>> static void cpu_throttle_timer_tick(void *opaque) > >>> > >> > >> This seems to make sense to me. > >> > >> Acked-by: Jason J. Herne > >> > >> I'm CC'ing Juan, Amit and David as they are all active in the migration area > >> and may have > >> opinions on this. Juan and David were also reviewers for the original > >> series. > > > > The description is interesting and sounds reasonable; it'll be > > interesting to see what difference it makes to the autoconverge > > behaviour for those workloads that need this level of throttle. > > To get some hard data, we wrote a little application that: > 1) spawns multiple threads (one per vCPU) > 2) each thread mmap()s+mlock()s a certain workset (eg. 30GB/#threads for a 32GB VM) > 3) each thread writes a word to the beginning of every page in a tight loop > 4) the parent thread periodically reports the number of dirtied pages > > Even on a dedicated 10G link, that is pretty much guaranteed to require 99% throttle to converge. > > Before the patch, Qemu migrates the VM (depicted above) fairly quickly (~40s) after reaching 99% throttle. The application reported a few seconds at a time with lockups which we initially thought was just that thread not running between Qemu-induced vCPU sleeps (and later attributed it to the reported bug). > > Then we used a 1G link. This time, the migration had to run for a lot longer even at 99%. That made the bug more likely to happen and we observed soft lockups (reported by the guest's kernel on the console) of 70+ seconds. > > Using the patch, and back on a 10G link, the migration completes after a few more iterations than before (took just under 2mins after reaching 99%). If you want further validation of the bug, instrumenting cpus-common.c:process_queued_cpu_work() could be done to show that cpu_throttle_thread() is running back-to-back under these cases. OK, that's reasonable. > In summary we believe this patch is immediately required to prevent the lockups. Yes, agreed. > A more elaborate throttling solution should be considered as future work. Perhaps a per-vCPU timer which throttles more precisely or a new convergence design altogether. Dave > > Thanks, > Felipe > > > > > Dave > > > >> -- > >> -- Jason J. Herne (jjherne@linux.vnet.ibm.com) > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK