From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Felipe Franciosi <felipe@nutanix.com>
Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Malcolm Crossley <malcolm@nutanix.com>,
Juan Quintela <quintela@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep
Date: Thu, 1 Jun 2017 16:08:54 +0100 [thread overview]
Message-ID: <20170601150854.GD2845@work-vm> (raw)
In-Reply-To: <9C902EF4-3E10-4326-8D28-412478005F56@nutanix.com>
* Felipe Franciosi (felipe@nutanix.com) wrote:
>
> > On 1 Jun 2017, at 15:36, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >
> > * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >> On 05/19/2017 05:29 PM, Felipe Franciosi wrote:
> >>> Currently, the throttle_thread_scheduled flag is reset back to 0 before
> >>> sleeping (as part of the throttling logic). Given that throttle_timer
> >>> (well, any timer) may tick with a slight delay, it so happens that under
> >>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
> >>> schedule a further cpu_throttle_thread() work item after the flag reset,
> >>> but before the previous sleep completed. This results on the vCPU thread
> >>> sleeping continuously for potentially several seconds in a row.
> >>>
> >>> The chances of that happening can be drastically minimised by resetting
> >>> the flag after the sleep.
> >>>
> >>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
> >>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
> >>> ---
> >>> cpus.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/cpus.c b/cpus.c
> >>> index 516e5cb..f42eebd 100644
> >>> --- a/cpus.c
> >>> +++ b/cpus.c
> >>> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
> >>> sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
> >>>
> >>> qemu_mutex_unlock_iothread();
> >>> - atomic_set(&cpu->throttle_thread_scheduled, 0);
> >>> g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
> >>> qemu_mutex_lock_iothread();
> >>> + atomic_set(&cpu->throttle_thread_scheduled, 0);
> >>> }
> >>>
> >>> static void cpu_throttle_timer_tick(void *opaque)
> >>>
> >>
> >> This seems to make sense to me.
> >>
> >> Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >>
> >> I'm CC'ing Juan, Amit and David as they are all active in the migration area
> >> and may have
> >> opinions on this. Juan and David were also reviewers for the original
> >> series.
> >
> > The description is interesting and sounds reasonable; it'll be
> > interesting to see what difference it makes to the autoconverge
> > behaviour for those workloads that need this level of throttle.
>
> To get some hard data, we wrote a little application that:
> 1) spawns multiple threads (one per vCPU)
> 2) each thread mmap()s+mlock()s a certain workset (eg. 30GB/#threads for a 32GB VM)
> 3) each thread writes a word to the beginning of every page in a tight loop
> 4) the parent thread periodically reports the number of dirtied pages
>
> Even on a dedicated 10G link, that is pretty much guaranteed to require 99% throttle to converge.
>
> Before the patch, Qemu migrates the VM (depicted above) fairly quickly (~40s) after reaching 99% throttle. The application reported a few seconds at a time with lockups which we initially thought was just that thread not running between Qemu-induced vCPU sleeps (and later attributed it to the reported bug).
>
> Then we used a 1G link. This time, the migration had to run for a lot longer even at 99%. That made the bug more likely to happen and we observed soft lockups (reported by the guest's kernel on the console) of 70+ seconds.
>
> Using the patch, and back on a 10G link, the migration completes after a few more iterations than before (took just under 2mins after reaching 99%). If you want further validation of the bug, instrumenting cpus-common.c:process_queued_cpu_work() could be done to show that cpu_throttle_thread() is running back-to-back under these cases.
OK, that's reasonable.
> In summary we believe this patch is immediately required to prevent the lockups.
Yes, agreed.
> A more elaborate throttling solution should be considered as future work. Perhaps a per-vCPU timer which throttles more precisely or a new convergence design altogether.
Dave
>
> Thanks,
> Felipe
>
> >
> > Dave
> >
> >> --
> >> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-06-01 15:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-19 21:29 [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep Felipe Franciosi
2017-05-22 13:01 ` Jason J. Herne
2017-06-01 14:36 ` Dr. David Alan Gilbert
2017-06-01 15:02 ` Felipe Franciosi
2017-06-01 15:08 ` Dr. David Alan Gilbert [this message]
2017-05-25 15:52 ` Paolo Bonzini
2017-05-25 16:25 ` Felipe Franciosi
2017-05-25 16:34 ` Paolo Bonzini
2017-06-07 16:26 ` Juan Quintela
2017-06-07 16:58 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170601150854.GD2845@work-vm \
--to=dgilbert@redhat.com \
--cc=felipe@nutanix.com \
--cc=jjherne@linux.vnet.ibm.com \
--cc=malcolm@nutanix.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.