Re: [PATCH] kvm: add halt_poll_ns module parameter

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: riel@redhat.com, mtosatti@redhat.com, rkrcmar@redhat.com,
	jan.kiszka@siemens.com, dmatlack@google.com
Subject: Re: [PATCH] kvm: add halt_poll_ns module parameter
Date: Mon, 09 Feb 2015 22:00:53 +0100	[thread overview]
Message-ID: <54D92005.2060308@de.ibm.com> (raw)
In-Reply-To: <1423226937-11169-1-git-send-email-pbonzini@redhat.com>

Am 06.02.2015 um 13:48 schrieb Paolo Bonzini:
> This patch introduces a new module parameter for the KVM module; when it
> is present, KVM attempts a bit of polling on every HLT before scheduling
> itself out via kvm_vcpu_block.
> 
> This parameter helps a lot for latency-bound workloads---in particular
> I tested it with O_DSYNC writes with a battery-backed disk in the host.
> In this case, writes are fast (because the data doesn't have to go all
> the way to the platters) but they cannot be merged by either the host or
> the guest.  KVM's performance here is usually around 30% of bare metal,
> or 50% if you use cache=directsync or cache=writethrough (these
> parameters avoid that the guest sends pointless flush requests, and
> at the same time they are not slow because of the battery-backed cache).
> The bad performance happens because on every halt the host CPU decides
> to halt itself too.  When the interrupt comes, the vCPU thread is then
> migrated to a new physical CPU, and in general the latency is horrible
> because the vCPU thread has to be scheduled back in.
> 
> With this patch performance reaches 60-65% of bare metal and, more
> important, 99% of what you get if you use idle=poll in the guest.  This
> means that the tunable gets rid of this particular bottleneck, and more
> work can be done to improve performance in the kernel or QEMU.
> 
> Of course there is some price to pay; every time an otherwise idle vCPUs
> is interrupted by an interrupt, it will poll unnecessarily and thus
> impose a little load on the host.  The above results were obtained with
> a mostly random value of the parameter (500000), and the load was around
> 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
> 
> The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
> that can be used to tune the parameter.  It counts how many HLT
> instructions received an interrupt during the polling period; each
> successful poll avoids that Linux schedules the VCPU thread out and back
> in, and may also avoid a likely trip to C1 and back for the physical CPU.
> 
> While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
> Of these halts, almost all are failed polls.  During the benchmark,
> instead, basically all halts end within the polling period, except a more
> or less constant stream of 50 per second coming from vCPUs that are not
> running the benchmark.  The wasted time is thus very low.  Things may
> be slightly different for Windows VMs, which have a ~10 ms timer tick.
> 
> The effect is also visible on Marcelo's recently-introduced latency
> test for the TSC deadline timer.  Though of course a non-RT kernel has
> awful latency bounds, the latency of the timer is around 8000-10000 clock
> cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
> deadline timer, thus, the effect is both a smaller average latency and
> a smaller variance.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

I can confirm that this also helps uperf with a 1/1 byte round trip work load
between guests on s390. And I can confirm the higher CPU load. This is normally
a no-go for the typical s390 users, which utilize their systems as much as 
possible. Your check for single_task_running could actually solve that
problem because on overcommitment this will never switch to polling if the
runqueues get full.
Since this is also runtime configurable and defaults to 0 it should be pretty
painless.

The only question is: is there a sane way of doing autotuning?
Christian

next prev parent reply	other threads:[~2015-02-09 21:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 12:48 [PATCH] kvm: add halt_poll_ns module parameter Paolo Bonzini
2015-02-09  8:22 ` Xiao Guangrong
2015-02-09  9:06   ` Paolo Bonzini
2015-02-09 10:12     ` Xiao Guangrong
2015-02-09 15:21 ` Radim Krčmář
2015-02-09 16:10   ` Paolo Bonzini
2015-02-09 17:28     ` Radim Krčmář
2015-02-09 19:52 ` David Matlack
2015-02-09 21:00 ` Christian Borntraeger [this message]
2015-02-10  7:50   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54D92005.2060308@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=dmatlack@google.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).