public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, KVM <kvm@vger.kernel.org>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Jens Freimann <jfrei@linux.vnet.ibm.com>,
	David Hildenbrand <dahi@linux.vnet.ibm.com>,
	Wanpeng Li <kernellwp@gmail.com>,
	David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 1/1] KVM: halt_polling: provide a way to qualify wakeups during poll
Date: Wed, 4 May 2016 09:50:57 +0200	[thread overview]
Message-ID: <5729A9E1.3050706@de.ibm.com> (raw)
In-Reply-To: <20160503150902.GF30059@potion>

On 05/03/2016 05:09 PM, Radim Krčmář wrote:
> 2016-05-03 14:37+0200, Christian Borntraeger:
>> Some wakeups should not be considered a sucessful poll. For example on
>> s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
>> would be considered runnable - letting all vCPUs poll all the time for
>> transactional like workload, even if one vCPU would be enough.
>> This can result in huge CPU usage for large guests.
>> This patch lets architectures provide a way to qualify wakeups if they
>> should be considered a good/bad wakeups in regard to polls.
>>
>> For s390 the implementation will fence of halt polling for anything but
>> known good, single vCPU events. The s390 implementation for floating
>> interrupts does a wakeup for one vCPU, but the interrupt will be delivered
>> by whatever CPU checks first for a pending interrupt. We prefer the
>> woken up CPU by marking the poll of this CPU as "good" poll.
>> This code will also mark several other wakeup reasons like IPI or
>> expired timers as "good". This will of course also mark some events as
>> not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
>> we prefer to not poll, unless we are really sure, though.
>>
>> This patch successfully limits the CPU usage for cases like uperf 1byte
>> transactional ping pong workload or wakeup heavy workload like OLTP
>> while still providing a proper speedup.
>>
>> This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
>> wakeups that are considered not good for polling.
>>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: David Matlack <dmatlack@google.com>
>> Cc: Wanpeng Li <kernellwp@gmail.com>
>> ---
> 
> Thanks for all explanations,
> 
> Acked-by: Radim Krčmář <rkrcmar@redhat.com>
> 


The feedback about the logic triggered some more experiments on my side.
So I was experimenting with some different workloads/heuristics and it
seems that even more aggressive shrinking (basically resetting to 0 as soon
as an invalid poll comes along) does improve the cpu usage even more.

patch on top
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ffe0545..c168662 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2036,12 +2036,13 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 out:
        block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
 
-       if (halt_poll_ns) {
+       if (!vcpu_valid_wakeup(vcpu))
+                shrink_halt_poll_ns(vcpu);
+       else if (halt_poll_ns) {
                if (block_ns <= vcpu->halt_poll_ns)
                        ;
                /* we had a long block, shrink polling */
-               else if (!vcpu_valid_wakeup(vcpu) ||
-                       (vcpu->halt_poll_ns && block_ns > halt_poll_ns))
+               else if (vcpu->halt_poll_ns && block_ns > halt_poll_ns)
                        shrink_halt_poll_ns(vcpu);
                /* we had a short halt and our poll time is too small */
                else if (vcpu->halt_poll_ns < halt_poll_ns &&


the uperf 1byte:1byte workload seems to have all the benefits still.
I have asked the performance folks to test several other workloads if
we loose some of the benefits.
So I will defer this patch until I have a full picture which heuristics
is best. Hopefully I have some answers next week. 

(So the new diff looks like)
@@ -2034,7 +2036,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 out:
        block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
 
-       if (halt_poll_ns) {
+       if (!vcpu_valid_wakeup(vcpu))
+                shrink_halt_poll_ns(vcpu);
+       else if (halt_poll_ns) {
                if (block_ns <= vcpu->halt_poll_ns)
                        ;



  reply	other threads:[~2016-05-04  7:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-03 12:37 [PATCH v2] KVM: halt_polling: provide a way to qualify wakeups during poll Christian Borntraeger
2016-05-03 12:37 ` [PATCH 1/1] " Christian Borntraeger
2016-05-03 12:41   ` David Hildenbrand
2016-05-03 12:56   ` Cornelia Huck
2016-05-03 15:03     ` Radim Krčmář
2016-05-03 18:12       ` Christian Borntraeger
2016-05-04  6:22         ` Cornelia Huck
2016-05-03 15:09   ` Radim Krčmář
2016-05-04  7:50     ` Christian Borntraeger [this message]
2016-05-04  8:05       ` Cornelia Huck
2016-05-13 10:18         ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5729A9E1.3050706@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dahi@linux.vnet.ibm.com \
    --cc=dmatlack@google.com \
    --cc=jfrei@linux.vnet.ibm.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox