From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Borntraeger Subject: Re: [PATCH 1/1] KVM: halt_polling: provide a way to qualify wakeups during poll Date: Wed, 4 May 2016 09:50:57 +0200 Message-ID: <5729A9E1.3050706@de.ibm.com> References: <1462279041-17028-1-git-send-email-borntraeger@de.ibm.com> <1462279041-17028-2-git-send-email-borntraeger@de.ibm.com> <20160503150902.GF30059@potion> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Paolo Bonzini , KVM , Cornelia Huck , linux-s390 , Jens Freimann , David Hildenbrand , Wanpeng Li , David Matlack To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Return-path: Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:56552 "EHLO e06smtp12.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751478AbcEDHvF (ORCPT ); Wed, 4 May 2016 03:51:05 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 4 May 2016 08:51:03 +0100 In-Reply-To: <20160503150902.GF30059@potion> Sender: kvm-owner@vger.kernel.org List-ID: On 05/03/2016 05:09 PM, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > 2016-05-03 14:37+0200, Christian Borntraeger: >> Some wakeups should not be considered a sucessful poll. For example = on >> s390 I/O interrupts are usually floating, which means that _ALL_ CPU= s >> would be considered runnable - letting all vCPUs poll all the time f= or >> transactional like workload, even if one vCPU would be enough. >> This can result in huge CPU usage for large guests. >> This patch lets architectures provide a way to qualify wakeups if th= ey >> should be considered a good/bad wakeups in regard to polls. >> >> For s390 the implementation will fence of halt polling for anything = but >> known good, single vCPU events. The s390 implementation for floating >> interrupts does a wakeup for one vCPU, but the interrupt will be del= ivered >> by whatever CPU checks first for a pending interrupt. We prefer the >> woken up CPU by marking the poll of this CPU as "good" poll. >> This code will also mark several other wakeup reasons like IPI or >> expired timers as "good". This will of course also mark some events = as >> not sucessful. As KVM on z runs always as a 2nd level hypervisor, >> we prefer to not poll, unless we are really sure, though. >> >> This patch successfully limits the CPU usage for cases like uperf 1b= yte >> transactional ping pong workload or wakeup heavy workload like OLTP >> while still providing a proper speedup. >> >> This also introduced a new vcpu stat "halt_poll_no_tuning" that mark= s >> wakeups that are considered not good for polling. >> >> Signed-off-by: Christian Borntraeger >> Cc: David Matlack >> Cc: Wanpeng Li >> --- >=20 > Thanks for all explanations, >=20 > Acked-by: Radim Kr=C4=8Dm=C3=A1=C5=99 >=20 The feedback about the logic triggered some more experiments on my side= =2E So I was experimenting with some different workloads/heuristics and it seems that even more aggressive shrinking (basically resetting to 0 as = soon as an invalid poll comes along) does improve the cpu usage even more. patch on top diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ffe0545..c168662 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2036,12 +2036,13 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) out: block_ns =3D ktime_to_ns(cur) - ktime_to_ns(start); =20 - if (halt_poll_ns) { + if (!vcpu_valid_wakeup(vcpu)) + shrink_halt_poll_ns(vcpu); + else if (halt_poll_ns) { if (block_ns <=3D vcpu->halt_poll_ns) ; /* we had a long block, shrink polling */ - else if (!vcpu_valid_wakeup(vcpu) || - (vcpu->halt_poll_ns && block_ns > halt_poll_ns)= ) + else if (vcpu->halt_poll_ns && block_ns > halt_poll_ns) shrink_halt_poll_ns(vcpu); /* we had a short halt and our poll time is too small *= / else if (vcpu->halt_poll_ns < halt_poll_ns && the uperf 1byte:1byte workload seems to have all the benefits still. I have asked the performance folks to test several other workloads if we loose some of the benefits. So I will defer this patch until I have a full picture which heuristics is best. Hopefully I have some answers next week.=20 (So the new diff looks like) @@ -2034,7 +2036,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) out: block_ns =3D ktime_to_ns(cur) - ktime_to_ns(start); =20 - if (halt_poll_ns) { + if (!vcpu_valid_wakeup(vcpu)) + shrink_halt_poll_ns(vcpu); + else if (halt_poll_ns) { if (block_ns <=3D vcpu->halt_poll_ns) ;