From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marc Zyngier Subject: Re: [PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE Date: Mon, 07 Oct 2013 17:16:30 +0100 Message-ID: <5252DE5E.6060700@arm.com> References: <1381160430-11790-1-git-send-email-marc.zyngier@arm.com> <1381160430-11790-2-git-send-email-marc.zyngier@arm.com> <52F72A87-9CC3-4B38-ACDA-F5EA66FA7375@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Cc: linux-arm-kernel , "kvmarm@lists.cs.columbia.edu" , "kvm@vger.kernel.org mailing list" To: Alexander Graf Return-path: Received: from service87.mimecast.com ([91.220.42.44]:38259 "EHLO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752588Ab3JGQQd convert rfc822-to-8bit (ORCPT ); Mon, 7 Oct 2013 12:16:33 -0400 In-Reply-To: <52F72A87-9CC3-4B38-ACDA-F5EA66FA7375@suse.de> Sender: kvm-owner@vger.kernel.org List-ID: On 07/10/13 17:04, Alexander Graf wrote: > > On 07.10.2013, at 17:40, Marc Zyngier wrote: > >> On an (even slightly) oversubscribed system, spinlocks are quickly >> becoming a bottleneck, as some vcpus are spinning, waiting for a >> lock to be released, while the vcpu holding the lock may not be >> running at all. >> >> This creates contention, and the observed slowdown is 40x for >> hackbench. No, this isn't a typo. >> >> The solution is to trap blocking WFEs and tell KVM that we're now >> spinning. This ensures that other vpus will get a scheduling boost, >> allowing the lock to be released more quickly. >> >>> From a performance point of view: hackbench 1 process 1000 >> >> 2xA15 host (baseline): 1.843s >> >> 2xA15 guest w/o patch: 2.083s 4xA15 guest w/o patch: 80.212s >> >> 2xA15 guest w/ patch: 2.072s 4xA15 guest w/ patch: 3.202s > > I'm confused. You got from 2.083s when not exiting on spin locks to > 2.072 when exiting on _every_ spin lock that didn't immediately > succeed. I would've expected to second number to be worse rather than > better. I assume it's within jitter, I'm still puzzled why you don't > see any significant drop in performance. The key is in the ARM ARM: B1.14.9: "When HCR.TWE is set to 1, and the processor is in a Non-secure mode other than Hyp mode, execution of a WFE instruction generates a Hyp Trap exception if, ignoring the value of the HCR.TWE bit, conditions permit the processor to suspend execution." So, on a non-overcommitted system, you rarely hit a blocking spinlock, hence not trapping. Otherwise, performance would go down the drain very quickly. And yes, the difference is pretty much noise. M. -- Jazz is not dead. It just smells funny...