From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [RFC -v4 PATCH 0/3] directed yield for Pause Loop Exiting
Date: Thu, 13 Jan 2011 15:12:22 +0200
Message-ID: <4D2EFA36.2040100@redhat.com>
References: <20110113002108.3abdf953@annuminas.surriel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mike Galbraith <efault@gmx.de>,
	Chris Wright <chrisw@sous-sol.org>
To: Rik van Riel <riel@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:8436 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751404Ab1AMNM4 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 13 Jan 2011 08:12:56 -0500
In-Reply-To: <20110113002108.3abdf953@annuminas.surriel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 01/13/2011 07:21 AM, Rik van Riel wrote:
> When running SMP virtual machines, it is possible for one VCPU to be
> spinning on a spinlock, while the VCPU that holds the spinlock is not
> currently running, because the host scheduler preempted it to run
> something else.
>
> Both Intel and AMD CPUs have a feature that detects when a virtual
> CPU is spinning on a lock and will trap to the host.
>
> The current KVM code sleeps for a bit whenever that happens, which
> results in eg. a 64 VCPU Windows guest taking forever and a bit to
> boot up.  This is because the VCPU holding the lock is actually
> running and not sleeping, so the pause is counter-productive.
>
> In other workloads a pause can also be counter-productive, with
> spinlock detection resulting in one guest giving up its CPU time
> to the others.  Instead of spinning, it ends up simply not running
> much at all.
>
> This patch series aims to fix that, by having a VCPU that spins
> give the remainder of its timeslice to another VCPU in the same
> guest before yielding the CPU - one that is runnable but got
> preempted, hopefully the lock holder.

Can you share some benchmark results?

I'm mostly interested in moderately sized guests (4-8 vcpus) under 
conditions of no overcommit, and high overcommit (2x).

For no overcommit, I'd like to see comparisons against mainline with PLE 
disabled, to be sure there aren't significant regressions. For 
overcommit, comparisons against the no overcommit case.  Comparisons 
against mainline, with or without PLE disabled, are uninteresting since 
we know it sucks both ways.

-- 
error compiling committee.c: too many arguments to function