From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mukesh Rathor <mukesh.rathor@oracle.com>
Subject: Re: Linux spin lock enhancement on xen
Date: Wed, 18 Aug 2010 19:52:36 -0700
Message-ID: <20100818195236.1b898e75@mantra.us.oracle.com>
References: <20100816183357.08623c4c@mantra.us.oracle.com>
	<4C6ACA28.7030104@goop.org>
	<20100817185807.10628599@mantra.us.oracle.com>
	<4C6C0C3D.2070508@goop.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <4C6C0C3D.2070508@goop.org>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Keir, "Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>, Fraser <keir.fraser@eu.citrix.com>
List-Id: xen-devel@lists.xenproject.org

On Wed, 18 Aug 2010 09:37:17 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> (They don't leave for no reason; they leave when they're told they can
> take the lock next.)
> 
> I don't see why the guest should micromanage Xen's scheduler
> decisions. If a VCPU is waiting for another VCPU and can put itself
> to sleep in the meantime, then its up to Xen to take advantage of
> that newly freed PCPU to schedule something.  It may decide to run
> something in your domain that's runnable, or it may decide to run
> something else.  There's no reason why the spinlock holder is the
> best VCPU to run overall, or even the best VCPU in your domain.
> 
> My view is you should just put any VCPU which has nothing to do to
> sleep, and let Xen sort out the scheduling of the remainder.

Agree for the most part. But if we can spare the cost of a vcpu coming
on a cpu, realizing it has nothing to do and putting itself to sleep, by a
simple solution, we've just saved cycles. Often we are looking for tiny
gains in the benchmarks against competition. 

Yes we don't want to micromanage xen's schedular. But if a guest knows
something that the schedular does not, and has no way of knowing it,
then it would be nice to be able to exploit that. I didn't think a vcpu
telling xen that it's not making forward progress was intrusive.

Another approach, perhaps better, is a hypercall that allows to temporarily
boost a vcpu's priority.  What do you guys think about that? This would
be akin to a system call allowing a process to boost priority. Or
some kernels, where a thread holding a lock gets a temporary bump in
the priority because a waitor tells the kernel to.


> I'm not sure I understand this point.  If you're pinning vcpus to
> pcpus, then presumably you're not going to share a pcpu among many,
> or any vcpus, so the lock holder will be able to run any time it
> wants.  And a directed yield will only help if the lock waiter is
> sharing the same pcpu as the lock holder, so it can hand over its
> timeslice (since making the directed yield preempt something already
> running in order to run your target vcpu seems rude and ripe for
> abuse).

No, if a customer licences 4 cpus, and runs a guest with 12 vcpus.
You now have 12 vcpus confined to the 4 physical. 

> Presumably the number of pcpus are also going up, so the amount of
> per-pcpu overcommit is about the same.

Unless the vcpus's are going up faster than pcpus :)....


Thanks,
Mukesh