From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=33089 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PNmFA-0004lI-PC
	for qemu-devel@nongnu.org; Wed, 01 Dec 2010 07:57:10 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1PNmF9-0006PS-AT
	for qemu-devel@nongnu.org; Wed, 01 Dec 2010 07:57:08 -0500
Received: from mx1.redhat.com ([209.132.183.28]:20156)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1PNmF8-0006PI-UN
	for qemu-devel@nongnu.org; Wed, 01 Dec 2010 07:57:07 -0500
Message-ID: <4CF6460C.5070604@redhat.com>
Date: Wed, 01 Dec 2010 14:56:44 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <1290530963-3448-1-git-send-email-aliguori@us.ibm.com>
	<4CECCA39.4060702@redhat.com> <4CED1A23.9030607@linux.vnet.ibm.com>
	<4CED1FD3.1000801@redhat.com>
	<20101201123742.GA3780@linux.vnet.ibm.com>
In-Reply-To: <20101201123742.GA3780@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to
	start/stop a VCPU (v2)
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: vatsa@linux.vnet.ibm.com
Cc: Chris Wright <chrisw@sous-sol.org>, Anthony Liguori <aliguori@linux.vnet.ibm.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org

On 12/01/2010 02:37 PM, Srivatsa Vaddagiri wrote:
> On Wed, Nov 24, 2010 at 04:23:15PM +0200, Avi Kivity wrote:
> >  >>I'm more concerned about lock holder preemption, and interaction
> >  >>of this mechanism with any kernel solution for LHP.
> >  >
> >  >Can you suggest some scenarios and I'll create some test cases?
> >  >I'm trying figure out the best way to evaluate this.
> >
> >  Booting 64-vcpu Windows on a 64-cpu host with PLE but without
> >  directed yield takes longer than forever because PLE detects
> >  contention within the guest, which under our current PLE
> >  implementation (usleep(100)) converts guest contention into delays.
>
> Is there any way of optimizing PLE at runtime in such special case? For ex:
> either turn off PLE feature or gradually increase (spin-)timeout when PLE should
> kick in ..

It's not a special case at all.  Both host contention and guest 
contention are perfectly normal, and can occur simultaneously.


> >  (a directed yield implementation would find that all vcpus are
> >  runnable, yielding optimal results under this test case).
>
> I would think a plain yield() (rather than usleep/directed yield) would suffice
> here (yield would realize that there is nobody else to yield to and continue
> running the same vcpu thread).

Currently yield() is a no-op on Linux.

> As regards to any concern of leaking cpu
> bandwidth because of a plain yield, I think it can be fixed by a more
> simpler modification to yield that allows a thread to reclaim whatever timeslice
> it gave up previously [1].

If some other thread used that timeslice, don't we have an accounting 
problem?

> Regarding directed yield, do we have any reliable mechanism to find target of
> directed yield in this (unmodified/non-paravirtualized guest) case? IOW how do
> we determine the vcpu thread to which cycles need to be yielded upon contention?

My idea was to yield to a random starved vcpu of the same guest.  There 
are several cases to consider:

- we hit the right vcpu; lock is released, party.
- we hit some vcpu that is doing unrelated work.  yielding thread 
doesn't make progress, but we're not wasting cpu time.
- we hit another waiter for the same lock.  it will also PLE exit and 
trigger a directed yield.  this increases the cost of directed yield by 
a factor of count_of_runnable_but_not_running_vcpus, which could be 
large, but not disasterously so (i.e. don't run a 64-vcpu guest on a 
uniprocessor host with this)

> >  So if you were to test something similar running with a 20% vcpu
> >  cap, I'm sure you'd run into similar issues.  It may show with fewer
> >  vcpus (I've only tested 64).
> >
> >  >Are you assuming the existence of a directed yield and the
> >  >specific concern is what happens when a directed yield happens
> >  >after a PLE and the target of the yield has been capped?
> >
> >  Yes.  My concern is that we will see the same kind of problems
> >  directed yield was designed to fix, but without allowing directed
> >  yield to fix them.  Directed yield was designed to fix lock holder
> >  preemption under contention,
>
> For modified guests, something like [2] seems to be the best approach to fix
> lock-holder preemption (LHP) problem, which does not require any sort of
> directed yield support. Essentially upon contention, a vcpu registers its lock
> of interest and goes to sleep (via hypercall) waiting for lock-owner to wake it
> up (again via another hypercall).

Right.

> For unmodified guests, IMHO a plain yield (or slightly enhanced yield [1])
> should fix the LHP problem.

A plain yield (ignoring no-opiness on Linux) will penalize the running 
guest wrt other guests.  We need to maintain fairness.

> Fyi, Xen folks also seem to be avoiding a directed yield for some of the same
> reasons [3].

I think that fails for unmodified guests, where you don't know when the 
lock is released and so you don't have a wake_up notification.  You lost 
a large timeslice and you can't gain it back, whereas with pv the wakeup 
means you only lose as much time as the lock was held.

> Given this line of thinking, hard-limiting guests (either in user-space or
> kernel-space, latter being what I prefer) should not have adverse interactions
> with LHP-related solutions.

If you hard-limit a vcpu that holds a lock, any waiting vcpus are also 
halted.  With directed yield you can let the lock holder make some 
progress at the expense of another vcpu.  A regular yield() will simply 
stall the waiter.

-- 
error compiling committee.c: too many arguments to function