From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=33089 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNmFA-0004lI-PC for qemu-devel@nongnu.org; Wed, 01 Dec 2010 07:57:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNmF9-0006PS-AT for qemu-devel@nongnu.org; Wed, 01 Dec 2010 07:57:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20156) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNmF8-0006PI-UN for qemu-devel@nongnu.org; Wed, 01 Dec 2010 07:57:07 -0500 Message-ID: <4CF6460C.5070604@redhat.com> Date: Wed, 01 Dec 2010 14:56:44 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1290530963-3448-1-git-send-email-aliguori@us.ibm.com> <4CECCA39.4060702@redhat.com> <4CED1A23.9030607@linux.vnet.ibm.com> <4CED1FD3.1000801@redhat.com> <20101201123742.GA3780@linux.vnet.ibm.com> In-Reply-To: <20101201123742.GA3780@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2) List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: vatsa@linux.vnet.ibm.com Cc: Chris Wright , Anthony Liguori , qemu-devel@nongnu.org, kvm@vger.kernel.org On 12/01/2010 02:37 PM, Srivatsa Vaddagiri wrote: > On Wed, Nov 24, 2010 at 04:23:15PM +0200, Avi Kivity wrote: > > >>I'm more concerned about lock holder preemption, and interaction > > >>of this mechanism with any kernel solution for LHP. > > > > > >Can you suggest some scenarios and I'll create some test cases? > > >I'm trying figure out the best way to evaluate this. > > > > Booting 64-vcpu Windows on a 64-cpu host with PLE but without > > directed yield takes longer than forever because PLE detects > > contention within the guest, which under our current PLE > > implementation (usleep(100)) converts guest contention into delays. > > Is there any way of optimizing PLE at runtime in such special case? For ex: > either turn off PLE feature or gradually increase (spin-)timeout when PLE should > kick in .. It's not a special case at all. Both host contention and guest contention are perfectly normal, and can occur simultaneously. > > (a directed yield implementation would find that all vcpus are > > runnable, yielding optimal results under this test case). > > I would think a plain yield() (rather than usleep/directed yield) would suffice > here (yield would realize that there is nobody else to yield to and continue > running the same vcpu thread). Currently yield() is a no-op on Linux. > As regards to any concern of leaking cpu > bandwidth because of a plain yield, I think it can be fixed by a more > simpler modification to yield that allows a thread to reclaim whatever timeslice > it gave up previously [1]. If some other thread used that timeslice, don't we have an accounting problem? > Regarding directed yield, do we have any reliable mechanism to find target of > directed yield in this (unmodified/non-paravirtualized guest) case? IOW how do > we determine the vcpu thread to which cycles need to be yielded upon contention? My idea was to yield to a random starved vcpu of the same guest. There are several cases to consider: - we hit the right vcpu; lock is released, party. - we hit some vcpu that is doing unrelated work. yielding thread doesn't make progress, but we're not wasting cpu time. - we hit another waiter for the same lock. it will also PLE exit and trigger a directed yield. this increases the cost of directed yield by a factor of count_of_runnable_but_not_running_vcpus, which could be large, but not disasterously so (i.e. don't run a 64-vcpu guest on a uniprocessor host with this) > > So if you were to test something similar running with a 20% vcpu > > cap, I'm sure you'd run into similar issues. It may show with fewer > > vcpus (I've only tested 64). > > > > >Are you assuming the existence of a directed yield and the > > >specific concern is what happens when a directed yield happens > > >after a PLE and the target of the yield has been capped? > > > > Yes. My concern is that we will see the same kind of problems > > directed yield was designed to fix, but without allowing directed > > yield to fix them. Directed yield was designed to fix lock holder > > preemption under contention, > > For modified guests, something like [2] seems to be the best approach to fix > lock-holder preemption (LHP) problem, which does not require any sort of > directed yield support. Essentially upon contention, a vcpu registers its lock > of interest and goes to sleep (via hypercall) waiting for lock-owner to wake it > up (again via another hypercall). Right. > For unmodified guests, IMHO a plain yield (or slightly enhanced yield [1]) > should fix the LHP problem. A plain yield (ignoring no-opiness on Linux) will penalize the running guest wrt other guests. We need to maintain fairness. > Fyi, Xen folks also seem to be avoiding a directed yield for some of the same > reasons [3]. I think that fails for unmodified guests, where you don't know when the lock is released and so you don't have a wake_up notification. You lost a large timeslice and you can't gain it back, whereas with pv the wakeup means you only lose as much time as the lock was held. > Given this line of thinking, hard-limiting guests (either in user-space or > kernel-space, latter being what I prefer) should not have adverse interactions > with LHP-related solutions. If you hard-limit a vcpu that holds a lock, any waiting vcpus are also halted. With directed yield you can let the lock holder make some progress at the expense of another vcpu. A regular yield() will simply stall the waiter. -- error compiling committee.c: too many arguments to function