From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
Date: Tue, 23 Nov 2010 16:35:52 +0200
Message-ID: <4CEBD148.3080508@redhat.com>
References: <1290466818-5230-1-git-send-email-aliguori@us.ibm.com> <4CEB6222.5050203@redhat.com> <4CEBC6E4.1000307@codemonkey.ws> <4CEBC915.2010006@redhat.com> <4CEBCE95.2050400@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Anthony Liguori <aliguori@us.ibm.com>, qemu-devel@nongnu.org,
	Chris Wright <chrisw@sous-sol.org>, kvm@vger.kernel.org
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:10082 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752318Ab0KWOgE (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 23 Nov 2010 09:36:04 -0500
In-Reply-To: <4CEBCE95.2050400@codemonkey.ws>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/23/2010 04:24 PM, Anthony Liguori wrote:
>
>>>
>>>> Using monitor commands is fairly heavyweight for something as high 
>>>> frequency as this.  What control period do you see people using?  
>>>> Maybe we should define USR1 for vcpu start/stop.
>>>>
>>>> What happens if one vcpu is stopped while another is running?  Spin 
>>>> loops, synchronous IPIs will take forever.  Maybe we need to stop 
>>>> the entire process.
>>>
>>> It's the same problem if a VCPU is descheduled while another is 
>>> running. 
>>
>> We can fix that with directed yield or lock holder preemption 
>> prevention.  But if a vcpu is stopped by qemu, we suddenly can't.
>
> That only works for spin locks.
>
> Here's the scenario:
>
> 1) VCPU 0 drops to userspace and acquires qemu_mutex
> 2) VCPU 0 gets descheduled
> 3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
> blocked and yields
> 4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy 
> the system is
>
> With CFS hard limits, once (2) happens, we're boned for (3) because 
> (4) cannot happen.  By having QEMU know about (2), it can choose to 
> run just a little bit longer in order to drop qemu_mutex such that (3) 
> never happens.

There's some support for futex priority inheritance, perhaps we can 
leverage that.  It's supposed to be for realtime threads, but perhaps we 
can hook the priority booster to directed yield.

It's really the same problem -- preempted lock holder -- only in 
userspace.  We should be able to use the same solution.

>
>>
>>> The problem with stopping the entire process is that a big 
>>> motivation for this is to ensure that benchmarks have consistent 
>>> results regardless of CPU capacity.  If you just monitor the full 
>>> process, then one VCPU may dominate the entitlement resulting in 
>>> very erratic benchmarking.
>>
>> What's the desired behaviour?  Give each vcpu 300M cycles per second, 
>> or give a 2vcpu guest 600M cycles per second?
>
> Each vcpu gets 300M cycles per second.
>
>> You could monitor threads separately but stop the entire process.  
>> Stopping individual threads will break apart as soon as they start 
>> taking locks.
>
> I don't think so..  PLE should work as expected.  It's no different 
> than a normally contended system.
>

PLE without directed yield is useless.  With directed yield, it may 
work, but if the vcpu is stopped, it becomes ineffective.

Directed yield allows the scheduler to follow a bouncing lock around by 
increasing the priority (or decreasing vruntime) of the immediate lock 
holder at the expense of waiters.  SIGSTOP may drop the priority of the 
lock holder to zero without giving PLE a way to adjust.

-- 
error compiling committee.c: too many arguments to function