From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Subject: Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
Date: Wed, 24 Nov 2010 07:58:59 -0600
Message-ID: <4CED1A23.9030607@linux.vnet.ibm.com>
References: <1290530963-3448-1-git-send-email-aliguori@us.ibm.com> <4CECCA39.4060702@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Chris Wright <chrisw@sous-sol.org>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e2.ny.us.ibm.com ([32.97.182.142]:42102 "EHLO e2.ny.us.ibm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754254Ab0KXN7g (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 24 Nov 2010 08:59:36 -0500
Received: from d01dlp02.pok.ibm.com (d01dlp02.pok.ibm.com [9.56.224.85])
	by e2.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id oAODhDVQ027060
	for <kvm@vger.kernel.org>; Wed, 24 Nov 2010 08:43:16 -0500
Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235])
	by d01dlp02.pok.ibm.com (Postfix) with ESMTP id B1F134DE803C
	for <kvm@vger.kernel.org>; Wed, 24 Nov 2010 08:58:10 -0500 (EST)
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217])
	by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id oAODxYPA327598
	for <kvm@vger.kernel.org>; Wed, 24 Nov 2010 08:59:34 -0500
Received: from d01av03.pok.ibm.com (loopback [127.0.0.1])
	by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id oAODxXn9008186
	for <kvm@vger.kernel.org>; Wed, 24 Nov 2010 11:59:34 -0200
In-Reply-To: <4CECCA39.4060702@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/24/2010 02:18 AM, Avi Kivity wrote:
> On 11/23/2010 06:49 PM, Anthony Liguori wrote:
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
>> teaching
>> them to respond to these signals (which cannot be trapped), use 
>> SIGUSR1 to
>> approximate the behavior of SIGSTOP/SIGCONT.
>>
>> The purpose of this is to implement CPU hard limits using an external 
>> tool that
>> watches the CPU consumption and stops the VCPU as appropriate.
>>
>> This provides a more elegant solution in that it allows the VCPU 
>> thread to
>> release qemu_mutex before going to sleep.
>>
>> This current implementation uses a single signal.  I think this is 
>> too racey
>> in the long term so I think we should introduce a second signal.  If 
>> two signals
>> get coalesced into one, it could confuse the monitoring tool into 
>> giving the
>> VCPU the inverse of it's entitlement.
>
> You can use sigqueue() to send an accompanying value.

I switched to using SIGRTMIN+5 and SIGRTMIN+6.  I think that's a nicer 
solution since it maps to SIGCONT/SIGSTOP.

>> It might be better to simply move this logic entirely into QEMU to 
>> make this
>> more robust--the question is whether we think this is a good long 
>> term feature
>> to carry in QEMU?
>>
>
> I'm more concerned about lock holder preemption, and interaction of 
> this mechanism with any kernel solution for LHP.

Can you suggest some scenarios and I'll create some test cases?  I'm 
trying figure out the best way to evaluate this.

Are you assuming the existence of a directed yield and the specific 
concern is what happens when a directed yield happens after a PLE and 
the target of the yield has been capped?

>> +static __thread int sigusr1_wfd;
>> +
>> +static void on_sigusr1(int signo)
>> +{
>> +    char ch = 0;
>> +    if (write(sigusr1_wfd,&ch, 1)<  0) {
>> +        /* who cares */
>> +    }
>> +}
>
> We do have signalfd().

This is actually called from signalfd.  I thought about refactoring that 
loop to handle signals directly but since we do this elsewhere I figured 
I'd keep things consistent.

>> +
>> +static void sigusr1_read(void *opaque)
>> +{
>> +    CPUState *env = opaque;
>> +    ssize_t len;
>> +    int caught_signal = 0;
>> +
>> +    do {
>> +        char buffer[256];
>> +        len = read(env->sigusr1_fd, buffer, sizeof(buffer));
>> +        caught_signal = 1;
>> +    } while (len>  0);
>> +
>> +    if (caught_signal) {
>> +        if (env->stopped) {
>
> env->stopped is multiplexed among multiple users, so this interferes 
> with vm_stop().
>
> We need to make ->stopped a reference count instead.

Indeed.

Regards,

Anthony Liguori