From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: KVM: only abort guest entry if timer count goes from 0->1
Date: Thu, 12 Jun 2008 15:09:19 +0300
Message-ID: <485111EF.1080607@qumranet.com>
References: <20080611225238.GB23771@dmt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm-devel <kvm@vger.kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:37986 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755988AbYFLML1 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 12 Jun 2008 08:11:27 -0400
In-Reply-To: <20080611225238.GB23771@dmt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> Only abort guest entry if the timer count went from 0->1, since for 1->2
> or larger the bit will either be set already or a timer irq will have
> been injected.
>
> Using atomic_inc_and_test() for it also introduces an SMP barrier
> to the LAPIC version (thought it was unecessary because of timer
> migration, but guest can be scheduled to a different pCPU between exit
> and kvm_vcpu_block(), so there is the possibility for a race).
>
> Noticed by Avi.
>
>   

Applied, thanks.

> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index 9e3391e..c0f7872 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -198,14 +198,11 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps)
>  	struct kvm_vcpu *vcpu0 = ps->pit->kvm->vcpus[0];
>  	struct kvm_kpit_timer *pt = &ps->pit_timer;
>  
> -	atomic_inc(&pt->pending);
> -	smp_mb__after_atomic_inc();
> -	if (vcpu0) {
> +	if (!atomic_inc_and_test(&pt->pending))
>  		set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests);
> -		if (waitqueue_active(&vcpu0->wq)) {
> -			vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -			wake_up_interruptible(&vcpu0->wq);
> -		}
> +	if (vcpu0 && waitqueue_active(&vcpu0->wq)) {
> +		vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +		wake_up_interruptible(&vcpu0->wq);
>  	}
>   

Semi-related: what business does the timer have waking up vcpu0?  The 
timer pin may be masked, or routed via the ioapic to vcpu13.  The code 
here needs to touch irq pin 0, not wake up random vcpus.

It can't do that immediately since we're in interrupt context and we 
can't take the appropriate locks.  So we need to go through a 
workqueue.  There's some code for this in the pci device assignment 
code, so we can just use that when it is merged.

Longer term we need to give the pic and ioapic their own locking, likely 
with spin_lock_irq() so we can touch them from interrupt context.

> index 180ba73..73f43de 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -945,8 +945,8 @@ static int __apic_timer_fn(struct kvm_lapic *apic)
>  	int result = 0;
>  	wait_queue_head_t *q = &apic->vcpu->wq;
>  
> -	atomic_inc(&apic->timer.pending);
> -	set_bit(KVM_REQ_PENDING_TIMER, &apic->vcpu->requests);
> +	if(!atomic_inc_and_test(&apic->timer.pending))
> +		set_bit(KVM_REQ_PENDING_TIMER, &apic->vcpu->requests);
>  	if (waitqueue_active(q)) {
>  		apic->vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  		wake_up_interruptible(q);
>   

The wakeup is okay since lapic is handled by its vcpu.  But do we need 
to change the mpstate?  The lapic code should do that, if it determines 
that an interrupt is actually generated.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.