From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH 3/4] KVM: Adds ability to preempt
 an	executing VCPU
Date: Tue, 08 May 2007 11:13:11 +0300
Message-ID: <46403117.2070000@qumranet.com>
References: <20070502212713.16738.8133.stgit@novell1.haskins.net>
	<20070502214325.16738.42702.stgit@novell1.haskins.net>
	<463EF7F4.9020106@qumranet.com> <463F04D8.BA47.005A.0@novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
To: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
Return-path: <kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
In-Reply-To: <463F04D8.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/kvm-devel>,
	<mailto:kvm-devel-request-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=kvm-devel>
List-Post: <mailto:kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
List-Help: <mailto:kvm-devel-request-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/kvm-devel>,
	<mailto:kvm-devel-request-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org?subject=subscribe>
Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
List-Id: kvm.vger.kernel.org

Gregory Haskins wrote:
>>     
>>>  
>>>  		vcpu- >cpu = - 1;
>>>  		vcpu- >kvm = kvm;
>>> @@ - 366,13 +370,20 @@ static void free_pio_guest_pages(struct kvm_vcpu *vcpu)
>>>  
>>>  static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
>>>  {
>>> +	unsigned long irqsave;
>>> +
>>>  	if (!vcpu- >vmcs)
>>>  		return;
>>>  
>>>  	vcpu_load(vcpu);
>>>  	kvm_mmu_destroy(vcpu);
>>>  	vcpu_put(vcpu);
>>> +
>>> +	spin_lock_irqsave(&vcpu- >irq.lock, irqsave);
>>> +	vcpu- >irq.task = NULL;
>>> +	spin_unlock_irqrestore(&vcpu- >irq.lock, irqsave);
>>>   
>>>       
>> Can irq.task be non- NULL here at all?  Also, we only free vcpus when we 
>> destroy the vm, and paravirt drivers would hopefully hold a ref to the 
>> vm, so there's nobody to race against here.
>>     
>
> I am perhaps being a bit overzealous here.  What I found in practice is that the LVTT can screw things up on shutdown, so I was being pretty conservative on the synchronization here.  
>
>   

That may point out to a different sync problem.  All pending timers 
ought to have been canceled before we reach here.  Please check to make 
sure this isn't papering over another problem.

>>>  	kvm_irqdevice_destructor(&vcpu- >irq.dev);
>>>  
>>> @@ - 1868,6 +1880,10 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>>>       
>> struct kvm_run *kvm_run)
>>     
>>>  		kvm_arch_ops- >decache_regs(vcpu);
>>>  	}
>>>  
>>> +	spin_lock_irqsave(&vcpu- >irq.lock, irqsaved);
>>> +	vcpu- >irq.task = current;
>>> +	spin_unlock_irqrestore(&vcpu- >irq.lock, irqsaved);
>>> +
>>>   
>>>       
>> Just assignment + __smp_wmb().
>>     
>
> (This comment applies to all of the subsequent reviews where memory barriers are recommended instead of locks:)
>
> I cant quite wrap my head around whether all these critical sections are correct with just a barrier instead of a full-blown lock.  I would prefer to be conservative and leave them as locks for now.  Someone with better insight could make a second pass and optimize the locks into barriers where appropriate.  I am just uncomfortable doing it feeling confident that I am not causing races.  If you insist on making the changes before the code is accepted, ok.  Just note that I am not comfortable ;)
>
>   

I approach it from the other direction: to me, a locked assignment says 
that something is fundamentally wrong.  Usually anything under a lock is 
a read-modify-write operation, otherwise the writes just stomp on each 
other.

This is the source of the all the race-after-vmexit-irq-check comments 
you've been getting to me.  No matter how many times you explain it, 
every time I see it the automatic race alarm pops up.

>>> +/*
>>>   * This function will be invoked whenever the vcpu- >irq.dev raises its INTR
>>>   * line
>>>   */
>>> @@ - 2318,10 +2348,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>>>  {
>>>  	struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this- >private;
>>>  	unsigned long flags;
>>> +	int direct_ipi = - 1;
>>>  
>>>  	spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> -	__set_bit(pin, &vcpu- >irq.pending);
>>> +
>>> +	if (!test_bit(pin, &vcpu- >irq.pending)) {
>>> +		/*
>>> +		 * Record the change..
>>> +		 */
>>> +		__set_bit(pin, &vcpu- >irq.pending);
>>> +
>>> +		/*
>>> +		 * then wake up the vcpu (if necessary)
>>> +		 */
>>> +		if (vcpu- >irq.task && (vcpu- >irq.task != current)) {
>>> +			if (vcpu- >irq.guest_mode) {
>>> +				/*
>>> +				 * If we are in guest mode, we can optimize
>>> +				 * the IPI by executing a function directly
>>> +				 * on the owning processor.
>>> +				 */
>>> +				direct_ipi = task_cpu(vcpu- >irq.task);
>>> +				BUG_ON(direct_ipi == smp_processor_id());
>>> +			} else
>>> +				/*
>>> +				 * otherwise, we must assume that we could be
>>> +				 * blocked anywhere, including userspace. Send
>>> +				 * a signal to give everyone a chance to get
>>> +				 * notification
>>> +				 */
>>> +				send_sig(vcpu- >irq.signo, vcpu- >irq.task, 0);
>>> +		}
>>> +	}
>>> +
>>>  	spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>> +
>>> +	if (direct_ipi != - 1) {
>>> +		/*
>>> +		 * Not sure if disabling preemption is needed.
>>> +		 * The kick_process() code does this so I copied it
>>> +		 */
>>> +		preempt_disable();
>>>       

[preemption is disabled here anyway]

>>> +		smp_call_function_single(direct_ipi,
>>> +					 kvm_vcpu_guest_intr,
>>> +					 vcpu, 0, 0);
>>> +		preempt_enable();
>>> +	}
>>>   
>>>       
>> I see why you must issue the IPI outside the spin_lock_irqsave(), but 
>> aren't you now opening a race?  vcpu enters guest mode, irq on other 
>> cpu, irq sets direct_ipi to wakeup guest, releases lock, vcpu exits to 
>> userspace (or migrates to another cpu), ipi is issued but nobody cares.
>>     
>
> Its subtle, but I think its ok.  The race is actually against the setting of the irq.pending.  This *must* happen inside the lock or the guest could exit and miss the interrupt.  Once the pending bit is set, however, the guest can be woken up in any old fashion and the behavior should be correct.  If the guest has already exited before the IPI is issued, its effectively a no-op (well, really its just a wasted IPI/reschedule event,  but no harm is done).  Does this make sense?  Did I miss something else?
>   

No, you are correct wrt the vcpu migrating to another cpu.

What about vs. exit to userspace where we may sleep?

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/