From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [patch 1/2] KVM: x86: do not entry guest mode if vcpu is not
 runnable
Date: Sat, 26 Jul 2008 11:07:48 +0300
Message-ID: <488ADB54.6040005@qumranet.com>
References: <20080721143855.032449406@localhost.localdomain> <20080721144037.226624791@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:48565 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751449AbYGZIHv (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sat, 26 Jul 2008 04:07:51 -0400
In-Reply-To: <20080721144037.226624791@localhost.localdomain>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> If a vcpu has been offlined, or not initialized at all, signals
> requesting userspace work to be performed will result in KVM attempting
> to re-entry guest mode.
>
> Problem is that the in-kernel irqchip emulation happily executes HALTED
> state vcpu's. This breaks "savevm" on Windows SMP installation (that
> only boots up a single vcpu), for example.
>
> Fix it by blocking halted vcpu's at kvm_arch_vcpu_ioctl_run(). 
>
> Change the promotion from halted to running to happen in the vcpu
> context. Use the information available in kvm_vcpu_block(), and the
> current mpstate to make the decision:
>
> - If there's an in-kernel timer or irq event the halted->running
> promotion evaluation can be performed, no need for userspace assistance.
>
> - If there's a signal, there's either userspace work to be performed
> in the vcpu's context or irqchip emulation is in userspace.
>
> This has the nice side effect of avoiding userspace exit in case 
> of irq injection to a halted vcpu from the iothread.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: kvm/arch/x86/kvm/x86.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/x86.c
> +++ kvm/arch/x86/kvm/x86.c
> @@ -2505,17 +2505,25 @@ void kvm_arch_exit(void)
>  	kvm_mmu_module_exit();
>  }
>  
> +static void kvm_vcpu_promote_runnable(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
> +		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +}
> +
>  int kvm_emulate_halt(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.halt_exits;
>  	KVMTRACE_0D(HLT, vcpu, handler);
>  	if (irqchip_in_kernel(vcpu->kvm)) {
> +		int ret;
>   

Missing blank line.

> @@ -2978,10 +2986,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
>  	if (vcpu->sigset_active)
>  		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>  
> -	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
> -		kvm_vcpu_block(vcpu);
> -		r = -EAGAIN;
> -		goto out;
> +	if (unlikely(!kvm_arch_vcpu_runnable(vcpu))) {
> +		if (kvm_vcpu_block(vcpu)) {
> +			r = -EAGAIN;
> +			goto out;
> +		}
> +		kvm_vcpu_promote_runnable(vcpu);
>  	}
>   


Any reason this is not in __vcpu_run()?

Our main loop could look like

   while (no reason to stop)
         if (runnable)
              enter guest
         else
              block
         deal with aftermath

kvm_emulate_halt would then simply modify the mp state.

>  
>  	/* re-sync apic's tpr */
> Index: kvm/include/linux/kvm_host.h
> ===================================================================
> --- kvm.orig/include/linux/kvm_host.h
> +++ kvm/include/linux/kvm_host.h
> @@ -199,7 +199,7 @@ struct kvm_memory_slot *gfn_to_memslot(s
>  int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
>  void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
>  
> -void kvm_vcpu_block(struct kvm_vcpu *vcpu);
> +int kvm_vcpu_block(struct kvm_vcpu *vcpu);
>  void kvm_resched(struct kvm_vcpu *vcpu);
>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> Index: kvm/virt/kvm/kvm_main.c
> ===================================================================
> --- kvm.orig/virt/kvm/kvm_main.c
> +++ kvm/virt/kvm/kvm_main.c
> @@ -818,9 +818,10 @@ void mark_page_dirty(struct kvm *kvm, gf
>  /*
>   * The vCPU has executed a HLT instruction with in-kernel mode enabled.
>   */
> -void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> +int kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  {
>  	DEFINE_WAIT(wait);
> +	int ret = 0;
>  
>  	for (;;) {
>  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> @@ -831,8 +832,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcp
>  			break;
>  		if (kvm_arch_vcpu_runnable(vcpu))
>  			break;
> -		if (signal_pending(current))
> +		if (signal_pending(current)) {
> +			ret = 1;
>  			break;
> +		}
>   

This is ambiguous.  Multiple exit conditions could be true at the same 
time (vcpu becomes runnable _and_ signal is pending), so you can't trust 
the return code.  It doesn't affect the usage in the rest of the patch 
(I think), but it is best to avoid such subtlety.

Can this be done by setting a KVM_REQ_UNHALT bit in vcpu->requests?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.