public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Longpeng (Mike)" <longpeng2@huawei.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"\"Jan H. Schönherr\"" <jschoenh@amazon.de>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"KarimAllah Ahmed" <karahmed@amazon.de>,
	kvm@vger.kernel.org
Subject: Re: [PATCH 2/3] KVM: Add capability to not exit on HLT
Date: Tue, 28 Nov 2017 09:34:56 +0800	[thread overview]
Message-ID: <5A1CBD40.4050709@huawei.com> (raw)
In-Reply-To: <20171127225156-mutt-send-email-mst@kernel.org>



On 2017/11/28 4:55, Michael S. Tsirkin wrote:

> On Mon, Nov 27, 2017 at 09:51:27PM +0100, Paolo Bonzini wrote:
>> On 27/11/2017 21:45, Michael S. Tsirkin wrote:
>>> On Sat, Nov 25, 2017 at 02:09:32PM +0100, Jan H. Schönherr wrote:
>>>> If host CPUs are dedicated to a VM, we can avoid VM exits on HLT,
>>>> reducing the wake-up latency on posted interrupts.
>>>>
>>>> This reintroduces a feature that has been there at some point --
>>>> see Linux 3.4 commit 10166744b80a ("KVM: VMX: remove yield_on_hlt")
>>>> for the removal -- but with the additional ability to enable it only
>>>> for selected VMs (and supporting SVM as well).
>>>>
>>>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
>>>
>>>
>>> If you are going to do this, why not expose mwait
>>> in the cpuid thus making guests use mwait to halt?
>>>
>>> What are the advantages of doing this using halt specifically?
>>
>> Not all guests use MWAIT, I suppose.
>>
>> Paolo
> 
> In that case, it would be nice to document which guests of interest
> don't.  E.g.  I don't think there are still supported versions of RHEL
> that don't use MWAIT.
> 


Some old kernels, E.g. my kernel is 3.10.0-514 based on RHEL 7.3, don't use
MWAIT if the idle-driver is not supported (we can see
"/sys/devices/system/cpu/cpuidle/current_driver" in guest is "none"). So the
idle routine will use the kernel's default routine.

The default idle routine is selected when starting,

old kernel:
'''
select_idle_routine()
	if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E)) {
		x86_idle = amd_e400_idle;
	} else
		x86_idle = default_idle;
'''

newer kernel:
'''
select_idle_routine()
	if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
		pr_info("using AMD E400 aware idle routine\n");
		x86_idle = amd_e400_idle;
	} else if (prefer_mwait_c1_over_halt(c)) {
		pr_info("using mwait in idle threads\n");
		x86_idle = mwait_idle;
	} else
		x86_idle = default_idle;
'''

So, some old guests don't use MWAIT as default idle routine.

> 
> 
>>>
>>>> ---
>>>> Note: AMD code paths are only compile tested
>>>> ---
>>>>  Documentation/virtual/kvm/api.txt | 12 +++++++++++-
>>>>  arch/x86/include/asm/kvm_host.h   |  1 +
>>>>  arch/x86/kvm/svm.c                |  3 ++-
>>>>  arch/x86/kvm/vmx.c                | 33 +++++++++++++++++++++++++++------
>>>>  arch/x86/kvm/x86.c                |  5 +++++
>>>>  arch/x86/kvm/x86.h                |  5 +++++
>>>>  include/uapi/linux/kvm.h          |  1 +
>>>>  7 files changed, 52 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index 0ee812c..c06bb41 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -4172,7 +4172,17 @@ Returns: 0 on success
>>>>  This capability indicates that a guest using memory monitoring instructions
>>>>  (MWAIT/MWAITX) to stop a virtual CPU will not cause a VM exit. As such, time
>>>>  spent while a virtual CPU is halted in this way will then be accounted for as
>>>> -guest running time on the host (as opposed to e.g. HLT).
>>>> +guest running time on the host.
>>>> +
>>>> +7.14 KVM_CAP_X86_GUEST_HLT
>>>> +
>>>> +Architectures: x86
>>>> +Parameters: none
>>>> +Returns: 0 on success
>>>> +
>>>> +This capability indicates that a guest using HLT to stop a virtual CPU will not
>>>> +cause a VM exit. As such, time spent while a virtual CPU is halted in this way
>>>> +will then be accounted for as guest running time on the host.
>>>>  
>>>>  8. Other capabilities.
>>>>  ----------------------
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index f7bcfaa..3197c2d 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -781,6 +781,7 @@ struct kvm_arch {
>>>>  
>>>>  	gpa_t wall_clock;
>>>>  
>>>> +	bool hlt_in_guest;
>>>>  	bool mwait_in_guest;
>>>>  
>>>>  	bool ept_identity_pagetable_done;
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index ef1b320..c135b98 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -1236,7 +1236,6 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>  	set_intercept(svm, INTERCEPT_RDPMC);
>>>>  	set_intercept(svm, INTERCEPT_CPUID);
>>>>  	set_intercept(svm, INTERCEPT_INVD);
>>>> -	set_intercept(svm, INTERCEPT_HLT);
>>>>  	set_intercept(svm, INTERCEPT_INVLPG);
>>>>  	set_intercept(svm, INTERCEPT_INVLPGA);
>>>>  	set_intercept(svm, INTERCEPT_IOIO_PROT);
>>>> @@ -1257,6 +1256,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>  		set_intercept(svm, INTERCEPT_MONITOR);
>>>>  		set_intercept(svm, INTERCEPT_MWAIT);
>>>>  	}
>>>> +	if (!kvm_hlt_in_guest(svm->vcpu.kvm))
>>>> +		set_intercept(svm, INTERCEPT_HLT);
>>>>  
>>>>  	control->iopm_base_pa = __sme_set(iopm_base);
>>>>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index a067735..1b67433 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -2446,6 +2446,25 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>>>  	vmx_set_interrupt_shadow(vcpu, 0);
>>>>  }
>>>>  
>>>> +static void vmx_set_intr_info(struct kvm_vcpu *vcpu, u32 intr)
>>>> +{
>>>> +	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
>>>> +
>>>> +	/*
>>>> +	 * Ensure that we clear the HLT state in the VMCS.  We don't need to
>>>> +	 * explicitly skip the instruction because if the HLT state is set, then
>>>> +	 * the instruction is already executing and RIP has already been
>>>> +	 * advanced.
>>>> +	 */
>>>> +	if (!kvm_hlt_in_guest(vcpu->kvm) || !(intr & INTR_INFO_VALID_MASK))
>>>> +		return;
>>>> +	if (is_external_interrupt(intr) || is_nmi(intr))
>>>> +		return;
>>>> +	if (vmcs_read32(GUEST_ACTIVITY_STATE) != GUEST_ACTIVITY_HLT)
>>>> +		return;
>>>> +	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
>>>> +}
>>>> +
>>>>  static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
>>>>  					       unsigned long exit_qual)
>>>>  {
>>>> @@ -2540,7 +2559,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>>>>  	} else
>>>>  		intr_info |= INTR_TYPE_HARD_EXCEPTION;
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
>>>> +	vmx_set_intr_info(vcpu, intr_info);
>>>>  }
>>>>  
>>>>  static bool vmx_rdtscp_supported(void)
>>>> @@ -5298,6 +5317,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
>>>>  	if (kvm_mwait_in_guest(vmx->vcpu.kvm))
>>>>  		exec_control &= ~(CPU_BASED_MWAIT_EXITING |
>>>>  				  CPU_BASED_MONITOR_EXITING);
>>>> +	if (kvm_hlt_in_guest(vmx->vcpu.kvm))
>>>> +		exec_control &= ~CPU_BASED_HLT_EXITING;
>>>>  	return exec_control;
>>>>  }
>>>>  
>>>> @@ -5635,7 +5656,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>>>  
>>>>  	setup_msrs(vmx);
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
>>>> +	vmx_set_intr_info(vcpu, 0);  /* 22.2.1 */
>>>>  
>>>>  	if (cpu_has_vmx_tpr_shadow() && !init_event) {
>>>>  		vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
>>>> @@ -5729,7 +5750,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
>>>>  			     vmx->vcpu.arch.event_exit_inst_len);
>>>>  	} else
>>>>  		intr |= INTR_TYPE_EXT_INTR;
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
>>>> +	vmx_set_intr_info(vcpu, intr);
>>>>  }
>>>>  
>>>>  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>>>> @@ -5758,8 +5779,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>>>>  		return;
>>>>  	}
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>>>> -			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
>>>> +	vmx_set_intr_info(vcpu, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK |
>>>> +				NMI_VECTOR);
>>>>  }
>>>>  
>>>>  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
>>>> @@ -9301,7 +9322,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>>>>  				  VM_ENTRY_INSTRUCTION_LEN,
>>>>  				  VM_ENTRY_EXCEPTION_ERROR_CODE);
>>>>  
>>>> -	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
>>>> +	vmx_set_intr_info(vcpu, 0);
>>>>  }
>>>>  
>>>>  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index fe6627a..f17c520 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -2755,6 +2755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>  	case KVM_CAP_SET_BOOT_CPU_ID:
>>>>   	case KVM_CAP_SPLIT_IRQCHIP:
>>>>  	case KVM_CAP_IMMEDIATE_EXIT:
>>>> +	case KVM_CAP_X86_GUEST_HLT:
>>>>  		r = 1;
>>>>  		break;
>>>>  	case KVM_CAP_ADJUST_CLOCK:
>>>> @@ -4068,6 +4069,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>>>  			r = 0;
>>>>  		}
>>>>  		break;
>>>> +	case KVM_CAP_X86_GUEST_HLT:
>>>> +		kvm->arch.hlt_in_guest = true;
>>>> +		r = 0;
>>>> +		break;
>>>>  	default:
>>>>  		r = -EINVAL;
>>>>  		break;
>>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>>>> index ed8e150..b2066aa 100644
>>>> --- a/arch/x86/kvm/x86.h
>>>> +++ b/arch/x86/kvm/x86.h
>>>> @@ -266,4 +266,9 @@ static inline bool kvm_mwait_in_guest(struct kvm *kvm)
>>>>  	return kvm->arch.mwait_in_guest;
>>>>  }
>>>>  
>>>> +static inline bool kvm_hlt_in_guest(struct kvm *kvm)
>>>> +{
>>>> +	return kvm->arch.hlt_in_guest;
>>>> +}
>>>> +
>>>>  #endif
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index 282d7613..ff8f266 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>>>>  #define KVM_CAP_HYPERV_SYNIC2 148
>>>>  #define KVM_CAP_HYPERV_VP_INDEX 149
>>>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>>>> +#define KVM_CAP_X86_GUEST_HLT 151
>>>>  
>>>>  #ifdef KVM_CAP_IRQ_ROUTING
>>>>  
>>>> -- 
>>>> 2.3.1.dirty
> 
> .
> 


-- 
Regards,
Longpeng(Mike)

  reply	other threads:[~2017-11-28  1:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-25 13:09 [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
2017-11-25 13:09 ` [PATCH 1/3] KVM: Don't enable MWAIT in guest by default Jan H. Schönherr
2017-11-27 18:13   ` Jim Mattson
     [not found]     ` <82e1f7c8-fdd6-835e-319a-bec72d771ef9@redhat.com>
2017-11-27 18:32       ` Jim Mattson
2017-11-28 23:58     ` Jan H. Schönherr
2017-11-29 16:58       ` Radim Krčmář
2017-11-27 20:46   ` Michael S. Tsirkin
2017-11-27 22:36     ` Jan H. Schönherr
2017-11-28 14:00       ` Michael S. Tsirkin
2017-11-27 20:50   ` Michael S. Tsirkin
     [not found]     ` <90f7f081-95d7-f573-8b57-5c6e86fd2a8d@redhat.com>
2017-11-27 20:57       ` Michael S. Tsirkin
2017-11-25 13:09 ` [PATCH 2/3] KVM: Add capability to not exit on HLT Jan H. Schönherr
2017-11-27  1:32   ` Wanpeng Li
2017-11-27  1:47     ` Wanpeng Li
     [not found]       ` <a2f4cf7f-5d7b-a1cc-30d5-d18df4d49173@redhat.com>
2017-11-27 12:29         ` Jan H. Schönherr
     [not found]     ` <421c71fd-6dff-c01e-9e78-42f114711ea9@redhat.com>
2017-11-27 15:27       ` Jan H. Schönherr
     [not found]   ` <e17ea420-c141-18b6-2622-e33a3f540c61@redhat.com>
2017-11-27 16:12     ` Jan H. Schönherr
2017-11-27 20:45   ` Michael S. Tsirkin
     [not found]     ` <8ce45bad-b43c-4e97-aa69-74d7fc9cecb5@redhat.com>
2017-11-27 20:55       ` Michael S. Tsirkin
2017-11-28  1:34         ` Longpeng (Mike) [this message]
2017-11-28 14:04           ` Michael S. Tsirkin
2017-11-25 13:09 ` [PATCH 3/3] KVM: Add capability to not exit on PAUSE Jan H. Schönherr
2017-11-27 20:48   ` Michael S. Tsirkin
2017-11-28  3:37   ` Longpeng (Mike)
2017-11-29  0:09     ` Jan H. Schönherr
2017-11-29  4:34       ` Longpeng (Mike)
2017-11-29 12:20         ` Jan H. Schönherr
     [not found] ` <a3c80a22-ff69-fa51-ea90-48f039eb449a@redhat.com>
2017-11-28  0:15   ` [PATCH 0/3] KVM: Tie MWAIT/HLT/PAUSE interception to initially disabled capabilities Jan H. Schönherr
     [not found]     ` <8971d9e0-388c-9934-1ab2-33508cbbeb8f@redhat.com>
2017-11-28 10:42       ` Jan H. Schönherr
2017-11-28 14:08       ` Michael S. Tsirkin
     [not found]         ` <e61d93f0-17d9-d182-83ae-b7165ae3dcb0@redhat.com>
2017-11-29  0:20           ` Michael S. Tsirkin
2017-11-29  0:24             ` Michael S. Tsirkin
     [not found]             ` <8e559062-e459-5a85-a4a3-72a4baf7764c@redhat.com>
2017-11-29 15:13               ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A1CBD40.4050709@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=joro@8bytes.org \
    --cc=jschoenh@amazon.de \
    --cc=karahmed@amazon.de \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox