From: "Zhai, Edwin" <edwin.zhai@intel.com>
To: Avi Kivity <avi@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Zhai, Edwin" <edwin.zhai@intel.com>
Subject: Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
Date: Mon, 28 Sep 2009 17:33:45 +0800 [thread overview]
Message-ID: <4AC082F9.1060502@intel.com> (raw)
In-Reply-To: <4ABF2221.4000505@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]
Avi Kivity wrote:
> +#define KVM_VMX_DEFAULT_PLE_GAP 41
> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
> +module_param(ple_gap, int, S_IRUGO);
> +
> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> +module_param(ple_window, int, S_IRUGO);
>
>
>
> Shouldn't be __read_mostly since they're read very rarely (__read_mostly
> should be for variables that are very often read, and rarely written).
>
In general, they are read only except that experienced user may try
different parameter for perf tuning.
> I'm not even sure they should be parameters.
>
For different spinlock in different OS, and for different workloads, we
need different parameter for tuning. It's similar as the enable_ept.
>
>> /*
>> + * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
>> + * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
>> + */
>> +static int handle_pause(struct kvm_vcpu *vcpu,
>> + struct kvm_run *kvm_run)
>> +{
>> + ktime_t expires;
>> + skip_emulated_instruction(vcpu);
>> +
>> + /* Sleep for 1 msec, and hope lock-holder got scheduled */
>> + expires = ktime_add_ns(ktime_get(), 1000000UL);
>>
>>
>
> I think this should be much lower, 50-100us. Maybe this should be a
> parameter. With 1ms we losing significant cpu time if the congestion
> clears.
>
I have made it a parameter with default value of 100 us.
>
>> + set_current_state(TASK_INTERRUPTIBLE);
>> + schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>> +
>>
>>
>
> Please add a tracepoint for this (since it can cause significant change
> in behaviour),
Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE
vmexit from other vmexits.
> and move the logic to kvm_main.c. It will be reused by
> the AMD implementation, possibly my software spinlock detector,
> paravirtualized spinlocks, and hopefully other architectures.
>
Done.
>
>> + return 1;
>> +}
>> +
>> +/*
>>
>>
>
>
[-- Attachment #2: kvm_ple_hrtimer_v2.patch --]
[-- Type: application/octet-stream, Size: 7678 bytes --]
KVM:VMX: Add support for Pause-Loop Exiting
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap - upper bound on the amount of time between two successive
executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
a PAUSE loop
If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.
Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.
Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
#define SECONDARY_EXEC_ENABLE_VPID 0x00000020
#define SECONDARY_EXEC_WBINVD_EXITING 0x00000040
#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400
#define PIN_BASED_EXT_INTR_MASK 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
VM_ENTRY_INSTRUCTION_LEN = 0x0000401a,
TPR_THRESHOLD = 0x0000401c,
SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
+ PLE_GAP = 0x00004020,
+ PLE_WINDOW = 0x00004022,
VM_INSTRUCTION_ERROR = 0x00004400,
VM_EXIT_REASON = 0x00004402,
VM_EXIT_INTR_INFO = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
#define EXIT_REASON_MSR_READ 31
#define EXIT_REASON_MSR_WRITE 32
#define EXIT_REASON_MWAIT_INSTRUCTION 36
+#define EXIT_REASON_PAUSE_INSTRUCTION 40
#define EXIT_REASON_MCE_DURING_VMENTRY 41
#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
#define EXIT_REASON_APIC_ACCESS 44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3fe0d42..ed40386 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,31 @@ module_param_named(unrestricted_guest,
static int __read_mostly emulate_invalid_guest_state = 0;
module_param(emulate_invalid_guest_state, bool, S_IRUGO);
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap: upper bound on the amount of time between two successive
+ * executions of PAUSE in a loop. Also indicate if ple enabled.
+ * According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ * in a PAUSE loop. Tests indicate that most spinlocks are held for
+ * less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP 41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
+/*
+ * ple_sleep controls how long(us) the VCPU sleep upon a PLE vmexit
+ */
+static int __read_mostly ple_sleep = 100;
+module_param(ple_sleep, int, S_IRUGO);
+
struct vmcs {
u32 revision_id;
u32 abort;
@@ -319,6 +344,12 @@ static inline int cpu_has_vmx_unrestricted_guest(void)
SECONDARY_EXEC_UNRESTRICTED_GUEST;
}
+static inline int cpu_has_vmx_ple(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
{
return flexpriority_enabled &&
@@ -1256,7 +1287,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
SECONDARY_EXEC_WBINVD_EXITING |
SECONDARY_EXEC_ENABLE_VPID |
SECONDARY_EXEC_ENABLE_EPT |
- SECONDARY_EXEC_UNRESTRICTED_GUEST;
+ SECONDARY_EXEC_UNRESTRICTED_GUEST |
+ SECONDARY_EXEC_PAUSE_LOOP_EXITING;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -1400,6 +1432,9 @@ static __init int hardware_setup(void)
if (enable_ept && !cpu_has_vmx_ept_2m_page())
kvm_disable_largepages();
+ if (!cpu_has_vmx_ple())
+ ple_gap = 0;
+
return alloc_kvm_area();
}
@@ -2312,9 +2347,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
if (!enable_unrestricted_guest)
exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+ if (!ple_gap)
+ exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
+ if (ple_gap) {
+ vmcs_write32(PLE_GAP, ple_gap);
+ vmcs_write32(PLE_WINDOW, ple_window);
+ }
+
vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */
@@ -3362,6 +3404,19 @@ out:
}
/*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run)
+{
+ skip_emulated_instruction(vcpu);
+ kvm_vcpu_sleep(vcpu, ple_sleep);
+
+ return 1;
+}
+
+/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
* to be done to userspace and return 0.
@@ -3397,6 +3452,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
[EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
[EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
+ [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
};
static const int kvm_vmx_max_exit_handlers =
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bf9ee9..3723d62 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -287,6 +287,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_sleep(struct kvm_vcpu *vcpu, unsigned int sleep_time);
void kvm_resched(struct kvm_vcpu *vcpu);
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e27b7a9..ff006ce 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1095,6 +1095,17 @@ void kvm_resched(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_resched);
+void kvm_vcpu_sleep(struct kvm_vcpu *vcpu, unsigned int sleep_time)
+{
+ /* Sleep for required time(us), and hope lock-holder got scheduled */
+ ktime_t expires;
+
+ expires = ktime_add_ns(ktime_get(), 1000UL * sleep_time);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_sleep);
+
static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct kvm_vcpu *vcpu = vma->vm_file->private_data;
next prev parent reply other threads:[~2009-09-28 9:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-23 14:04 [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting Zhai, Edwin
2009-09-23 14:09 ` Avi Kivity
2009-09-25 1:11 ` Zhai, Edwin
2009-09-27 8:28 ` Avi Kivity
2009-09-28 9:33 ` Zhai, Edwin [this message]
2009-09-29 12:05 ` Zhai, Edwin
2009-09-29 13:34 ` Avi Kivity
2009-09-30 1:01 ` Zhai, Edwin
2009-09-30 6:28 ` Avi Kivity
2009-09-30 16:22 ` Marcelo Tosatti
2009-10-02 18:28 ` Marcelo Tosatti
2009-10-09 10:03 ` Zhai, Edwin
2009-10-11 15:34 ` Avi Kivity
2009-10-12 19:13 ` Marcelo Tosatti
2009-09-25 20:43 ` Joerg Roedel
2009-09-27 8:31 ` Avi Kivity
2009-09-27 13:46 ` Joerg Roedel
2009-09-27 13:47 ` Avi Kivity
2009-09-27 14:07 ` Joerg Roedel
2009-09-27 14:18 ` Avi Kivity
2009-09-27 14:53 ` Joerg Roedel
2009-09-29 16:46 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AC082F9.1060502@intel.com \
--to=edwin.zhai@intel.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.