From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting Date: Sun, 27 Sep 2009 10:28:17 +0200 Message-ID: <4ABF2221.4000505@redhat.com> References: <4ABA2AD7.6080008@intel.com> <4ABA2C22.7020000@redhat.com> <4ABC18C6.9070202@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Ingo Molnar , "kvm@vger.kernel.org" To: "Zhai, Edwin" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:31503 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753439AbZI0I21 (ORCPT ); Sun, 27 Sep 2009 04:28:27 -0400 In-Reply-To: <4ABC18C6.9070202@intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On 09/25/2009 04:11 AM, Zhai, Edwin wrote: > Avi, > > hrtimer is used for sleep in attached patch, which have similar perf > gain with previous one. Maybe we can check in this patch first, and > turn to direct yield in future, as you suggested. > > +/* > + * These 2 parameters are used to config the controls for Pause-Loop Exiting: > + * ple_gap: upper bound on the amount of time between two successive > + * executions of PAUSE in a loop. Also indicate if ple enabled. > + * According to test, this time is usually small than 41 cycles. > + * ple_window: upper bound on the amount of time a guest is allowed to execute > + * in a PAUSE loop. Tests indicate that most spinlocks are held for > + * less than 2^12 cycles > + * Time is measured based on a counter that runs at the same rate as the TSC, > + * refer SDM volume 3b section 21.6.13& 22.1.3. > + */ > +#define KVM_VMX_DEFAULT_PLE_GAP 41 > +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096 > +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP; > +module_param(ple_gap, int, S_IRUGO); > + > +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; > +module_param(ple_window, int, S_IRUGO); > Shouldn't be __read_mostly since they're read very rarely (__read_mostly should be for variables that are very often read, and rarely written). I'm not even sure they should be parameters. > /* > + * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE > + * exiting, so only get here on cpu with PAUSE-Loop-Exiting. > + */ > +static int handle_pause(struct kvm_vcpu *vcpu, > + struct kvm_run *kvm_run) > +{ > + ktime_t expires; > + skip_emulated_instruction(vcpu); > + > + /* Sleep for 1 msec, and hope lock-holder got scheduled */ > + expires = ktime_add_ns(ktime_get(), 1000000UL); > I think this should be much lower, 50-100us. Maybe this should be a parameter. With 1ms we losing significant cpu time if the congestion clears. > + set_current_state(TASK_INTERRUPTIBLE); > + schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); > + > Please add a tracepoint for this (since it can cause significant change in behaviour), and move the logic to kvm_main.c. It will be reused by the AMD implementation, possibly my software spinlock detector, paravirtualized spinlocks, and hopefully other architectures. > + return 1; > +} > + > +/* > -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.