Tosatti, See attached patch. Avi, Could you pls. do the check in if no any other comments. Thanks, Marcelo Tosatti wrote: > On Wed, Sep 30, 2009 at 01:22:49PM -0300, Marcelo Tosatti wrote: > >> On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote: >> >>> Avi, >>> I modify it according your comments. The only thing I want to keep is >>> the module param ple_gap/window. Although they are not per-guest, they >>> can be used to find the right value, and disable PLE for debug purpose. >>> >>> Thanks, >>> >>> >>> Avi Kivity wrote: >>> >>>> On 09/28/2009 11:33 AM, Zhai, Edwin wrote: >>>> >>>> >>>>> Avi Kivity wrote: >>>>> >>>>> >>>>>> +#define KVM_VMX_DEFAULT_PLE_GAP 41 >>>>>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096 >>>>>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP; >>>>>> +module_param(ple_gap, int, S_IRUGO); >>>>>> + >>>>>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; >>>>>> +module_param(ple_window, int, S_IRUGO); >>>>>> >>>>>> Shouldn't be __read_mostly since they're read very rarely >>>>>> (__read_mostly should be for variables that are very often read, >>>>>> and rarely written). >>>>>> >>>>>> >>>>> In general, they are read only except that experienced user may try >>>>> different parameter for perf tuning. >>>>> >>>>> >>>> __read_mostly doesn't just mean it's read mostly. It also means it's >>>> read often. Otherwise it's just wasting space in hot cachelines. >>>> >>>> >>>> >>>>>> I'm not even sure they should be parameters. >>>>>> >>>>>> >>>>> For different spinlock in different OS, and for different workloads, >>>>> we need different parameter for tuning. It's similar as the >>>>> enable_ept. >>>>> >>>>> >>>> No, global parameters don't work for tuning workloads and guests since >>>> they cannot be modified on a per-guest basis. enable_ept is only >>>> useful for debugging and testing. >>>> >>>> >>>> >>>>>>> + set_current_state(TASK_INTERRUPTIBLE); >>>>>>> + schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); >>>>>>> + >>>>>>> >>>>>>> >>>>>> Please add a tracepoint for this (since it can cause significant >>>>>> change in behaviour), >>>>>> >>>>> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE >>>>> vmexit from other vmexits. >>>>> >>>>> >>>> Right. I thought of the software spinlock detector, but that's another >>>> problem. >>>> >>>> I think you can drop the sleep_time parameter, it can be part of the >>>> function. Also kvm_vcpu_sleep() is confusing, we also sleep on halt. >>>> Please call it kvm_vcpu_on_spin() or something (since that's what the >>>> guest is doing). >>>> >> kvm_vcpu_on_spin() should add the vcpu to vcpu->wq (so a new pending >> interrupt wakes it up immediately). >> > > Updated version (also please send it separately from the vmx.c patch): > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 894a56e..43125dc 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -231,6 +231,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); > void mark_page_dirty(struct kvm *kvm, gfn_t gfn); > > void kvm_vcpu_block(struct kvm_vcpu *vcpu); > +void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); > void kvm_resched(struct kvm_vcpu *vcpu); > void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); > void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4d0dd39..e788d70 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1479,6 +1479,21 @@ void kvm_resched(struct kvm_vcpu *vcpu) > } > EXPORT_SYMBOL_GPL(kvm_resched); > > +void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) > +{ > + ktime_t expires; > + DEFINE_WAIT(wait); > + > + prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > + > + /* Sleep for 100 us, and hope lock-holder got scheduled */ > + expires = ktime_add_ns(ktime_get(), 100000UL); > + schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); > + > + finish_wait(&vcpu->wq, &wait); > +} > +EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); > + > static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > { > struct kvm_vcpu *vcpu = vma->vm_file->private_data; > > -- best rgds, edwin