From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock Date: Mon, 07 Feb 2011 10:15:43 -0500 Message-ID: <4D500C9F.2080501@redhat.com> References: <4D4BCB97.6000900@siemens.com> <4D4C698A.4010201@redhat.com> <4D4FD8EE.6040009@siemens.com> <4D4FFD97.6010805@redhat.com> <4D5008F0.5060200@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: Avi Kivity , Marcelo Tosatti , kvm , Linux Kernel Mailing List To: Jan Kiszka Return-path: In-Reply-To: <4D5008F0.5060200@siemens.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 02/07/2011 10:00 AM, Jan Kiszka wrote: > On 2011-02-07 15:11, Zachary Amsden wrote: > >> On 02/07/2011 06:35 AM, Jan Kiszka wrote: >> >>> On 2011-02-04 22:03, Zachary Amsden wrote: >>> >>> >>>> On 02/04/2011 04:49 AM, Jan Kiszka wrote: >>>> >>>> >>>>> Code under this lock requires non-preemptibility. Ensure this also over >>>>> -rt by converting it to raw spinlock. >>>>> >>>>> >>>>> >>>> Oh dear, I had forgotten about that. I believe kvm_lock might have the >>>> same assumption in a few places regarding clock. >>>> >>>> >>> I only found a problematic section in kvmclock_cpufreq_notifier. Didn't >>> see this during my tests as I have CPUFREQ disabled in my .config. >>> >>> We may need something like this as converting kvm_lock would likely be >>> overkill: >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 36f54fb..971ee0d 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -4530,7 +4530,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va >>> struct cpufreq_freqs *freq = data; >>> struct kvm *kvm; >>> struct kvm_vcpu *vcpu; >>> - int i, send_ipi = 0; >>> + int i, me, send_ipi = 0; >>> >>> /* >>> * We allow guests to temporarily run on slowing clocks, >>> @@ -4583,9 +4583,11 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va >>> kvm_for_each_vcpu(i, vcpu, kvm) { >>> if (vcpu->cpu != freq->cpu) >>> continue; >>> + me = get_cpu(); >>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); >>> - if (vcpu->cpu != smp_processor_id()) >>> + if (vcpu->cpu != me) >>> send_ipi = 1; >>> + put_cpu(); >>> } >>> } >>> spin_unlock(&kvm_lock); >>> >>> Jan >>> >>> >>> >> That looks like a good solution, and I do believe that is the only place >> the lock is used in that fashion - please add a comment though in the >> giant comment block above that preemption protection is needed for RT. >> Also, gcc should catch this, but moving the me variable into the >> kvm_for_each_vcpu loop should allow for better register allocation. >> >> The only other thing I can think of is that RT lock preemption may break >> some of the CPU initialization semantics enforced by kvm_lock if you >> happen to get a hotplug event just as the module is loading. That >> should be rare, but if it is indeed a bug, it would be nice to fix, it >> would be a panic for sure not to initialize VMX. >> > Hmm, is a cpu hotplug notifier allowed to run sleepy code? Can't > imagine. So we already have a strong reason to convert kvm_lock to a > raw_spinlock which obsoletes the above workaround. > I don't know as it is allowed to sleep, it doesn't call any sleeping functions to my knowledge. What worries me in the RT case is that the spinlock acquired for hardware_enable might be preempted and run on another CPU, which obviously isn't what you want. Zach