From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation Date: Tue, 17 Jul 2012 14:51:56 +0200 Message-ID: <50055FEC.4020602@suse.de> References: <1341830087-12728-1-git-send-email-Bharat.Bhushan@freescale.com> <50044CF1.8070806@suse.de> <5004B995.8060303@freescale.com> <25F0D173-5EFA-4516-B478-F5E220E502CE@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D03DCB08D@039-SN2MPN1-023.039d.mgd.msft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Wood Scott-B07421 , "" , "" , "" , Benjamin Herrenschmidt , Kumar Gala , Avi Kivity To: Bhushan Bharat-R65777 Return-path: In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D03DCB08D@039-SN2MPN1-023.039d.mgd.msft.net> Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 07/17/2012 11:57 AM, Bhushan Bharat-R65777 wrote: > >> -----Original Message----- >> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-owner@vger.kernel.org] On >> Behalf Of Alexander Graf >> Sent: Tuesday, July 17, 2012 12:50 PM >> To: Wood Scott-B07421 >> Cc: Bhushan Bharat-R65777; ; ; >> ; Bhushan Bharat-R65777; Benjamin Herrenschmidt; Kumar >> Gala >> Subject: Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation >> >> >> >> On 17.07.2012, at 03:02, Scott Wood wrote: >> >>> On 07/16/2012 12:18 PM, Alexander Graf wrote: >>>>> +/* >>>>> + * Return the number of jiffies until the next timeout. If the >>>> timeout is >>>>> + * longer than the NEXT_TIMER_MAX_DELTA, that >>>> then? >>>> >>>>> return NEXT_TIMER_MAX_DELTA >>>>> + * instead. >>>> I can read code. >>> Come on, it's not exactly x++; /* add one to x */ >>> >>> It's faster to read code (as well as know the constraints within which >>> you can modify it without having to spend a lot of time digesting all >>> the callers' use cases) when you have a high level description of its >>> interface contract, and can be selective about when to zoom in to the >>> details. Linux kernel code tends to be bad about this. >> Yeah, not opposed to leave that part in :). >> >>>> The important piece of information in the comment is >>>> missing: The reason. >>> The reason for what? Why you want to know the next timeout? That's >>> the caller's business. Or why we use NEXT_TIMER_MAX_DELTA as the limit? >> Why we use the limit. IIRC it was explained in the last thread, just didn't make >> its way into the comment. > Earlier we have a comment on the #define MAX_TIMEOUT (new define added for a purpose, so the comment described the puspose). > Now we uses the generic #define NEX_TIMER_MAX_DELTA (include/linux/timer.h), so removed the comment. Ah, ok. Just saying, if you comment on some mechanism, like you did here, please also include the reasoning behind it. For example Do foo if x is true. isn't particularly helpful. However Do foo if x is true because the bar API will break with high values is very helpful. It includes the action and reason of the code :). Alternatively, to me the same as above would be /* bar API will break with high values */ if (x) do(foo) because in this case the code is the action description. Either variant works fine for me. > >>>>> +void kvmppc_watchdog_func(unsigned long data) { >>>>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; >>>>> + u32 tsr, new_tsr; >>>>> + int final; >>>>> + >>>>> + do { >>>>> + new_tsr = tsr = vcpu->arch.tsr; >>>>> + final = 0; >>>>> + >>>>> + /* Time out event */ >>>>> + if (tsr & TSR_ENW) { >>>>> + if (tsr & TSR_WIS) { >>>>> + new_tsr = (tsr & ~TCR_WRC_MASK) | >>>>> + (vcpu->arch.tcr & TCR_WRC_MASK); >>>>> + vcpu->arch.tcr &= ~TCR_WRC_MASK; >>>> Can't we just poke the vcpu to exit the VM and do the above on its own? >>> We've discussed this before. TSR updates are done via atomics, and we >>> send a request for the vcpu to act on the result. This is how the >>> decrementer works. >>> >>> http://www.spinics.net/lists/kvm-ppc/msg03169.html >> Yeah, the major difference to the dec is the atomicity of the whole thing. Dec >> changes one bit to enable the interrupt line. The final expiration is more >> complex. > Is not setting the TSR.WRS atomic here (cmpxchg() will handle this)? Final expiration sets TCR. TSR should be ok. > >>>> This is the watdog expired case, right? >>> Final expiration, yes. >>> >>>> I'd also prefer to have an >>>> explicit event for the expiry than a special TSR check in the main loop. >>> So check TSR[WRS] in update_timer_ints(), and have it queue a >>> pseudoexception? >> Or here. > Do we mean define a sudo IROPRIO for final expiry. We can also define an event that is sent through kvm_make_request. But yeah, IRQPRIO is probably easier. Not 100% sure which way is better though. Avi, any preferences? > >>> That would eliminate the need to change the runnable function. >>> >>>> Also call me sceptic on the reset of tcr. If our user space watchdog >>>> event is "write a message", then we essentially want to hide the fact >>>> that the watchdog expired from the guest, right? In that case, the >>>> second time-out wouldn't do anything guest visible. >>> This was probably copied straight out of the hardware documentation, >>> which explicitly says TCR[WRC] gets set to zero on final expiration >>> (as part of reset). We should leave that part up to userspace. It >>> definitely shouldn't be done inside the cmpxchg loop (or from >>> interrupt context -- only TSR gets the atomic treatment). I don't >>> think the read of TCR outside vcpu context is a problem, though. >> Yeah, but it'd just make me less wary if only the vcpu thread itself accesses >> vcpu internal registers that aren't irq state and thus designed for it (TSR). >> >> But yes, the most flexible way would probably be to do it from user space. Since >> it'd happen from within the vcpu context of user space, we can also guarantee >> that the TCR access is atomic. > Yes, will move the tcr.wrc clearing to userspace. > >>>>> int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) { >>>>> - return !(v->arch.shared->msr & MSR_WE) || >>>>> - !!(v->arch.pending_exceptions) || >>>>> - v->requests; >>>>> + bool ret = !(v->arch.shared->msr & MSR_WE) || >>>>> + !!(v->arch.pending_exceptions) || >>>>> + v->requests; >>>>> + >>>>> + ret = ret || kvmppc_get_tsr_wrc(v); >>>> Why do you need to declare the cpu as non-runnable when a watchdog >>>> event occured? >>> It's the other way around -- it's always runnable when a watchdog exit >>> is pending. It's like a pending exception. >> Ah, so yes, we should just shove it into pending_exceptions then. > Pending_exception? You mean sudo again here as said earlier. pseudo :). Yeah, I'm referring to above. No need to check 500 different conditions when we already have a bitmap that says "event is pending". Alex