From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH 2/3] Improve DEC handling Date: Mon, 21 Dec 2009 19:19:10 +0100 Message-ID: <4B2FBC1E.6020206@suse.de> References: <1261405373-8008-1-git-send-email-agraf@suse.de> <1261405373-8008-3-git-send-email-agraf@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: kvm-ppc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Hollis Blanchard Return-path: In-Reply-To: Sender: kvm-ppc-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: kvm.vger.kernel.org Hollis Blanchard wrote: > On Mon, Dec 21, 2009 at 6:22 AM, Alexander Graf wrote: > >> We treated the DEC interrupt like an edge based one. This is not true for >> Book3s. The DEC keeps firing until mtdec is issued again and thus clears >> the interrupt line. >> > > That's not quite right. The decrementer keeps firing until the top bit > is cleared, i.e. with mtdec. However, not *every* mtdec clears it. > Right, that's we we fire a dec interrupt off whenever we mtdec with the top bit set. > (Also, I'm pretty sure this varies between Book 3S implementations, > e.g. 970 behaves differently than POWERn. I don't remember specific > values of though, and I could be misremembering...) > IIRC only the embedded cores were different. But I could be wrong. How do I find out? > So is this the failure mode? > - a decrementer interrupt is delivered > - guest does *not* issue mtdec to clear it (ppc64's lazy interrupt disabling?) > - guest expects a second decrementer interrupt, but KVM doesn't deliver one > > In that case, it seems like the real fix would be something like this: > > void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) > { > unsigned long dec_nsec; > > pr_debug("mtDEC: %x\n", vcpu->arch.dec); > #ifdef CONFIG_PPC64 > /* POWER4+ triggers a dec interrupt if the value is < 0 */ > if (vcpu->arch.dec & 0x80000000) { > hrtimer_try_to_cancel(&vcpu->arch.dec_timer); > kvmppc_core_queue_dec(vcpu); > + /* keep queuing interrupts until guest clears high MSR bit */ > + hrtimer_start(&vcpu->arch.dec_timer, ktime_set(0, 100), > + HRTIMER_MODE_REL); > This code path is only triggered when the guest mtdecs with a negative value. But I understand what you're trying to suggest and I think it's a bad idea. We don't want to poll the guest for interrupt enablement. On a real CPU the DEC interrupt keeps being active when the DEC register is negative. And that's exactly what this patch implements, no? That way we are automatically event-based and everyone's happy. Alex