From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: KVM: x86: reset lapic_timer.expired_tscdeadline at SET_LAPIC time Date: Mon, 20 Jun 2016 10:11:58 -0300 Message-ID: <20160620131157.GA7470@amt.cnet> References: <20160617234126.GA24514@amt.cnet> <6bc78368-d559-fa09-7e77-389b0e87d695@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm-devel , Paolo Bonzini To: Alan Jenkins Return-path: Received: from mx1.redhat.com ([209.132.183.28]:48428 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752277AbcFTNMs (ORCPT ); Mon, 20 Jun 2016 09:12:48 -0400 Content-Disposition: inline In-Reply-To: <6bc78368-d559-fa09-7e77-389b0e87d695@gmail.com> Sender: kvm-owner@vger.kernel.org List-ID: On Sat, Jun 18, 2016 at 02:43:51PM +0100, Alan Jenkins wrote: > On 18/06/2016, Marcelo Tosatti wrote: > > > >Alan Jenkins reports hang at > >https://bugzilla.redhat.com/show_bug.cgi?id=3D1337667. > > > >* APIC write: expiration =3D 1000. > >* LAPIC tsc deadline code sets timer to 1000-30. > >* Timer fires at 970. > >* Guest writes TSC=3Dw. > > > >Guest fails to VM-entry to process signal to perform > >"vmload" in userspace. > > > >Case 1: w > 970: > >Guest entry can be performed. > > > >Case 2: w < 970: > >Guest entry should not be performed because "An interrupt is generat= ed > >when the logical processor=E2=80=99s time-stamp counter equals or ex= ceeds the > >target value in the IA32_TSC_DEADLINE MSR." > > > >Setting of APIC test resets all APIC state, including > >any pending interrupts. > > > >Signed-off-by: Marcelo Tosatti > > > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index ea306ad..89be6e9 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -2991,6 +2991,7 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm= _vcpu > >*vcpu, > > { > > kvm_apic_post_state_restore(vcpu, s); > > update_cr8_intercept(vcpu); > >+ vcpu->arch.apic->lapic_timer.expired_tscdeadline =3D 0; > > > > return 0; > > } >=20 > Hi >=20 > I'm happy to check my own test if this is considered a solution. > (And apologies for how my own efforts go... flailing around and not > always communicating effectively...) Your report had many details. > 1) I think the commit message could be improved. Specifically it > says the problem is due to TSC write, then that we need to fix LAPIC > reset. It doesn't say what the connection is between the two. Right. > (I.e. you would have to read the bug report, & then realize that it > _happens_ to involve a LAPIC state restore as well, because it's > triggered by reloading the entire VM state from a snapshot) >=20 > Doing so might make the intention of this commit clearer. I've been > trying to work out a high quality fix, which prevents the lockups > revealed by this case. I think this commit would fix my bug nicely, > but it doesn't help e.g. if the LAPIC restore was omitted from the > sequence. >=20 > (Both TSC and the value of TSC deadline are set by MSRs, not by > writing to the LAPIC. So I don't think the LAPIC state restore is > necessary to trigger the issue). >=20 >=20 > 2) If this approach is considered expedient, I have two comments - >=20 > i) isn't it better to set lapic_timer.expired_tscdeadline in > lapic.c, i.e. in apic_post_state_restore() ? >=20 > ii) then, wouldn't it be more correct to set it between > hrtimer_cancel() and start_apic_timer()? It doesn't sound right to > clear expired_tscdeadline, in case it was set immediately by > start_apic_timer() >=20 > Alan Advancing the timer expiration should only be necessary on guest initiated writes. Lets improve the commit message. Can you confirm the patch fixes your problem?