From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alan Jenkins Subject: Re: KVM: x86: reset lapic_timer.expired_tscdeadline at SET_LAPIC time Date: Mon, 20 Jun 2016 16:22:33 +0100 Message-ID: <9407acfa-2923-6059-b11a-e0944263a248@gmail.com> References: <20160617234126.GA24514@amt.cnet> <6bc78368-d559-fa09-7e77-389b0e87d695@gmail.com> <20160620131157.GA7470@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm-devel , Paolo Bonzini To: Marcelo Tosatti Return-path: Received: from mail-wm0-f67.google.com ([74.125.82.67]:33282 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753291AbcFTPgR (ORCPT ); Mon, 20 Jun 2016 11:36:17 -0400 Received: by mail-wm0-f67.google.com with SMTP id r201so15017230wme.0 for ; Mon, 20 Jun 2016 08:36:12 -0700 (PDT) In-Reply-To: <20160620131157.GA7470@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On 20/06/16 14:11, Marcelo Tosatti wrote: > On Sat, Jun 18, 2016 at 02:43:51PM +0100, Alan Jenkins wrote: >> On 18/06/2016, Marcelo Tosatti wrote: >>> I'm happy to check my own test if this is considered a solution. >>> (And apologies for how my own efforts go... flailing around and not >>> always communicating effectively...) > Your report had many details. Good, let's hope I will not report too many irrelevant details ("bury the lede") :). > >> 1) I think the commit message could be improved. Specifically it >> says the problem is due to TSC write, then that we need to fix LAPIC >> reset. It doesn't say what the connection is between the two. > Right. > > >> (I.e. you would have to read the bug report, & then realize that it >> _happens_ to involve a LAPIC state restore as well, because it's >> triggered by reloading the entire VM state from a snapshot) >> >> Doing so might make the intention of this commit clearer. I've been >> trying to work out a high quality fix, which prevents the lockups >> revealed by this case. I think this commit would fix my bug nicely, >> but it doesn't help e.g. if the LAPIC restore was omitted from the >> sequence. >> >> (Both TSC and the value of TSC deadline are set by MSRs, not by >> writing to the LAPIC. So I don't think the LAPIC state restore is >> necessary to trigger the issue). >> >> >> 2) If this approach is considered expedient, I have two comments - >> >> i) isn't it better to set lapic_timer.expired_tscdeadline in >> lapic.c, i.e. in apic_post_state_restore() ? >> >> ii) then, wouldn't it be more correct to set it between >> hrtimer_cancel() and start_apic_timer()? It doesn't sound right to >> clear expired_tscdeadline, in case it was set immediately by >> start_apic_timer() >> >> Alan > Advancing the timer expiration should only be necessary on > guest initiated writes. Lets improve the commit message. The timer expiration is still advanced, this patch just avoids the busywait. I think you mean it's ok if the guest would be interrupted early after a host-initiated write. I think it'd be fine in practice; even interrupting before observed TSC reaches the TSC deadline isn't a problem in itself. When restoring CPU state, qemu will already have burnt the ~10ns advance in terms of *real* time. And if there actually were external observers, they would already be confused by the guest being rewound. I just can't code something like that without leaving a comment. > Can you confirm the patch fixes your problem? That did it, yes.