From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: [PATCH] fix kvmclock bug Date: Mon, 27 Sep 2010 09:00:48 -1000 Message-ID: <4CA0E9E0.7020209@redhat.com> References: <4C95560D.3050108@redhat.com> <4C9C5309.5080403@web.de> <4C9F1849.1050801@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Avi Kivity , kvm , Glauber Costa To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:49443 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757226Ab0I0TAv (ORCPT ); Mon, 27 Sep 2010 15:00:51 -0400 In-Reply-To: <4C9F1849.1050801@web.de> Sender: kvm-owner@vger.kernel.org List-ID: On 09/25/2010 11:54 PM, Jan Kiszka wrote: > Am 24.09.2010 09:28, Jan Kiszka wrote: > >> Am 19.09.2010 02:15, Zachary Amsden wrote: >> >>> For CPUs with unstable TSC, we null time offset between not just VCPU >>> switches, but all preemptions of the kvm thread. This makes a bug much >>> more likely where the kvmclock values are updated before a successful >>> exit from virt, causing an underflow. >>> >>> The null offsetting was added at : bf0fb4a42ba7eb362f4013bd2e93209666793e66 >>> The underflow happens with this additional patch : >>> cf839f5da2b0779b9ec8b990f851fb4e7d681da0 >>> >>> There is a secondary bug, which is that TSC fails to advance with real >>> time on unstable TSC, but the fix is much more involved (it requires the >>> TSC catchup code). >>> >>> For now, this patch is sufficient to get things working again for me. >>> >> ...but not for me. I still face stuck (or infinitely slow) guests that >> want to use kvmclock once tsc_unstable gets set. Or is this patch >> addressing a different issue? >> > Commit bfb3f332 ("TSC catchup mode") in kvm.git finally resolves the > issue here. > Good to hear that! > That only leaves us with the likely wrong unstable declaration of the > TSC after resume. And that raises the question for me if KVM is actually > that much smarter than the Linux kernel in detecting TSC jumps. If > something is missing, can't we improve the kernel's detection mechanism > which already has suspend/resume support? > Linux must make the the conservative choice about TSC being declared unstable; if it is possible that it has become unstable, it is unstable. Unfortunately, this bodes not well for us, as most of the finer points of accuracy depend on having a stable TSC. There's a bunch of places that declare TSC unstable, and where in the suspend / resume cycle that happens would depend on your actual hardware. Zach