From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: kvmclock doesn't work, help? Date: Thu, 10 Dec 2015 19:32:13 -0200 Message-ID: <20151210213212.GA4836@amt.cnet> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm list , Radim Krcmar , Paolo Bonzini , X86 ML To: Andy Lutomirski Return-path: Received: from mx1.redhat.com ([209.132.183.28]:49646 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753297AbbLKVTn (ORCPT ); Fri, 11 Dec 2015 16:19:43 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote: > I'm trying to clean up kvmclock and I can't get it to work at all. My > host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC. > > If I boot an SMP (2 vcpus) guest, tracing says: > > qemu-system-x86-2517 [001] 102242.610654: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > qemu-system-x86-2521 [000] 102242.613742: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc > qemu-system-x86-2522 [000] 102242.622959: kvm_track_tsc: > vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc > qemu-system-x86-2521 [000] 102242.645123: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc > qemu-system-x86-2522 [000] 102242.647291: kvm_track_tsc: > vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc > qemu-system-x86-2521 [000] 102242.653369: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc > qemu-system-x86-2522 [000] 102242.653429: kvm_track_tsc: > vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc > qemu-system-x86-2517 [001] 102242.653447: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 1 > qemu-system-x86-2521 [000] 102242.653657: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 1 > qemu-system-x86-2522 [002] 102242.664448: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 1 > > > If I boot a UP guest, tracing says: > > qemu-system-x86-2567 [001] 102370.447484: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 1 > qemu-system-x86-2571 [002] 102370.447688: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 1 > > I suspect, but I haven't verified, that this is fallout from: > > commit 16a9602158861687c78b6de6dc6a79e6e8a9136f > Author: Marcelo Tosatti > Date: Wed May 14 12:43:24 2014 -0300 > > KVM: x86: disable master clock if TSC is reset during suspend > > Updating system_time from the kernel clock once master clock > has been enabled can result in time backwards event, in case > kernel clock frequency is lower than TSC frequency. > > Disable master clock in case it is necessary to update it > from the resume path. > > Signed-off-by: Marcelo Tosatti > Signed-off-by: Paolo Bonzini > > > Can we please stop making kvmclock more complex? It's a beast right > now, and not in a good way. It's far too tangled with the vclock > machinery on both the host and guest sides, the pvclock stuff is not > well thought out (even in principle in an ABI sense), and it's never > been clear to my what problem exactly the kvmclock stuff is supposed > to solve. > > I'm somewhat tempted to suggest that we delete kvmclock entirely and > start over. A correctly functioning KVM guest using TSC (i.e. > ignoring kvmclock entirely) > seems to work rather more reliably and > considerably faster than a kvmclock guest. > > --Andy > > -- > Andy Lutomirski > AMA Capital Management, LLC Andy, I am all for solving practical problems rather than pleasing aesthetic pleasure. > Updating system_time from the kernel clock once master clock > has been enabled can result in time backwards event, in case > kernel clock frequency is lower than TSC frequency. > > Disable master clock in case it is necessary to update it > from the resume path. > once master clock > has been enabled can result in time backwards event, in case > kernel clock frequency is lower than TSC frequency. guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads. If the effective frequency of the kernel clock is lower (for example due to NTP correcting the TSC frequency of the system), and you resume and update the system, the following happens: guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads=LARGE VALUE. suspend/resume event. guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0.