From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42076) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c4VoN-0004Sa-E1 for qemu-devel@nongnu.org; Wed, 09 Nov 2016 11:33:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c4VoJ-0001fv-Ff for qemu-devel@nongnu.org; Wed, 09 Nov 2016 11:33:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57648) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c4VoJ-0001ff-7i for qemu-devel@nongnu.org; Wed, 09 Nov 2016 11:33:15 -0500 References: <20161104094322.GA16930@amt.cnet> <20161104165933.GA3027@amt.cnet> <20161107154610.GG2054@work-vm> <20161107194058.GB28327@amt.cnet> <20161107200349.GC1155@work-vm> <20161108000609.GA3689@amt.cnet> <20161108102255.GC2042@work-vm> <4c34da7d-7027-5595-012a-61ab6937f8e3@redhat.com> <20161109162847.GF7738@work-vm> From: Paolo Bonzini Message-ID: <98ffcbc5-fdb9-0937-9ea5-0f5f9ef4dbf4@redhat.com> Date: Wed, 9 Nov 2016 17:33:09 +0100 MIME-Version: 1.0 In-Reply-To: <20161109162847.GF7738@work-vm> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Marcelo Tosatti , kvm@vger.kernel.org, qemu-devel , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Juan Quintela , Eduardo Habkost On 09/11/2016 17:28, Dr. David Alan Gilbert wrote: > * Paolo Bonzini (pbonzini@redhat.com) wrote: >> >> >> On 08/11/2016 11:22, Dr. David Alan Gilbert wrote: >>> * Marcelo Tosatti (mtosatti@redhat.com) wrote: >>>> On Mon, Nov 07, 2016 at 08:03:50PM +0000, Dr. David Alan Gilbert wrote: >>>>> * Marcelo Tosatti (mtosatti@redhat.com) wrote: >>>>>> On Mon, Nov 07, 2016 at 03:46:11PM +0000, Dr. David Alan Gilbert wrote: >>>>>>> * Marcelo Tosatti (mtosatti@redhat.com) wrote: >>>>>>>> This patch, relative to pre-copy migration codepath, >>>>>>>> measures the time between vm_stop() and pre_save(), >>>>>>>> which includes copying the remaining RAM to destination, >>>>>>>> and advances the clock by that amount. >>>>>>>> >>>>>>>> In a VM with 5 seconds downtime, this reduces the guest >>>>>>>> clock difference on destination from 5s to 0.2s. >>>>>>>> >>>>>>>> Tested with Linux and Windows 2012 R2 guests with -cpu XXX,+hv-time. >>>>>>> >>>>>>> One thing that bothers me is that it's only this clock that's >>>>>>> getting corrected; doesn't it cause things to get upset when >>>>>>> one clock moves and the others dont? >>>>>> >>>>>> If you are correlating the clocks, then yes. >>>>>> >>>>>> Older Linux guests get upset (marking the TSC clocksource unstable >>>>>> because the watchdog checks TSC vs kvmclock), but there is a workaround for it >>>>>> in newer guests >>>>>> (kvmclock interface to notify watchdog to not complain). >>>>>> >>>>>> Note marking TSC clocksource unstable on older guests is harmless >>>>>> because kvmclock is the standard clocksource. >>>>>> >>>>>> For Windows guests, i don't know that Windows correlates between different >>>>>> clocks. >>>>>> >>>>>> That is, there is relative control as to which software reads kvmclock >>>>>> or Windows TIMER MSR, so i don't see the need to advance every clock >>>>>> exposed. >>>>>> >>>>>>> Shouldn't the pause delay be recorded somewhere architecturally >>>>>>> independent and then be a thing that kvm-clock happens to use and >>>>>>> other clocks might as well? >>>>>> >>>>>> In theory, yes. In practice, i don't see the need for this... >>>>> >>>>> It seems unlikely to me that x86 is the only one that will want >>>>> to do something similar. >>>> >>>> Can't they copy what kvmclock is doing today? >>> >>> We shouldn't have copies of code all over should we? >> >> Let's cross the bridge when we get there. > > That will mean it has the migration data in the wrong place > and any other clocks that need to be incremented by the same offset > will need a hook or be inconsistent with this calculation. No, there is no additional migration data that is needed. This is just a bug in how the pausing of CLOCK_MONOTONIC was implemented for the kvmclock clocksource. Right now, x86 is the only case where we have the problem, and x86 is using a single "backend" for both kvmclock and the Hyper-V TSC reference page. For everyone else, there is no clocksource paravirtualization going on (luckily, considering what a mess is kvmclock). They can just use QEMU_CLOCK_VIRTUAL if they want something that pauses during the VM. Now, QEMU_CLOCK_VIRTUAL actually has the same bug that Marcelo is fixing, so we may indeed want a common solution if possible. But again, let's see first what the code looks like for _one_ clocksource, before writing a generalized (and thus more complex) solution. Paolo