Re: [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>
Subject: Re: [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save
Date: Wed, 9 Nov 2016 17:32:50 -0200	[thread overview]
Message-ID: <20161109193248.GA20437@amt.cnet> (raw)
In-Reply-To: <20161108133230.GA3660@amt.cnet>

On Tue, Nov 08, 2016 at 11:32:30AM -0200, Marcelo Tosatti wrote:
> On Tue, Nov 08, 2016 at 10:22:56AM +0000, Dr. David Alan Gilbert wrote:
> > * Marcelo Tosatti (mtosatti@redhat.com) wrote:
> > > On Mon, Nov 07, 2016 at 08:03:50PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Marcelo Tosatti (mtosatti@redhat.com) wrote:
> > > > > On Mon, Nov 07, 2016 at 03:46:11PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > * Marcelo Tosatti (mtosatti@redhat.com) wrote:
> > > > > > > This patch, relative to pre-copy migration codepath,
> > > > > > > measures the time between vm_stop() and pre_save(),
> > > > > > > which includes copying the remaining RAM to destination,
> > > > > > > and advances the clock by that amount.
> > > > > > > 
> > > > > > > In a VM with 5 seconds downtime, this reduces the guest
> > > > > > > clock difference on destination from 5s to 0.2s.
> > > > > > > 
> > > > > > > Tested with Linux and Windows 2012 R2 guests with -cpu XXX,+hv-time.
> > > > > > 
> > > > > > One thing that bothers me is that it's only this clock that's
> > > > > > getting corrected; doesn't it cause things to get upset when
> > > > > > one clock moves and the others dont?
> > > > > 
> > > > > If you are correlating the clocks, then yes.
> > > > > 
> > > > > Older Linux guests get upset (marking the TSC clocksource unstable
> > > > > because the watchdog checks TSC vs kvmclock), but there is a workaround for it 
> > > > > in newer guests
> > > > > (kvmclock interface to notify watchdog to not complain).
> > > > > 
> > > > > Note marking TSC clocksource unstable on older guests is harmless
> > > > > because kvmclock is the standard clocksource.
> > > > > 
> > > > > For Windows guests, i don't know that Windows correlates between different
> > > > > clocks.
> > > > > 
> > > > > That is, there is relative control as to which software reads kvmclock 
> > > > > or Windows TIMER MSR, so i don't see the need to advance every clock 
> > > > > exposed.
> > > > > 
> > > > > > Shouldn't the pause delay be recorded somewhere architecturally
> > > > > > independent and then be a thing that kvm-clock happens to use and
> > > > > > other clocks might as well?
> > > > > 
> > > > > In theory, yes. In practice, i don't see the need for this... 
> > > > 
> > > > It seems unlikely to me that x86 is the only one that will want
> > > > to do something similar.
> > > 
> > > Can't they copy what kvmclock is doing today? 
> > 
> > We shouldn't have copies of code all over should we?
> > 
> > Dave
> 
> Fine i'll add a notifier.
> 

KVM Linux guests all have to: 

Make CLOCK_MONOTONIC not count during vmpause
(because it mimicks behaviour under suspend/resume 
of baremetal). And because time overflows.

Assuming the HW clock counts while the VM is paused,
it means they have to hook into vmstate change
notifiers to:

        event           action
        vmstop          KVM_GET_CLOCK
        vmstart         KVM_SET_CLOCK (that earlier value)

For x86 we want to start counting the time there 
because while the VM is running, the host TSC 
is keeping track of time. 
So you measure the amount of time between:

        -> Guest VM clock stops ticking.
        -> clock_gettime(CLOCK_MONOTONIC, &pointA);
        ...
        -> presave: clock_gettime(CLOCK_MONOTONIC, &pointB);

I measured the additional time between presave and EOF: its about
5ms.

On destination, there is an additional 30ms between EOF receival 
and restoration of TSC.

Now, the clock difference is 130ms, and i am not sure where it 
comes from, trying to figure out. But the patch should give 35ms 
difference which is pretty good. Chasing that down...

So in summary: i don't see the point of making the code
"generic" without knowing what the other arches 
want to do.

next prev parent reply	other threads:[~2016-11-09 19:33 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-04  9:43 [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save Marcelo Tosatti
2016-11-04 12:28 ` Juan Quintela
2016-11-04 12:35   ` Marcelo Tosatti
2016-11-04 14:00     ` Marcelo Tosatti
2016-11-04 15:25 ` Radim Krčmář
2016-11-04 15:33   ` Paolo Bonzini
2016-11-04 15:48     ` Radim Krčmář
2016-11-04 15:57       ` Paolo Bonzini
2016-11-04 17:16         ` Radim Krčmář
2016-11-04 21:29           ` Paolo Bonzini
2016-11-04 21:47             ` Marcelo Tosatti
2016-11-04 22:35               ` Paolo Bonzini
2016-11-07 14:31           ` Roman Kagan
2016-11-07 19:31             ` Marcelo Tosatti
2016-11-04 16:24       ` Marcelo Tosatti
2016-11-04 17:34         ` Radim Krčmář
2016-11-04 18:29           ` Marcelo Tosatti
2016-11-04 20:07             ` Radim Krčmář
2016-11-04 16:04   ` Marcelo Tosatti
2016-11-04 17:07   ` Marcelo Tosatti
2016-11-04 17:39     ` Radim Krčmář
2016-11-04 18:31       ` Marcelo Tosatti
2016-11-07 13:08       ` Dr. David Alan Gilbert
2016-11-04 16:59 ` [QEMU PATCH v2] " Marcelo Tosatti
2016-11-04 18:57   ` Juan Quintela
2016-11-07 15:46   ` Dr. David Alan Gilbert
2016-11-07 19:41     ` Marcelo Tosatti
2016-11-07 20:03       ` Dr. David Alan Gilbert
2016-11-08  0:06         ` Marcelo Tosatti
2016-11-08 10:22           ` Dr. David Alan Gilbert
2016-11-08 13:32             ` Marcelo Tosatti
2016-11-09 19:32               ` Marcelo Tosatti [this message]
2016-11-09 16:23             ` Paolo Bonzini
2016-11-09 16:28               ` Dr. David Alan Gilbert
2016-11-09 16:33                 ` Paolo Bonzini
2016-11-10 11:48               ` Marcelo Tosatti
2016-11-10 17:57                 ` Paolo Bonzini
2016-11-11 14:23                   ` Marcelo Tosatti
2017-02-07 10:02       ` Wanpeng Li
2017-02-07 12:18         ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161109193248.GA20437@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).