Re: [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>
Subject: Re: [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save
Date: Fri, 4 Nov 2016 12:00:38 -0200	[thread overview]
Message-ID: <20161104140035.GA14339@amt.cnet> (raw)
In-Reply-To: <20161104123539.GA3132@amt.cnet>

On Fri, Nov 04, 2016 at 10:35:39AM -0200, Marcelo Tosatti wrote:
> On Fri, Nov 04, 2016 at 01:28:48PM +0100, Juan Quintela wrote:
> > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > This patch, relative to pre-copy migration codepath,
> > > measures the time between vm_stop() and pre_save(), 
> > > which includes copying the remaining RAM to destination,
> > > and advances the clock by that amount.
> > >
> > > In a VM with 5 seconds downtime, this reduces the guest 
> > > clock difference on destination from 5s to 0.2s.
> > >
> > > Please do not apply this yet as some codepaths still need
> > > checking, submitting early for comments.
> > >
> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > You can use an optional section, and then you don't need to increase the
> > version number.
> 
> Optional section is more appropriate, thanks.
> 
> > I believe you that the clock manipulation is right, only talking about
> > the migration bits.
> > 
> > > +static uint64_t clock_delta(struct timespec *before, struct timespec *after)
> > > +{
> > > +    if (before->tv_sec > after->tv_sec ||
> > > +        (before->tv_sec == after->tv_sec &&
> > > +         before->tv_nsec > after->tv_nsec)) {
> > > +        fprintf(stderr, "clock_delta failed: before=(%ld sec, %ld nsec),"
> > > +                        "after=(%ld sec, %ld nsec)\n", before->tv_sec,
> > > +                        before->tv_nsec, after->tv_sec, after->tv_nsec);
> > > +        abort();
> > > +    }
> > > +
> > > +    return (after->tv_sec - before->tv_sec) * 1000000000ULL +
> > > +            after->tv_nsec - before->tv_nsec;
> > > +}
> > 
> > I can't believe that we don't have a helper function already to
> > calculate this....
> 
> Couldnt find any...
> 
> > > +
> > > +static void kvmclock_pre_save(void *opaque)
> > > +{
> > > +    KVMClockState *s = opaque;
> > > +    struct timespec now;
> > > +    uint64_t ns;
> > > +
> > > +    if (s->t_aftervmstop.tv_sec == 0) {
> > > +        return;
> > > +    }
> > 
> > You have your test here.
> > 
> > > +
> > > +    clock_gettime(CLOCK_MONOTONIC, &now);
> > > +
> > > +    ns = clock_delta(&s->t_aftervmstop, &now);
> > > +
> > > +    /*
> > > +     * Linux guests can overflow if time jumps
> > > +     * forward in large increments.
> > > +     * Cap maximum adjustment to 10 minutes.
> > > +     */
> > > +    ns = MIN(ns, 600000000000ULL);
> > > +
> > > +    if (s->clock + ns > s->clock) {
> > > +        s->ns = ns;
> > 
> > Would it be a good idea to print an error message here?  If it has been more
> > than 10mins since we did the vmstop, something got wrong here.
> 
> Not sure... is it not possible for the user to stop migration in some 
> way? 
> 
> What if network is very slow and maxdowntime very high?
> 
> > > +    }
> > > +}
> > > +
> > > +static int kvmclock_post_load(void *opaque, int version_id)
> > > +{
> > > +    KVMClockState *s = opaque;
> > > +
> > > +    /* save the value from incoming migration */
> > > +    s->advance_clock = s->ns;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  static const VMStateDescription kvmclock_vmsd = {
> > >      .name = "kvmclock",
> > > -    .version_id = 1,
> > > +    .version_id = 2,
> > >      .minimum_version_id = 1,
> > > +    .pre_save = kvmclock_pre_save,
> > > +    .post_load = kvmclock_post_load,
> > >      .fields = (VMStateField[]) {
> > >          VMSTATE_UINT64(clock, KVMClockState),
> > > +        VMSTATE_UINT64_V(ns, KVMClockState, 2),
> > >          VMSTATE_END_OF_LIST()
> > >      }
> > >  };
> > 
> > 
> > If you need help with the subsection stuff, just ask.
> > 
> > Later, Juan.
> 
> Ok, i'll try to cook up an optional section and lets see what happens.
> 
> Thanks Juan.

Ok so by "optional section" i meant a section that when sent 
to destination, could be ignored and migration would succeed. 

The alternative (what this patch has now) is to increase migration
version so that:

    1. older machine types remain compatible. 
    2. newer machine types fail to migrate.

Because the data being sent, ns, is not really optional: if kvmclock or
hyper-v time is enabled (which should be 100% of the cases) we always
want to send that data.

That is, there is no difference between:

* Writing a subsection with needed=1 always (except when 
using an older machine types).
* Using old/new machine types with particular versions.

I think i missed the patch to switch current machine
types to kvmclock v1, BTW.

WARNING: multiple messages have this Message-ID (diff)

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>
Subject: Re: [Qemu-devel] [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save
Date: Fri, 4 Nov 2016 12:00:38 -0200	[thread overview]
Message-ID: <20161104140035.GA14339@amt.cnet> (raw)
In-Reply-To: <20161104123539.GA3132@amt.cnet>

On Fri, Nov 04, 2016 at 10:35:39AM -0200, Marcelo Tosatti wrote:
> On Fri, Nov 04, 2016 at 01:28:48PM +0100, Juan Quintela wrote:
> > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > This patch, relative to pre-copy migration codepath,
> > > measures the time between vm_stop() and pre_save(), 
> > > which includes copying the remaining RAM to destination,
> > > and advances the clock by that amount.
> > >
> > > In a VM with 5 seconds downtime, this reduces the guest 
> > > clock difference on destination from 5s to 0.2s.
> > >
> > > Please do not apply this yet as some codepaths still need
> > > checking, submitting early for comments.
> > >
> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > You can use an optional section, and then you don't need to increase the
> > version number.
> 
> Optional section is more appropriate, thanks.
> 
> > I believe you that the clock manipulation is right, only talking about
> > the migration bits.
> > 
> > > +static uint64_t clock_delta(struct timespec *before, struct timespec *after)
> > > +{
> > > +    if (before->tv_sec > after->tv_sec ||
> > > +        (before->tv_sec == after->tv_sec &&
> > > +         before->tv_nsec > after->tv_nsec)) {
> > > +        fprintf(stderr, "clock_delta failed: before=(%ld sec, %ld nsec),"
> > > +                        "after=(%ld sec, %ld nsec)\n", before->tv_sec,
> > > +                        before->tv_nsec, after->tv_sec, after->tv_nsec);
> > > +        abort();
> > > +    }
> > > +
> > > +    return (after->tv_sec - before->tv_sec) * 1000000000ULL +
> > > +            after->tv_nsec - before->tv_nsec;
> > > +}
> > 
> > I can't believe that we don't have a helper function already to
> > calculate this....
> 
> Couldnt find any...
> 
> > > +
> > > +static void kvmclock_pre_save(void *opaque)
> > > +{
> > > +    KVMClockState *s = opaque;
> > > +    struct timespec now;
> > > +    uint64_t ns;
> > > +
> > > +    if (s->t_aftervmstop.tv_sec == 0) {
> > > +        return;
> > > +    }
> > 
> > You have your test here.
> > 
> > > +
> > > +    clock_gettime(CLOCK_MONOTONIC, &now);
> > > +
> > > +    ns = clock_delta(&s->t_aftervmstop, &now);
> > > +
> > > +    /*
> > > +     * Linux guests can overflow if time jumps
> > > +     * forward in large increments.
> > > +     * Cap maximum adjustment to 10 minutes.
> > > +     */
> > > +    ns = MIN(ns, 600000000000ULL);
> > > +
> > > +    if (s->clock + ns > s->clock) {
> > > +        s->ns = ns;
> > 
> > Would it be a good idea to print an error message here?  If it has been more
> > than 10mins since we did the vmstop, something got wrong here.
> 
> Not sure... is it not possible for the user to stop migration in some 
> way? 
> 
> What if network is very slow and maxdowntime very high?
> 
> > > +    }
> > > +}
> > > +
> > > +static int kvmclock_post_load(void *opaque, int version_id)
> > > +{
> > > +    KVMClockState *s = opaque;
> > > +
> > > +    /* save the value from incoming migration */
> > > +    s->advance_clock = s->ns;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  static const VMStateDescription kvmclock_vmsd = {
> > >      .name = "kvmclock",
> > > -    .version_id = 1,
> > > +    .version_id = 2,
> > >      .minimum_version_id = 1,
> > > +    .pre_save = kvmclock_pre_save,
> > > +    .post_load = kvmclock_post_load,
> > >      .fields = (VMStateField[]) {
> > >          VMSTATE_UINT64(clock, KVMClockState),
> > > +        VMSTATE_UINT64_V(ns, KVMClockState, 2),
> > >          VMSTATE_END_OF_LIST()
> > >      }
> > >  };
> > 
> > 
> > If you need help with the subsection stuff, just ask.
> > 
> > Later, Juan.
> 
> Ok, i'll try to cook up an optional section and lets see what happens.
> 
> Thanks Juan.

Ok so by "optional section" i meant a section that when sent 
to destination, could be ignored and migration would succeed. 

The alternative (what this patch has now) is to increase migration
version so that:

    1. older machine types remain compatible. 
    2. newer machine types fail to migrate.

Because the data being sent, ns, is not really optional: if kvmclock or
hyper-v time is enabled (which should be 100% of the cases) we always
want to send that data.

That is, there is no difference between:

* Writing a subsection with needed=1 always (except when 
using an older machine types).
* Using old/new machine types with particular versions.

I think i missed the patch to switch current machine
types to kvmclock v1, BTW.

next prev parent reply	other threads:[~2016-11-04 14:01 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-04  9:43 [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save Marcelo Tosatti
2016-11-04  9:43 ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 12:28 ` Juan Quintela
2016-11-04 12:28   ` [Qemu-devel] " Juan Quintela
2016-11-04 12:35   ` Marcelo Tosatti
2016-11-04 12:35     ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 14:00     ` Marcelo Tosatti [this message]
2016-11-04 14:00       ` Marcelo Tosatti
2016-11-04 15:25 ` Radim Krčmář
2016-11-04 15:25   ` [Qemu-devel] " Radim Krčmář
2016-11-04 15:33   ` Paolo Bonzini
2016-11-04 15:33     ` [Qemu-devel] " Paolo Bonzini
2016-11-04 15:48     ` Radim Krčmář
2016-11-04 15:48       ` [Qemu-devel] " Radim Krčmář
2016-11-04 15:57       ` Paolo Bonzini
2016-11-04 15:57         ` [Qemu-devel] " Paolo Bonzini
2016-11-04 17:16         ` Radim Krčmář
2016-11-04 17:16           ` [Qemu-devel] " Radim Krčmář
2016-11-04 21:29           ` Paolo Bonzini
2016-11-04 21:29             ` [Qemu-devel] " Paolo Bonzini
2016-11-04 21:47             ` Marcelo Tosatti
2016-11-04 21:47               ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 22:35               ` Paolo Bonzini
2016-11-04 22:35                 ` [Qemu-devel] " Paolo Bonzini
2016-11-07 14:31           ` Roman Kagan
2016-11-07 14:31             ` [Qemu-devel] " Roman Kagan
2016-11-07 19:31             ` Marcelo Tosatti
2016-11-07 19:31               ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 16:24       ` Marcelo Tosatti
2016-11-04 16:24         ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 17:34         ` Radim Krčmář
2016-11-04 17:34           ` [Qemu-devel] " Radim Krčmář
2016-11-04 18:29           ` Marcelo Tosatti
2016-11-04 18:29             ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 20:07             ` Radim Krčmář
2016-11-04 20:07               ` [Qemu-devel] " Radim Krčmář
2016-11-04 16:04   ` Marcelo Tosatti
2016-11-04 16:04     ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 17:07   ` Marcelo Tosatti
2016-11-04 17:07     ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 17:39     ` Radim Krčmář
2016-11-04 17:39       ` [Qemu-devel] " Radim Krčmář
2016-11-04 18:31       ` Marcelo Tosatti
2016-11-04 18:31         ` [Qemu-devel] " Marcelo Tosatti
2016-11-07 13:08       ` Dr. David Alan Gilbert
2016-11-07 13:08         ` [Qemu-devel] " Dr. David Alan Gilbert
2016-11-04 16:59 ` [QEMU PATCH v2] " Marcelo Tosatti
2016-11-04 16:59   ` [Qemu-devel] " Marcelo Tosatti
2016-11-04 18:57   ` Juan Quintela
2016-11-04 18:57     ` [Qemu-devel] " Juan Quintela
2016-11-07 15:46   ` Dr. David Alan Gilbert
2016-11-07 15:46     ` [Qemu-devel] " Dr. David Alan Gilbert
2016-11-07 19:41     ` Marcelo Tosatti
2016-11-07 19:41       ` [Qemu-devel] " Marcelo Tosatti
2016-11-07 20:03       ` Dr. David Alan Gilbert
2016-11-07 20:03         ` [Qemu-devel] " Dr. David Alan Gilbert
2016-11-08  0:06         ` Marcelo Tosatti
2016-11-08  0:06           ` [Qemu-devel] " Marcelo Tosatti
2016-11-08 10:22           ` Dr. David Alan Gilbert
2016-11-08 10:22             ` [Qemu-devel] " Dr. David Alan Gilbert
2016-11-08 13:32             ` Marcelo Tosatti
2016-11-08 13:32               ` [Qemu-devel] " Marcelo Tosatti
2016-11-09 19:32               ` Marcelo Tosatti
2016-11-09 19:32                 ` [Qemu-devel] " Marcelo Tosatti
2016-11-09 16:23             ` Paolo Bonzini
2016-11-09 16:23               ` [Qemu-devel] " Paolo Bonzini
2016-11-09 16:28               ` Dr. David Alan Gilbert
2016-11-09 16:28                 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-11-09 16:33                 ` Paolo Bonzini
2016-11-09 16:33                   ` [Qemu-devel] " Paolo Bonzini
2016-11-10 11:48               ` Marcelo Tosatti
2016-11-10 11:48                 ` [Qemu-devel] " Marcelo Tosatti
2016-11-10 17:57                 ` Paolo Bonzini
2016-11-10 17:57                   ` [Qemu-devel] " Paolo Bonzini
2016-11-11 14:23                   ` Marcelo Tosatti
2016-11-11 14:23                     ` [Qemu-devel] " Marcelo Tosatti
2017-02-07 10:02       ` Wanpeng Li
2017-02-07 10:02         ` [Qemu-devel] " Wanpeng Li
2017-02-07 12:18         ` Marcelo Tosatti
2017-02-07 12:18           ` [Qemu-devel] " Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161104140035.GA14339@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.