All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Radim Krcmar <rkrcmar@redhat.com>
Subject: Re: [qemu patch V2 2/2] kvmclock: reduce kvmclock difference on migration
Date: Thu, 17 Nov 2016 17:15:13 -0200	[thread overview]
Message-ID: <20161117191510.GA5460@amt.cnet> (raw)
In-Reply-To: <20161117181609.GZ5057@thinpad.lan.raisama.net>

On Thu, Nov 17, 2016 at 04:16:09PM -0200, Eduardo Habkost wrote:
> On Thu, Nov 17, 2016 at 03:15:35PM -0200, Marcelo Tosatti wrote:
> > On Thu, Nov 17, 2016 at 12:11:26PM -0200, Eduardo Habkost wrote:
> > > On Thu, Nov 17, 2016 at 11:24:13AM -0200, Marcelo Tosatti wrote:
> > > > Check for KVM_CAP_ADJUST_CLOCK capability KVM_CLOCK_TSC_STABLE, which
> > > > indicates that KVM_GET_CLOCK returns a value as seen by the guest at
> > > > that moment.
> > > > 
> > > > For new machine types, use this value rather than reading 
> > > > from guest memory.
> > > > 
> > > > This reduces kvmclock difference on migration from 5s to 0.1s
> > > > (when max_downtime == 5s).
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > > 
> > > > ---
> > > >  hw/i386/kvm/clock.c    |  108 ++++++++++++++++++++++++++++++++++++++++++-------
> > > >  include/hw/i386/pc.h   |    5 ++
> > > >  target-i386/kvm.c      |    7 +++
> > > >  target-i386/kvm_i386.h |    1 
> > > >  4 files changed, 107 insertions(+), 14 deletions(-)
> > > > 
> > > > v2: 
> > > > - improve variable names (Juan)
> > > > - consolidate code on kvm_get_clock function (Paolo)
> > > > - return mach_use_reliable_get_clock from needed function (Paolo)
> > > > 
> > > > Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
> > > > ===================================================================
> > > > --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c	2016-11-14 10:40:39.748116312 -0200
> > > > +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c	2016-11-14 13:38:29.299955042 -0200
> > > > @@ -36,6 +36,12 @@
> > > >  
> > > >      uint64_t clock;
> > > >      bool clock_valid;
> > > > +
> > > > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > > > +    bool mach_use_reliable_get_clock;
> > > > +
> > > > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > > > +    bool src_use_reliable_get_clock;
> > > >  } KVMClockState;
> > > >  
> > > >  struct pvclock_vcpu_time_info {
> > > > @@ -81,6 +87,19 @@
> > > >      return nsec + time.system_time;
> > > >  }
> > > >  
> > > > +static uint64_t kvm_get_clock(void)
> > > > +{
> > > > +    struct kvm_clock_data data;
> > > > +    int ret;
> > > > +
> > > > +    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > > > +    if (ret < 0) {
> > > > +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > > > +                abort();
> > > > +    }
> > > > +    return data.clock;
> > > > +}
> > > > +
> > > >  static void kvmclock_vm_state_change(void *opaque, int running,
> > > >                                       RunState state)
> > > >  {
> > > > @@ -91,15 +110,37 @@
> > > >  
> > > >      if (running) {
> > > >          struct kvm_clock_data data = {};
> > > > -        uint64_t time_at_migration = kvmclock_current_nsec(s);
> > > > +        uint64_t pvclock_via_mem = 0;
> > > >  
> > > > -        s->clock_valid = false;
> > > > +        /* local (running VM) restore */
> > > > +        if (s->clock_valid) {
> > > > +            /*
> > > > +             * if host does not support reliable KVM_GET_CLOCK,
> > > > +             * read kvmclock value from memory
> > > > +             */
> > > > +            if (!kvm_has_adjust_clock_stable()) {
> > > > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > > > +            }
> > > > +        /* migration/savevm/init restore */
> > > > +        } else {
> > > > +            /*
> > > > +             * use s->clock in case machine uses reliable
> > > > +             * get clock and source host supported
> > > > +             * reliable get clock
> > > > +             */
> > > > +            if (!(s->mach_use_reliable_get_clock &&
> > > > +                  s->src_use_reliable_get_clock)) {
> > > > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > > > +            }
> > > 
> > > The s->mach_use_reliable_get_clock check seems redundant.
> > > src_use_reliable_get_clock is set only if
> > > mach_use_reliable_get_clock is true.
> > 
> > Done.
> > 
> > > > +        }
> > > >  
> > > > -        /* We can't rely on the migrated clock value, just discard it */
> > > > -        if (time_at_migration) {
> > > > -            s->clock = time_at_migration;
> > > > +        /* We can't rely on the saved clock value, just discard it */
> > > > +        if (pvclock_via_mem) {
> > > > +            s->clock = pvclock_via_mem;
> > > >          }
> > > >  
> > > > +        s->clock_valid = false;
> > > > +
> > > >          data.clock = s->clock;
> > > >          ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
> > > >          if (ret < 0) {
> > > > @@ -120,8 +161,6 @@
> > > >              }
> > > >          }
> > > >      } else {
> > > > -        struct kvm_clock_data data;
> > > > -        int ret;
> > > >  
> > > >          if (s->clock_valid) {
> > > >              return;
> > > > @@ -129,13 +168,7 @@
> > > >  
> > > >          kvm_synchronize_all_tsc();
> > > >  
> > > > -        ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > > > -        if (ret < 0) {
> > > > -            fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > > > -            abort();
> > > > -        }
> > > > -        s->clock = data.clock;
> > > > -
> > > > +        s->clock = kvm_get_clock();
> > > >          /*
> > > >           * If the VM is stopped, declare the clock state valid to
> > > >           * avoid re-reading it on next vmsave (which would return
> > > > @@ -152,22 +185,69 @@
> > > >      qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> > > >  }
> > > >  
> > > > +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> > > > +{
> > > > +    KVMClockState *s = opaque;
> > > > +
> > > > +    /*
> > > > +     * On machine types that support reliable KVM_GET_CLOCK,
> > > > +     * if host kernel does provide reliable KVM_GET_CLOCK,
> > > > +     * set src_use_reliable_get_clock=true so that destination
> > > > +     * avoids reading kvmclock from memory.
> > > > +     */
> > > > +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> > > > +        s->src_use_reliable_get_clock = true;
> > > > +    }
> > > 
> > > It feels fragile to change device state inside the .needed
> > > function. Better to initialize src_use_reliable_get_clock on
> > > kvmclock_realize()?
> > > 
> > > What exactly ensures src_use_reliable_get_clock is correctly
> > > initialized on the migration destination as well?
> > 
> > Its initialized to false (because its part of device state).
> 
> Do you mean it is always going to be false on the destination
> host? What if .needed is called by other code? .needed shouldn't
> have any side-effects.

I don't think needed is called by any other code.

> Also, the difference between the variables on the destination and
> the source is very confusing, see other comments below:

Yes it is: its because you don't know when is what. But if you consider
separate events:

    initialization, incoming migration, vmstop, vmcontinue.

And the values of the variables for each case, then it becomes 
clear.

If anyone has a suggestion to improve clarify, its welcome.

> > > 
> > > > +
> > > > +    return s->mach_use_reliable_get_clock;
> > > 
> > > If if kvm_has_adjust_clock_stable() is false, isn't it simpler to
> > > simply skip the section?
> > > 
> > > It looks like mach_use_reliable_get_clock and
> > > src_use_reliable_get_clock could become a single field:
> > > 
> > > * use_reliable_get_clock ("x-use-reliable-get-clock" property)
> > >   set to true by default (set on DEFINE_PROP_BOOL parameter)
> > > * "x-use-reliable-get-clock" set to false by default on older
> > >   machine-types
> > > * use_reliable_get_clock forced to false on kvmclock_realize() if
> > >   !kvm_has_adjust_clock_stable()
> > > * kvmclock_reliable_get_clock.needed return s->use_reliable_get_clock
> > 
> > No because use_reliable_get_clock can be false because the destination host
> > does not support kvm_has_adjust_clock_stable(), but the source host
> > supports it (so we want to use the value in the first incoming
> > migration).
> > 
> 
> Are you talking about the value of the fields on the source host,
> or on the destination host? In this case, you mean that
> use_reliable_get_clock should be false on the destination, right?

Yes, destination.

> > So the variable indicates whether the source host supported
> > migration, not whether the destination supports it.
> 
> Good, that's the whole point of the field, isn't it? I will try
> to summarize all cases below. Please correct me if something is
> incorrect.

Yes. So in this case, the cleanup/simplification to use a single
variable is not possible (or maybe it is, but it just complicates
things).

> --------+------------+---------------+-----------+------------+
>         | kvm_has_adj_clock_stable() | use_reliable_get_clock |
> machine | src        | dst           | src       | dst        |
> --------+------------+---------------+-----------+------------+
> pc-2.8  | false      | false         | false     | false      |
> pc-2.8  | false      | true          | false     | false (*)  |
> pc-2.8  | true       | false         | true      | true (**)  |
> pc-2.8  | true       | true          | true      | true       |
> --------+------------+---------------+-----------+------------+
> pc-2.7  | (any)      | (any)         | false     | false      |
> --------+------------+---------------+-----------+------------+
> 
> (*) I guess this is where things would break if we skip the
>     subsection (and use a single field). If the subsection is
>     skipped, use_reliable_get_clock would be set to true on the
>     destination. (See my comment below on the new patch you sent)

Well, perhaps you can massage things so that it works 
without sending subsection (that is):

    * Send subsection only in case value is true.
    * Do not send subsection in case value is false (then treat
      non presence of subsection and mach_reliable_clock=true 
      to mean source did not support kvm_has_reliable_clock).

Once you agree on "send a subsection for both cases", then you 
handle it.

The code is there, its simple, readable, so...

(what you can't do is to unify src_use_get_reliable_clock (remote 
information), and local information). Because you need 
to know both things to be able to handle the 4 cases properly on 
incoming migration.


> (**) Destination can still do the right thing here, right?
> 
> 
> 
> > 
> [...]
> > 
> > How about the following:
> > 
> > Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c	2016-11-17 15:07:11.220632761 -0200
> > +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c	2016-11-17 15:11:51.372048640 -0200
> > @@ -36,6 +36,12 @@
> >  
> >      uint64_t clock;
> >      bool clock_valid;
> > +
> > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > +    bool mach_use_reliable_get_clock;
> > +
> > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > +    bool src_use_reliable_get_clock;
> >  } KVMClockState;
> >  
> >  struct pvclock_vcpu_time_info {
> > @@ -81,6 +87,19 @@
> >      return nsec + time.system_time;
> >  }
> >  
> > +static uint64_t kvm_get_clock(void)
> > +{
> > +    struct kvm_clock_data data;
> > +    int ret;
> > +
> > +    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > +    if (ret < 0) {
> > +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > +                abort();
> > +    }
> > +    return data.clock;
> > +}
> > +
> >  static void kvmclock_vm_state_change(void *opaque, int running,
> >                                       RunState state)
> >  {
> > @@ -91,15 +110,36 @@
> >  
> >      if (running) {
> >          struct kvm_clock_data data = {};
> > -        uint64_t time_at_migration = kvmclock_current_nsec(s);
> > +        uint64_t pvclock_via_mem = 0;
> >  
> > -        s->clock_valid = false;
> > +        /* local (running VM) restore */
> > +        if (s->clock_valid) {
> > +            /*
> > +             * if host does not support reliable KVM_GET_CLOCK,
> > +             * read kvmclock value from memory
> > +             */
> > +            if (!kvm_has_adjust_clock_stable()) {
> > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > +            }
> > +        /* migration/savevm/init restore */
> > +        } else {
> > +            /*
> > +             * use s->clock in case machine uses reliable
> > +             * get clock and source host supported
> > +             * reliable get clock
> > +             */
> > +            if (!s->src_use_reliable_get_clock) {
> > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > +            }
> > +        }
> >  
> > -        /* We can't rely on the migrated clock value, just discard it */
> > -        if (time_at_migration) {
> > -            s->clock = time_at_migration;
> > +        /* We can't rely on the saved clock value, just discard it */
> > +        if (pvclock_via_mem) {
> > +            s->clock = pvclock_via_mem;
> >          }
> >  
> > +        s->clock_valid = false;
> > +
> >          data.clock = s->clock;
> >          ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
> >          if (ret < 0) {
> > @@ -120,8 +160,6 @@
> >              }
> >          }
> >      } else {
> > -        struct kvm_clock_data data;
> > -        int ret;
> >  
> >          if (s->clock_valid) {
> >              return;
> > @@ -129,13 +167,7 @@
> >  
> >          kvm_synchronize_all_tsc();
> >  
> > -        ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > -        if (ret < 0) {
> > -            fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > -            abort();
> > -        }
> > -        s->clock = data.clock;
> > -
> > +        s->clock = kvm_get_clock();
> >          /*
> >           * If the VM is stopped, declare the clock state valid to
> >           * avoid re-reading it on next vmsave (which would return
> > @@ -149,25 +181,72 @@
> >  {
> >      KVMClockState *s = KVM_CLOCK(dev);
> >  
> > +    /*
> > +     * On machine types that support reliable KVM_GET_CLOCK,
> > +     * if host kernel does provide reliable KVM_GET_CLOCK,
> > +     * set src_use_reliable_get_clock=true so that destination
> > +     * avoids reading kvmclock from memory.
> > +     */
> > +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> > +        s->src_use_reliable_get_clock = true;
> > +    }
> > +
> >      qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> >  }
> >  
> > +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> > +{
> > +    KVMClockState *s = opaque;
> > +
> > +    return s->mach_use_reliable_get_clock;
> > +}
> 
> I didn't review kvmclock_vm_state_change() and
> kvm_has_adjust_clock_stable(), but the rules to migrate
> src_use_reliable_get_clock look good now, and seem to match the
> table above.

Ok... Paolo, Radim?

Thanks Eduardo.

> 
> > +
> > +static const VMStateDescription kvmclock_reliable_get_clock = {
> > +    .name = "kvmclock/src_use_reliable_get_clock",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = kvmclock_src_use_reliable_get_clock,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState),
> > +        VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> > +static void kvmclock_pre_save(void *opaque)
> > +{
> > +    KVMClockState *s = opaque;
> > +
> > +    s->clock = kvm_get_clock();
> > +}
> > +
> >  static const VMStateDescription kvmclock_vmsd = {
> >      .name = "kvmclock",
> >      .version_id = 1,
> >      .minimum_version_id = 1,
> > +    .pre_save = kvmclock_pre_save,
> >      .fields = (VMStateField[]) {
> >          VMSTATE_UINT64(clock, KVMClockState),
> >          VMSTATE_END_OF_LIST()
> > +    },
> > +    .subsections = (const VMStateDescription * []) {
> > +        &kvmclock_reliable_get_clock,
> > +        NULL
> >      }
> >  };
> >  
> > +static Property kvmclock_properties[] = {
> > +    DEFINE_PROP_BOOL("x-mach-use-reliable-get-clock", KVMClockState,
> > +                      mach_use_reliable_get_clock, true),
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> >  static void kvmclock_class_init(ObjectClass *klass, void *data)
> >  {
> >      DeviceClass *dc = DEVICE_CLASS(klass);
> >  
> >      dc->realize = kvmclock_realize;
> >      dc->vmsd = &kvmclock_vmsd;
> > +    dc->props = kvmclock_properties;
> >  }
> >  
> >  static const TypeInfo kvmclock_info = {
> > Index: qemu-mig-advance-clock/include/hw/i386/pc.h
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/include/hw/i386/pc.h	2016-11-17 15:07:11.220632761 -0200
> > +++ qemu-mig-advance-clock/include/hw/i386/pc.h	2016-11-17 15:08:58.096791416 -0200
> > @@ -370,6 +370,11 @@
> >  #define PC_COMPAT_2_7 \
> >      HW_COMPAT_2_7 \
> >      {\
> > +        .driver   = "kvmclock",\
> > +        .property = "x-mach-use-reliable-get-clock",\
> > +        .value    = "off",\
> > +    },\
> > +    {\
> >          .driver   = TYPE_X86_CPU,\
> >          .property = "l3-cache",\
> >          .value    = "off",\
> > Index: qemu-mig-advance-clock/target-i386/kvm.c
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/target-i386/kvm.c	2016-11-17 15:07:11.221632762 -0200
> > +++ qemu-mig-advance-clock/target-i386/kvm.c	2016-11-17 15:07:15.867639659 -0200
> > @@ -117,6 +117,13 @@
> >      return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
> >  }
> >  
> > +bool kvm_has_adjust_clock_stable(void)
> > +{
> > +    int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
> > +
> > +    return (ret == KVM_CLOCK_TSC_STABLE);
> > +}
> > +
> >  bool kvm_allows_irq0_override(void)
> >  {
> >      return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
> > Index: qemu-mig-advance-clock/target-i386/kvm_i386.h
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h	2016-11-17 15:07:11.222632764 -0200
> > +++ qemu-mig-advance-clock/target-i386/kvm_i386.h	2016-11-17 15:07:15.867639659 -0200
> > @@ -17,6 +17,7 @@
> >  
> >  bool kvm_allows_irq0_override(void);
> >  bool kvm_has_smm(void);
> > +bool kvm_has_adjust_clock_stable(void);
> >  void kvm_synchronize_all_tsc(void);
> >  void kvm_arch_reset_vcpu(X86CPU *cs);
> >  void kvm_arch_do_init_vcpu(X86CPU *cs);
> 
> -- 
> Eduardo

WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Radim Krcmar <rkrcmar@redhat.com>
Subject: Re: [Qemu-devel] [qemu patch V2 2/2] kvmclock: reduce kvmclock difference on migration
Date: Thu, 17 Nov 2016 17:15:13 -0200	[thread overview]
Message-ID: <20161117191510.GA5460@amt.cnet> (raw)
In-Reply-To: <20161117181609.GZ5057@thinpad.lan.raisama.net>

On Thu, Nov 17, 2016 at 04:16:09PM -0200, Eduardo Habkost wrote:
> On Thu, Nov 17, 2016 at 03:15:35PM -0200, Marcelo Tosatti wrote:
> > On Thu, Nov 17, 2016 at 12:11:26PM -0200, Eduardo Habkost wrote:
> > > On Thu, Nov 17, 2016 at 11:24:13AM -0200, Marcelo Tosatti wrote:
> > > > Check for KVM_CAP_ADJUST_CLOCK capability KVM_CLOCK_TSC_STABLE, which
> > > > indicates that KVM_GET_CLOCK returns a value as seen by the guest at
> > > > that moment.
> > > > 
> > > > For new machine types, use this value rather than reading 
> > > > from guest memory.
> > > > 
> > > > This reduces kvmclock difference on migration from 5s to 0.1s
> > > > (when max_downtime == 5s).
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > > 
> > > > ---
> > > >  hw/i386/kvm/clock.c    |  108 ++++++++++++++++++++++++++++++++++++++++++-------
> > > >  include/hw/i386/pc.h   |    5 ++
> > > >  target-i386/kvm.c      |    7 +++
> > > >  target-i386/kvm_i386.h |    1 
> > > >  4 files changed, 107 insertions(+), 14 deletions(-)
> > > > 
> > > > v2: 
> > > > - improve variable names (Juan)
> > > > - consolidate code on kvm_get_clock function (Paolo)
> > > > - return mach_use_reliable_get_clock from needed function (Paolo)
> > > > 
> > > > Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
> > > > ===================================================================
> > > > --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c	2016-11-14 10:40:39.748116312 -0200
> > > > +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c	2016-11-14 13:38:29.299955042 -0200
> > > > @@ -36,6 +36,12 @@
> > > >  
> > > >      uint64_t clock;
> > > >      bool clock_valid;
> > > > +
> > > > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > > > +    bool mach_use_reliable_get_clock;
> > > > +
> > > > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > > > +    bool src_use_reliable_get_clock;
> > > >  } KVMClockState;
> > > >  
> > > >  struct pvclock_vcpu_time_info {
> > > > @@ -81,6 +87,19 @@
> > > >      return nsec + time.system_time;
> > > >  }
> > > >  
> > > > +static uint64_t kvm_get_clock(void)
> > > > +{
> > > > +    struct kvm_clock_data data;
> > > > +    int ret;
> > > > +
> > > > +    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > > > +    if (ret < 0) {
> > > > +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > > > +                abort();
> > > > +    }
> > > > +    return data.clock;
> > > > +}
> > > > +
> > > >  static void kvmclock_vm_state_change(void *opaque, int running,
> > > >                                       RunState state)
> > > >  {
> > > > @@ -91,15 +110,37 @@
> > > >  
> > > >      if (running) {
> > > >          struct kvm_clock_data data = {};
> > > > -        uint64_t time_at_migration = kvmclock_current_nsec(s);
> > > > +        uint64_t pvclock_via_mem = 0;
> > > >  
> > > > -        s->clock_valid = false;
> > > > +        /* local (running VM) restore */
> > > > +        if (s->clock_valid) {
> > > > +            /*
> > > > +             * if host does not support reliable KVM_GET_CLOCK,
> > > > +             * read kvmclock value from memory
> > > > +             */
> > > > +            if (!kvm_has_adjust_clock_stable()) {
> > > > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > > > +            }
> > > > +        /* migration/savevm/init restore */
> > > > +        } else {
> > > > +            /*
> > > > +             * use s->clock in case machine uses reliable
> > > > +             * get clock and source host supported
> > > > +             * reliable get clock
> > > > +             */
> > > > +            if (!(s->mach_use_reliable_get_clock &&
> > > > +                  s->src_use_reliable_get_clock)) {
> > > > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > > > +            }
> > > 
> > > The s->mach_use_reliable_get_clock check seems redundant.
> > > src_use_reliable_get_clock is set only if
> > > mach_use_reliable_get_clock is true.
> > 
> > Done.
> > 
> > > > +        }
> > > >  
> > > > -        /* We can't rely on the migrated clock value, just discard it */
> > > > -        if (time_at_migration) {
> > > > -            s->clock = time_at_migration;
> > > > +        /* We can't rely on the saved clock value, just discard it */
> > > > +        if (pvclock_via_mem) {
> > > > +            s->clock = pvclock_via_mem;
> > > >          }
> > > >  
> > > > +        s->clock_valid = false;
> > > > +
> > > >          data.clock = s->clock;
> > > >          ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
> > > >          if (ret < 0) {
> > > > @@ -120,8 +161,6 @@
> > > >              }
> > > >          }
> > > >      } else {
> > > > -        struct kvm_clock_data data;
> > > > -        int ret;
> > > >  
> > > >          if (s->clock_valid) {
> > > >              return;
> > > > @@ -129,13 +168,7 @@
> > > >  
> > > >          kvm_synchronize_all_tsc();
> > > >  
> > > > -        ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > > > -        if (ret < 0) {
> > > > -            fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > > > -            abort();
> > > > -        }
> > > > -        s->clock = data.clock;
> > > > -
> > > > +        s->clock = kvm_get_clock();
> > > >          /*
> > > >           * If the VM is stopped, declare the clock state valid to
> > > >           * avoid re-reading it on next vmsave (which would return
> > > > @@ -152,22 +185,69 @@
> > > >      qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> > > >  }
> > > >  
> > > > +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> > > > +{
> > > > +    KVMClockState *s = opaque;
> > > > +
> > > > +    /*
> > > > +     * On machine types that support reliable KVM_GET_CLOCK,
> > > > +     * if host kernel does provide reliable KVM_GET_CLOCK,
> > > > +     * set src_use_reliable_get_clock=true so that destination
> > > > +     * avoids reading kvmclock from memory.
> > > > +     */
> > > > +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> > > > +        s->src_use_reliable_get_clock = true;
> > > > +    }
> > > 
> > > It feels fragile to change device state inside the .needed
> > > function. Better to initialize src_use_reliable_get_clock on
> > > kvmclock_realize()?
> > > 
> > > What exactly ensures src_use_reliable_get_clock is correctly
> > > initialized on the migration destination as well?
> > 
> > Its initialized to false (because its part of device state).
> 
> Do you mean it is always going to be false on the destination
> host? What if .needed is called by other code? .needed shouldn't
> have any side-effects.

I don't think needed is called by any other code.

> Also, the difference between the variables on the destination and
> the source is very confusing, see other comments below:

Yes it is: its because you don't know when is what. But if you consider
separate events:

    initialization, incoming migration, vmstop, vmcontinue.

And the values of the variables for each case, then it becomes 
clear.

If anyone has a suggestion to improve clarify, its welcome.

> > > 
> > > > +
> > > > +    return s->mach_use_reliable_get_clock;
> > > 
> > > If if kvm_has_adjust_clock_stable() is false, isn't it simpler to
> > > simply skip the section?
> > > 
> > > It looks like mach_use_reliable_get_clock and
> > > src_use_reliable_get_clock could become a single field:
> > > 
> > > * use_reliable_get_clock ("x-use-reliable-get-clock" property)
> > >   set to true by default (set on DEFINE_PROP_BOOL parameter)
> > > * "x-use-reliable-get-clock" set to false by default on older
> > >   machine-types
> > > * use_reliable_get_clock forced to false on kvmclock_realize() if
> > >   !kvm_has_adjust_clock_stable()
> > > * kvmclock_reliable_get_clock.needed return s->use_reliable_get_clock
> > 
> > No because use_reliable_get_clock can be false because the destination host
> > does not support kvm_has_adjust_clock_stable(), but the source host
> > supports it (so we want to use the value in the first incoming
> > migration).
> > 
> 
> Are you talking about the value of the fields on the source host,
> or on the destination host? In this case, you mean that
> use_reliable_get_clock should be false on the destination, right?

Yes, destination.

> > So the variable indicates whether the source host supported
> > migration, not whether the destination supports it.
> 
> Good, that's the whole point of the field, isn't it? I will try
> to summarize all cases below. Please correct me if something is
> incorrect.

Yes. So in this case, the cleanup/simplification to use a single
variable is not possible (or maybe it is, but it just complicates
things).

> --------+------------+---------------+-----------+------------+
>         | kvm_has_adj_clock_stable() | use_reliable_get_clock |
> machine | src        | dst           | src       | dst        |
> --------+------------+---------------+-----------+------------+
> pc-2.8  | false      | false         | false     | false      |
> pc-2.8  | false      | true          | false     | false (*)  |
> pc-2.8  | true       | false         | true      | true (**)  |
> pc-2.8  | true       | true          | true      | true       |
> --------+------------+---------------+-----------+------------+
> pc-2.7  | (any)      | (any)         | false     | false      |
> --------+------------+---------------+-----------+------------+
> 
> (*) I guess this is where things would break if we skip the
>     subsection (and use a single field). If the subsection is
>     skipped, use_reliable_get_clock would be set to true on the
>     destination. (See my comment below on the new patch you sent)

Well, perhaps you can massage things so that it works 
without sending subsection (that is):

    * Send subsection only in case value is true.
    * Do not send subsection in case value is false (then treat
      non presence of subsection and mach_reliable_clock=true 
      to mean source did not support kvm_has_reliable_clock).

Once you agree on "send a subsection for both cases", then you 
handle it.

The code is there, its simple, readable, so...

(what you can't do is to unify src_use_get_reliable_clock (remote 
information), and local information). Because you need 
to know both things to be able to handle the 4 cases properly on 
incoming migration.


> (**) Destination can still do the right thing here, right?
> 
> 
> 
> > 
> [...]
> > 
> > How about the following:
> > 
> > Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c	2016-11-17 15:07:11.220632761 -0200
> > +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c	2016-11-17 15:11:51.372048640 -0200
> > @@ -36,6 +36,12 @@
> >  
> >      uint64_t clock;
> >      bool clock_valid;
> > +
> > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > +    bool mach_use_reliable_get_clock;
> > +
> > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > +    bool src_use_reliable_get_clock;
> >  } KVMClockState;
> >  
> >  struct pvclock_vcpu_time_info {
> > @@ -81,6 +87,19 @@
> >      return nsec + time.system_time;
> >  }
> >  
> > +static uint64_t kvm_get_clock(void)
> > +{
> > +    struct kvm_clock_data data;
> > +    int ret;
> > +
> > +    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > +    if (ret < 0) {
> > +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > +                abort();
> > +    }
> > +    return data.clock;
> > +}
> > +
> >  static void kvmclock_vm_state_change(void *opaque, int running,
> >                                       RunState state)
> >  {
> > @@ -91,15 +110,36 @@
> >  
> >      if (running) {
> >          struct kvm_clock_data data = {};
> > -        uint64_t time_at_migration = kvmclock_current_nsec(s);
> > +        uint64_t pvclock_via_mem = 0;
> >  
> > -        s->clock_valid = false;
> > +        /* local (running VM) restore */
> > +        if (s->clock_valid) {
> > +            /*
> > +             * if host does not support reliable KVM_GET_CLOCK,
> > +             * read kvmclock value from memory
> > +             */
> > +            if (!kvm_has_adjust_clock_stable()) {
> > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > +            }
> > +        /* migration/savevm/init restore */
> > +        } else {
> > +            /*
> > +             * use s->clock in case machine uses reliable
> > +             * get clock and source host supported
> > +             * reliable get clock
> > +             */
> > +            if (!s->src_use_reliable_get_clock) {
> > +                pvclock_via_mem = kvmclock_current_nsec(s);
> > +            }
> > +        }
> >  
> > -        /* We can't rely on the migrated clock value, just discard it */
> > -        if (time_at_migration) {
> > -            s->clock = time_at_migration;
> > +        /* We can't rely on the saved clock value, just discard it */
> > +        if (pvclock_via_mem) {
> > +            s->clock = pvclock_via_mem;
> >          }
> >  
> > +        s->clock_valid = false;
> > +
> >          data.clock = s->clock;
> >          ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
> >          if (ret < 0) {
> > @@ -120,8 +160,6 @@
> >              }
> >          }
> >      } else {
> > -        struct kvm_clock_data data;
> > -        int ret;
> >  
> >          if (s->clock_valid) {
> >              return;
> > @@ -129,13 +167,7 @@
> >  
> >          kvm_synchronize_all_tsc();
> >  
> > -        ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> > -        if (ret < 0) {
> > -            fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> > -            abort();
> > -        }
> > -        s->clock = data.clock;
> > -
> > +        s->clock = kvm_get_clock();
> >          /*
> >           * If the VM is stopped, declare the clock state valid to
> >           * avoid re-reading it on next vmsave (which would return
> > @@ -149,25 +181,72 @@
> >  {
> >      KVMClockState *s = KVM_CLOCK(dev);
> >  
> > +    /*
> > +     * On machine types that support reliable KVM_GET_CLOCK,
> > +     * if host kernel does provide reliable KVM_GET_CLOCK,
> > +     * set src_use_reliable_get_clock=true so that destination
> > +     * avoids reading kvmclock from memory.
> > +     */
> > +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> > +        s->src_use_reliable_get_clock = true;
> > +    }
> > +
> >      qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> >  }
> >  
> > +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> > +{
> > +    KVMClockState *s = opaque;
> > +
> > +    return s->mach_use_reliable_get_clock;
> > +}
> 
> I didn't review kvmclock_vm_state_change() and
> kvm_has_adjust_clock_stable(), but the rules to migrate
> src_use_reliable_get_clock look good now, and seem to match the
> table above.

Ok... Paolo, Radim?

Thanks Eduardo.

> 
> > +
> > +static const VMStateDescription kvmclock_reliable_get_clock = {
> > +    .name = "kvmclock/src_use_reliable_get_clock",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = kvmclock_src_use_reliable_get_clock,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState),
> > +        VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> > +static void kvmclock_pre_save(void *opaque)
> > +{
> > +    KVMClockState *s = opaque;
> > +
> > +    s->clock = kvm_get_clock();
> > +}
> > +
> >  static const VMStateDescription kvmclock_vmsd = {
> >      .name = "kvmclock",
> >      .version_id = 1,
> >      .minimum_version_id = 1,
> > +    .pre_save = kvmclock_pre_save,
> >      .fields = (VMStateField[]) {
> >          VMSTATE_UINT64(clock, KVMClockState),
> >          VMSTATE_END_OF_LIST()
> > +    },
> > +    .subsections = (const VMStateDescription * []) {
> > +        &kvmclock_reliable_get_clock,
> > +        NULL
> >      }
> >  };
> >  
> > +static Property kvmclock_properties[] = {
> > +    DEFINE_PROP_BOOL("x-mach-use-reliable-get-clock", KVMClockState,
> > +                      mach_use_reliable_get_clock, true),
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> >  static void kvmclock_class_init(ObjectClass *klass, void *data)
> >  {
> >      DeviceClass *dc = DEVICE_CLASS(klass);
> >  
> >      dc->realize = kvmclock_realize;
> >      dc->vmsd = &kvmclock_vmsd;
> > +    dc->props = kvmclock_properties;
> >  }
> >  
> >  static const TypeInfo kvmclock_info = {
> > Index: qemu-mig-advance-clock/include/hw/i386/pc.h
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/include/hw/i386/pc.h	2016-11-17 15:07:11.220632761 -0200
> > +++ qemu-mig-advance-clock/include/hw/i386/pc.h	2016-11-17 15:08:58.096791416 -0200
> > @@ -370,6 +370,11 @@
> >  #define PC_COMPAT_2_7 \
> >      HW_COMPAT_2_7 \
> >      {\
> > +        .driver   = "kvmclock",\
> > +        .property = "x-mach-use-reliable-get-clock",\
> > +        .value    = "off",\
> > +    },\
> > +    {\
> >          .driver   = TYPE_X86_CPU,\
> >          .property = "l3-cache",\
> >          .value    = "off",\
> > Index: qemu-mig-advance-clock/target-i386/kvm.c
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/target-i386/kvm.c	2016-11-17 15:07:11.221632762 -0200
> > +++ qemu-mig-advance-clock/target-i386/kvm.c	2016-11-17 15:07:15.867639659 -0200
> > @@ -117,6 +117,13 @@
> >      return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
> >  }
> >  
> > +bool kvm_has_adjust_clock_stable(void)
> > +{
> > +    int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
> > +
> > +    return (ret == KVM_CLOCK_TSC_STABLE);
> > +}
> > +
> >  bool kvm_allows_irq0_override(void)
> >  {
> >      return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
> > Index: qemu-mig-advance-clock/target-i386/kvm_i386.h
> > ===================================================================
> > --- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h	2016-11-17 15:07:11.222632764 -0200
> > +++ qemu-mig-advance-clock/target-i386/kvm_i386.h	2016-11-17 15:07:15.867639659 -0200
> > @@ -17,6 +17,7 @@
> >  
> >  bool kvm_allows_irq0_override(void);
> >  bool kvm_has_smm(void);
> > +bool kvm_has_adjust_clock_stable(void);
> >  void kvm_synchronize_all_tsc(void);
> >  void kvm_arch_reset_vcpu(X86CPU *cs);
> >  void kvm_arch_do_init_vcpu(X86CPU *cs);
> 
> -- 
> Eduardo

  reply	other threads:[~2016-11-17 19:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-17 13:24 [qemu patch V2 0/2] improve kvmclock difference on migration Marcelo Tosatti
2016-11-17 13:24 ` [Qemu-devel] " Marcelo Tosatti
2016-11-17 13:24 ` [qemu patch V2 1/2] kvm: sync linux headers Marcelo Tosatti
2016-11-17 13:24   ` [Qemu-devel] " Marcelo Tosatti
2016-11-17 13:24 ` [qemu patch V2 2/2] kvmclock: reduce kvmclock difference on migration Marcelo Tosatti
2016-11-17 13:24   ` [Qemu-devel] " Marcelo Tosatti
2016-11-17 14:11   ` Eduardo Habkost
2016-11-17 14:11     ` [Qemu-devel] " Eduardo Habkost
2016-11-17 14:15     ` Paolo Bonzini
2016-11-17 14:15       ` [Qemu-devel] " Paolo Bonzini
2016-11-17 16:30       ` Marcelo Tosatti
2016-11-17 16:30         ` [Qemu-devel] " Marcelo Tosatti
2016-11-17 17:41         ` Eduardo Habkost
2016-11-17 17:41           ` [Qemu-devel] " Eduardo Habkost
2016-11-17 17:15     ` Marcelo Tosatti
2016-11-17 17:15       ` [Qemu-devel] " Marcelo Tosatti
2016-11-17 18:16       ` Eduardo Habkost
2016-11-17 18:16         ` [Qemu-devel] " Eduardo Habkost
2016-11-17 19:15         ` Marcelo Tosatti [this message]
2016-11-17 19:15           ` Marcelo Tosatti
2016-11-17 19:59           ` Eduardo Habkost
2016-11-17 19:59             ` [Qemu-devel] " Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161117191510.GA5460@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.