From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37660) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Whghw-00008W-Aa for qemu-devel@nongnu.org; Tue, 06 May 2014 10:51:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Whghp-0001Uq-BO for qemu-devel@nongnu.org; Tue, 06 May 2014 10:51:00 -0400 Received: from mail-pd0-f180.google.com ([209.85.192.180]:53476) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Whghp-0001Tw-2G for qemu-devel@nongnu.org; Tue, 06 May 2014 10:50:53 -0400 Received: by mail-pd0-f180.google.com with SMTP id y10so2893213pdj.25 for ; Tue, 06 May 2014 07:50:49 -0700 (PDT) Message-ID: <5368F6C3.8000307@ozlabs.ru> Date: Wed, 07 May 2014 00:50:43 +1000 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <1398940629-26415-1-git-send-email-aik@ozlabs.ru> In-Reply-To: <1398940629-26415-1-git-send-email-aik@ozlabs.ru> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7] spapr: Add support for time base offset migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: qemu-ppc@nongnu.org, Alexander Graf On 05/01/2014 08:37 PM, Alexey Kardashevskiy wrote: > This allows guests to have a different timebase origin from the host. > > This is needed for migration, where a guest can migrate from one host > to another and the two hosts might have a different timebase origin. > However, the timebase seen by the guest must not go backwards, and > should go forwards only by a small amount corresponding to the time > taken for the migration. > > This is only supported for recent POWER hardware which has the TBU40 > (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not > 970. > > This adds kvm_access_one_reg() to access a special register which is not > in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch. > > The feature must be present in the host kernel. > > This bumps vmstate_spapr::version_id and enables new vmstate_ppc_timebase > only for it. Since the vmstate_spapr::minimum_version_id remains > unchanged, migration from older QEMU is supported but without > vmstate_ppc_timebase. > > Signed-off-by: Alexey Kardashevskiy > --- > Changes: > v7: > * migration_duration_ns forced to be between [0...1s] > * s/tb/tb_remote/ > * time_of_the_day_ns is int64_t now as this is what get_clock_realtime() > returns Still bad? :) > > v6: > * time_of_the_day is now time_of_the_day_ns and measured in nm instead of us > * VMSTATE_PPC_TIMEBASE_V supports versions now > > v5: > * fixed multiple comments in cpu_ppc_get_adjusted_tb and merged it > into timebase_post_load() > * removed round_up(1<<24) as KVM is expected to do this anyway > * removed @freq from migration stream > * renamed PPCTimebaseOffset to PPCTimebase > * CLOCKS_PER_SEC is used as a constant which 1000000us/s (man clock) > > v4: > * made it per machine timebase offser rather than per CPU > > v3: > * kvm_access_one_reg moved out to a separate patch > * tb_offset and host_timebase were replaced with guest_timebase as > the destionation does not really care of offset on the source > > v2: > * bumped the vmstate_ppc_cpu version > * defined version for the env.tb_env field > --- > hw/ppc/ppc.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++ > hw/ppc/spapr.c | 4 +-- > include/hw/ppc/spapr.h | 1 + > target-ppc/cpu-qom.h | 16 ++++++++++ > target-ppc/kvm.c | 5 ++++ > trace-events | 3 ++ > 6 files changed, 106 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c > index 71df471..bec82cd 100644 > --- a/hw/ppc/ppc.c > +++ b/hw/ppc/ppc.c > @@ -29,9 +29,11 @@ > #include "sysemu/cpus.h" > #include "hw/timer/m48t59.h" > #include "qemu/log.h" > +#include "qemu/error-report.h" > #include "hw/loader.h" > #include "sysemu/kvm.h" > #include "kvm_ppc.h" > +#include "trace.h" > > //#define PPC_DEBUG_IRQ > //#define PPC_DEBUG_TB > @@ -49,6 +51,8 @@ > # define LOG_TB(...) do { } while (0) > #endif > > +#define NSEC_PER_SEC 1000000000LL > + > static void cpu_ppc_tb_stop (CPUPPCState *env); > static void cpu_ppc_tb_start (CPUPPCState *env); > > @@ -829,6 +833,81 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq) > cpu_ppc_store_purr(cpu, 0x0000000000000000ULL); > } > > +static void timebase_pre_save(void *opaque) > +{ > + PPCTimebase *tb = opaque; > + uint64_t ticks = cpu_get_real_ticks(); > + PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); > + > + if (!first_ppc_cpu->env.tb_env) { > + error_report("No timebase object"); > + return; > + } > + > + tb->time_of_the_day_ns = get_clock_realtime(); > + /* > + * tb_offset is only expected to be changed by migration so > + * there is no need to update it from KVM here > + */ > + tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; > +} > + > +static int timebase_post_load(void *opaque, int version_id) > +{ > + PPCTimebase *tb_remote = opaque; > + CPUState *cpu; > + PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); > + int64_t tb_off_adj, tb_off, ns_diff; > + int64_t migration_duration_ns, migration_duration_tb, guest_tb, host_ns; > + unsigned long freq; > + > + if (!first_ppc_cpu->env.tb_env) { > + error_report("No timebase object"); > + return -1; > + } > + > + freq = first_ppc_cpu->env.tb_env->tb_freq; > + /* > + * Calculate timebase on the destination side of migration. > + * The destination timebase must be not less than the source timebase. > + * We try to adjust timebase by downtime if host clocks are not > + * too much out of sync (1 second for now). > + */ > + host_ns = get_clock_realtime(); > + ns_diff = MAX(0, host_ns - tb_remote->time_of_the_day_ns); > + migration_duration_ns = MIN(NSEC_PER_SEC, ns_diff); > + migration_duration_tb = muldiv64(migration_duration_ns, freq, NSEC_PER_SEC); > + guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb); > + > + tb_off_adj = guest_tb - cpu_get_real_ticks(); > + > + tb_off = first_ppc_cpu->env.tb_env->tb_offset; > + trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off, > + (tb_off_adj - tb_off) / freq); > + > + /* Set new offset to all CPUs */ > + CPU_FOREACH(cpu) { > + PowerPCCPU *pcpu = POWERPC_CPU(cpu); > + pcpu->env.tb_env->tb_offset = tb_off_adj; > + } > + > + return 0; > +} > + > +const VMStateDescription vmstate_ppc_timebase = { > + .name = "timebase", > + .version_id = 1, > + .minimum_version_id = 1, > + .minimum_version_id_old = 1, > + .pre_save = timebase_pre_save, > + .post_load = timebase_post_load, > + .fields = (VMStateField []) { > + VMSTATE_UINT64(guest_timebase, PPCTimebase), > + VMSTATE_INT64(time_of_the_day_ns, PPCTimebase), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > /* Set up (once) timebase frequency (in Hz) */ > clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq) > { > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 451c473..297fc6f 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -818,7 +818,7 @@ static int spapr_vga_init(PCIBus *pci_bus) > > static const VMStateDescription vmstate_spapr = { > .name = "spapr", > - .version_id = 1, > + .version_id = 2, > .minimum_version_id = 1, > .minimum_version_id_old = 1, > .fields = (VMStateField []) { > @@ -826,7 +826,7 @@ static const VMStateDescription vmstate_spapr = { > > /* RTC offset */ > VMSTATE_UINT64(rtc_offset, sPAPREnvironment), > - > + VMSTATE_PPC_TIMEBASE_V(tb, sPAPREnvironment, 2), > VMSTATE_END_OF_LIST() > }, > }; > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index 5fdac1e..9f8bb89 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -29,6 +29,7 @@ typedef struct sPAPREnvironment { > target_ulong entry_point; > uint32_t next_irq; > uint64_t rtc_offset; > + struct PPCTimebase tb; > bool has_graphics; > > uint32_t epow_irq; > diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h > index 47dc8e6..d926d93 100644 > --- a/target-ppc/cpu-qom.h > +++ b/target-ppc/cpu-qom.h > @@ -120,6 +120,22 @@ int ppc64_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs, > int cpuid, void *opaque); > #ifndef CONFIG_USER_ONLY > extern const struct VMStateDescription vmstate_ppc_cpu; > + > +typedef struct PPCTimebase { > + uint64_t guest_timebase; > + int64_t time_of_the_day_ns; > +} PPCTimebase; > + > +extern const struct VMStateDescription vmstate_ppc_timebase; > + > +#define VMSTATE_PPC_TIMEBASE_V(_field, _state, _version) { \ > + .name = (stringify(_field)), \ > + .version_id = (_version), \ > + .size = sizeof(PPCTimebase), \ > + .vmsd = &vmstate_ppc_timebase, \ > + .flags = VMS_STRUCT, \ > + .offset = vmstate_offset_value(_state, _field, PPCTimebase), \ > +} > #endif > > #endif > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 73dbb02..a8a1498 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -35,6 +35,7 @@ > #include "hw/sysbus.h" > #include "hw/ppc/spapr.h" > #include "hw/ppc/spapr_vio.h" > +#include "hw/ppc/ppc.h" > #include "sysemu/watchdog.h" > #include "trace.h" > > @@ -890,6 +891,8 @@ int kvm_arch_put_registers(CPUState *cs, int level) > DPRINTF("Warning: Unable to set VPA information to KVM\n"); > } > } > + > + kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &env->tb_env->tb_offset); > #endif /* TARGET_PPC64 */ > } > > @@ -1133,6 +1136,8 @@ int kvm_arch_get_registers(CPUState *cs) > DPRINTF("Warning: Unable to get VPA information from KVM\n"); > } > } > + > + kvm_get_one_reg(cs, KVM_REG_PPC_TB_OFFSET, &env->tb_env->tb_offset); > #endif > } > > diff --git a/trace-events b/trace-events > index a5218ba..6627569 100644 > --- a/trace-events > +++ b/trace-events > @@ -1182,6 +1182,9 @@ spapr_iommu_get(uint64_t liobn, uint64_t ioba, uint64_t ret, uint64_t tce) "liob > spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u mask=%x" > spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) "liobn=%"PRIx64" tcet=%p table=%p fd=%d" > > +# hw/ppc/ppc.c > +ppc_tb_adjust(uint64_t offs1, uint64_t offs2, int64_t diff, int64_t seconds) "adjusted from 0x%"PRIx64" to 0x%"PRIx64", diff %"PRId64" (%"PRId64"s)" > + > # util/hbitmap.c > hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx" > hbitmap_reset(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64 > -- Alexey