* [PATCH 0/2] Time-related fixes for migration
@ 2014-03-30 3:05 Boris Ostrovsky
2014-03-30 3:05 ` [PATCH 1/2] libxl: Set guest parameters from config file during a restore Boris Ostrovsky
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Boris Ostrovsky @ 2014-03-30 3:05 UTC (permalink / raw)
To: xen-devel
Cc: kevin.tian, suravee.suthikulpanit, stefano.stabellini, eddie.dong,
ian.jackson, jbeulich, jun.nakajima, boris.ostrovsky,
ian.campbell
Two patches to address time-related problem that we discovered during
migration testing.
* The first patch loads HVM parameters from configuration file during restore.
To fix the actual problem that we saw only timer_mode needed to be restored but
it seems to me that other parameters are needed as well since at least some of
them are used at runtime.
The bug can be demonstrated with a Solaris guest but I haven't been able to
trigger it with Linux. Possibly because Solaris' gethrtime() routine (which is
what the test was using) is a trap to kernel's hrtimer which reports global
time and performs some adjustments to per-CPU clock.
* The second patch keeps TSCs synchronized across VPCUs after save/restore.
Currently TSC values diverge after migration because during both save and restore
we calculate them separately for each VCPU and base each calculation on
newly-read host's TSC.
The problem can be easily demonstrated with this program for a 2-VCPU guest
(I am assuming here invariant TSC so, for example, tsc_mode="always_emulate" (*)):
int
main(int argc, char* argv[])
{
unsigned long long h = 0LL;
int proc = 0;
cpu_set_t set;
for(;;) {
unsigned long long n = __native_read_tsc();
if(h && n < h)
printf("prev 0x%llx cur 0x%llx\n", h, n);
CPU_ZERO(&set);
proc = (proc + 1) & 1;
CPU_SET(proc, &set);
if (sched_setaffinity(0, sizeof(cpu_set_t), &set)) {
perror("sched_setaffinity");
exit(1);
}
h = n;
}
}
(*) Which brings up another observation: when we are in default tsc_mode we
start off with vtsc=0 and thus clear TSC_Invariant bit in guest's CPUID.
After migration vtsc is 1 and TSC_Invariant bit is set. So the guest may observe
different values of CPUID. Which technically reflects the fact that TSC became
"safe" but I think potentially may be problematic to some guests.
Boris Ostrovsky (2):
libxl: Set guest parameters from config file during a restore
x86/HVM: Use fixed TSC value when saving or restoring domain
tools/libxl/libxl_dom.c | 51 +++++++++++++++++++++++++----------------
xen/arch/x86/hvm/hvm.c | 18 ++++++++++-----
xen/arch/x86/hvm/save.c | 36 +++++++++++++++++++++--------
xen/arch/x86/hvm/svm/svm.c | 4 ++--
xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
xen/arch/x86/hvm/vpt.c | 16 ++++++++-----
xen/arch/x86/time.c | 7 ++++--
xen/common/hvm/save.c | 5 ++++
xen/include/asm-x86/domain.h | 2 ++
xen/include/asm-x86/hvm/hvm.h | 9 +++++---
xen/include/xen/hvm/save.h | 2 ++
xen/include/xen/time.h | 3 ++-
12 files changed, 105 insertions(+), 52 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/2] libxl: Set guest parameters from config file during a restore 2014-03-30 3:05 [PATCH 0/2] Time-related fixes for migration Boris Ostrovsky @ 2014-03-30 3:05 ` Boris Ostrovsky 2014-04-01 10:37 ` Ian Campbell 2014-03-30 3:05 ` [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky 2014-03-31 14:41 ` [PATCH 0/2] Time-related fixes for migration Tian, Kevin 2 siblings, 1 reply; 14+ messages in thread From: Boris Ostrovsky @ 2014-03-30 3:05 UTC (permalink / raw) To: xen-devel Cc: kevin.tian, suravee.suthikulpanit, stefano.stabellini, eddie.dong, ian.jackson, jbeulich, jun.nakajima, boris.ostrovsky, ian.campbell Guest's configuration parameters (e.g. timer_mode) are used by the hypervisor during runtime. We should therefore set them during restore. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- tools/libxl/libxl_dom.c | 51 ++++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 55f74b2..368de39 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -201,6 +201,32 @@ static int numa_place_domain(libxl__gc *gc, uint32_t domid, return rc; } +static unsigned long timer_mode(const libxl_domain_build_info *info) +{ + const libxl_timer_mode mode = info->u.hvm.timer_mode; + assert(mode >= LIBXL_TIMER_MODE_DELAY_FOR_MISSED_TICKS && + mode <= LIBXL_TIMER_MODE_ONE_MISSED_TICK_PENDING); + return ((unsigned long)mode); +} + +static void hvm_set_conf_params(xc_interface *handle, uint32_t domid, + libxl_domain_build_info *const info) +{ + xc_set_hvm_param(handle, domid, HVM_PARAM_PAE_ENABLED, + libxl_defbool_val(info->u.hvm.pae)); +#if defined(__i386__) || defined(__x86_64__) + xc_set_hvm_param(handle, domid, HVM_PARAM_VIRIDIAN, + libxl_defbool_val(info->u.hvm.viridian)); + xc_set_hvm_param(handle, domid, HVM_PARAM_HPET_ENABLED, + libxl_defbool_val(info->u.hvm.hpet)); +#endif + xc_set_hvm_param(handle, domid, HVM_PARAM_TIMER_MODE, timer_mode(info)); + xc_set_hvm_param(handle, domid, HVM_PARAM_VPT_ALIGN, + libxl_defbool_val(info->u.hvm.vpt_align)); + xc_set_hvm_param(handle, domid, HVM_PARAM_NESTEDHVM, + libxl_defbool_val(info->u.hvm.nested_hvm)); +} + int libxl__build_pre(libxl__gc *gc, uint32_t domid, libxl_domain_config *d_config, libxl__domain_build_state *state) { @@ -255,6 +281,9 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid); state->vm_generationid_addr = 0; + if (info->type == LIBXL_DOMAIN_TYPE_HVM) + hvm_set_conf_params(ctx->xch, domid, info); + rc = libxl__arch_domain_create(gc, d_config, domid); return rc; @@ -449,13 +478,6 @@ out: return ret == 0 ? 0 : ERROR_FAIL; } -static unsigned long timer_mode(const libxl_domain_build_info *info) -{ - const libxl_timer_mode mode = info->u.hvm.timer_mode; - assert(mode >= LIBXL_TIMER_MODE_DELAY_FOR_MISSED_TICKS && - mode <= LIBXL_TIMER_MODE_ONE_MISSED_TICK_PENDING); - return ((unsigned long)mode); -} static int hvm_build_set_params(xc_interface *handle, uint32_t domid, libxl_domain_build_info *info, int store_evtchn, unsigned long *store_mfn, @@ -484,22 +506,11 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid, xc_get_hvm_param(handle, domid, HVM_PARAM_STORE_PFN, store_mfn); xc_get_hvm_param(handle, domid, HVM_PARAM_CONSOLE_PFN, console_mfn); - xc_set_hvm_param(handle, domid, HVM_PARAM_PAE_ENABLED, - libxl_defbool_val(info->u.hvm.pae)); -#if defined(__i386__) || defined(__x86_64__) - xc_set_hvm_param(handle, domid, HVM_PARAM_VIRIDIAN, - libxl_defbool_val(info->u.hvm.viridian)); - xc_set_hvm_param(handle, domid, HVM_PARAM_HPET_ENABLED, - libxl_defbool_val(info->u.hvm.hpet)); -#endif - xc_set_hvm_param(handle, domid, HVM_PARAM_TIMER_MODE, timer_mode(info)); - xc_set_hvm_param(handle, domid, HVM_PARAM_VPT_ALIGN, - libxl_defbool_val(info->u.hvm.vpt_align)); - xc_set_hvm_param(handle, domid, HVM_PARAM_NESTEDHVM, - libxl_defbool_val(info->u.hvm.nested_hvm)); xc_set_hvm_param(handle, domid, HVM_PARAM_STORE_EVTCHN, store_evtchn); xc_set_hvm_param(handle, domid, HVM_PARAM_CONSOLE_EVTCHN, console_evtchn); + hvm_set_conf_params(handle, domid, info); + xc_dom_gnttab_hvm_seed(handle, domid, *console_mfn, *store_mfn, console_domid, store_domid); return 0; } -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] libxl: Set guest parameters from config file during a restore 2014-03-30 3:05 ` [PATCH 1/2] libxl: Set guest parameters from config file during a restore Boris Ostrovsky @ 2014-04-01 10:37 ` Ian Campbell 2014-04-01 13:45 ` Boris Ostrovsky 0 siblings, 1 reply; 14+ messages in thread From: Ian Campbell @ 2014-04-01 10:37 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, suravee.suthikulpanit, stefano.stabellini, eddie.dong, ian.jackson, xen-devel, jbeulich, jun.nakajima On Sat, 2014-03-29 at 23:05 -0400, Boris Ostrovsky wrote: > @@ -484,22 +506,11 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid, > > xc_get_hvm_param(handle, domid, HVM_PARAM_STORE_PFN, store_mfn); > xc_get_hvm_param(handle, domid, HVM_PARAM_CONSOLE_PFN, console_mfn); > - xc_set_hvm_param(handle, domid, HVM_PARAM_PAE_ENABLED, > - libxl_defbool_val(info->u.hvm.pae)); > -#if defined(__i386__) || defined(__x86_64__) > - xc_set_hvm_param(handle, domid, HVM_PARAM_VIRIDIAN, > - libxl_defbool_val(info->u.hvm.viridian)); > - xc_set_hvm_param(handle, domid, HVM_PARAM_HPET_ENABLED, > - libxl_defbool_val(info->u.hvm.hpet)); > -#endif > - xc_set_hvm_param(handle, domid, HVM_PARAM_TIMER_MODE, timer_mode(info)); > - xc_set_hvm_param(handle, domid, HVM_PARAM_VPT_ALIGN, > - libxl_defbool_val(info->u.hvm.vpt_align)); > - xc_set_hvm_param(handle, domid, HVM_PARAM_NESTEDHVM, > - libxl_defbool_val(info->u.hvm.nested_hvm)); > xc_set_hvm_param(handle, domid, HVM_PARAM_STORE_EVTCHN, store_evtchn); > xc_set_hvm_param(handle, domid, HVM_PARAM_CONSOLE_EVTCHN, console_evtchn); > > + hvm_set_conf_params(handle, domid, info); Can you confirm that this isn't now called twice on domain create, once from libxl__build_pre and then again here. Ian. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] libxl: Set guest parameters from config file during a restore 2014-04-01 10:37 ` Ian Campbell @ 2014-04-01 13:45 ` Boris Ostrovsky 0 siblings, 0 replies; 14+ messages in thread From: Boris Ostrovsky @ 2014-04-01 13:45 UTC (permalink / raw) To: Ian Campbell Cc: kevin.tian, suravee.suthikulpanit, stefano.stabellini, eddie.dong, ian.jackson, xen-devel, jbeulich, jun.nakajima On 04/01/2014 06:37 AM, Ian Campbell wrote: > On Sat, 2014-03-29 at 23:05 -0400, Boris Ostrovsky wrote: >> @@ -484,22 +506,11 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid, >> >> xc_get_hvm_param(handle, domid, HVM_PARAM_STORE_PFN, store_mfn); >> xc_get_hvm_param(handle, domid, HVM_PARAM_CONSOLE_PFN, console_mfn); >> - xc_set_hvm_param(handle, domid, HVM_PARAM_PAE_ENABLED, >> - libxl_defbool_val(info->u.hvm.pae)); >> -#if defined(__i386__) || defined(__x86_64__) >> - xc_set_hvm_param(handle, domid, HVM_PARAM_VIRIDIAN, >> - libxl_defbool_val(info->u.hvm.viridian)); >> - xc_set_hvm_param(handle, domid, HVM_PARAM_HPET_ENABLED, >> - libxl_defbool_val(info->u.hvm.hpet)); >> -#endif >> - xc_set_hvm_param(handle, domid, HVM_PARAM_TIMER_MODE, timer_mode(info)); >> - xc_set_hvm_param(handle, domid, HVM_PARAM_VPT_ALIGN, >> - libxl_defbool_val(info->u.hvm.vpt_align)); >> - xc_set_hvm_param(handle, domid, HVM_PARAM_NESTEDHVM, >> - libxl_defbool_val(info->u.hvm.nested_hvm)); >> xc_set_hvm_param(handle, domid, HVM_PARAM_STORE_EVTCHN, store_evtchn); >> xc_set_hvm_param(handle, domid, HVM_PARAM_CONSOLE_EVTCHN, console_evtchn); >> >> + hvm_set_conf_params(handle, domid, info); > Can you confirm that this isn't now called twice on domain create, once > from libxl__build_pre and then again here. No, I can't. I thought I checked that libxl__build_pre() is not called on create and of course it clearly is. So this was me being rather sloppy. I'll remove hvm_set_conf_params() call from hvm_build_set_params() and leave it only in libxl__build_pre(). -boris ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-30 3:05 [PATCH 0/2] Time-related fixes for migration Boris Ostrovsky 2014-03-30 3:05 ` [PATCH 1/2] libxl: Set guest parameters from config file during a restore Boris Ostrovsky @ 2014-03-30 3:05 ` Boris Ostrovsky 2014-03-31 9:51 ` Jan Beulich 2014-03-31 15:29 ` Tian, Kevin 2014-03-31 14:41 ` [PATCH 0/2] Time-related fixes for migration Tian, Kevin 2 siblings, 2 replies; 14+ messages in thread From: Boris Ostrovsky @ 2014-03-30 3:05 UTC (permalink / raw) To: xen-devel Cc: kevin.tian, suravee.suthikulpanit, stefano.stabellini, eddie.dong, ian.jackson, jbeulich, jun.nakajima, boris.ostrovsky, ian.campbell When a domain is saved each VCPU's TSC value needs to be preserved. To get it we use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() which it may call) calculates VCPU's TSC based on current host's TSC value (by doing a rdtscll()). Since this is performed for each VCPU separately we end up with un-synchronized TSCs. Similarly, during a restore each VCPU is assigned its TSC based on host's current tick, causing virtual TSCs to diverge further. With this, we can easily get into situation where a guest may see time going backwards. Instead of reading new TSC value for each VCPU when saving/restoring it we should use the same value across all VCPUs. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/hvm.c | 18 ++++++++++++------ xen/arch/x86/hvm/save.c | 36 ++++++++++++++++++++++++++---------- xen/arch/x86/hvm/svm/svm.c | 4 ++-- xen/arch/x86/hvm/vmx/vmx.c | 4 ++-- xen/arch/x86/hvm/vpt.c | 16 ++++++++++------ xen/arch/x86/time.c | 7 +++++-- xen/common/hvm/save.c | 5 +++++ xen/include/asm-x86/domain.h | 2 ++ xen/include/asm-x86/hvm/hvm.h | 9 ++++++--- xen/include/xen/hvm/save.h | 2 ++ xen/include/xen/time.h | 3 ++- 11 files changed, 74 insertions(+), 32 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index ae24211..98de16a 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -248,19 +248,22 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat) return 1; } -void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc) +void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc) { uint64_t tsc; uint64_t delta_tsc; if ( v->domain->arch.vtsc ) { - tsc = hvm_get_guest_time(v); + tsc = hvm_get_guest_time_fixed(v, at_tsc); tsc = gtime_to_gtsc(v->domain, tsc); } else { - rdtscll(tsc); + if ( at_tsc ) + tsc = at_tsc; + else + rdtscll(tsc); } delta_tsc = guest_tsc - tsc; @@ -279,19 +282,22 @@ void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust) v->arch.hvm_vcpu.msr_tsc_adjust = tsc_adjust; } -u64 hvm_get_guest_tsc(struct vcpu *v) +u64 hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc) { uint64_t tsc; if ( v->domain->arch.vtsc ) { - tsc = hvm_get_guest_time(v); + tsc = hvm_get_guest_time_fixed(v, at_tsc); tsc = gtime_to_gtsc(v->domain, tsc); v->domain->arch.vtsc_kerncount++; } else { - rdtscll(tsc); + if ( at_tsc ) + tsc = at_tsc; + else + rdtscll(tsc); } return tsc + v->arch.hvm_vcpu.cache_tsc_offset; diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c index 066fdb2..6b0767e 100644 --- a/xen/arch/x86/hvm/save.c +++ b/xen/arch/x86/hvm/save.c @@ -24,7 +24,7 @@ #include <asm/hvm/support.h> #include <public/hvm/save.h> -void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) +void arch_hvm_save(struct domain *dom, struct hvm_save_header *hdr) { uint32_t eax, ebx, ecx, edx; @@ -33,24 +33,32 @@ void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) hdr->cpuid = eax; /* Save guest's preferred TSC. */ - hdr->gtsc_khz = d->arch.tsc_khz; + hdr->gtsc_khz = dom->arch.tsc_khz; + + /* Time when saving started */ + rdtscll(dom->arch.chkpt_tsc); +} + +void arch_hvm_save_done(struct domain *dom) +{ + dom->arch.chkpt_tsc = 0; } -int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) +int arch_hvm_load(struct domain *dom, struct hvm_save_header *hdr) { uint32_t eax, ebx, ecx, edx; if ( hdr->magic != HVM_FILE_MAGIC ) { printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n", - d->domain_id, hdr->magic); + dom->domain_id, hdr->magic); return -1; } if ( hdr->version != HVM_FILE_VERSION ) { printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n", - d->domain_id, hdr->version); + dom->domain_id, hdr->version); return -1; } @@ -59,20 +67,28 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) if ( (hdr->cpuid & ~0x0fUL) != (eax & ~0x0fUL) ) printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU " "(%#"PRIx32") and restored on another (%#"PRIx32").\n", - d->domain_id, hdr->cpuid, eax); + dom->domain_id, hdr->cpuid, eax); /* Restore guest's preferred TSC frequency. */ if ( hdr->gtsc_khz ) - d->arch.tsc_khz = hdr->gtsc_khz; - if ( d->arch.vtsc ) - hvm_set_rdtsc_exiting(d, 1); + dom->arch.tsc_khz = hdr->gtsc_khz; + if ( dom->arch.vtsc ) + hvm_set_rdtsc_exiting(dom, 1); + + /* Time when restore started */ + rdtscll(dom->arch.chkpt_tsc); /* VGA state is not saved/restored, so we nobble the cache. */ - d->arch.hvm_domain.stdvga.cache = 0; + dom->arch.hvm_domain.stdvga.cache = 0; return 0; } +void arch_hvm_load_done(struct domain *dom) +{ + dom->arch.chkpt_tsc = 0; +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 4fd5376..7aa55c3 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -318,7 +318,7 @@ static void svm_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) data->msr_efer = v->arch.hvm_vcpu.guest_efer; data->msr_flags = -1ULL; - data->tsc = hvm_get_guest_tsc(v); + data->tsc = hvm_get_guest_tsc_fixed(v, v->domain->arch.chkpt_tsc); } @@ -334,7 +334,7 @@ static void svm_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) v->arch.hvm_vcpu.guest_efer = data->msr_efer; svm_update_guest_efer(v); - hvm_set_guest_tsc(v, data->tsc); + hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.chkpt_tsc); } static void svm_save_vmcb_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8395e86..f10d34c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -540,7 +540,7 @@ static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) data->msr_star = guest_state->msrs[VMX_INDEX_MSR_STAR]; data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK]; - data->tsc = hvm_get_guest_tsc(v); + data->tsc = hvm_get_guest_tsc_fixed(v, v->domain->arch.chkpt_tsc); } static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) @@ -556,7 +556,7 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) v->arch.hvm_vmx.cstar = data->msr_cstar; v->arch.hvm_vmx.shadow_gs = data->shadow_gs; - hvm_set_guest_tsc(v, data->tsc); + hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.chkpt_tsc); } diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index f7af688..38541cf 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -36,7 +36,7 @@ void hvm_init_guest_time(struct domain *d) pl->last_guest_time = 0; } -u64 hvm_get_guest_time(struct vcpu *v) +u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc) { struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time; u64 now; @@ -45,11 +45,15 @@ u64 hvm_get_guest_time(struct vcpu *v) ASSERT(is_hvm_vcpu(v)); spin_lock(&pl->pl_time_lock); - now = get_s_time() + pl->stime_offset; - if ( (int64_t)(now - pl->last_guest_time) > 0 ) - pl->last_guest_time = now; - else - now = ++pl->last_guest_time; + now = get_s_time_fixed(at_tsc) + pl->stime_offset; + + if ( !at_tsc ) + { + if ( (int64_t)(now - pl->last_guest_time) > 0 ) + pl->last_guest_time = now; + else + now = ++pl->last_guest_time; + } spin_unlock(&pl->pl_time_lock); return now + v->arch.hvm_vcpu.stime_offset; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 000191b..d424c70 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -712,13 +712,16 @@ static unsigned long get_cmos_time(void) * System Time ***************************************************************************/ -s_time_t get_s_time(void) +s_time_t get_s_time_fixed(u64 at_tsc) { struct cpu_time *t = &this_cpu(cpu_time); u64 tsc, delta; s_time_t now; - rdtscll(tsc); + if ( at_tsc ) + tsc = at_tsc; + else + rdtscll(tsc); delta = tsc - t->local_tsc_stamp; now = t->stime_local_stamp + scale_delta(delta, &t->tsc_scale); diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c index 6c16399..7db68af 100644 --- a/xen/common/hvm/save.c +++ b/xen/common/hvm/save.c @@ -186,6 +186,8 @@ int hvm_save(struct domain *d, hvm_domain_context_t *h) } } + arch_hvm_save_done(d); + /* Save an end-of-file marker */ if ( hvm_save_entry(END, 0, h, &end) != 0 ) { @@ -236,7 +238,10 @@ int hvm_load(struct domain *d, hvm_domain_context_t *h) /* Read the typecode of the next entry and check for the end-marker */ desc = (struct hvm_save_descriptor *)(&h->data[h->cur]); if ( desc->typecode == 0 ) + { + arch_hvm_load_done(d); return 0; + } /* Find the handler for this entry */ if ( (desc->typecode > HVM_SAVE_CODE_MAX) || diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 49f7c0c..201f856 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -308,6 +308,8 @@ struct arch_domain (possibly other cases in the future */ uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */ uint64_t vtsc_usercount; /* not used for hvm */ + uint64_t chkpt_tsc; /* TSC value that VCPUs use to calculate their + tsc_offset value. Used during save/restore */ /* Pseudophysical e820 map (XENMEM_memory_map). */ spinlock_t e820_lock; diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index dcc3483..d80e763 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -232,12 +232,15 @@ bool_t hvm_send_assist_req(struct vcpu *v); void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat); int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat); -void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc); -u64 hvm_get_guest_tsc(struct vcpu *v); +void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc); +#define hvm_set_guest_tsc(v, t) hvm_set_guest_tsc_fixed((v), (t), 0) +u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc); +#define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed((v), 0) void hvm_init_guest_time(struct domain *d); void hvm_set_guest_time(struct vcpu *v, u64 guest_time); -u64 hvm_get_guest_time(struct vcpu *v); +u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc); +#define hvm_get_guest_time(v) hvm_get_guest_time_fixed((v), 0) int vmsi_deliver( struct domain *d, int vector, diff --git a/xen/include/xen/hvm/save.h b/xen/include/xen/hvm/save.h index ae6f0bb..70522a9 100644 --- a/xen/include/xen/hvm/save.h +++ b/xen/include/xen/hvm/save.h @@ -133,6 +133,8 @@ int hvm_load(struct domain *d, hvm_domain_context_t *h); /* Arch-specific definitions. */ struct hvm_save_header; void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr); +void arch_hvm_save_done(struct domain *d); int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr); +void arch_hvm_load_done(struct domain *d); #endif /* __XEN_HVM_SAVE_H__ */ diff --git a/xen/include/xen/time.h b/xen/include/xen/time.h index 95b4b91..032eb23 100644 --- a/xen/include/xen/time.h +++ b/xen/include/xen/time.h @@ -32,7 +32,8 @@ struct vcpu; typedef s64 s_time_t; #define PRI_stime PRId64 -s_time_t get_s_time(void); +s_time_t get_s_time_fixed(u64 at_tick); +#define get_s_time() get_s_time_fixed(0) unsigned long get_localtime(struct domain *d); uint64_t get_localtime_us(struct domain *d); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-30 3:05 ` [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky @ 2014-03-31 9:51 ` Jan Beulich 2014-03-31 14:01 ` Boris Ostrovsky 2014-03-31 15:29 ` Tian, Kevin 1 sibling, 1 reply; 14+ messages in thread From: Jan Beulich @ 2014-03-31 9:51 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, ian.campbell, stefano.stabellini, ian.jackson, eddie.dong, xen-devel, jun.nakajima, suravee.suthikulpanit > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -248,19 +248,22 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat) > return 1; > } > > -void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc) > +void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc) > { > uint64_t tsc; > uint64_t delta_tsc; > > if ( v->domain->arch.vtsc ) > { > - tsc = hvm_get_guest_time(v); > + tsc = hvm_get_guest_time_fixed(v, at_tsc); > tsc = gtime_to_gtsc(v->domain, tsc); > } > else > { > - rdtscll(tsc); > + if ( at_tsc ) > + tsc = at_tsc; Please insert this as an "else if()" above the current "else". > @@ -279,19 +282,22 @@ void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust) > v->arch.hvm_vcpu.msr_tsc_adjust = tsc_adjust; > } > > -u64 hvm_get_guest_tsc(struct vcpu *v) > +u64 hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc) > { > uint64_t tsc; > > if ( v->domain->arch.vtsc ) > { > - tsc = hvm_get_guest_time(v); > + tsc = hvm_get_guest_time_fixed(v, at_tsc); > tsc = gtime_to_gtsc(v->domain, tsc); > v->domain->arch.vtsc_kerncount++; > } > else > { > - rdtscll(tsc); > + if ( at_tsc ) > + tsc = at_tsc; Same here. > --- a/xen/arch/x86/hvm/save.c > +++ b/xen/arch/x86/hvm/save.c > @@ -24,7 +24,7 @@ > #include <asm/hvm/support.h> > #include <public/hvm/save.h> > > -void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) > +void arch_hvm_save(struct domain *dom, struct hvm_save_header *hdr) The change is unmotivated (afaict) and inconsistent with most other code - we conventionally use "d" for struct domain * variables. Please drop the change here and use "d" throughout the rest of the patch, at once resulting in less churn and hence making it easier to review. > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -232,12 +232,15 @@ bool_t hvm_send_assist_req(struct vcpu *v); > void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat); > int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat); > > -void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc); > -u64 hvm_get_guest_tsc(struct vcpu *v); > +void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc); > +#define hvm_set_guest_tsc(v, t) hvm_set_guest_tsc_fixed((v), (t), 0) > +u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc); > +#define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed((v), 0) > > void hvm_init_guest_time(struct domain *d); > void hvm_set_guest_time(struct vcpu *v, u64 guest_time); > -u64 hvm_get_guest_time(struct vcpu *v); > +u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc); > +#define hvm_get_guest_time(v) hvm_get_guest_time_fixed((v), 0) Pointless parentheses (several instances thereof). > index 95b4b91..032eb23 100644 > --- a/xen/include/xen/time.h > +++ b/xen/include/xen/time.h > @@ -32,7 +32,8 @@ struct vcpu; > typedef s64 s_time_t; > #define PRI_stime PRId64 > > -s_time_t get_s_time(void); > +s_time_t get_s_time_fixed(u64 at_tick); > +#define get_s_time() get_s_time_fixed(0) get_s_time(), through NOW(), has quite many users, so I'm not certain the code bloat resulting from this is desirable. I'd suggest the function to remain such; the compiler will be able to make it a mov+jmp. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-31 9:51 ` Jan Beulich @ 2014-03-31 14:01 ` Boris Ostrovsky 2014-03-31 14:08 ` Tian, Kevin 2014-03-31 14:34 ` Jan Beulich 0 siblings, 2 replies; 14+ messages in thread From: Boris Ostrovsky @ 2014-03-31 14:01 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, ian.campbell, stefano.stabellini, ian.jackson, eddie.dong, xen-devel, jun.nakajima, suravee.suthikulpanit On 03/31/2014 05:51 AM, Jan Beulich wrote: > >> --- a/xen/arch/x86/hvm/save.c >> +++ b/xen/arch/x86/hvm/save.c >> @@ -24,7 +24,7 @@ >> #include <asm/hvm/support.h> >> #include <public/hvm/save.h> >> >> -void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) >> +void arch_hvm_save(struct domain *dom, struct hvm_save_header *hdr) > The change is unmotivated (afaict) and inconsistent with most other > code - we conventionally use "d" for struct domain * variables. Please > drop the change here and use "d" throughout the rest of the patch, > at once resulting in less churn and hence making it easier to review. The reason for this change is subsequent rdtscll(dom->arch.chkpt_tsc); which will not work as rdtscll(d->arch.chkpt_tsc); The alternatives as I see are * Declare a temporary variable for use with rdtscll * Change rdtscll's definition to use something else instead of variable 'd'. Say, eax and edx (hopefully this won't clash with something else) > >> index 95b4b91..032eb23 100644 >> --- a/xen/include/xen/time.h >> +++ b/xen/include/xen/time.h >> @@ -32,7 +32,8 @@ struct vcpu; >> typedef s64 s_time_t; >> #define PRI_stime PRId64 >> >> -s_time_t get_s_time(void); >> +s_time_t get_s_time_fixed(u64 at_tick); >> +#define get_s_time() get_s_time_fixed(0) > get_s_time(), through NOW(), has quite many users, so I'm not certain > the code bloat resulting from this is desirable. I'd suggest the function > to remain such; the compiler will be able to make it a mov+jmp. Sorry, not sure I understand what you are asking for. There shouldn't be much of code size increase since get_s_time() currently (and get_s_time_fixed() after this patch is applied) are not inlines. The only increase is due to routine itself getting very slightly larger. But I suspect you meant something else. -boris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-31 14:01 ` Boris Ostrovsky @ 2014-03-31 14:08 ` Tian, Kevin 2014-03-31 14:34 ` Jan Beulich 1 sibling, 0 replies; 14+ messages in thread From: Tian, Kevin @ 2014-03-31 14:08 UTC (permalink / raw) To: Boris Ostrovsky, Jan Beulich Cc: ian.campbell@citrix.com, stefano.stabellini@eu.citrix.com, ian.jackson@eu.citrix.com, Dong, Eddie, xen-devel@lists.xen.org, Nakajima, Jun, suravee.suthikulpanit@amd.com > From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] > Sent: Monday, March 31, 2014 10:01 PM > > On 03/31/2014 05:51 AM, Jan Beulich wrote: > > > >> --- a/xen/arch/x86/hvm/save.c > >> +++ b/xen/arch/x86/hvm/save.c > >> @@ -24,7 +24,7 @@ > >> #include <asm/hvm/support.h> > >> #include <public/hvm/save.h> > >> > >> -void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) > >> +void arch_hvm_save(struct domain *dom, struct hvm_save_header *hdr) > > The change is unmotivated (afaict) and inconsistent with most other > > code - we conventionally use "d" for struct domain * variables. Please > > drop the change here and use "d" throughout the rest of the patch, > > at once resulting in less churn and hence making it easier to review. > > The reason for this change is subsequent > rdtscll(dom->arch.chkpt_tsc); > > which will not work as > rdtscll(d->arch.chkpt_tsc); > > The alternatives as I see are > * Declare a temporary variable for use with rdtscll > * Change rdtscll's definition to use something else instead of variable > 'd'. Say, eax and edx (hopefully this won't clash with something else) > use temp variable at least for now, which is cleaner regarding to the purpose of this patch. rdtscll() should be a separate patch, however imo it is not worthy of doing that. Thanks Kevin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-31 14:01 ` Boris Ostrovsky 2014-03-31 14:08 ` Tian, Kevin @ 2014-03-31 14:34 ` Jan Beulich 2014-03-31 15:06 ` Boris Ostrovsky 1 sibling, 1 reply; 14+ messages in thread From: Jan Beulich @ 2014-03-31 14:34 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, ian.campbell, stefano.stabellini, ian.jackson, eddie.dong, xen-devel, jun.nakajima, suravee.suthikulpanit >>> On 31.03.14 at 16:01, <boris.ostrovsky@oracle.com> wrote: > On 03/31/2014 05:51 AM, Jan Beulich wrote: >> >>> --- a/xen/arch/x86/hvm/save.c >>> +++ b/xen/arch/x86/hvm/save.c >>> @@ -24,7 +24,7 @@ >>> #include <asm/hvm/support.h> >>> #include <public/hvm/save.h> >>> >>> -void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) >>> +void arch_hvm_save(struct domain *dom, struct hvm_save_header *hdr) >> The change is unmotivated (afaict) and inconsistent with most other >> code - we conventionally use "d" for struct domain * variables. Please >> drop the change here and use "d" throughout the rest of the patch, >> at once resulting in less churn and hence making it easier to review. > > The reason for this change is subsequent > rdtscll(dom->arch.chkpt_tsc); > > which will not work as > rdtscll(d->arch.chkpt_tsc); > > The alternatives as I see are > * Declare a temporary variable for use with rdtscll > * Change rdtscll's definition to use something else instead of variable > 'd'. Say, eax and edx (hopefully this won't clash with something else) Yeah, the macro using "a" and "d" as variables is really bad, and it's that what should be changed. >>> index 95b4b91..032eb23 100644 >>> --- a/xen/include/xen/time.h >>> +++ b/xen/include/xen/time.h >>> @@ -32,7 +32,8 @@ struct vcpu; >>> typedef s64 s_time_t; >>> #define PRI_stime PRId64 >>> >>> -s_time_t get_s_time(void); >>> +s_time_t get_s_time_fixed(u64 at_tick); >>> +#define get_s_time() get_s_time_fixed(0) >> get_s_time(), through NOW(), has quite many users, so I'm not certain >> the code bloat resulting from this is desirable. I'd suggest the function >> to remain such; the compiler will be able to make it a mov+jmp. > > Sorry, not sure I understand what you are asking for. > > There shouldn't be much of code size increase since get_s_time() > currently (and get_s_time_fixed() after this patch is applied) are not > inlines. The only increase is due to routine itself getting very > slightly larger. > > But I suspect you meant something else. All call sites have to zero %edi with the change in place, and I was trying to tell you that the number of call sites of this isn't exactly small due to the function's use via NOW(). Hence I think you shouldn't penalize the callers and have an explicit out of line wrapper. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-31 14:34 ` Jan Beulich @ 2014-03-31 15:06 ` Boris Ostrovsky 2014-03-31 15:09 ` Jan Beulich 0 siblings, 1 reply; 14+ messages in thread From: Boris Ostrovsky @ 2014-03-31 15:06 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, ian.campbell, stefano.stabellini, ian.jackson, eddie.dong, xen-devel, jun.nakajima, suravee.suthikulpanit On 03/31/2014 10:34 AM, Jan Beulich wrote: > >>>> index 95b4b91..032eb23 100644 >>>> --- a/xen/include/xen/time.h >>>> +++ b/xen/include/xen/time.h >>>> @@ -32,7 +32,8 @@ struct vcpu; >>>> typedef s64 s_time_t; >>>> #define PRI_stime PRId64 >>>> >>>> -s_time_t get_s_time(void); >>>> +s_time_t get_s_time_fixed(u64 at_tick); >>>> +#define get_s_time() get_s_time_fixed(0) >>> get_s_time(), through NOW(), has quite many users, so I'm not certain >>> the code bloat resulting from this is desirable. I'd suggest the function >>> to remain such; the compiler will be able to make it a mov+jmp. >> Sorry, not sure I understand what you are asking for. >> >> There shouldn't be much of code size increase since get_s_time() >> currently (and get_s_time_fixed() after this patch is applied) are not >> inlines. The only increase is due to routine itself getting very >> slightly larger. >> >> But I suspect you meant something else. > All call sites have to zero %edi with the change in place, and I > was trying to tell you that the number of call sites of this isn't > exactly small due to the function's use via NOW(). Hence I think > you shouldn't penalize the callers and have an explicit out of line > wrapper. OK, I understand the concern now but still confused about what you want ;-( Two separate routines that only differ in how tsc is calculated (rdtscll vs. an 'in' parameter)? -boris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-31 15:06 ` Boris Ostrovsky @ 2014-03-31 15:09 ` Jan Beulich 0 siblings, 0 replies; 14+ messages in thread From: Jan Beulich @ 2014-03-31 15:09 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, ian.campbell, stefano.stabellini, ian.jackson, eddie.dong, xen-devel, jun.nakajima, suravee.suthikulpanit >>> On 31.03.14 at 17:06, <boris.ostrovsky@oracle.com> wrote: > On 03/31/2014 10:34 AM, Jan Beulich wrote: >> >>>>> index 95b4b91..032eb23 100644 >>>>> --- a/xen/include/xen/time.h >>>>> +++ b/xen/include/xen/time.h >>>>> @@ -32,7 +32,8 @@ struct vcpu; >>>>> typedef s64 s_time_t; >>>>> #define PRI_stime PRId64 >>>>> >>>>> -s_time_t get_s_time(void); >>>>> +s_time_t get_s_time_fixed(u64 at_tick); >>>>> +#define get_s_time() get_s_time_fixed(0) >>>> get_s_time(), through NOW(), has quite many users, so I'm not certain >>>> the code bloat resulting from this is desirable. I'd suggest the function >>>> to remain such; the compiler will be able to make it a mov+jmp. >>> Sorry, not sure I understand what you are asking for. >>> >>> There shouldn't be much of code size increase since get_s_time() >>> currently (and get_s_time_fixed() after this patch is applied) are not >>> inlines. The only increase is due to routine itself getting very >>> slightly larger. >>> >>> But I suspect you meant something else. >> All call sites have to zero %edi with the change in place, and I >> was trying to tell you that the number of call sites of this isn't >> exactly small due to the function's use via NOW(). Hence I think >> you shouldn't penalize the callers and have an explicit out of line >> wrapper. > > OK, I understand the concern now but still confused about what you want ;-( > > Two separate routines that only differ in how tsc is calculated (rdtscll > vs. an 'in' parameter)? No - I'm fine with get_s_time() being a wrapper for get_s_time_fixed(), just that I ask it to be a proper out-of-line function rather than an inline one or a macro. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-03-30 3:05 ` [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky 2014-03-31 9:51 ` Jan Beulich @ 2014-03-31 15:29 ` Tian, Kevin 1 sibling, 0 replies; 14+ messages in thread From: Tian, Kevin @ 2014-03-31 15:29 UTC (permalink / raw) To: Boris Ostrovsky, xen-devel@lists.xen.org Cc: suravee.suthikulpanit@amd.com, stefano.stabellini@eu.citrix.com, Dong, Eddie, ian.jackson@eu.citrix.com, jbeulich@suse.com, Nakajima, Jun, ian.campbell@citrix.com > From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] > Sent: Sunday, March 30, 2014 11:06 AM > > When a domain is saved each VCPU's TSC value needs to be preserved. To get > it we > use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() which > it may call) calculates VCPU's TSC based on current host's TSC value (by doing > a > rdtscll()). Since this is performed for each VCPU separately we end up with > un-synchronized TSCs. > > Similarly, during a restore each VCPU is assigned its TSC based on host's > current > tick, causing virtual TSCs to diverge further. > > With this, we can easily get into situation where a guest may see time going > backwards. > > Instead of reading new TSC value for each VCPU when saving/restoring it we > should > use the same value across all VCPUs. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Somehow the new ***_fixed interfaces looked a bit overkilled, however I don't have a better idea now except introducing a new hypercall to save/restore all VCPU TSCs together, which however doesn't solve the existing versions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Thanks Kevin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] Time-related fixes for migration 2014-03-30 3:05 [PATCH 0/2] Time-related fixes for migration Boris Ostrovsky 2014-03-30 3:05 ` [PATCH 1/2] libxl: Set guest parameters from config file during a restore Boris Ostrovsky 2014-03-30 3:05 ` [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky @ 2014-03-31 14:41 ` Tian, Kevin 2014-03-31 15:30 ` Boris Ostrovsky 2 siblings, 1 reply; 14+ messages in thread From: Tian, Kevin @ 2014-03-31 14:41 UTC (permalink / raw) To: Boris Ostrovsky, xen-devel@lists.xen.org Cc: suravee.suthikulpanit@amd.com, stefano.stabellini@eu.citrix.com, Dong, Eddie, ian.jackson@eu.citrix.com, jbeulich@suse.com, Nakajima, Jun, ian.campbell@citrix.com > From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] > Sent: Sunday, March 30, 2014 11:06 AM > > Two patches to address time-related problem that we discovered during > migration testing. > > * The first patch loads HVM parameters from configuration file during > restore. > To fix the actual problem that we saw only timer_mode needed to be restored > but > it seems to me that other parameters are needed as well since at least some > of > them are used at runtime. > > The bug can be demonstrated with a Solaris guest but I haven't been able to > trigger it with Linux. Possibly because Solaris' gethrtime() routine (which is > what the test was using) is a trap to kernel's hrtimer which reports global > time and performs some adjustments to per-CPU clock. this looks a right thing to fix. > > * The second patch keeps TSCs synchronized across VPCUs after save/restore. > Currently TSC values diverge after migration because during both save and > restore > we calculate them separately for each VCPU and base each calculation on > newly-read host's TSC. > > The problem can be easily demonstrated with this program for a 2-VCPU guest > (I am assuming here invariant TSC so, for example, > tsc_mode="always_emulate" (*)): > > int > main(int argc, char* argv[]) > { > > unsigned long long h = 0LL; > int proc = 0; > cpu_set_t set; > > for(;;) { > unsigned long long n = __native_read_tsc(); > if(h && n < h) > printf("prev 0x%llx cur 0x%llx\n", h, n); > CPU_ZERO(&set); > proc = (proc + 1) & 1; > CPU_SET(proc, &set); > if (sched_setaffinity(0, sizeof(cpu_set_t), &set)) { > perror("sched_setaffinity"); > exit(1); > } > > h = n; > } > } > what's the backward drift range from above program? dozens of cycles? hundreds of cycles? > > (*) Which brings up another observation: when we are in default tsc_mode we > start off with vtsc=0 and thus clear TSC_Invariant bit in guest's CPUID. > After migration vtsc is 1 and TSC_Invariant bit is set. So the guest may observe > different values of CPUID. Which technically reflects the fact that TSC became > "safe" but I think potentially may be problematic to some guests. > > > Boris Ostrovsky (2): > libxl: Set guest parameters from config file during a restore > x86/HVM: Use fixed TSC value when saving or restoring domain > > tools/libxl/libxl_dom.c | 51 > +++++++++++++++++++++++++---------------- > xen/arch/x86/hvm/hvm.c | 18 ++++++++++----- > xen/arch/x86/hvm/save.c | 36 +++++++++++++++++++++-------- > xen/arch/x86/hvm/svm/svm.c | 4 ++-- > xen/arch/x86/hvm/vmx/vmx.c | 4 ++-- > xen/arch/x86/hvm/vpt.c | 16 ++++++++----- > xen/arch/x86/time.c | 7 ++++-- > xen/common/hvm/save.c | 5 ++++ > xen/include/asm-x86/domain.h | 2 ++ > xen/include/asm-x86/hvm/hvm.h | 9 +++++--- > xen/include/xen/hvm/save.h | 2 ++ > xen/include/xen/time.h | 3 ++- > 12 files changed, 105 insertions(+), 52 deletions(-) > > -- > 1.7.10.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] Time-related fixes for migration 2014-03-31 14:41 ` [PATCH 0/2] Time-related fixes for migration Tian, Kevin @ 2014-03-31 15:30 ` Boris Ostrovsky 0 siblings, 0 replies; 14+ messages in thread From: Boris Ostrovsky @ 2014-03-31 15:30 UTC (permalink / raw) To: Tian, Kevin Cc: ian.campbell@citrix.com, stefano.stabellini@eu.citrix.com, Nakajima, Jun, Dong, Eddie, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, jbeulich@suse.com, suravee.suthikulpanit@amd.com On 03/31/2014 10:41 AM, Tian, Kevin wrote: >> * The second patch keeps TSCs synchronized across VPCUs after save/restore. >> Currently TSC values diverge after migration because during both save and >> restore >> we calculate them separately for each VCPU and base each calculation on >> newly-read host's TSC. >> >> The problem can be easily demonstrated with this program for a 2-VCPU guest >> (I am assuming here invariant TSC so, for example, >> tsc_mode="always_emulate" (*)): >> >> int >> main(int argc, char* argv[]) >> { >> >> unsigned long long h = 0LL; >> int proc = 0; >> cpu_set_t set; >> >> for(;;) { >> unsigned long long n = __native_read_tsc(); >> if(h && n < h) >> printf("prev 0x%llx cur 0x%llx\n", h, n); >> CPU_ZERO(&set); >> proc = (proc + 1) & 1; >> CPU_SET(proc, &set); >> if (sched_setaffinity(0, sizeof(cpu_set_t), &set)) { >> perror("sched_setaffinity"); >> exit(1); >> } >> >> h = n; >> } >> } >> > what's the backward drift range from above program? dozens of cycles? > hundreds of cycles? For "raw" difference (i.e. TSC registers themselves) it's usually tens of thousands, sometimes more (and sometimes *much* more). For example, here are outputs of 'xl debug-key v' before and after a migrate for a 2p guest: root@haswell> xl dmesg |grep Offset (XEN) TSC Offset = ffffff54e63f1cab (XEN) TSC Offset = ffffff54e63f1cab (XEN) TSC Offset = ffffff6cb0a59ea9 (XEN) TSC Offset = ffffff6cb0a566ae root@haswell> For guest's view, taking into account, for example, the fact that sched_affinity() takes quite some time, it's a few hundreds of cycles. And obviously as you inclrease number of VCPUs (and I think guest memory as well) the largest difference grows further. -boris ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-04-01 13:45 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-30 3:05 [PATCH 0/2] Time-related fixes for migration Boris Ostrovsky 2014-03-30 3:05 ` [PATCH 1/2] libxl: Set guest parameters from config file during a restore Boris Ostrovsky 2014-04-01 10:37 ` Ian Campbell 2014-04-01 13:45 ` Boris Ostrovsky 2014-03-30 3:05 ` [PATCH 2/2] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky 2014-03-31 9:51 ` Jan Beulich 2014-03-31 14:01 ` Boris Ostrovsky 2014-03-31 14:08 ` Tian, Kevin 2014-03-31 14:34 ` Jan Beulich 2014-03-31 15:06 ` Boris Ostrovsky 2014-03-31 15:09 ` Jan Beulich 2014-03-31 15:29 ` Tian, Kevin 2014-03-31 14:41 ` [PATCH 0/2] Time-related fixes for migration Tian, Kevin 2014-03-31 15:30 ` Boris Ostrovsky
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.