* [PATCH v4 0/3] Time-related fixes for migration @ 2014-04-16 1:27 Boris Ostrovsky 2014-04-16 1:27 ` [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 1:27 UTC (permalink / raw) To: xen-devel Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich, jun.nakajima, boris.ostrovsky Version 4: While testing on AMD processors I realized that TSC scaling has a number of problems, some of which need to use the same interfaces as I added for migration bugs. I therefore decided to fold TSC scaling fixes into this series. Three patches: 1 Revert the change introduced by 4aab59a3 where we stop intercepting RDTSC(P) if TSC scaling is supported. This was done in the wrong place resulting in guest running without intercepts but with vtsc on. This patch also allows us to continue running in tsc native mode after migration if frequency stays the same (which is what docs/misc/tscmode.txt implies also) 2 It looks like TSC scaling enablement logic was inverted: we should be using it when running in tsc native mode, which is not what happens now. We also need to do some work to synchronize TSCs during initial boot when TSC scaling is on. 3 The remainder of the original patch to cover TSC synchronization after migration. During v3 review Jan requested that I drop the at_tsc argument to hvm_set_guest_tsc_fixed() since it should always be possible to safely examine chkpt_tsc (now called sync_tsc). This is no longer the case because we use this variable also during boot and so I left the interface as it was in V3 (and dropped arch_hvm_save_done() where sync_tsc is cleared since it doesn't add anything) Version 3: * Only the second patch is submitted (the first one has been applied) * More thorough AMD support (work around rdtscll() in svm_set_tsc_offset()) Version 2: * Avoid setting parameters from config file twice * Use out-of-line definition of get_s_time() * Update rdtscll macro definition to avoid namespace clashes * Syntax cleanup Two patches to address issues we discovered during migration testing. * The first patch loads HVM parameters from configuration file during restore. To fix the actual problem that we saw only timer_mode needed to be restored but it seems to me that other parameters are needed as well since at least some of them are used "at runtime". The bug can be demonstrated with a Solaris guest but I haven't been able to trigger it with Linux. Possibly because Solaris's gethrtime() routine is a fast trap to kernel's hrtimer which does some tricks to account for missed ticks during interrupts. * The second patch keeps TSCs synchronized across VPCUs after save/restore. Currently TSC values diverge after migration because during both save and restore we calculate them separately for each VCPU and base each calculation on newly-read host's TSC. The problem can be easily demonstrated with this program for a 2-VCPU guest (I am assuming here invariant TSC so, for example, tsc_mode="always_emulate" (*)): int main(int argc, char* argv[]) { unsigned long long h = 0LL; int proc = 0; cpu_set_t set; for(;;) { unsigned long long n = __native_read_tsc(); if(h && n < h) printf("prev 0x%llx cur 0x%llx\n", h, n); CPU_ZERO(&set); proc = (proc + 1) & 1; CPU_SET(proc, &set); if (sched_setaffinity(0, sizeof(cpu_set_t), &set)) { perror("sched_setaffinity"); exit(1); } h = n; } } (*) Which brings up another observation: when we are in default tsc_mode we start off with vtsc=0 and thus clear TSC_Invariant bit in guest's CPUID. After migration vtsc is 1 and TSC_Invariant bit is set. So the guest may observe different values of CPUID. Which technically reflects the fact that TSC became "safe" but I think potentially may be problematic to some guests. Boris Ostrovsky (3): x86: Use native RDTSC(P) execution when guest and host frequencies are the same x86/svn: Enable TSC scaling x86/HVM: Use fixed TSC value when saving or restoring domain xen/arch/x86/hvm/hvm.c | 23 ++++++++++++++------ xen/arch/x86/hvm/save.c | 6 +++++ xen/arch/x86/hvm/svm/svm.c | 22 +++++++++++-------- xen/arch/x86/hvm/vmx/vmx.c | 7 +++-- xen/arch/x86/hvm/vmx/vvmx.c | 4 +- xen/arch/x86/hvm/vpt.c | 16 +++++++++----- xen/arch/x86/time.c | 42 ++++++++++++++++++++++++++++++++----- xen/include/asm-x86/hvm/domain.h | 6 +++++ xen/include/asm-x86/hvm/hvm.h | 11 ++++++--- xen/include/asm-x86/msr.h | 6 ++-- xen/include/xen/time.h | 1 + 11 files changed, 104 insertions(+), 40 deletions(-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 1:27 [PATCH v4 0/3] Time-related fixes for migration Boris Ostrovsky @ 2014-04-16 1:27 ` Boris Ostrovsky 2014-04-16 9:54 ` Jan Beulich 2014-04-16 11:38 ` Jan Beulich 2014-04-16 1:27 ` [PATCH v4 2/3] x86/svn: Enable TSC scaling Boris Ostrovsky 2014-04-16 1:27 ` [PATCH v4 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky 2 siblings, 2 replies; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 1:27 UTC (permalink / raw) To: xen-devel Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich, jun.nakajima, boris.ostrovsky We should be able to continue using native RDTSC(P) execution after migration if host and guest frequencies are equal (this includes the case when the frequencies are made equal by TSC scaling feature) This also allows us to revert main part of commit 4aab59a3 (svm: Do not intercept RDTSC(P) when TSC scaling is supported by hardware) which was wrong: while RDTSC intercepts were disabled domain's vtsc could still be set, leading to inconsistent view of guest's TSC. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/svm/svm.c | 2 +- xen/arch/x86/time.c | 13 +++++++++---- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 4fd5376..813e775 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -728,7 +728,7 @@ static void svm_set_rdtsc_exiting(struct vcpu *v, bool_t enable) general1_intercepts &= ~GENERAL1_INTERCEPT_RDTSC; general2_intercepts &= ~GENERAL2_INTERCEPT_RDTSCP; - if ( enable && !cpu_has_tsc_ratio ) + if ( enable ) { general1_intercepts |= GENERAL1_INTERCEPT_RDTSC; general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index f904af2..b84d127 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -37,6 +37,7 @@ #include <asm/hpet.h> #include <io_ports.h> #include <asm/setup.h> /* for early_time_init */ +#include <asm/hvm/svm/svm.h> /* for cpu_has_tsc_ratio */ #include <public/arch-x86/cpuid.h> /* opt_clocksource: Force clocksource to one of: pit, hpet, acpi. */ @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, d->arch.vtsc_offset = get_s_time() - elapsed_nsec; d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); - /* use native TSC if initial host has safe TSC, has not migrated - * yet and tsc_khz == cpu_khz */ - if ( host_tsc_is_safe() && incarnation == 0 && - d->arch.tsc_khz == cpu_khz ) + /* + * Use native TSC if initial host has safe TSC and either has not + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via + * TSC scaling) + */ + if ( host_tsc_is_safe() && + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || + cpu_has_tsc_ratio) ) d->arch.vtsc = 0; else d->arch.ns_to_vtsc = scale_reciprocal(d->arch.vtsc_to_ns); -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 1:27 ` [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky @ 2014-04-16 9:54 ` Jan Beulich 2014-04-16 13:02 ` Boris Ostrovsky 2014-04-16 11:38 ` Jan Beulich 1 sibling, 1 reply; 16+ messages in thread From: Jan Beulich @ 2014-04-16 9:54 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel >>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: > @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, > d->arch.vtsc_offset = get_s_time() - elapsed_nsec; > d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; > set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); > - /* use native TSC if initial host has safe TSC, has not migrated > - * yet and tsc_khz == cpu_khz */ > - if ( host_tsc_is_safe() && incarnation == 0 && > - d->arch.tsc_khz == cpu_khz ) > + /* > + * Use native TSC if initial host has safe TSC and either has not > + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via > + * TSC scaling) > + */ > + if ( host_tsc_is_safe() && > + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || > + cpu_has_tsc_ratio) ) Can't you drop checking incarnation to be zero then? In that case, afaict d->arch.tsc_khz == cpu_khz due to gtsc_khz being passed in as zero from arch_domain_create(). Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 9:54 ` Jan Beulich @ 2014-04-16 13:02 ` Boris Ostrovsky 0 siblings, 0 replies; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 13:02 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel On 04/16/2014 05:54 AM, Jan Beulich wrote: >>>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: >> @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, >> d->arch.vtsc_offset = get_s_time() - elapsed_nsec; >> d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; >> set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); >> - /* use native TSC if initial host has safe TSC, has not migrated >> - * yet and tsc_khz == cpu_khz */ >> - if ( host_tsc_is_safe() && incarnation == 0 && >> - d->arch.tsc_khz == cpu_khz ) >> + /* >> + * Use native TSC if initial host has safe TSC and either has not >> + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via >> + * TSC scaling) >> + */ >> + if ( host_tsc_is_safe() && >> + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || >> + cpu_has_tsc_ratio) ) > Can't you drop checking incarnation to be zero then? In that case, > afaict d->arch.tsc_khz == cpu_khz due to gtsc_khz being passed > in as zero from arch_domain_create(). Yes, it's pointless. I was thinking that maybe toolstack may pass gtsc_khz != cpu_khz during initial boot (it doesn't do it now, but theoretically it could) but then we shouldn't be using vtsc=0 in that case anyway. -boris ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 1:27 ` [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky 2014-04-16 9:54 ` Jan Beulich @ 2014-04-16 11:38 ` Jan Beulich 2014-04-16 14:28 ` Boris Ostrovsky 1 sibling, 1 reply; 16+ messages in thread From: Jan Beulich @ 2014-04-16 11:38 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel >>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: > @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, > d->arch.vtsc_offset = get_s_time() - elapsed_nsec; > d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; > set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); > - /* use native TSC if initial host has safe TSC, has not migrated > - * yet and tsc_khz == cpu_khz */ > - if ( host_tsc_is_safe() && incarnation == 0 && > - d->arch.tsc_khz == cpu_khz ) > + /* > + * Use native TSC if initial host has safe TSC and either has not > + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via > + * TSC scaling) > + */ > + if ( host_tsc_is_safe() && > + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || > + cpu_has_tsc_ratio) ) Doesn't this cpu_has_tsc_ratio check also need to be qualified with is_pv_domain()? And is the change from && in the old condition to || actually valid for PV guests? Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 11:38 ` Jan Beulich @ 2014-04-16 14:28 ` Boris Ostrovsky 2014-04-16 14:37 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 14:28 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel On 04/16/2014 07:38 AM, Jan Beulich wrote: >>>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: >> @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, >> d->arch.vtsc_offset = get_s_time() - elapsed_nsec; >> d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; >> set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); >> - /* use native TSC if initial host has safe TSC, has not migrated >> - * yet and tsc_khz == cpu_khz */ >> - if ( host_tsc_is_safe() && incarnation == 0 && >> - d->arch.tsc_khz == cpu_khz ) >> + /* >> + * Use native TSC if initial host has safe TSC and either has not >> + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via >> + * TSC scaling) >> + */ >> + if ( host_tsc_is_safe() && >> + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || >> + cpu_has_tsc_ratio) ) > Doesn't this cpu_has_tsc_ratio check also need to be qualified with > is_pv_domain()? And is the change from && in the old condition to || > actually valid for PV guests? Hmm, I haven't thought about PV here. So then the condition should be if ( host_tsc_is_safe() ) { if ( (is_hvm_domain() && (arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio)) || (incarnation == 0 && d->arch.tsc_khz == cpu_khz) ) d->arch.vtsc = 0; } -boris ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 14:28 ` Boris Ostrovsky @ 2014-04-16 14:37 ` Jan Beulich 2014-04-16 15:33 ` Boris Ostrovsky 0 siblings, 1 reply; 16+ messages in thread From: Jan Beulich @ 2014-04-16 14:37 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel >>> On 16.04.14 at 16:28, <boris.ostrovsky@oracle.com> wrote: > On 04/16/2014 07:38 AM, Jan Beulich wrote: >>>>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: >>> @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, >>> d->arch.vtsc_offset = get_s_time() - elapsed_nsec; >>> d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; >>> set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); >>> - /* use native TSC if initial host has safe TSC, has not migrated >>> - * yet and tsc_khz == cpu_khz */ >>> - if ( host_tsc_is_safe() && incarnation == 0 && >>> - d->arch.tsc_khz == cpu_khz ) >>> + /* >>> + * Use native TSC if initial host has safe TSC and either has not >>> + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via >>> + * TSC scaling) >>> + */ >>> + if ( host_tsc_is_safe() && >>> + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || >>> + cpu_has_tsc_ratio) ) >> Doesn't this cpu_has_tsc_ratio check also need to be qualified with >> is_pv_domain()? And is the change from && in the old condition to || >> actually valid for PV guests? > > Hmm, I haven't thought about PV here. > > So then the condition should be > > if ( host_tsc_is_safe() ) > { > if ( (is_hvm_domain() && (arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio)) || > (incarnation == 0 && d->arch.tsc_khz == cpu_khz) ) > d->arch.vtsc = 0; > } Almost - to include PVH you need to either use !is_pv_domain() or has_hvm_container_domain(). Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 14:37 ` Jan Beulich @ 2014-04-16 15:33 ` Boris Ostrovsky 2014-04-16 16:15 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 15:33 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel On 04/16/2014 10:37 AM, Jan Beulich wrote: >>>> On 16.04.14 at 16:28, <boris.ostrovsky@oracle.com> wrote: >> On 04/16/2014 07:38 AM, Jan Beulich wrote: >>>>>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: >>>> @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, >>>> d->arch.vtsc_offset = get_s_time() - elapsed_nsec; >>>> d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; >>>> set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); >>>> - /* use native TSC if initial host has safe TSC, has not migrated >>>> - * yet and tsc_khz == cpu_khz */ >>>> - if ( host_tsc_is_safe() && incarnation == 0 && >>>> - d->arch.tsc_khz == cpu_khz ) >>>> + /* >>>> + * Use native TSC if initial host has safe TSC and either has not >>>> + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via >>>> + * TSC scaling) >>>> + */ >>>> + if ( host_tsc_is_safe() && >>>> + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || >>>> + cpu_has_tsc_ratio) ) >>> Doesn't this cpu_has_tsc_ratio check also need to be qualified with >>> is_pv_domain()? And is the change from && in the old condition to || >>> actually valid for PV guests? >> Hmm, I haven't thought about PV here. >> >> So then the condition should be >> >> if ( host_tsc_is_safe() ) >> { >> if ( (is_hvm_domain() && (arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio)) || >> (incarnation == 0 && d->arch.tsc_khz == cpu_khz) ) >> d->arch.vtsc = 0; >> } > Almost - to include PVH you need to either use !is_pv_domain() or > has_hvm_container_domain(). PVH never makes here, it is forced to use TSC_MODE_NEVER_EMULATE above (see pvhfixme above). PVH need to be looked at anyway. For example, there is a is_hvm_domain() check at the bottom which I suspect needs to be PVH-safe. I think it should be addressed separately though. -boris ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 15:33 ` Boris Ostrovsky @ 2014-04-16 16:15 ` Jan Beulich 2014-04-16 16:44 ` Boris Ostrovsky 0 siblings, 1 reply; 16+ messages in thread From: Jan Beulich @ 2014-04-16 16:15 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, xen-devel, jun.nakajima >>> On 16.04.14 at 17:33, <boris.ostrovsky@oracle.com> wrote: > On 04/16/2014 10:37 AM, Jan Beulich wrote: >>>>> On 16.04.14 at 16:28, <boris.ostrovsky@oracle.com> wrote: >>> On 04/16/2014 07:38 AM, Jan Beulich wrote: >>>>>>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: >>>>> @@ -1889,10 +1890,14 @@ void tsc_set_info(struct domain *d, >>>>> d->arch.vtsc_offset = get_s_time() - elapsed_nsec; >>>>> d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz; >>>>> set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 ); >>>>> - /* use native TSC if initial host has safe TSC, has not migrated >>>>> - * yet and tsc_khz == cpu_khz */ >>>>> - if ( host_tsc_is_safe() && incarnation == 0 && >>>>> - d->arch.tsc_khz == cpu_khz ) >>>>> + /* >>>>> + * Use native TSC if initial host has safe TSC and either has not >>>>> + * migrated yet or tsc_khz == cpu_khz (either "naturally" or via >>>>> + * TSC scaling) >>>>> + */ >>>>> + if ( host_tsc_is_safe() && >>>>> + (incarnation == 0 || d->arch.tsc_khz == cpu_khz || >>>>> + cpu_has_tsc_ratio) ) >>>> Doesn't this cpu_has_tsc_ratio check also need to be qualified with >>>> is_pv_domain()? And is the change from && in the old condition to || >>>> actually valid for PV guests? >>> Hmm, I haven't thought about PV here. >>> >>> So then the condition should be >>> >>> if ( host_tsc_is_safe() ) >>> { >>> if ( (is_hvm_domain() && (arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio)) > || >>> (incarnation == 0 && d->arch.tsc_khz == cpu_khz) ) >>> d->arch.vtsc = 0; >>> } >> Almost - to include PVH you need to either use !is_pv_domain() or >> has_hvm_container_domain(). > > PVH never makes here, it is forced to use TSC_MODE_NEVER_EMULATE above > (see pvhfixme above). Please sort this out with Mukesh - I would generally have thought that time handling should be HVM-like for PVH, and was surprised (in the sense that I didn't recall) to find that fixme comment there when reviewing your patch. > PVH need to be looked at anyway. For example, there is a is_hvm_domain() > check at the bottom which I suspect needs to be PVH-safe. Indeed this all looks rather suspicious... Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 16:15 ` Jan Beulich @ 2014-04-16 16:44 ` Boris Ostrovsky 2014-04-17 6:29 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 16:44 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, xen-devel, jun.nakajima On 04/16/2014 12:15 PM, Jan Beulich wrote: > Please sort this out with Mukesh - I would generally have thought that > time handling should be HVM-like for PVH, and was surprised (in the > sense that I didn't recall) to find that fixme comment there when > reviewing your patch. Do you want PVH issues be resolved in this series or separately? -boris > >> PVH need to be looked at anyway. For example, there is a is_hvm_domain() >> check at the bottom which I suspect needs to be PVH-safe. > Indeed this all looks rather suspicious... > > Jan > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same 2014-04-16 16:44 ` Boris Ostrovsky @ 2014-04-17 6:29 ` Jan Beulich 0 siblings, 0 replies; 16+ messages in thread From: Jan Beulich @ 2014-04-17 6:29 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, xen-devel, jun.nakajima >>> On 16.04.14 at 18:44, <boris.ostrovsky@oracle.com> wrote: > On 04/16/2014 12:15 PM, Jan Beulich wrote: >> Please sort this out with Mukesh - I would generally have thought that >> time handling should be HVM-like for PVH, and was surprised (in the >> sense that I didn't recall) to find that fixme comment there when >> reviewing your patch. > > Do you want PVH issues be resolved in this series or separately? Either way is fine; I'd just like you to work out with Mukesh what the right predicate in this specific patch should be, as there's no reason to put in the wrong one now just to have it changed again as soon as PVH gets fixed. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 2/3] x86/svn: Enable TSC scaling 2014-04-16 1:27 [PATCH v4 0/3] Time-related fixes for migration Boris Ostrovsky 2014-04-16 1:27 ` [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky @ 2014-04-16 1:27 ` Boris Ostrovsky 2014-04-16 10:21 ` Jan Beulich 2014-04-16 10:53 ` Andrew Cooper 2014-04-16 1:27 ` [PATCH v4 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky 2 siblings, 2 replies; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 1:27 UTC (permalink / raw) To: xen-devel Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich, jun.nakajima, boris.ostrovsky TSC ratio enabling logic is inverted: we want to use it when we are running in native tsc mode, i.e. when d->arch.vtsc is zero. Also, since now svm_set_tsc_offset()'s calculations depend on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after vtsc changes in tsc_set_info(). In addition, with TSC ratio enabled, svm_set_tsc_offset() will need to do rdtsc. With that we may end up having TSCs on guest's processors out of sync. d->arch.hvm_domain.sync_tsc which is set by the boot processor can now be used by APs as reference TSC value instead of host's current TSC. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/hvm.c | 15 ++++++++++----- xen/arch/x86/hvm/svm/svm.c | 15 +++++++++------ xen/arch/x86/hvm/vmx/vmx.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 4 ++-- xen/arch/x86/hvm/vpt.c | 16 ++++++++++------ xen/arch/x86/time.c | 29 +++++++++++++++++++++++++++-- xen/include/asm-x86/hvm/domain.h | 6 ++++++ xen/include/asm-x86/hvm/hvm.h | 8 +++++--- xen/include/asm-x86/msr.h | 6 +++--- xen/include/xen/time.h | 1 + 10 files changed, 74 insertions(+), 28 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 38c491e..2a7824c 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -275,27 +275,31 @@ void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc) - v->arch.hvm_vcpu.cache_tsc_offset; v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc; - hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset); + hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0); } void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust) { v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust - v->arch.hvm_vcpu.msr_tsc_adjust; - hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset); + hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0); v->arch.hvm_vcpu.msr_tsc_adjust = tsc_adjust; } -u64 hvm_get_guest_tsc(struct vcpu *v) +u64 hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc) { uint64_t tsc; if ( v->domain->arch.vtsc ) { - tsc = hvm_get_guest_time(v); + tsc = hvm_get_guest_time_fixed(v, at_tsc); tsc = gtime_to_gtsc(v->domain, tsc); v->domain->arch.vtsc_kerncount++; } + else if ( at_tsc ) + { + tsc = at_tsc; + } else { rdtscll(tsc); @@ -3848,7 +3852,8 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip) /* Sync AP's TSC with BSP's. */ v->arch.hvm_vcpu.cache_tsc_offset = v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset; - hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset); + hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, + d->arch.hvm_domain.sync_tsc); v->arch.hvm_vcpu.msr_tsc_adjust = 0; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 813e775..e0f2e9e 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -680,7 +680,7 @@ static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc, return guest_tsc - offset; } -static void svm_set_tsc_offset(struct vcpu *v, u64 offset) +static void svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc) { struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb; struct vmcb_struct *n1vmcb, *n2vmcb; @@ -688,11 +688,14 @@ static void svm_set_tsc_offset(struct vcpu *v, u64 offset) struct domain *d = v->domain; uint64_t host_tsc, guest_tsc; - guest_tsc = hvm_get_guest_tsc(v); + guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc); /* Re-adjust the offset value when TSC_RATIO is available */ - if ( cpu_has_tsc_ratio && d->arch.vtsc ) { - rdtscll(host_tsc); + if ( cpu_has_tsc_ratio && !d->arch.vtsc ) { + if ( at_tsc ) + host_tsc = at_tsc; + else + rdtscll(host_tsc); offset = svm_get_tsc_offset(host_tsc, guest_tsc, vcpu_tsc_ratio(v)); } @@ -847,13 +850,13 @@ static int svm_update_lwp_cfg(struct vcpu *v, uint64_t msr_content) static inline void svm_tsc_ratio_save(struct vcpu *v) { /* Other vcpus might not have vtsc enabled. So disable TSC_RATIO here. */ - if ( cpu_has_tsc_ratio && v->domain->arch.vtsc ) + if ( cpu_has_tsc_ratio && !v->domain->arch.vtsc ) wrmsrl(MSR_AMD64_TSC_RATIO, DEFAULT_TSC_RATIO); } static inline void svm_tsc_ratio_load(struct vcpu *v) { - if ( cpu_has_tsc_ratio && v->domain->arch.vtsc ) + if ( cpu_has_tsc_ratio && !v->domain->arch.vtsc ) wrmsrl(MSR_AMD64_TSC_RATIO, vcpu_tsc_ratio(v)); } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 77ce167..402ed95 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1052,7 +1052,7 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value) } } -static void vmx_set_tsc_offset(struct vcpu *v, u64 offset) +static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc) { vmx_vmcs_enter(v); diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 40167d6..e263376 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1058,7 +1058,7 @@ static void load_shadow_guest_state(struct vcpu *v) if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL ) hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL)); - hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset); + hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0); vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmentry_fields), vmentry_fields); @@ -1259,7 +1259,7 @@ static void load_vvmcs_host_state(struct vcpu *v) if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL ) hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL)); - hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset); + hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0); __set_vvmcs(vvmcs, VM_ENTRY_INTR_INFO, 0); } diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c index f7af688..38541cf 100644 --- a/xen/arch/x86/hvm/vpt.c +++ b/xen/arch/x86/hvm/vpt.c @@ -36,7 +36,7 @@ void hvm_init_guest_time(struct domain *d) pl->last_guest_time = 0; } -u64 hvm_get_guest_time(struct vcpu *v) +u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc) { struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time; u64 now; @@ -45,11 +45,15 @@ u64 hvm_get_guest_time(struct vcpu *v) ASSERT(is_hvm_vcpu(v)); spin_lock(&pl->pl_time_lock); - now = get_s_time() + pl->stime_offset; - if ( (int64_t)(now - pl->last_guest_time) > 0 ) - pl->last_guest_time = now; - else - now = ++pl->last_guest_time; + now = get_s_time_fixed(at_tsc) + pl->stime_offset; + + if ( !at_tsc ) + { + if ( (int64_t)(now - pl->last_guest_time) > 0 ) + pl->last_guest_time = now; + else + now = ++pl->last_guest_time; + } spin_unlock(&pl->pl_time_lock); return now + v->arch.hvm_vcpu.stime_offset; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index b84d127..31cf35f 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -716,19 +716,27 @@ static unsigned long get_cmos_time(void) * System Time ***************************************************************************/ -s_time_t get_s_time(void) +s_time_t get_s_time_fixed(u64 at_tsc) { struct cpu_time *t = &this_cpu(cpu_time); u64 tsc, delta; s_time_t now; - rdtscll(tsc); + if ( at_tsc ) + tsc = at_tsc; + else + rdtscll(tsc); delta = tsc - t->local_tsc_stamp; now = t->stime_local_stamp + scale_delta(delta, &t->tsc_scale); return now; } +s_time_t get_s_time() +{ + return get_s_time_fixed(0); +} + uint64_t tsc_ticks2ns(uint64_t ticks) { struct cpu_time *t = &this_cpu(cpu_time); @@ -1922,7 +1930,24 @@ void tsc_set_info(struct domain *d, } d->arch.incarnation = incarnation + 1; if ( is_hvm_domain(d) ) + { hvm_set_rdtsc_exiting(d, d->arch.vtsc); + if ( d->vcpu && d->vcpu[0] && incarnation == 0 ) + { + /* + * set_tsc_offset() is called from hvm_vcpu_initialise() before + * tsc_set_info(). New vtsc mode may require recomputing TSC + * offset. + * We only need to do this for BSP during initial boot. APs will + * call set_tsc_offset() later from hvm_vcpu_reset_state() and they + * will sync their TSC to BSP's sync_tsc. + */ + rdtscll(d->arch.hvm_domain.sync_tsc); + hvm_funcs.set_tsc_offset(d->vcpu[0], + d->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset, + d->arch.hvm_domain.sync_tsc); + } + } } /* vtsc may incur measurable performance degradation, diagnose with this */ diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index b1e3187..9607911 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -90,6 +90,12 @@ struct hvm_domain { bool_t qemu_mapcache_invalidate; bool_t is_s3_suspended; + /* + * TSC value that VCPUs use to calculate their tsc_offset value. + * Used during initialization and save/restore + */ + uint64_t sync_tsc; + union { struct vmx_domain vmx; struct svm_domain svm; diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index dcc3483..1fb3b99 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -137,7 +137,7 @@ struct hvm_function_table { int (*get_guest_pat)(struct vcpu *v, u64 *); int (*set_guest_pat)(struct vcpu *v, u64); - void (*set_tsc_offset)(struct vcpu *v, u64 offset); + void (*set_tsc_offset)(struct vcpu *v, u64 offset, u64 at_tsc); void (*inject_trap)(struct hvm_trap *trap); @@ -233,11 +233,13 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat); int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat); void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc); -u64 hvm_get_guest_tsc(struct vcpu *v); +u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc); +#define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0) void hvm_init_guest_time(struct domain *d); void hvm_set_guest_time(struct vcpu *v, u64 guest_time); -u64 hvm_get_guest_time(struct vcpu *v); +u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc); +#define hvm_get_guest_time(v) hvm_get_guest_time_fixed(v, 0) int vmsi_deliver( struct domain *d, int vector, diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h index 61f579a..52cae4b 100644 --- a/xen/include/asm-x86/msr.h +++ b/xen/include/asm-x86/msr.h @@ -78,9 +78,9 @@ static inline int wrmsr_safe(unsigned int msr, uint64_t val) __asm__ __volatile__("rdtsc" : "=a" (low) : : "edx") #define rdtscll(val) do { \ - unsigned int a,d; \ - asm volatile("rdtsc" : "=a" (a), "=d" (d)); \ - (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \ + unsigned int _eax, _edx; \ + asm volatile("rdtsc" : "=a" (_eax), "=d" (_edx)); \ + (val) = ((unsigned long)_eax) | (((unsigned long)_edx)<<32); \ } while(0) #define __write_tsc(val) wrmsrl(MSR_IA32_TSC, val) diff --git a/xen/include/xen/time.h b/xen/include/xen/time.h index 2703454..709501f 100644 --- a/xen/include/xen/time.h +++ b/xen/include/xen/time.h @@ -32,6 +32,7 @@ struct vcpu; typedef s64 s_time_t; #define PRI_stime PRId64 +s_time_t get_s_time_fixed(u64 at_tick); s_time_t get_s_time(void); unsigned long get_localtime(struct domain *d); uint64_t get_localtime_us(struct domain *d); -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/3] x86/svn: Enable TSC scaling 2014-04-16 1:27 ` [PATCH v4 2/3] x86/svn: Enable TSC scaling Boris Ostrovsky @ 2014-04-16 10:21 ` Jan Beulich 2014-04-16 10:53 ` Andrew Cooper 1 sibling, 0 replies; 16+ messages in thread From: Jan Beulich @ 2014-04-16 10:21 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel >>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: > TSC ratio enabling logic is inverted: we want to use it when we > are running in native tsc mode, i.e. when d->arch.vtsc is zero. > > Also, since now svm_set_tsc_offset()'s calculations depend > on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after > vtsc changes in tsc_set_info(). > > In addition, with TSC ratio enabled, svm_set_tsc_offset() will > need to do rdtsc. With that we may end up having TSCs on guest's > processors out of sync. d->arch.hvm_domain.sync_tsc which is set > by the boot processor can now be used by APs as reference TSC > value instead of host's current TSC. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with two cosmetic remarks: > @@ -688,11 +688,14 @@ static void svm_set_tsc_offset(struct vcpu *v, u64 offset) > struct domain *d = v->domain; > uint64_t host_tsc, guest_tsc; > > - guest_tsc = hvm_get_guest_tsc(v); > + guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc); > > /* Re-adjust the offset value when TSC_RATIO is available */ > - if ( cpu_has_tsc_ratio && d->arch.vtsc ) { > - rdtscll(host_tsc); > + if ( cpu_has_tsc_ratio && !d->arch.vtsc ) { Please fix the coding style violation here if you already touch the line. > --- a/xen/include/asm-x86/hvm/domain.h > +++ b/xen/include/asm-x86/hvm/domain.h > @@ -90,6 +90,12 @@ struct hvm_domain { > bool_t qemu_mapcache_invalidate; > bool_t is_s3_suspended; > > + /* > + * TSC value that VCPUs use to calculate their tsc_offset value. > + * Used during initialization and save/restore Stop missing at end of sentence. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/3] x86/svn: Enable TSC scaling 2014-04-16 1:27 ` [PATCH v4 2/3] x86/svn: Enable TSC scaling Boris Ostrovsky 2014-04-16 10:21 ` Jan Beulich @ 2014-04-16 10:53 ` Andrew Cooper 1 sibling, 0 replies; 16+ messages in thread From: Andrew Cooper @ 2014-04-16 10:53 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, jbeulich, eddie.dong, xen-devel, jun.nakajima, suravee.suthikulpanit On 16/04/14 02:27, Boris Ostrovsky wrote: > TSC ratio enabling logic is inverted: we want to use it when we > are running in native tsc mode, i.e. when d->arch.vtsc is zero. > > Also, since now svm_set_tsc_offset()'s calculations depend > on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after > vtsc changes in tsc_set_info(). > > In addition, with TSC ratio enabled, svm_set_tsc_offset() will > need to do rdtsc. With that we may end up having TSCs on guest's > processors out of sync. d->arch.hvm_domain.sync_tsc which is set > by the boot processor can now be used by APs as reference TSC > value instead of host's current TSC. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> "svm" in the subject? ~Andrew > --- > xen/arch/x86/hvm/hvm.c | 15 ++++++++++----- > xen/arch/x86/hvm/svm/svm.c | 15 +++++++++------ > xen/arch/x86/hvm/vmx/vmx.c | 2 +- > xen/arch/x86/hvm/vmx/vvmx.c | 4 ++-- > xen/arch/x86/hvm/vpt.c | 16 ++++++++++------ > xen/arch/x86/time.c | 29 +++++++++++++++++++++++++++-- > xen/include/asm-x86/hvm/domain.h | 6 ++++++ > xen/include/asm-x86/hvm/hvm.h | 8 +++++--- > xen/include/asm-x86/msr.h | 6 +++--- > xen/include/xen/time.h | 1 + > 10 files changed, 74 insertions(+), 28 deletions(-) > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-04-16 1:27 [PATCH v4 0/3] Time-related fixes for migration Boris Ostrovsky 2014-04-16 1:27 ` [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky 2014-04-16 1:27 ` [PATCH v4 2/3] x86/svn: Enable TSC scaling Boris Ostrovsky @ 2014-04-16 1:27 ` Boris Ostrovsky 2014-04-16 10:22 ` Jan Beulich 2 siblings, 1 reply; 16+ messages in thread From: Boris Ostrovsky @ 2014-04-16 1:27 UTC (permalink / raw) To: xen-devel Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich, jun.nakajima, boris.ostrovsky When a domain is saved each VCPU's TSC value needs to be preserved. To get it we use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() which it may call) calculates VCPU's TSC based on current host's TSC value (by doing a rdtscll()). Since this is performed for each VCPU separately we end up with un-synchronized TSCs. Similarly, during a restore each VCPU is assigned its TSC based on host's current tick, causing virtual TSCs to diverge further. With this, we can easily get into situation where a guest may see time going backwards. Instead of reading new TSC value for each VCPU when saving/restoring it we should use the same value across all VCPUs. Reported-by: Philippe Coquard <philippe.coquard@mpsa.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/hvm.c | 10 +++++++--- xen/arch/x86/hvm/save.c | 6 ++++++ xen/arch/x86/hvm/svm/svm.c | 5 +++-- xen/arch/x86/hvm/vmx/vmx.c | 5 +++-- xen/include/asm-x86/hvm/hvm.h | 3 ++- 5 files changed, 21 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 2a7824c..9b5b337 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -255,16 +255,20 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat) return 1; } -void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc) +void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc) { uint64_t tsc; uint64_t delta_tsc; if ( v->domain->arch.vtsc ) { - tsc = hvm_get_guest_time(v); + tsc = hvm_get_guest_time_fixed(v, at_tsc); tsc = gtime_to_gtsc(v->domain, tsc); } + else if ( at_tsc ) + { + tsc = at_tsc; + } else { rdtscll(tsc); @@ -275,7 +279,7 @@ void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc) - v->arch.hvm_vcpu.cache_tsc_offset; v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc; - hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0); + hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc); } void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust) diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c index 066fdb2..6af19be 100644 --- a/xen/arch/x86/hvm/save.c +++ b/xen/arch/x86/hvm/save.c @@ -34,6 +34,9 @@ void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) /* Save guest's preferred TSC. */ hdr->gtsc_khz = d->arch.tsc_khz; + + /* Time when saving started */ + rdtscll(d->arch.hvm_domain.sync_tsc); } int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) @@ -67,6 +70,9 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) if ( d->arch.vtsc ) hvm_set_rdtsc_exiting(d, 1); + /* Time when restore started */ + rdtscll(d->arch.hvm_domain.sync_tsc); + /* VGA state is not saved/restored, so we nobble the cache. */ d->arch.hvm_domain.stdvga.cache = 0; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index e0f2e9e..2e2e624 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -318,7 +318,8 @@ static void svm_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) data->msr_efer = v->arch.hvm_vcpu.guest_efer; data->msr_flags = -1ULL; - data->tsc = hvm_get_guest_tsc(v); + data->tsc = hvm_get_guest_tsc_fixed(v, + v->domain->arch.hvm_domain.sync_tsc); } @@ -334,7 +335,7 @@ static void svm_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) v->arch.hvm_vcpu.guest_efer = data->msr_efer; svm_update_guest_efer(v); - hvm_set_guest_tsc(v, data->tsc); + hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.hvm_domain.sync_tsc); } static void svm_save_vmcb_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 402ed95..a013d13 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -540,7 +540,8 @@ static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) data->msr_star = guest_state->msrs[VMX_INDEX_MSR_STAR]; data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK]; - data->tsc = hvm_get_guest_tsc(v); + data->tsc = hvm_get_guest_tsc_fixed(v, + v->domain->arch.hvm_domain.sync_tsc); } static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) @@ -556,7 +557,7 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) v->arch.hvm_vmx.cstar = data->msr_cstar; v->arch.hvm_vmx.shadow_gs = data->shadow_gs; - hvm_set_guest_tsc(v, data->tsc); + hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.hvm_domain.sync_tsc); } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 1fb3b99..31043b2 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -232,7 +232,8 @@ bool_t hvm_send_assist_req(struct vcpu *v); void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat); int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat); -void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc); +void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc); +#define hvm_set_guest_tsc(v, t) hvm_set_guest_tsc_fixed(v, t, 0) u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc); #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0) -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain 2014-04-16 1:27 ` [PATCH v4 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky @ 2014-04-16 10:22 ` Jan Beulich 0 siblings, 0 replies; 16+ messages in thread From: Jan Beulich @ 2014-04-16 10:22 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel >>> On 16.04.14 at 03:27, <boris.ostrovsky@oracle.com> wrote: > When a domain is saved each VCPU's TSC value needs to be preserved. To get it > we > use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() > which > it may call) calculates VCPU's TSC based on current host's TSC value (by > doing a > rdtscll()). Since this is performed for each VCPU separately we end up with > un-synchronized TSCs. > > Similarly, during a restore each VCPU is assigned its TSC based on host's > current > tick, causing virtual TSCs to diverge further. > > With this, we can easily get into situation where a guest may see time going > backwards. > > Instead of reading new TSC value for each VCPU when saving/restoring it we > should > use the same value across all VCPUs. > > Reported-by: Philippe Coquard <philippe.coquard@mpsa.com> > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-04-17 6:29 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-16 1:27 [PATCH v4 0/3] Time-related fixes for migration Boris Ostrovsky 2014-04-16 1:27 ` [PATCH v4 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky 2014-04-16 9:54 ` Jan Beulich 2014-04-16 13:02 ` Boris Ostrovsky 2014-04-16 11:38 ` Jan Beulich 2014-04-16 14:28 ` Boris Ostrovsky 2014-04-16 14:37 ` Jan Beulich 2014-04-16 15:33 ` Boris Ostrovsky 2014-04-16 16:15 ` Jan Beulich 2014-04-16 16:44 ` Boris Ostrovsky 2014-04-17 6:29 ` Jan Beulich 2014-04-16 1:27 ` [PATCH v4 2/3] x86/svn: Enable TSC scaling Boris Ostrovsky 2014-04-16 10:21 ` Jan Beulich 2014-04-16 10:53 ` Andrew Cooper 2014-04-16 1:27 ` [PATCH v4 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky 2014-04-16 10:22 ` Jan Beulich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).