* [PATCH 0/2] Avoid using TSC clocksource on AMD APUs affected by erratum 778 @ 2014-07-31 9:47 Igor Mammedov 2014-07-31 9:47 ` [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh Igor Mammedov 2014-07-31 9:47 ` [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG Igor Mammedov 0 siblings, 2 replies; 7+ messages in thread From: Igor Mammedov @ 2014-07-31 9:47 UTC (permalink / raw) To: linux-kernel; +Cc: x86, hpa, mingo, tglx, pbonzini, kvm, mtosatti Fixes pvclock backwards jumps caused by TSC drifting despite host believing that TSC is invariant/synchronized. TSC drift maybe caused by erratum 778 described in "Revision Guide for AMD Family 15h Models 10h-1Fh Processors, Publication # 48931, Issue Date: May 2013, Revision: 3.10" Igor Mammedov (2): x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG arch/x86/include/asm/cpufeature.h | 1 + arch/x86/kernel/cpu/amd.c | 9 +++++++++ arch/x86/kvm/cpuid.c | 3 ++- 3 files changed, 12 insertions(+), 1 deletion(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh 2014-07-31 9:47 [PATCH 0/2] Avoid using TSC clocksource on AMD APUs affected by erratum 778 Igor Mammedov @ 2014-07-31 9:47 ` Igor Mammedov 2014-07-31 15:47 ` Borislav Petkov 2014-08-01 9:02 ` Borislav Petkov 2014-07-31 9:47 ` [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG Igor Mammedov 1 sibling, 2 replies; 7+ messages in thread From: Igor Mammedov @ 2014-07-31 9:47 UTC (permalink / raw) To: linux-kernel; +Cc: x86, hpa, mingo, tglx, pbonzini, kvm, mtosatti Due to erratum #778 from "Revision Guide for AMD Family 15h Models 10h-1Fh Processors, Publication # 48931, Issue Date: May 2013, Revision: 3.10" TSC on affected processor, a core may drift under certain conditions, which makes initially synchronized TSCs to become unsynchronized. As result TSC clocksource becomes unsuitable for using as wallclock and it brakes pvclock when it's running with PVCLOCK_TSC_STABLE_BIT flag set. That causes backwards clock jumps when pvclock is first read on CPU with drifted TSC and then on CPU where TSC was stable or had a lower drift rate. To fix issue mark TSC as unstable on affected CPU, so it won't be used as clocksource. Which in turn disables master_clock mechanism in KVM and force pvclock using global clock counter that can't go backwards. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- arch/x86/include/asm/cpufeature.h | 1 + arch/x86/kernel/cpu/amd.c | 9 +++++++++ 2 files changed, 10 insertions(+) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index e265ff9..c47a2a77 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -236,6 +236,7 @@ #define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ #define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* AMD Erratum 383 */ #define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* AMD Erratum 400 */ +#define X86_BUG_AMD_TSC_DRIFT X86_BUG(5) /* AMD Erratum 778 */ #if defined(__KERNEL__) && !defined(__ASSEMBLY__) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index ce8b8ff..5623eb8 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -513,6 +513,7 @@ static void early_init_amd(struct cpuinfo_x86 *c) static const int amd_erratum_383[]; static const int amd_erratum_400[]; +static const int amd_erratum_778[]; static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum); static void init_amd(struct cpuinfo_x86 *c) @@ -721,6 +722,11 @@ static void init_amd(struct cpuinfo_x86 *c) if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_APIC_C1E); + if (cpu_has_amd_erratum(c, amd_erratum_778)) { + set_cpu_bug(c, X86_BUG_AMD_TSC_DRIFT); + mark_tsc_unstable("possible TSC drift as per erratum #778"); + } + rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy); } @@ -857,6 +863,9 @@ static const int amd_erratum_383[] = AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf)); +static const int amd_erratum_778[] = + AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x15, 0x10, 0, 0x1f, 0xf)); + static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) { int osvw_id = *erratum++; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh 2014-07-31 9:47 ` [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh Igor Mammedov @ 2014-07-31 15:47 ` Borislav Petkov 2014-07-31 16:33 ` Paolo Bonzini 2014-08-01 9:02 ` Borislav Petkov 1 sibling, 1 reply; 7+ messages in thread From: Borislav Petkov @ 2014-07-31 15:47 UTC (permalink / raw) To: Igor Mammedov Cc: linux-kernel, x86, hpa, mingo, tglx, pbonzini, kvm, mtosatti On Thu, Jul 31, 2014 at 09:47:12AM +0000, Igor Mammedov wrote: > Due to erratum #778 from > "Revision Guide for AMD Family 15h Models 10h-1Fh Processors, > Publication # 48931, Issue Date: May 2013, Revision: 3.10" > > TSC on affected processor, a core may drift under certain conditions, > which makes initially synchronized TSCs to become unsynchronized. Is this something you're seeing on a real system? If so, how do you trigger this? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh 2014-07-31 15:47 ` Borislav Petkov @ 2014-07-31 16:33 ` Paolo Bonzini 0 siblings, 0 replies; 7+ messages in thread From: Paolo Bonzini @ 2014-07-31 16:33 UTC (permalink / raw) To: Borislav Petkov, Igor Mammedov Cc: linux-kernel, x86, hpa, mingo, tglx, kvm, mtosatti Il 31/07/2014 17:47, Borislav Petkov ha scritto: > On Thu, Jul 31, 2014 at 09:47:12AM +0000, Igor Mammedov wrote: >> Due to erratum #778 from >> "Revision Guide for AMD Family 15h Models 10h-1Fh Processors, >> Publication # 48931, Issue Date: May 2013, Revision: 3.10" >> >> TSC on affected processor, a core may drift under certain conditions, >> which makes initially synchronized TSCs to become unsynchronized. > > Is this something you're seeing on a real system? If so, how do you > trigger this? http://thread.gmane.org/gmane.linux.kernel/1748516 says that Ingo's time-warp-test fails miserably on this machine. (The test is at http://people.redhat.com/mingo/time-warp-test/time-warp-test.c) Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh 2014-07-31 9:47 ` [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh Igor Mammedov 2014-07-31 15:47 ` Borislav Petkov @ 2014-08-01 9:02 ` Borislav Petkov 1 sibling, 0 replies; 7+ messages in thread From: Borislav Petkov @ 2014-08-01 9:02 UTC (permalink / raw) To: Igor Mammedov Cc: linux-kernel, x86, hpa, mingo, tglx, pbonzini, kvm, mtosatti On Thu, Jul 31, 2014 at 09:47:12AM +0000, Igor Mammedov wrote: > Due to erratum #778 from > "Revision Guide for AMD Family 15h Models 10h-1Fh Processors, > Publication # 48931, Issue Date: May 2013, Revision: 3.10" > > TSC on affected processor, a core may drift under certain conditions, > which makes initially synchronized TSCs to become unsynchronized. > > As result TSC clocksource becomes unsuitable for using as wallclock > and it brakes pvclock when it's running with PVCLOCK_TSC_STABLE_BIT > flag set. > That causes backwards clock jumps when pvclock is first read on > CPU with drifted TSC and then on CPU where TSC was stable or had > a lower drift rate. > > To fix issue mark TSC as unstable on affected CPU, so it won't > be used as clocksource. Which in turn disables master_clock > mechanism in KVM and force pvclock using global clock counter > that can't go backwards. > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG 2014-07-31 9:47 [PATCH 0/2] Avoid using TSC clocksource on AMD APUs affected by erratum 778 Igor Mammedov 2014-07-31 9:47 ` [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh Igor Mammedov @ 2014-07-31 9:47 ` Igor Mammedov 2014-07-31 14:38 ` Paolo Bonzini 1 sibling, 1 reply; 7+ messages in thread From: Igor Mammedov @ 2014-07-31 9:47 UTC (permalink / raw) To: linux-kernel; +Cc: x86, hpa, mingo, tglx, pbonzini, kvm, mtosatti Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..f519823 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -478,8 +478,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_CLOCKSOURCE2) | (1 << KVM_FEATURE_ASYNC_PF) | (1 << KVM_FEATURE_PV_EOI) | - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | (1 << KVM_FEATURE_PV_UNHALT); + if (!static_cpu_has_bug(X86_BUG_AMD_TSC_DRIFT)) + entry->eax |= (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG 2014-07-31 9:47 ` [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG Igor Mammedov @ 2014-07-31 14:38 ` Paolo Bonzini 0 siblings, 0 replies; 7+ messages in thread From: Paolo Bonzini @ 2014-07-31 14:38 UTC (permalink / raw) To: Igor Mammedov, linux-kernel; +Cc: x86, hpa, mingo, tglx, kvm, mtosatti Il 31/07/2014 11:47, Igor Mammedov ha scritto: > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > arch/x86/kvm/cpuid.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 38a0afe..f519823 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -478,8 +478,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > (1 << KVM_FEATURE_CLOCKSOURCE2) | > (1 << KVM_FEATURE_ASYNC_PF) | > (1 << KVM_FEATURE_PV_EOI) | > - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | > (1 << KVM_FEATURE_PV_UNHALT); > + if (!static_cpu_has_bug(X86_BUG_AMD_TSC_DRIFT)) > + entry->eax |= (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); > > if (sched_info_on()) > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > Marcelo, is there anything we can do if the VM migrates from a sane host to a buggy one? Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-08-01 9:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-31 9:47 [PATCH 0/2] Avoid using TSC clocksource on AMD APUs affected by erratum 778 Igor Mammedov 2014-07-31 9:47 ` [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh Igor Mammedov 2014-07-31 15:47 ` Borislav Petkov 2014-07-31 16:33 ` Paolo Bonzini 2014-08-01 9:02 ` Borislav Petkov 2014-07-31 9:47 ` [PATCH 2/2] x86: kvm: do not advertise stable clocksource if CPU has TSC drift BUG Igor Mammedov 2014-07-31 14:38 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox