* [PATCH 0/9] pvclock misc fixes - v4.
@ 2010-05-11 16:17 Glauber Costa
2010-05-11 16:17 ` [PATCH 1/9] Enable pvclock flags in vcpu_time_info structure Glauber Costa
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
This is the fourth version ov kvmclock fixes.
Just two minor changes in patch 5, per avi request, and
the addition of cpuid.txt file, documenting all cpuid flags
we use.
As a side effect, this patch removes the time-travel feature
in kvm guests.
Glauber Costa (9):
Enable pvclock flags in vcpu_time_info structure
Add a global synchronization point for pvclock
change msr numbers for kvmclock
add new KVMCLOCK cpuid feature
export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
Try using new kvm clock msrs
don't compute pvclock adjustments if we trust the tsc
Tell the guest we'll warn it about tsc stability
Add cpuid.txt file
Documentation/kvm/cpuid.txt | 44 ++++++++++++++++++++++++++++
arch/x86/include/asm/kvm_para.h | 13 ++++++++
arch/x86/include/asm/pvclock-abi.h | 4 ++-
arch/x86/include/asm/pvclock.h | 1 +
arch/x86/kernel/kvmclock.c | 56 ++++++++++++++++++++++-------------
arch/x86/kernel/pvclock.c | 37 +++++++++++++++++++++++
arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++++-
7 files changed, 176 insertions(+), 23 deletions(-)
create mode 100644 Documentation/kvm/cpuid.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/9] Enable pvclock flags in vcpu_time_info structure
2010-05-11 16:17 [PATCH 0/9] pvclock misc fixes - v4 Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 2/9] Add a global synchronization point for pvclock Glauber Costa
2010-05-11 18:25 ` [PATCH 0/9] pvclock misc fixes - v4 Avi Kivity
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden, Jeremy Fitzhardinge
This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.
Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.
Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
---
arch/x86/include/asm/pvclock-abi.h | 3 ++-
arch/x86/include/asm/pvclock.h | 1 +
arch/x86/kernel/pvclock.c | 9 +++++++++
3 files changed, 12 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
u64 system_time;
u32 tsc_to_system_mul;
s8 tsc_shift;
- u8 pad[3];
+ u8 flags;
+ u8 pad[2];
} __attribute__((__packed__)); /* 32 bytes */
struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..cd02f32 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
/* some helper functions for xen and kvm pv clock sources */
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..f7fdd56 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
u32 tsc_to_nsec_mul;
int tsc_shift;
u32 version;
+ u8 flags;
};
+static u8 valid_flags __read_mostly = 0;
+
+void pvclock_set_flags(u8 flags)
+{
+ valid_flags = flags;
+}
+
/*
* Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
* yielding a 64-bit result.
@@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
dst->system_timestamp = src->system_time;
dst->tsc_to_nsec_mul = src->tsc_to_system_mul;
dst->tsc_shift = src->tsc_shift;
+ dst->flags = src->flags;
rmb(); /* test version after fetching data */
} while ((src->version & 1) || (dst->version != src->version));
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/9] Add a global synchronization point for pvclock
2010-05-11 16:17 ` [PATCH 1/9] Enable pvclock flags in vcpu_time_info structure Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 3/9] change msr numbers for kvmclock Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden, Jeremy Fitzhardinge, Marcelo Tosatti
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.
This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.
Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.
Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.
A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.
We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.
OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.
After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.
Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
---
arch/x86/kernel/pvclock.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index f7fdd56..f5bc40e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
return pv_tsc_khz;
}
+static atomic64_t last_value = ATOMIC64_INIT(0);
+
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
{
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+ u64 last;
do {
version = pvclock_get_time_values(&shadow, src);
@@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
barrier();
} while (version != src->version);
+ /*
+ * Assumption here is that last_value, a global accumulator, always goes
+ * forward. If we are less than that, we should not be much smaller.
+ * We assume there is an error marging we're inside, and then the correction
+ * does not sacrifice accuracy.
+ *
+ * For reads: global may have changed between test and return,
+ * but this means someone else updated poked the clock at a later time.
+ * We just need to make sure we are not seeing a backwards event.
+ *
+ * For updates: last_value = ret is not enough, since two vcpus could be
+ * updating at the same time, and one of them could be slightly behind,
+ * making the assumption that last_value always go forward fail to hold.
+ */
+ last = atomic64_read(&last_value);
+ do {
+ if (ret < last)
+ return last;
+ last = atomic64_cmpxchg(&last_value, last, ret);
+ } while (unlikely(last != ret));
+
return ret;
}
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/9] change msr numbers for kvmclock
2010-05-11 16:17 ` [PATCH 2/9] Add a global synchronization point for pvclock Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 4/9] add new KVMCLOCK cpuid feature Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/include/asm/kvm_para.h | 4 ++++
arch/x86/kvm/x86.c | 7 ++++++-
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..9734808 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,10 @@
#define MSR_KVM_WALL_CLOCK 0x11
#define MSR_KVM_SYSTEM_TIME 0x12
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00
+#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+
#define KVM_MAX_MMU_OP_BATCH 32
/* Operations for KVM_HC_MMU_OP */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..eb84947 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -664,9 +664,10 @@ static inline u32 bit(int bitno)
* kvm-specific. Those are put in the beginning of the list.
*/
-#define KVM_SAVE_MSRS_BEGIN 5
+#define KVM_SAVE_MSRS_BEGIN 7
static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+ MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1192,10 +1193,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
case MSR_IA32_MISC_ENABLE:
vcpu->arch.ia32_misc_enable_msr = data;
break;
+ case MSR_KVM_WALL_CLOCK_NEW:
case MSR_KVM_WALL_CLOCK:
vcpu->kvm->arch.wall_clock = data;
kvm_write_wall_clock(vcpu->kvm, data);
break;
+ case MSR_KVM_SYSTEM_TIME_NEW:
case MSR_KVM_SYSTEM_TIME: {
if (vcpu->arch.time_page) {
kvm_release_page_dirty(vcpu->arch.time_page);
@@ -1467,9 +1470,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
data = vcpu->arch.efer;
break;
case MSR_KVM_WALL_CLOCK:
+ case MSR_KVM_WALL_CLOCK_NEW:
data = vcpu->kvm->arch.wall_clock;
break;
case MSR_KVM_SYSTEM_TIME:
+ case MSR_KVM_SYSTEM_TIME_NEW:
data = vcpu->arch.time;
break;
case MSR_IA32_P5_MC_ADDR:
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/9] add new KVMCLOCK cpuid feature
2010-05-11 16:17 ` [PATCH 3/9] change msr numbers for kvmclock Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 5/9] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest
that kvmclock is available through a new set of MSRs. The old ones
are deprecated.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/include/asm/kvm_para.h | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
#define KVM_FEATURE_CLOCKSOURCE 0
#define KVM_FEATURE_NOP_IO_DELAY 1
#define KVM_FEATURE_MMU_OP 2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE2 3
#define MSR_KVM_WALL_CLOCK 0x11
#define MSR_KVM_SYSTEM_TIME 0x12
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/9] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
2010-05-11 16:17 ` [PATCH 4/9] add new KVMCLOCK cpuid feature Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 6/9] Try using new kvm clock msrs Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.
A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/kvm/x86.c | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb84947..a51b560 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1971,6 +1971,23 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
break;
}
+ case KVM_CPUID_SIGNATURE: {
+ char signature[12] = "KVMKVMKVM\0\0";
+ u32 *sigptr = (u32 *)signature;
+ entry->eax = 0;
+ entry->ebx = sigptr[0];
+ entry->ecx = sigptr[1];
+ entry->edx = sigptr[2];
+ break;
+ }
+ case KVM_CPUID_FEATURES:
+ entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
+ (1 << KVM_FEATURE_NOP_IO_DELAY) |
+ (1 << KVM_FEATURE_CLOCKSOURCE2);
+ entry->ebx = 0;
+ entry->ecx = 0;
+ entry->edx = 0;
+ break;
case 0x80000000:
entry->eax = min(entry->eax, 0x8000001a);
break;
@@ -2017,6 +2034,23 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
for (func = 0x80000001; func <= limit && nent < cpuid->nent; ++func)
do_cpuid_ent(&cpuid_entries[nent], func, 0,
&nent, cpuid->nent);
+
+
+
+ r = -E2BIG;
+ if (nent >= cpuid->nent)
+ goto out_free;
+
+ do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, &nent,
+ cpuid->nent);
+
+ r = -E2BIG;
+ if (nent >= cpuid->nent)
+ goto out_free;
+
+ do_cpuid_ent(&cpuid_entries[nent], KVM_CPUID_FEATURES, 0, &nent,
+ cpuid->nent);
+
r = -E2BIG;
if (nent >= cpuid->nent)
goto out_free;
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 6/9] Try using new kvm clock msrs
2010-05-11 16:17 ` [PATCH 5/9] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 7/9] don't compute pvclock adjustments if we trust the tsc Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.
However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.
So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/kernel/kvmclock.c | 53 ++++++++++++++++++++++++++-----------------
1 files changed, 32 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..59c740f 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,8 @@
#define KVM_SCALE 22
static int kvmclock = 1;
+static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
static int parse_no_kvmclock(char *arg)
{
@@ -54,7 +56,8 @@ static unsigned long kvm_get_wallclock(void)
low = (int)__pa_symbol(&wall_clock);
high = ((u64)__pa_symbol(&wall_clock) >> 32);
- native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
+
+ native_write_msr(msr_kvm_wall_clock, low, high);
vcpu_time = &get_cpu_var(hv_clock);
pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
@@ -130,7 +133,8 @@ static int kvm_register_clock(char *txt)
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
- return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+ return native_write_msr_safe(msr_kvm_system_time, low, high);
}
#ifdef CONFIG_X86_LOCAL_APIC
@@ -165,14 +169,14 @@ static void __init kvm_smp_prepare_boot_cpu(void)
#ifdef CONFIG_KEXEC
static void kvm_crash_shutdown(struct pt_regs *regs)
{
- native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+ native_write_msr(msr_kvm_system_time, 0, 0);
native_machine_crash_shutdown(regs);
}
#endif
static void kvm_shutdown(void)
{
- native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+ native_write_msr(msr_kvm_system_time, 0, 0);
native_machine_shutdown();
}
@@ -181,27 +185,34 @@ void __init kvmclock_init(void)
if (!kvm_para_available())
return;
- if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
- if (kvm_register_clock("boot clock"))
- return;
- pv_time_ops.sched_clock = kvm_clock_read;
- x86_platform.calibrate_tsc = kvm_get_tsc_khz;
- x86_platform.get_wallclock = kvm_get_wallclock;
- x86_platform.set_wallclock = kvm_set_wallclock;
+ if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+ msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
+ msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
+ } else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+ return;
+
+ printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
+ msr_kvm_system_time, msr_kvm_wall_clock);
+
+ if (kvm_register_clock("boot clock"))
+ return;
+ pv_time_ops.sched_clock = kvm_clock_read;
+ x86_platform.calibrate_tsc = kvm_get_tsc_khz;
+ x86_platform.get_wallclock = kvm_get_wallclock;
+ x86_platform.set_wallclock = kvm_set_wallclock;
#ifdef CONFIG_X86_LOCAL_APIC
- x86_cpuinit.setup_percpu_clockev =
- kvm_setup_secondary_clock;
+ x86_cpuinit.setup_percpu_clockev =
+ kvm_setup_secondary_clock;
#endif
#ifdef CONFIG_SMP
- smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+ smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
#endif
- machine_ops.shutdown = kvm_shutdown;
+ machine_ops.shutdown = kvm_shutdown;
#ifdef CONFIG_KEXEC
- machine_ops.crash_shutdown = kvm_crash_shutdown;
+ machine_ops.crash_shutdown = kvm_crash_shutdown;
#endif
- kvm_get_preset_lpj();
- clocksource_register(&kvm_clock);
- pv_info.paravirt_enabled = 1;
- pv_info.name = "KVM";
- }
+ kvm_get_preset_lpj();
+ clocksource_register(&kvm_clock);
+ pv_info.paravirt_enabled = 1;
+ pv_info.name = "KVM";
}
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 7/9] don't compute pvclock adjustments if we trust the tsc
2010-05-11 16:17 ` [PATCH 6/9] Try using new kvm clock msrs Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 8/9] Tell the guest we'll warn it about tsc stability Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
If the HV told us we can fully trust the TSC, skip any
correction
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
arch/x86/include/asm/kvm_para.h | 5 +++++
arch/x86/include/asm/pvclock-abi.h | 1 +
arch/x86/kernel/kvmclock.c | 3 +++
arch/x86/kernel/pvclock.c | 4 ++++
4 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..05eba5e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
*/
#define KVM_FEATURE_CLOCKSOURCE2 3
+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24
+
#define MSR_KVM_WALL_CLOCK 0x11
#define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h
index ec5c41a..35f2d19 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -39,5 +39,6 @@ struct pvclock_wall_clock {
u32 nsec;
} __attribute__((__packed__));
+#define PVCLOCK_TSC_STABLE_BIT (1 << 0)
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 59c740f..eb9b76c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -215,4 +215,7 @@ void __init kvmclock_init(void)
clocksource_register(&kvm_clock);
pv_info.paravirt_enabled = 1;
pv_info.name = "KVM";
+
+ if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
+ pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
}
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index f5bc40e..239427c 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
barrier();
} while (version != src->version);
+ if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
+ (shadow.flags & PVCLOCK_TSC_STABLE_BIT))
+ return ret;
+
/*
* Assumption here is that last_value, a global accumulator, always goes
* forward. If we are less than that, we should not be much smaller.
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 8/9] Tell the guest we'll warn it about tsc stability
2010-05-11 16:17 ` [PATCH 7/9] don't compute pvclock adjustments if we trust the tsc Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
2010-05-11 16:17 ` [PATCH 9/9] Add cpuid.txt file Glauber Costa
0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
This patch puts up the flag that tells the guest that we'll warn it
about the tsc being trustworthy or not. By now, we also say
it is not.
---
arch/x86/kvm/x86.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a51b560..7922401 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -855,6 +855,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
vcpu->hv_clock.system_time = ts.tv_nsec +
(NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
+ vcpu->hv_clock.flags = 0;
+
/*
* The interface expects us to write an even number signaling that the
* update is finished. Since the guest won't see the intermediate
@@ -1983,7 +1985,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
case KVM_CPUID_FEATURES:
entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
(1 << KVM_FEATURE_NOP_IO_DELAY) |
- (1 << KVM_FEATURE_CLOCKSOURCE2);
+ (1 << KVM_FEATURE_CLOCKSOURCE2) |
+ (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 9/9] Add cpuid.txt file
2010-05-11 16:17 ` [PATCH 8/9] Tell the guest we'll warn it about tsc stability Glauber Costa
@ 2010-05-11 16:17 ` Glauber Costa
0 siblings, 0 replies; 17+ messages in thread
From: Glauber Costa @ 2010-05-11 16:17 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel, avi, zamsden
This file documents cpuid bits used by KVM.
Signed-off-by: Glauber Costa <glommer@redhat.com>
---
Documentation/kvm/cpuid.txt | 44 +++++++++++++++++++++++++++++++++++++++++++
1 files changed, 44 insertions(+), 0 deletions(-)
create mode 100644 Documentation/kvm/cpuid.txt
diff --git a/Documentation/kvm/cpuid.txt b/Documentation/kvm/cpuid.txt
new file mode 100644
index 0000000..ca1e89f
--- /dev/null
+++ b/Documentation/kvm/cpuid.txt
@@ -0,0 +1,44 @@
+KVM CPUID bits
+Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
+=====================================================
+
+A guest running on a kvm host, can check some of its features using
+cpuid. This is not always guaranteed to work, since userspace can
+mask-out some, or even all KVM-related cpuid features before launching
+a guest.
+
+KVM cpuid functions are:
+
+function: KVM_CPUID_SIGNATURE (0x40000000)
+returns : eax = 0,
+ ebx = 0x4b4d564b,
+ ecx = 0x564b4d56,
+ edx = 0x4d.
+Note that this value in ebx, ecx and edx corresponds to the string "KVMKVMKVM".
+This function queries the presence of KVM cpuid leafs.
+
+
+function: define KVM_CPUID_FEATURES (0x40000001)
+returns : ebx, ecx, edx = 0
+ eax = and OR'ed group of (1 << flag), where each flags is:
+
+
+flag || value || meaning
+=============================================================================
+KVM_FEATURE_CLOCKSOURCE || 0 || kvmclock available at msrs
+ || || 0x11 and 0x12.
+------------------------------------------------------------------------------
+KVM_FEATURE_NOP_IO_DELAY || 1 || not necessary to perform delays
+ || || on PIO operations.
+------------------------------------------------------------------------------
+KVM_FEATURE_MMU_OP || 2 || deprecated.
+------------------------------------------------------------------------------
+KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs
+ || || 0x4b564d00 and 0x4b564d01
+------------------------------------------------------------------------------
+KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will wanr if no guest-side
+ || || per-cpu warps are expected in
+ || || kvmclock.
+------------------------------------------------------------------------------
+
+
--
1.6.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-11 16:17 [PATCH 0/9] pvclock misc fixes - v4 Glauber Costa
2010-05-11 16:17 ` [PATCH 1/9] Enable pvclock flags in vcpu_time_info structure Glauber Costa
@ 2010-05-11 18:25 ` Avi Kivity
2010-05-12 3:33 ` Zachary Amsden
2010-05-12 20:33 ` Marcelo Tosatti
3 siblings, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2010-05-11 18:25 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, linux-kernel, zamsden
On 05/11/2010 07:17 PM, Glauber Costa wrote:
> This is the fourth version ov kvmclock fixes.
>
> Just two minor changes in patch 5, per avi request, and
> the addition of cpuid.txt file, documenting all cpuid flags
> we use.
>
Looks good.
> As a side effect, this patch removes the time-travel feature
> in kvm guests.
>
Too many reports of users accidentally killing their grandparents anyway.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-11 16:17 [PATCH 0/9] pvclock misc fixes - v4 Glauber Costa
2010-05-11 16:17 ` [PATCH 1/9] Enable pvclock flags in vcpu_time_info structure Glauber Costa
2010-05-11 18:25 ` [PATCH 0/9] pvclock misc fixes - v4 Avi Kivity
@ 2010-05-12 3:33 ` Zachary Amsden
2010-05-12 5:47 ` Alexander Graf
2010-05-12 20:33 ` Marcelo Tosatti
3 siblings, 1 reply; 17+ messages in thread
From: Zachary Amsden @ 2010-05-12 3:33 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, linux-kernel, avi
On 05/11/2010 06:17 AM, Glauber Costa wrote:
> This is the fourth version ov kvmclock fixes.
>
> Just two minor changes in patch 5, per avi request, and
> the addition of cpuid.txt file, documenting all cpuid flags
> we use.
>
> As a side effect, this patch removes the time-travel feature
> in kvm guests.
>
> Glauber Costa (9):
> Enable pvclock flags in vcpu_time_info structure
> Add a global synchronization point for pvclock
> change msr numbers for kvmclock
> add new KVMCLOCK cpuid feature
> export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
> Try using new kvm clock msrs
> don't compute pvclock adjustments if we trust the tsc
> Tell the guest we'll warn it about tsc stability
> Add cpuid.txt file
>
> Documentation/kvm/cpuid.txt | 44 ++++++++++++++++++++++++++++
> arch/x86/include/asm/kvm_para.h | 13 ++++++++
> arch/x86/include/asm/pvclock-abi.h | 4 ++-
> arch/x86/include/asm/pvclock.h | 1 +
> arch/x86/kernel/kvmclock.c | 56 ++++++++++++++++++++++-------------
> arch/x86/kernel/pvclock.c | 37 +++++++++++++++++++++++
> arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++++-
> 7 files changed, 176 insertions(+), 23 deletions(-)
> create mode 100644 Documentation/kvm/cpuid.txt
>
>
Acked-by: Zachary Amsden <zamsden@redhat.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-12 3:33 ` Zachary Amsden
@ 2010-05-12 5:47 ` Alexander Graf
2010-05-12 6:33 ` Avi Kivity
2010-05-12 17:04 ` Glauber Costa
0 siblings, 2 replies; 17+ messages in thread
From: Alexander Graf @ 2010-05-12 5:47 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Glauber Costa, kvm, linux-kernel, avi
On 12.05.2010, at 05:33, Zachary Amsden wrote:
> On 05/11/2010 06:17 AM, Glauber Costa wrote:
>> This is the fourth version ov kvmclock fixes.
>>
>> Just two minor changes in patch 5, per avi request, and
>> the addition of cpuid.txt file, documenting all cpuid flags
>> we use.
>>
>> As a side effect, this patch removes the time-travel feature
>> in kvm guests.
>>
>> Glauber Costa (9):
>> Enable pvclock flags in vcpu_time_info structure
>> Add a global synchronization point for pvclock
>> change msr numbers for kvmclock
>> add new KVMCLOCK cpuid feature
>> export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
>> Try using new kvm clock msrs
>> don't compute pvclock adjustments if we trust the tsc
>> Tell the guest we'll warn it about tsc stability
>> Add cpuid.txt file
>>
>> Documentation/kvm/cpuid.txt | 44 ++++++++++++++++++++++++++++
>> arch/x86/include/asm/kvm_para.h | 13 ++++++++
>> arch/x86/include/asm/pvclock-abi.h | 4 ++-
>> arch/x86/include/asm/pvclock.h | 1 +
>> arch/x86/kernel/kvmclock.c | 56 ++++++++++++++++++++++-------------
>> arch/x86/kernel/pvclock.c | 37 +++++++++++++++++++++++
>> arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++++-
>> 7 files changed, 176 insertions(+), 23 deletions(-)
>> create mode 100644 Documentation/kvm/cpuid.txt
>>
>>
> Acked-by: Zachary Amsden <zamsden@redhat.com>
How about stable queuing of this one?
Alex
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-12 5:47 ` Alexander Graf
@ 2010-05-12 6:33 ` Avi Kivity
2010-05-12 6:35 ` Alexander Graf
2010-05-12 17:04 ` Glauber Costa
1 sibling, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2010-05-12 6:33 UTC (permalink / raw)
To: Alexander Graf; +Cc: Zachary Amsden, Glauber Costa, kvm, linux-kernel
On 05/12/2010 08:47 AM, Alexander Graf wrote:
> How about stable queuing of this one?
>
Only after this gets some serious field testing. I'm not happy to trade
a known issue for an unknown regression. While I don't expect any
problems with the patchset, testing has often proven me wrong.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-12 6:33 ` Avi Kivity
@ 2010-05-12 6:35 ` Alexander Graf
0 siblings, 0 replies; 17+ messages in thread
From: Alexander Graf @ 2010-05-12 6:35 UTC (permalink / raw)
To: Avi Kivity; +Cc: Zachary Amsden, Glauber Costa, kvm, linux-kernel
On 12.05.2010, at 08:33, Avi Kivity wrote:
> On 05/12/2010 08:47 AM, Alexander Graf wrote:
>> How about stable queuing of this one?
>>
>
> Only after this gets some serious field testing. I'm not happy to trade a known issue for an unknown regression. While I don't expect any problems with the patchset, testing has often proven me wrong.
Agreed. I merely wanted to raise awareness of me wanting to see this in -stable when it is stable :).
Alex
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-12 5:47 ` Alexander Graf
2010-05-12 6:33 ` Avi Kivity
@ 2010-05-12 17:04 ` Glauber Costa
1 sibling, 0 replies; 17+ messages in thread
From: Glauber Costa @ 2010-05-12 17:04 UTC (permalink / raw)
To: Alexander Graf; +Cc: Zachary Amsden, kvm, linux-kernel, avi
On Wed, May 12, 2010 at 07:47:39AM +0200, Alexander Graf wrote:
>
> On 12.05.2010, at 05:33, Zachary Amsden wrote:
>
> > On 05/11/2010 06:17 AM, Glauber Costa wrote:
> >> This is the fourth version ov kvmclock fixes.
> >>
> >> Just two minor changes in patch 5, per avi request, and
> >> the addition of cpuid.txt file, documenting all cpuid flags
> >> we use.
> >>
> >> As a side effect, this patch removes the time-travel feature
> >> in kvm guests.
> >>
> >> Glauber Costa (9):
> >> Enable pvclock flags in vcpu_time_info structure
> >> Add a global synchronization point for pvclock
> >> change msr numbers for kvmclock
> >> add new KVMCLOCK cpuid feature
> >> export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
> >> Try using new kvm clock msrs
> >> don't compute pvclock adjustments if we trust the tsc
> >> Tell the guest we'll warn it about tsc stability
> >> Add cpuid.txt file
> >>
> >> Documentation/kvm/cpuid.txt | 44 ++++++++++++++++++++++++++++
> >> arch/x86/include/asm/kvm_para.h | 13 ++++++++
> >> arch/x86/include/asm/pvclock-abi.h | 4 ++-
> >> arch/x86/include/asm/pvclock.h | 1 +
> >> arch/x86/kernel/kvmclock.c | 56 ++++++++++++++++++++++-------------
> >> arch/x86/kernel/pvclock.c | 37 +++++++++++++++++++++++
> >> arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++++-
> >> 7 files changed, 176 insertions(+), 23 deletions(-)
> >> create mode 100644 Documentation/kvm/cpuid.txt
> >>
> >>
> > Acked-by: Zachary Amsden <zamsden@redhat.com>
>
>
> How about stable queuing of this one?
I don't think the MSR part classifies for stable.
The sync part, that might.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/9] pvclock misc fixes - v4.
2010-05-11 16:17 [PATCH 0/9] pvclock misc fixes - v4 Glauber Costa
` (2 preceding siblings ...)
2010-05-12 3:33 ` Zachary Amsden
@ 2010-05-12 20:33 ` Marcelo Tosatti
3 siblings, 0 replies; 17+ messages in thread
From: Marcelo Tosatti @ 2010-05-12 20:33 UTC (permalink / raw)
To: Glauber Costa; +Cc: kvm, linux-kernel, avi, zamsden
On Tue, May 11, 2010 at 12:17:38PM -0400, Glauber Costa wrote:
> This is the fourth version ov kvmclock fixes.
>
> Just two minor changes in patch 5, per avi request, and
> the addition of cpuid.txt file, documenting all cpuid flags
> we use.
>
> As a side effect, this patch removes the time-travel feature
> in kvm guests.
>
> Glauber Costa (9):
> Enable pvclock flags in vcpu_time_info structure
> Add a global synchronization point for pvclock
> change msr numbers for kvmclock
> add new KVMCLOCK cpuid feature
> export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
> Try using new kvm clock msrs
> don't compute pvclock adjustments if we trust the tsc
> Tell the guest we'll warn it about tsc stability
> Add cpuid.txt file
>
> Documentation/kvm/cpuid.txt | 44 ++++++++++++++++++++++++++++
> arch/x86/include/asm/kvm_para.h | 13 ++++++++
> arch/x86/include/asm/pvclock-abi.h | 4 ++-
> arch/x86/include/asm/pvclock.h | 1 +
> arch/x86/kernel/kvmclock.c | 56 ++++++++++++++++++++++-------------
> arch/x86/kernel/pvclock.c | 37 +++++++++++++++++++++++
> arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++++-
> 7 files changed, 176 insertions(+), 23 deletions(-)
> create mode 100644 Documentation/kvm/cpuid.txt
Applied all, thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2010-05-12 20:51 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11 16:17 [PATCH 0/9] pvclock misc fixes - v4 Glauber Costa
2010-05-11 16:17 ` [PATCH 1/9] Enable pvclock flags in vcpu_time_info structure Glauber Costa
2010-05-11 16:17 ` [PATCH 2/9] Add a global synchronization point for pvclock Glauber Costa
2010-05-11 16:17 ` [PATCH 3/9] change msr numbers for kvmclock Glauber Costa
2010-05-11 16:17 ` [PATCH 4/9] add new KVMCLOCK cpuid feature Glauber Costa
2010-05-11 16:17 ` [PATCH 5/9] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Glauber Costa
2010-05-11 16:17 ` [PATCH 6/9] Try using new kvm clock msrs Glauber Costa
2010-05-11 16:17 ` [PATCH 7/9] don't compute pvclock adjustments if we trust the tsc Glauber Costa
2010-05-11 16:17 ` [PATCH 8/9] Tell the guest we'll warn it about tsc stability Glauber Costa
2010-05-11 16:17 ` [PATCH 9/9] Add cpuid.txt file Glauber Costa
2010-05-11 18:25 ` [PATCH 0/9] pvclock misc fixes - v4 Avi Kivity
2010-05-12 3:33 ` Zachary Amsden
2010-05-12 5:47 ` Alexander Graf
2010-05-12 6:33 ` Avi Kivity
2010-05-12 6:35 ` Alexander Graf
2010-05-12 17:04 ` Glauber Costa
2010-05-12 20:33 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).