* [PATCH 0/2] kvm clock - merge last comments @ 2008-02-11 18:07 Glauber de Oliveira Costa 2008-02-11 18:07 ` [PATCH 1/2] kvmclock - the host part Glauber de Oliveira Costa 0 siblings, 1 reply; 13+ messages in thread From: Glauber de Oliveira Costa @ 2008-02-11 18:07 UTC (permalink / raw) To: kvm-devel; +Cc: chrisw, avi Here's a new version that merges last comments from avi. Also, it makes it available to x86_64 as well. Userspace part will follow shortly ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] kvmclock - the host part. 2008-02-11 18:07 [PATCH 0/2] kvm clock - merge last comments Glauber de Oliveira Costa @ 2008-02-11 18:07 ` Glauber de Oliveira Costa 2008-02-11 18:07 ` [PATCH 2/2] kvmclock implementation, the guest part Glauber de Oliveira Costa 2008-02-13 10:49 ` [PATCH 1/2] kvmclock - the host part Avi Kivity 0 siblings, 2 replies; 13+ messages in thread From: Glauber de Oliveira Costa @ 2008-02-11 18:07 UTC (permalink / raw) To: kvm-devel; +Cc: chrisw, avi, Glauber de Oliveira Costa This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writting to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- arch/x86/kvm/x86.c | 95 ++++++++++++++++++++++++++++++++++++++++++++ include/asm-x86/kvm_host.h | 7 +++ include/asm-x86/kvm_para.h | 24 +++++++++++ include/linux/kvm.h | 1 4 files changed, 126 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0987191..03adc9b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -19,6 +19,7 @@ #include "segment_descriptor.h" #include "irq.h" #include "mmu.h" +#include <linux/clocksource.h> #include <linux/kvm.h> #include <linux/fs.h> #include <linux/vmalloc.h> @@ -420,7 +421,7 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TIME_STAMP_COUNTER, + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, }; static unsigned num_msrs_to_save; @@ -478,6 +479,71 @@ static int do_set_msr(struct kvm_vcpu *v return kvm_set_msr(vcpu, index, *data); } +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) +{ + int version = 1; + struct kvm_wall_clock wc; + unsigned long flags; + struct timespec wc_ts; + + local_irq_save(flags); + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, + &v->arch.hv_clock.tsc_timestamp); + wc_ts = current_kernel_time(); + local_irq_restore(flags); + + down_write(¤t->mm->mmap_sem); + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); + up_write(¤t->mm->mmap_sem); + + /* With all the info we got, fill in the values */ + wc.wc_sec = wc_ts.tv_sec; + wc.wc_nsec = wc_ts.tv_nsec; + wc.wc_version = ++version; + + down_write(¤t->mm->mmap_sem); + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); + up_write(¤t->mm->mmap_sem); +} + +static void kvm_write_guest_time(struct kvm_vcpu *v) +{ + struct timespec ts; + unsigned long flags; + struct kvm_vcpu_arch *vcpu = &v->arch; + void *shared_kaddr; + + if ((!vcpu->time_page)) + return; + + /* Keep irq disabled to prevent changes to the clock */ + local_irq_save(flags); + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, + &vcpu->hv_clock.tsc_timestamp); + ktime_get_ts(&ts); + local_irq_restore(flags); + + /* With all the info we got, fill in the values */ + + vcpu->hv_clock.system_time = ts.tv_nsec + + (NSEC_PER_SEC * (u64)ts.tv_sec); + /* + * The interface expects us to write an even number signaling that the + * update is finished. Since the guest won't see the intermediate states, + * we just write "2" at the end + */ + vcpu->hv_clock.version = 2; + + shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); + + memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, + sizeof(vcpu->hv_clock)); + + kunmap_atomic(shared_kaddr, KM_USER0); + + mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); +} + int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { @@ -503,6 +569,25 @@ int kvm_set_msr_common(struct kvm_vcpu * case MSR_IA32_MISC_ENABLE: vcpu->arch.ia32_misc_enable_msr = data; break; + case MSR_KVM_WALL_CLOCK: + vcpu->kvm->arch.wall_clock = data; + kvm_write_wall_clock(vcpu, data); + break; + case MSR_KVM_SYSTEM_TIME: { + vcpu->arch.time = data & PAGE_MASK; + vcpu->arch.time_offset = data & ~PAGE_MASK; + + vcpu->arch.hv_clock.tsc_to_system_mul = + clocksource_khz2mult(tsc_khz, 22); + vcpu->arch.hv_clock.tsc_shift = 22; + + down_write(¤t->mm->mmap_sem); + vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); + up_write(¤t->mm->mmap_sem); + if (is_error_page(vcpu->arch.time_page)) + vcpu->arch.time_page = NULL; + break; + } default: pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data); return 1; @@ -560,6 +645,12 @@ int kvm_get_msr_common(struct kvm_vcpu * case MSR_EFER: data = vcpu->arch.shadow_efer; break; + case MSR_KVM_WALL_CLOCK: + data = vcpu->kvm->arch.wall_clock; + break; + case MSR_KVM_SYSTEM_TIME: + data = vcpu->arch.time; + break; default: pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); return 1; @@ -687,6 +778,7 @@ int kvm_dev_ioctl_check_extension(long e case KVM_CAP_USER_MEMORY: case KVM_CAP_SET_TSS_ADDR: case KVM_CAP_EXT_CPUID: + case KVM_CAP_CLOCKSOURCE: r = 1; break; case KVM_CAP_VAPIC: @@ -744,6 +836,7 @@ out: void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { kvm_x86_ops->vcpu_load(vcpu, cpu); + kvm_write_guest_time(vcpu); } void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 92668fa..c57cfe9 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -261,6 +261,11 @@ #define VCPU_MP_STATE_HALTED /* emulate context */ struct x86_emulate_ctxt emulate_ctxt; + + gpa_t time; + struct kvm_vcpu_time_info hv_clock; + unsigned int time_offset; + struct page *time_page; }; struct kvm_mem_alias { @@ -287,6 +292,8 @@ struct kvm_arch{ int round_robin_prev_vcpu; unsigned int tss_addr; struct page *apic_access_page; + + gpa_t wall_clock; }; struct kvm_vm_stat { diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index c6f3fd8..232c02e 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -10,10 +10,34 @@ #define KVM_CPUID_SIGNATURE 0x40000000 * paravirtualization, the appropriate feature bit should be checked. */ #define KVM_CPUID_FEATURES 0x40000001 +#define KVM_FEATURE_CLOCKSOURCE 0 + +#define MSR_KVM_WALL_CLOCK 0x11 +#define MSR_KVM_SYSTEM_TIME 0x12 #ifdef __KERNEL__ #include <asm/processor.h> +/* xen binary-compatible interface. See xen headers for details */ +struct kvm_vcpu_time_info { + uint32_t version; + uint32_t pad0; + uint64_t tsc_timestamp; + uint64_t system_time; + uint32_t tsc_to_system_mul; + int8_t tsc_shift; +}; /* 32 bytes */ + +struct kvm_wall_clock { + uint32_t wc_version; + uint32_t wc_sec; + uint32_t wc_nsec; +}; + + +extern void kvmclock_init(void); + + /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. */ diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 4de4fd2..78ce53f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -232,6 +232,7 @@ #define KVM_CAP_USER_MEMORY 3 #define KVM_CAP_SET_TSS_ADDR 4 #define KVM_CAP_EXT_CPUID 5 #define KVM_CAP_VAPIC 6 +#define KVM_CAP_CLOCKSOURCE 7 /* * ioctls for VM fds -- 1.4.2 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] kvmclock implementation, the guest part. 2008-02-11 18:07 ` [PATCH 1/2] kvmclock - the host part Glauber de Oliveira Costa @ 2008-02-11 18:07 ` Glauber de Oliveira Costa 2008-02-13 10:49 ` [PATCH 1/2] kvmclock - the host part Avi Kivity 1 sibling, 0 replies; 13+ messages in thread From: Glauber de Oliveira Costa @ 2008-02-11 18:07 UTC (permalink / raw) To: kvm-devel; +Cc: chrisw, avi, Glauber de Oliveira Costa This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- arch/x86/Kconfig | 10 +++ arch/x86/kernel/Makefile | 1 arch/x86/kernel/kvmclock.c | 161 ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/setup_32.c | 5 + arch/x86/kernel/setup_64.c | 5 + 5 files changed, 182 insertions(+), 0 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 65b4491..a4b33b1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -349,6 +349,16 @@ config VMI at the moment), by linking the kernel to a GPL-ed ROM module provided by the hypervisor. +config KVM_CLOCK + bool "KVM paravirtualized clock" + select PARAVIRT + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure such as time of day, and + system time + source "arch/x86/lguest/Kconfig" config PARAVIRT diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 6f81300..4b10872 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -68,6 +68,7 @@ obj-$(CONFIG_DEBUG_RODATA_TEST) += test_ obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o +obj-$(CONFIG_KVM_CLOCK) += kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o ifdef CONFIG_INPUT_PCSPKR diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c new file mode 100644 index 0000000..809d8a5 --- /dev/null +++ b/arch/x86/kernel/kvmclock.c @@ -0,0 +1,161 @@ +/* KVM paravirtual clock driver. A clocksource implementation + Copyright (C) 2008 Glauber de Oliveira Costa, Red Hat Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ + +#include <linux/clocksource.h> +#include <linux/kvm_para.h> +#include <asm/arch_hooks.h> +#include <asm/msr.h> +#include <xen/interface/xen.h> + +#define KVM_SCALE 22 + +static int kvmclock = 1; + +static int parse_no_kvmclock(char *arg) +{ + kvmclock = 0; + return 0; +} +early_param("no-kvmclock", parse_no_kvmclock); + +struct shared_info shared_info __attribute__((__aligned__(PAGE_SIZE))); + +/* The hypervisor will put information about time periodically here */ +static struct kvm_vcpu_time_info hv_clock[NR_CPUS]; +#define get_clock(cpu, field) hv_clock[cpu].field + +static inline u64 kvm_get_delta(u64 last_tsc) +{ + int cpu = smp_processor_id(); + u64 delta = native_read_tsc() - last_tsc; + return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE; +} + +static struct kvm_wall_clock wall_clock; +/* + * The wallclock is the time of day when we booted. Since then, some time may + * have elapsed since the hypervisor wrote the data. So we try to account for + * that. Even if the tsc is not accurate, it gives us a more accurate timing + * than not adjusting at all + */ +unsigned long kvm_get_wallclock(void) +{ + u32 wc_sec, wc_nsec; + u64 delta, last_tsc; + struct timespec ts; + int version, nsec, cpu = smp_processor_id(); + int low,high; + + low = (int)__pa(&wall_clock); + high = ((u64)__pa(&wall_clock) >> 32); + + native_write_msr(MSR_KVM_WALL_CLOCK, low, high); + do { + version = wall_clock.wc_version; + rmb(); + wc_sec = wall_clock.wc_sec; + wc_nsec = wall_clock.wc_nsec; + last_tsc = get_clock(cpu, tsc_timestamp); + rmb(); + } while ((wall_clock.wc_version != version) || (version & 1)); + + delta = kvm_get_delta(last_tsc); + delta += wc_nsec; + nsec = do_div(delta, NSEC_PER_SEC); + set_normalized_timespec(&ts, wc_sec + delta, nsec); + /* + * Of all mechanisms of time adjustment I've tested, this one + * was the champion! + */ + return ts.tv_sec + 1; +} + +int kvm_set_wallclock(unsigned long now) +{ + return 0; +} + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time. Without the tsc adjustment, we can have a situation + * in which a vcpu starts to run earlier (smaller system_time), but probes + * time later (compared to another vcpu), leading to backwards time + */ +static cycle_t kvm_clock_read(void) +{ + u64 last_tsc, now; + int cpu; + + preempt_disable(); + cpu = smp_processor_id(); + + last_tsc = get_clock(cpu, tsc_timestamp); + now = get_clock(cpu, system_time); + + now += kvm_get_delta(last_tsc); + preempt_enable(); + + return now; +} + +static struct clocksource kvm_clock = { + .name = "kvm-clock", + .read = kvm_clock_read, + .rating = 400, + .mask = CLOCKSOURCE_MASK(64), + .mult = 1 << KVM_SCALE, + .shift = KVM_SCALE, + .flags = CLOCK_SOURCE_IS_CONTINUOUS, +}; + +static int kvm_register_clock(void) +{ + int cpu = smp_processor_id(); + int low,high; + low = (int)__pa(&hv_clock[cpu]); + high = ((u64)__pa(&hv_clock[cpu]) >> 32); + + return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high); +} + +static void kvm_setup_secondary_clock(void) +{ + /* + * Now that the first cpu already had this clocksource initialized, + * we shouldn't fail. + */ + WARN_ON(kvm_register_clock()); + /* ok, done with our trickery, call native */ + setup_secondary_APIC_clock(); +} + +void __init kvmclock_init(void) +{ + if (!kvm_para_available()) + return; + + if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { + if (kvm_register_clock()) + return; + pv_time_ops.get_wallclock = kvm_get_wallclock; + pv_time_ops.set_wallclock = kvm_set_wallclock; + pv_time_ops.sched_clock = kvm_clock_read; + pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock; + clocksource_register(&kvm_clock); + } +} diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index 62adc5f..9c630e8 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -46,6 +46,7 @@ #include <linux/dmi.h> #include <linux/pfn.h> #include <linux/pci.h> #include <linux/init_ohci1394_dma.h> +#include <linux/kvm_para.h> #include <video/edid.h> @@ -766,6 +767,10 @@ #endif if (mtrr_trim_uncached_memory(max_pfn)) max_low_pfn = setup_memory(); +#ifdef CONFIG_KVM_CLOCK + kvmclock_init(); +#endif + #ifdef CONFIG_VMI /* * Must be after max_low_pfn is determined, and before kernel diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c index 77fb87b..e7dc84c 100644 --- a/arch/x86/kernel/setup_64.c +++ b/arch/x86/kernel/setup_64.c @@ -42,6 +42,7 @@ #include <linux/dma-mapping.h> #include <linux/ctype.h> #include <linux/uaccess.h> #include <linux/init_ohci1394_dma.h> +#include <linux/kvm_para.h> #include <asm/mtrr.h> #include <asm/uaccess.h> @@ -342,6 +343,10 @@ #endif io_delay_init(); +#ifdef CONFIG_KVM_CLOCK + kvmclock_init(); +#endif + #ifdef CONFIG_SMP /* setup to use the early static init tables during kernel startup */ x86_cpu_to_apicid_early_ptr = (void *)x86_cpu_to_apicid_init; -- 1.4.2 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. 2008-02-11 18:07 ` [PATCH 1/2] kvmclock - the host part Glauber de Oliveira Costa 2008-02-11 18:07 ` [PATCH 2/2] kvmclock implementation, the guest part Glauber de Oliveira Costa @ 2008-02-13 10:49 ` Avi Kivity 2008-02-13 21:45 ` Glauber Costa 2008-02-18 14:06 ` Marcelo Tosatti 1 sibling, 2 replies; 13+ messages in thread From: Avi Kivity @ 2008-02-13 10:49 UTC (permalink / raw) To: Glauber de Oliveira Costa; +Cc: kvm-devel, chrisw Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > The area is binary compatible with xen, as we use the same shadow_info structure. > > > +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) > +{ > + int version = 1; > + struct kvm_wall_clock wc; > + unsigned long flags; > + struct timespec wc_ts; > + > + local_irq_save(flags); > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > + &v->arch.hv_clock.tsc_timestamp); > Why is this needed? IIRC the wall clock is not tied to any vcpu. If we can remove this, the argument to the function should be kvm, not kvm_vcpu. We can remove the irq games as well. > + wc_ts = current_kernel_time(); > + local_irq_restore(flags); > + > + down_write(¤t->mm->mmap_sem); > + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); > + up_write(¤t->mm->mmap_sem); > Why down_write? accidentally or on purpose? For mutual exclusion, I suggest taking kvm->lock instead (for the entire function). > + > + /* With all the info we got, fill in the values */ > + wc.wc_sec = wc_ts.tv_sec; > + wc.wc_nsec = wc_ts.tv_nsec; > + wc.wc_version = ++version; > + > + down_write(¤t->mm->mmap_sem); > + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); > + up_write(¤t->mm->mmap_sem); > Should be in three steps: write version, write data, write version. kvm_write_guest doesn't guarantee any order. It may fail as well, and we need to handle that. > > +/* xen binary-compatible interface. See xen headers for details */ > +struct kvm_vcpu_time_info { > + uint32_t version; > + uint32_t pad0; > + uint64_t tsc_timestamp; > + uint64_t system_time; > + uint32_t tsc_to_system_mul; > + int8_t tsc_shift; > +}; /* 32 bytes */ > + > +struct kvm_wall_clock { > + uint32_t wc_version; > + uint32_t wc_sec; > + uint32_t wc_nsec; > +}; > + > These structures are dangerously sized. __Suggest__ __attribute__((__packed__)). (or some padding at the end of kvm_vcpu_time_info. > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 4de4fd2..78ce53f 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -232,6 +232,7 @@ #define KVM_CAP_USER_MEMORY 3 > #define KVM_CAP_SET_TSS_ADDR 4 > #define KVM_CAP_EXT_CPUID 5 > #define KVM_CAP_VAPIC 6 > +#define KVM_CAP_CLOCKSOURCE 7 > > Please refresh against kvm.git, this has changed a bit. -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. 2008-02-13 10:49 ` [PATCH 1/2] kvmclock - the host part Avi Kivity @ 2008-02-13 21:45 ` Glauber Costa 2008-02-14 9:21 ` Avi Kivity 2008-02-18 14:06 ` Marcelo Tosatti 1 sibling, 1 reply; 13+ messages in thread From: Glauber Costa @ 2008-02-13 21:45 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, chrisw Avi Kivity wrote: > Glauber de Oliveira Costa wrote: >> This is the host part of kvm clocksource implementation. As it does >> not include clockevents, it is a fairly simple implementation. We >> only have to register a per-vcpu area, and start writting to it >> periodically. >> >> The area is binary compatible with xen, as we use the same shadow_info >> structure. >> >> > >> +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) >> +{ >> + int version = 1; >> + struct kvm_wall_clock wc; >> + unsigned long flags; >> + struct timespec wc_ts; >> + >> + local_irq_save(flags); >> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, >> + &v->arch.hv_clock.tsc_timestamp); >> > > Why is this needed? IIRC the wall clock is not tied to any vcpu. > > If we can remove this, the argument to the function should be kvm, not > kvm_vcpu. We can remove the irq games as well. After some new thoughts, I don't agree. The time calculation in the guest will be in the format wallclock + delta tsc. Everything that has a tsc _is_ tied to a cpu. Although we can store the area globally, I think the best semantics is to have a vcpu to always issue an msr write to the area before reading it, in order to have the tsc updated. >> + wc_ts = current_kernel_time(); >> + local_irq_restore(flags); >> + >> + down_write(¤t->mm->mmap_sem); >> + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); >> + up_write(¤t->mm->mmap_sem); >> > > Why down_write? accidentally or on purpose? accidentally. Marcelo has pointed it out to me, and I do have a version without it now. > For mutual exclusion, I suggest taking kvm->lock instead (for the entire > function). This function is called too often. since we only need to guarantee mutual exclusion in a tiny part, it seems preferable to me. Do you have any extra reason for kvm->lock'ing the entire function? >> + >> + /* With all the info we got, fill in the values */ >> + wc.wc_sec = wc_ts.tv_sec; >> + wc.wc_nsec = wc_ts.tv_nsec; >> + wc.wc_version = ++version; >> + >> + down_write(¤t->mm->mmap_sem); >> + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); >> + up_write(¤t->mm->mmap_sem); >> > > Should be in three steps: write version, write data, write version. > kvm_write_guest doesn't guarantee any order. It may fail as well, and we > need to handle that. Ok, I see. This is fundamentally different from the system time case, because multiple cpus can be running over the same area. >> >> +/* xen binary-compatible interface. See xen headers for details */ >> +struct kvm_vcpu_time_info { >> + uint32_t version; >> + uint32_t pad0; >> + uint64_t tsc_timestamp; >> + uint64_t system_time; >> + uint32_t tsc_to_system_mul; >> + int8_t tsc_shift; >> +}; /* 32 bytes */ >> + >> +struct kvm_wall_clock { >> + uint32_t wc_version; >> + uint32_t wc_sec; >> + uint32_t wc_nsec; >> +}; >> + >> > > These structures are dangerously sized. __Suggest__ > __attribute__((__packed__)). (or some padding at the end of > kvm_vcpu_time_info. They are sized so to have same size as xen's. If it concerns you, packed should be better. >> diff --git a/include/linux/kvm.h b/include/linux/kvm.h >> index 4de4fd2..78ce53f 100644 >> --- a/include/linux/kvm.h >> +++ b/include/linux/kvm.h >> @@ -232,6 +232,7 @@ #define KVM_CAP_USER_MEMORY 3 >> #define KVM_CAP_SET_TSS_ADDR 4 >> #define KVM_CAP_EXT_CPUID 5 >> #define KVM_CAP_VAPIC 6 >> +#define KVM_CAP_CLOCKSOURCE 7 >> >> > > Please refresh against kvm.git, this has changed a bit. ok. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. 2008-02-13 21:45 ` Glauber Costa @ 2008-02-14 9:21 ` Avi Kivity 0 siblings, 0 replies; 13+ messages in thread From: Avi Kivity @ 2008-02-14 9:21 UTC (permalink / raw) To: Glauber Costa; +Cc: kvm-devel, chrisw Glauber Costa wrote: >> >>> +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) >>> +{ >>> + int version = 1; >>> + struct kvm_wall_clock wc; >>> + unsigned long flags; >>> + struct timespec wc_ts; >>> + >>> + local_irq_save(flags); >>> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, >>> + &v->arch.hv_clock.tsc_timestamp); >>> >> >> Why is this needed? IIRC the wall clock is not tied to any vcpu. >> >> If we can remove this, the argument to the function should be kvm, >> not kvm_vcpu. We can remove the irq games as well. > > After some new thoughts, I don't agree. The time calculation in the > guest will be in the format wallclock + delta tsc. Everything that has > a tsc _is_ tied to a cpu. Although we can store the area globally, I > think the best semantics is to have a vcpu to always issue an msr > write to the area before reading it, in order to have the tsc updated. > No. Guest time is (wall_clock + system_time) == (wall_clock + system_time_base + tsc_adjust). The wall clock part is vcpu indepenent (or it would be unusable by more than one vcpus). wall_clock is the system boot time; it only changes when the host time is adjusted (by ntp or user command). > >> For mutual exclusion, I suggest taking kvm->lock instead (for the >> entire function). > > This function is called too often. since we only need to guarantee > mutual exclusion in a tiny part, it seems preferable to me. Do you > have any extra reason for kvm->lock'ing the entire function? Wall clock adjustment should only be called as a result of settimeofday() or similar. Only vcpu time needs frequent adjustment (due to vcpu migration or tsc frequency change). > > >>> >>> +/* xen binary-compatible interface. See xen headers for details */ >>> +struct kvm_vcpu_time_info { >>> + uint32_t version; >>> + uint32_t pad0; >>> + uint64_t tsc_timestamp; >>> + uint64_t system_time; >>> + uint32_t tsc_to_system_mul; >>> + int8_t tsc_shift; >>> +}; /* 32 bytes */ >>> + >>> +struct kvm_wall_clock { >>> + uint32_t wc_version; >>> + uint32_t wc_sec; >>> + uint32_t wc_nsec; >>> +}; >>> + >>> >> >> These structures are dangerously sized. __Suggest__ >> __attribute__((__packed__)). (or some padding at the end of >> kvm_vcpu_time_info. > > They are sized so to have same size as xen's. If it concerns you, > packed should be better. Okay, so let's pack kvm_wall_clock to avoid surprises. The Xen sources I have (v2.6.25-rc1:include/xen/interface/xen.h) do have three bytes padding in vcpu_time_info to make it 32 bytes. -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. 2008-02-13 10:49 ` [PATCH 1/2] kvmclock - the host part Avi Kivity 2008-02-13 21:45 ` Glauber Costa @ 2008-02-18 14:06 ` Marcelo Tosatti 1 sibling, 0 replies; 13+ messages in thread From: Marcelo Tosatti @ 2008-02-18 14:06 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, chrisw, Glauber de Oliveira Costa On Wed, Feb 13, 2008 at 12:49:42PM +0200, Avi Kivity wrote: > Glauber de Oliveira Costa wrote: > > This is the host part of kvm clocksource implementation. As it does > > not include clockevents, it is a fairly simple implementation. We > > only have to register a per-vcpu area, and start writting to it periodically. > > > > The area is binary compatible with xen, as we use the same shadow_info structure. > > > > > > > +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) > > +{ > > + int version = 1; > > + struct kvm_wall_clock wc; > > + unsigned long flags; > > + struct timespec wc_ts; > > + > > + local_irq_save(flags); > > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > > + &v->arch.hv_clock.tsc_timestamp); > > > > Why is this needed? IIRC the wall clock is not tied to any vcpu. > > If we can remove this, the argument to the function should be kvm, not > kvm_vcpu. We can remove the irq games as well. > > > + wc_ts = current_kernel_time(); > > + local_irq_restore(flags); > > + > > + down_write(¤t->mm->mmap_sem); > > + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); > > + up_write(¤t->mm->mmap_sem); > > > > Why down_write? accidentally or on purpose? > > For mutual exclusion, I suggest taking kvm->lock instead (for the entire > function). You need slots_lock in read-mode for kvm_write_guest(). kvm->lock does not protect against memslot changes and the dirty bitmap. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/2] kvm clock - xen compatible by accident
@ 2008-01-16 16:31 Glauber de Oliveira Costa
[not found] ` <1200501067774-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Glauber de Oliveira Costa @ 2008-01-16 16:31 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Cc: jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w
I think I've misunderstood what you guys wanted to achieve with "xen
compatible", but now I get it. It's something that's kvm specific,
but happens to be able to communicate with xen guests, provided they
do a kvm-aware initialization.
So, here's the two patches for it, using two msrs and non-xen data structures
Userspace is the same, so I'm only sending these ones
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 13+ messages in thread[parent not found: <1200501067774-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH 1/2] kvmclock - the host part. [not found] ` <1200501067774-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-01-16 16:31 ` Glauber de Oliveira Costa [not found] ` <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Glauber de Oliveira Costa @ 2008-01-16 16:31 UTC (permalink / raw) To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Cc: jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w, Glauber de Oliveira Costa This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writting to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- arch/x86/kvm/x86.c | 98 +++++++++++++++++++++++++++++++++++++++++++- include/asm-x86/kvm_host.h | 6 +++ include/asm-x86/kvm_para.h | 24 +++++++++++ include/linux/kvm.h | 1 + 4 files changed, 128 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8a90403..fd69aa1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -19,6 +19,7 @@ #include "irq.h" #include "mmu.h" +#include <linux/clocksource.h> #include <linux/kvm.h> #include <linux/fs.h> #include <linux/vmalloc.h> @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TIME_STAMP_COUNTER, + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, }; static unsigned num_msrs_to_save; @@ -467,6 +468,73 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) return kvm_set_msr(vcpu, index, *data); } +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) +{ + int version = 1; + struct wall_clock wc; + unsigned long flags; + struct timespec wc_ts; + + local_irq_save(flags); + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, + &v->arch.hv_clock.tsc_timestamp); + wc_ts = current_kernel_time(); + local_irq_restore(flags); + + down_write(¤t->mm->mmap_sem); + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); + up_write(¤t->mm->mmap_sem); + + /* With all the info we got, fill in the values */ + wc.wc_sec = wc_ts.tv_sec; + wc.wc_nsec = wc_ts.tv_nsec; + wc.wc_version = ++version; + + down_write(¤t->mm->mmap_sem); + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); + up_write(¤t->mm->mmap_sem); +} +static void kvm_write_guest_time(struct kvm_vcpu *v) +{ + struct timespec ts; + unsigned long flags; + struct kvm_vcpu_arch *vcpu = &v->arch; + void *shared_kaddr; + + if ((!vcpu->time_page)) + return; + + /* Keep irq disabled to prevent changes to the clock */ + local_irq_save(flags); + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, + &vcpu->hv_clock.tsc_timestamp); + ktime_get_ts(&ts); + local_irq_restore(flags); + + /* With all the info we got, fill in the values */ + + vcpu->hv_clock.system_time = ts.tv_nsec + + (NSEC_PER_SEC * (u64)ts.tv_sec); + /* + * The interface expects us to write an even number signaling that the + * update is finished. Since the guest won't see the intermediate states, + * we just write "2" at the end + */ + vcpu->hv_clock.version = 2; + + preempt_disable(); + + shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); + + memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, + sizeof(vcpu->hv_clock)); + + kunmap_atomic(shared_kaddr, KM_USER0); + preempt_enable(); + + mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); +} + int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { @@ -494,6 +562,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) case MSR_IA32_MISC_ENABLE: vcpu->arch.ia32_misc_enable_msr = data; break; + case MSR_KVM_WALL_CLOCK: + vcpu->arch.wall_clock = data; + kvm_write_wall_clock(vcpu, data); + break; + case MSR_KVM_SYSTEM_TIME: { + vcpu->arch.time = data & PAGE_MASK; + vcpu->arch.time_offset = data & ~PAGE_MASK; + + vcpu->arch.hv_clock.tsc_to_system_mul = + clocksource_khz2mult(tsc_khz, 22); + vcpu->arch.hv_clock.tsc_shift = 22; + + down_write(¤t->mm->mmap_sem); + vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); + up_write(¤t->mm->mmap_sem); + if (is_error_page(vcpu->arch.time_page)) + vcpu->arch.time_page = NULL; + break; + } default: pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data); return 1; @@ -553,6 +640,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) data = vcpu->arch.shadow_efer; break; #endif + case MSR_KVM_WALL_CLOCK: + data = vcpu->arch.wall_clock; + break; + case MSR_KVM_SYSTEM_TIME: + data = vcpu->arch.time; + break; + default: pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); return 1; @@ -680,6 +774,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_USER_MEMORY: case KVM_CAP_SET_TSS_ADDR: case KVM_CAP_EXT_CPUID: + case KVM_CAP_CLOCKSOURCE: r = 1; break; case KVM_CAP_VAPIC: @@ -737,6 +832,7 @@ out: void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { kvm_x86_ops->vcpu_load(vcpu, cpu); + kvm_write_guest_time(vcpu); } void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index d6db0de..2a621ee 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -261,6 +261,12 @@ struct kvm_vcpu_arch { /* emulate context */ struct x86_emulate_ctxt emulate_ctxt; + + gpa_t wall_clock; + gpa_t time; + struct kvm_vcpu_time_info hv_clock; + unsigned int time_offset; + struct page *time_page; }; struct kvm_mem_alias { diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index c6f3fd8..8c105bc 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -10,10 +10,34 @@ * paravirtualization, the appropriate feature bit should be checked. */ #define KVM_CPUID_FEATURES 0x40000001 +#define KVM_FEATURE_CLOCKSOURCE 0 + +#define MSR_KVM_WALL_CLOCK 0x11 +#define MSR_KVM_SYSTEM_TIME 0x12 #ifdef __KERNEL__ #include <asm/processor.h> +/* xen binary-compatible interface. See xen headers for details */ +struct kvm_vcpu_time_info { + uint32_t version; + uint32_t pad0; + uint64_t tsc_timestamp; + uint64_t system_time; + uint32_t tsc_to_system_mul; + int8_t tsc_shift; +}; /* 32 bytes */ + +struct wall_clock { + uint32_t wc_version; + uint32_t wc_sec; + uint32_t wc_nsec; +}; + + +extern void kvmclock_init(void); + + /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. */ diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 4de4fd2..78ce53f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -232,6 +232,7 @@ struct kvm_vapic_addr { #define KVM_CAP_SET_TSS_ADDR 4 #define KVM_CAP_EXT_CPUID 5 #define KVM_CAP_VAPIC 6 +#define KVM_CAP_CLOCKSOURCE 7 /* * ioctls for VM fds -- 1.5.0.6 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 1/2] kvmclock - the host part. [not found] ` <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-01-16 19:00 ` Anthony Liguori [not found] ` <478E544C.4020603-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org> 2008-01-17 7:59 ` Gerd Hoffmann ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Anthony Liguori @ 2008-01-16 19:00 UTC (permalink / raw) To: Glauber de Oliveira Costa Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > The area is binary compatible with xen, as we use the same shadow_info structure. > > Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > arch/x86/kvm/x86.c | 98 +++++++++++++++++++++++++++++++++++++++++++- > include/asm-x86/kvm_host.h | 6 +++ > include/asm-x86/kvm_para.h | 24 +++++++++++ > include/linux/kvm.h | 1 + > 4 files changed, 128 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 8a90403..fd69aa1 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -19,6 +19,7 @@ > #include "irq.h" > #include "mmu.h" > > +#include <linux/clocksource.h> > #include <linux/kvm.h> > #include <linux/fs.h> > #include <linux/vmalloc.h> > @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = { > #ifdef CONFIG_X86_64 > MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, > #endif > - MSR_IA32_TIME_STAMP_COUNTER, > + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, > }; > > static unsigned num_msrs_to_save; > @@ -467,6 +468,73 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) > return kvm_set_msr(vcpu, index, *data); > } > > +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) > +{ > + int version = 1; > + struct wall_clock wc; > + unsigned long flags; > + struct timespec wc_ts; > + > + local_irq_save(flags); > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > + &v->arch.hv_clock.tsc_timestamp); > + wc_ts = current_kernel_time(); > + local_irq_restore(flags); > + > + down_write(¤t->mm->mmap_sem); > + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); > + up_write(¤t->mm->mmap_sem); > + > + /* With all the info we got, fill in the values */ > + wc.wc_sec = wc_ts.tv_sec; > + wc.wc_nsec = wc_ts.tv_nsec; > + wc.wc_version = ++version; > + > + down_write(¤t->mm->mmap_sem); > + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); > + up_write(¤t->mm->mmap_sem); > Can we get a comment explaining why we only write the version field and then immediately increment the version and write the whole struct? It's not at all obvious why the first write is needed to me. > +} > +static void kvm_write_guest_time(struct kvm_vcpu *v) > +{ > + struct timespec ts; > + unsigned long flags; > + struct kvm_vcpu_arch *vcpu = &v->arch; > + void *shared_kaddr; > + > + if ((!vcpu->time_page)) > + return; > + > + /* Keep irq disabled to prevent changes to the clock */ > + local_irq_save(flags); > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > + &vcpu->hv_clock.tsc_timestamp); > + ktime_get_ts(&ts); > + local_irq_restore(flags); > + > + /* With all the info we got, fill in the values */ > + > + vcpu->hv_clock.system_time = ts.tv_nsec + > + (NSEC_PER_SEC * (u64)ts.tv_sec); > + /* > + * The interface expects us to write an even number signaling that the > + * update is finished. Since the guest won't see the intermediate states, > + * we just write "2" at the end > + */ > + vcpu->hv_clock.version = 2; > + > + preempt_disable(); > + > + shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); > + > + memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, > + sizeof(vcpu->hv_clock)); > + > + kunmap_atomic(shared_kaddr, KM_USER0); > Instead of doing a kmap/memcpy, I think it would be better to store the GPA of the time page and do a kvm_write_guest(). Otherwise, you're pinning this page in memory. Regards, Anthony Liguori > + preempt_enable(); > + > + mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); > +} > + > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) > { > @@ -494,6 +562,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) > case MSR_IA32_MISC_ENABLE: > vcpu->arch.ia32_misc_enable_msr = data; > break; > + case MSR_KVM_WALL_CLOCK: > + vcpu->arch.wall_clock = data; > + kvm_write_wall_clock(vcpu, data); > + break; > + case MSR_KVM_SYSTEM_TIME: { > + vcpu->arch.time = data & PAGE_MASK; > + vcpu->arch.time_offset = data & ~PAGE_MASK; > + > + vcpu->arch.hv_clock.tsc_to_system_mul = > + clocksource_khz2mult(tsc_khz, 22); > + vcpu->arch.hv_clock.tsc_shift = 22; > + > + down_write(¤t->mm->mmap_sem); > + vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); > + up_write(¤t->mm->mmap_sem); > + if (is_error_page(vcpu->arch.time_page)) > + vcpu->arch.time_page = NULL; > + break; > + } > default: > pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data); > return 1; > @@ -553,6 +640,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) > data = vcpu->arch.shadow_efer; > break; > #endif > + case MSR_KVM_WALL_CLOCK: > + data = vcpu->arch.wall_clock; > + break; > + case MSR_KVM_SYSTEM_TIME: > + data = vcpu->arch.time; > + break; > + > default: > pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); > return 1; > @@ -680,6 +774,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_USER_MEMORY: > case KVM_CAP_SET_TSS_ADDR: > case KVM_CAP_EXT_CPUID: > + case KVM_CAP_CLOCKSOURCE: > r = 1; > break; > case KVM_CAP_VAPIC: > @@ -737,6 +832,7 @@ out: > void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > { > kvm_x86_ops->vcpu_load(vcpu, cpu); > + kvm_write_guest_time(vcpu); > } > > void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h > index d6db0de..2a621ee 100644 > --- a/include/asm-x86/kvm_host.h > +++ b/include/asm-x86/kvm_host.h > @@ -261,6 +261,12 @@ struct kvm_vcpu_arch { > /* emulate context */ > > struct x86_emulate_ctxt emulate_ctxt; > + > + gpa_t wall_clock; > + gpa_t time; > + struct kvm_vcpu_time_info hv_clock; > + unsigned int time_offset; > + struct page *time_page; > }; > > struct kvm_mem_alias { > diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h > index c6f3fd8..8c105bc 100644 > --- a/include/asm-x86/kvm_para.h > +++ b/include/asm-x86/kvm_para.h > @@ -10,10 +10,34 @@ > * paravirtualization, the appropriate feature bit should be checked. > */ > #define KVM_CPUID_FEATURES 0x40000001 > +#define KVM_FEATURE_CLOCKSOURCE 0 > + > +#define MSR_KVM_WALL_CLOCK 0x11 > +#define MSR_KVM_SYSTEM_TIME 0x12 > > #ifdef __KERNEL__ > #include <asm/processor.h> > > +/* xen binary-compatible interface. See xen headers for details */ > +struct kvm_vcpu_time_info { > + uint32_t version; > + uint32_t pad0; > + uint64_t tsc_timestamp; > + uint64_t system_time; > + uint32_t tsc_to_system_mul; > + int8_t tsc_shift; > +}; /* 32 bytes */ > + > +struct wall_clock { > + uint32_t wc_version; > + uint32_t wc_sec; > + uint32_t wc_nsec; > +}; > + > + > +extern void kvmclock_init(void); > + > + > /* This instruction is vmcall. On non-VT architectures, it will generate a > * trap that we will then rewrite to the appropriate instruction. > */ > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 4de4fd2..78ce53f 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -232,6 +232,7 @@ struct kvm_vapic_addr { > #define KVM_CAP_SET_TSS_ADDR 4 > #define KVM_CAP_EXT_CPUID 5 > #define KVM_CAP_VAPIC 6 > +#define KVM_CAP_CLOCKSOURCE 7 > > /* > * ioctls for VM fds > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <478E544C.4020603-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]
* Re: [PATCH 1/2] kvmclock - the host part. [not found] ` <478E544C.4020603-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org> @ 2008-01-17 0:18 ` Glauber de Oliveira Costa 0 siblings, 0 replies; 13+ messages in thread From: Glauber de Oliveira Costa @ 2008-01-17 0:18 UTC (permalink / raw) To: Anthony Liguori Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w Anthony Liguori wrote: > Glauber de Oliveira Costa wrote: >> This is the host part of kvm clocksource implementation. As it does >> not include clockevents, it is a fairly simple implementation. We >> only have to register a per-vcpu area, and start writting to it >> periodically. >> >> The area is binary compatible with xen, as we use the same shadow_info >> structure. >> >> Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> --- >> arch/x86/kvm/x86.c | 98 >> +++++++++++++++++++++++++++++++++++++++++++- >> include/asm-x86/kvm_host.h | 6 +++ >> include/asm-x86/kvm_para.h | 24 +++++++++++ >> include/linux/kvm.h | 1 + >> 4 files changed, 128 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 8a90403..fd69aa1 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -19,6 +19,7 @@ >> #include "irq.h" >> #include "mmu.h" >> >> +#include <linux/clocksource.h> >> #include <linux/kvm.h> >> #include <linux/fs.h> >> #include <linux/vmalloc.h> >> @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = { >> #ifdef CONFIG_X86_64 >> MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, >> #endif >> - MSR_IA32_TIME_STAMP_COUNTER, >> + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, >> }; >> >> static unsigned num_msrs_to_save; >> @@ -467,6 +468,73 @@ static int do_set_msr(struct kvm_vcpu *vcpu, >> unsigned index, u64 *data) >> return kvm_set_msr(vcpu, index, *data); >> } >> >> +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) >> +{ >> + int version = 1; >> + struct wall_clock wc; >> + unsigned long flags; >> + struct timespec wc_ts; >> + >> + local_irq_save(flags); >> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, >> + &v->arch.hv_clock.tsc_timestamp); >> + wc_ts = current_kernel_time(); >> + local_irq_restore(flags); >> + >> + down_write(¤t->mm->mmap_sem); >> + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); >> + up_write(¤t->mm->mmap_sem); >> + >> + /* With all the info we got, fill in the values */ >> + wc.wc_sec = wc_ts.tv_sec; >> + wc.wc_nsec = wc_ts.tv_nsec; >> + wc.wc_version = ++version; >> + >> + down_write(¤t->mm->mmap_sem); >> + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); >> + up_write(¤t->mm->mmap_sem); >> > > Can we get a comment explaining why we only write the version field and > then immediately increment the version and write the whole struct? It's > not at all obvious why the first write is needed to me. If the comment is the only pending thing, can we add the comment in a later commit? >> +} >> +static void kvm_write_guest_time(struct kvm_vcpu *v) >> +{ >> + struct timespec ts; >> + unsigned long flags; >> + struct kvm_vcpu_arch *vcpu = &v->arch; >> + void *shared_kaddr; >> + >> + if ((!vcpu->time_page)) >> + return; >> + >> + /* Keep irq disabled to prevent changes to the clock */ >> + local_irq_save(flags); >> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, >> + &vcpu->hv_clock.tsc_timestamp); >> + ktime_get_ts(&ts); >> + local_irq_restore(flags); >> + >> + /* With all the info we got, fill in the values */ >> + >> + vcpu->hv_clock.system_time = ts.tv_nsec + >> + (NSEC_PER_SEC * (u64)ts.tv_sec); >> + /* >> + * The interface expects us to write an even number signaling >> that the >> + * update is finished. Since the guest won't see the intermediate >> states, >> + * we just write "2" at the end >> + */ >> + vcpu->hv_clock.version = 2; >> + >> + preempt_disable(); >> + >> + shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); >> + >> + memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, >> + sizeof(vcpu->hv_clock)); >> + >> + kunmap_atomic(shared_kaddr, KM_USER0); >> > > Instead of doing a kmap/memcpy, I think it would be better to store the > GPA of the time page and do a kvm_write_guest(). Otherwise, you're > pinning this page in memory. this functions end up being called from various contexts. Some with the mmap_sem held, some uncontended. kvm_write_guest needs it held, so it would turn the code into a big spaguetti. Using the kmap was avi's suggestion to get around it, which I personally liked: we only grab the semaphore when the msr is registered. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. [not found] ` <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-01-16 19:00 ` Anthony Liguori @ 2008-01-17 7:59 ` Gerd Hoffmann 2008-01-20 15:37 ` Avi Kivity 2008-01-20 15:38 ` Avi Kivity 3 siblings, 0 replies; 13+ messages in thread From: Gerd Hoffmann @ 2008-01-17 7:59 UTC (permalink / raw) To: Glauber de Oliveira Costa Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > The area is binary compatible with xen, as we use the same shadow_info structure. comment needs an update too ;) > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > - MSR_IA32_TIME_STAMP_COUNTER, > + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, + MSR_KVM_WALL_CLOCK Looks good otherwise. cheers, Gerd ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. [not found] ` <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-01-16 19:00 ` Anthony Liguori 2008-01-17 7:59 ` Gerd Hoffmann @ 2008-01-20 15:37 ` Avi Kivity 2008-01-20 15:38 ` Avi Kivity 3 siblings, 0 replies; 13+ messages in thread From: Avi Kivity @ 2008-01-20 15:37 UTC (permalink / raw) To: Glauber de Oliveira Costa Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > The area is binary compatible with xen, as we use the same shadow_info structure. > > Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > arch/x86/kvm/x86.c | 98 +++++++++++++++++++++++++++++++++++++++++++- > include/asm-x86/kvm_host.h | 6 +++ > include/asm-x86/kvm_para.h | 24 +++++++++++ > include/linux/kvm.h | 1 + > 4 files changed, 128 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 8a90403..fd69aa1 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -19,6 +19,7 @@ > #include "irq.h" > #include "mmu.h" > > +#include <linux/clocksource.h> > #include <linux/kvm.h> > #include <linux/fs.h> > #include <linux/vmalloc.h> > @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = { > #ifdef CONFIG_X86_64 > MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, > #endif > - MSR_IA32_TIME_STAMP_COUNTER, > + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, > Missing second MSR here. > > +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock) > +{ > + int version = 1; > + struct wall_clock wc; > + unsigned long flags; > + struct timespec wc_ts; > + > + local_irq_save(flags); > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > + &v->arch.hv_clock.tsc_timestamp); > + wc_ts = current_kernel_time(); > + local_irq_restore(flags); > + > + down_write(¤t->mm->mmap_sem); > + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version)); > + up_write(¤t->mm->mmap_sem); > + > + /* With all the info we got, fill in the values */ > + wc.wc_sec = wc_ts.tv_sec; > + wc.wc_nsec = wc_ts.tv_nsec; > + wc.wc_version = ++version; > + > + down_write(¤t->mm->mmap_sem); > + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc)); > + up_write(¤t->mm->mmap_sem); > +} > missing blank line > +static void kvm_write_guest_time(struct kvm_vcpu *v) > +{ > + struct timespec ts; > + unsigned long flags; > + struct kvm_vcpu_arch *vcpu = &v->arch; > + void *shared_kaddr; > + > + if ((!vcpu->time_page)) > + return; > + > + /* Keep irq disabled to prevent changes to the clock */ > + local_irq_save(flags); > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > + &vcpu->hv_clock.tsc_timestamp); > + ktime_get_ts(&ts); > + local_irq_restore(flags); > + > + /* With all the info we got, fill in the values */ > + > + vcpu->hv_clock.system_time = ts.tv_nsec + > + (NSEC_PER_SEC * (u64)ts.tv_sec); > + /* > + * The interface expects us to write an even number signaling that the > + * update is finished. Since the guest won't see the intermediate states, > + * we just write "2" at the end > + */ > + vcpu->hv_clock.version = 2; > + > + preempt_disable(); > vcpu_load() is not preemptible anyway (and cannot be). > + > + shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); > + > + memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, > + sizeof(vcpu->hv_clock)); > + > + kunmap_atomic(shared_kaddr, KM_USER0); > + preempt_enable(); > + > + mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); > +} > + > > int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) > { > @@ -494,6 +562,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) > case MSR_IA32_MISC_ENABLE: > vcpu->arch.ia32_misc_enable_msr = data; > break; > + case MSR_KVM_WALL_CLOCK: > + vcpu->arch.wall_clock = data; > This better be a global msr (in kvm->, not vcpu->). There is precedent; some msrs in hyperthreading-capable processors are global to a core. We need some notification to update wall clock as a result of time-of-day step changes, due to ntp or the user setting the date. Hopefully such a notifier exists. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] kvmclock - the host part. [not found] ` <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> ` (2 preceding siblings ...) 2008-01-20 15:37 ` Avi Kivity @ 2008-01-20 15:38 ` Avi Kivity 3 siblings, 0 replies; 13+ messages in thread From: Avi Kivity @ 2008-01-20 15:38 UTC (permalink / raw) To: Glauber de Oliveira Costa Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y Glauber de Oliveira Costa wrote: > + > +struct wall_clock { > + uint32_t wc_version; > + uint32_t wc_sec; > + uint32_t wc_nsec; > +}; > + > kvm_wall_clock? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-02-18 14:06 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-11 18:07 [PATCH 0/2] kvm clock - merge last comments Glauber de Oliveira Costa
2008-02-11 18:07 ` [PATCH 1/2] kvmclock - the host part Glauber de Oliveira Costa
2008-02-11 18:07 ` [PATCH 2/2] kvmclock implementation, the guest part Glauber de Oliveira Costa
2008-02-13 10:49 ` [PATCH 1/2] kvmclock - the host part Avi Kivity
2008-02-13 21:45 ` Glauber Costa
2008-02-14 9:21 ` Avi Kivity
2008-02-18 14:06 ` Marcelo Tosatti
-- strict thread matches above, loose matches on Subject: below --
2008-01-16 16:31 [PATCH 0/2] kvm clock - xen compatible by accident Glauber de Oliveira Costa
[not found] ` <1200501067774-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-01-16 16:31 ` [PATCH 1/2] kvmclock - the host part Glauber de Oliveira Costa
[not found] ` <12005010761561-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-01-16 19:00 ` Anthony Liguori
[not found] ` <478E544C.4020603-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2008-01-17 0:18 ` Glauber de Oliveira Costa
2008-01-17 7:59 ` Gerd Hoffmann
2008-01-20 15:37 ` Avi Kivity
2008-01-20 15:38 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox