* [PATCH] KVM:x86: avoid VMCS access from non-vCPU context @ 2016-10-31 17:12 Roman Kagan 2016-10-31 23:31 ` Paolo Bonzini 0 siblings, 1 reply; 4+ messages in thread From: Roman Kagan @ 2016-10-31 17:12 UTC (permalink / raw) To: kvm; +Cc: Radim Krčmář, Denis V. Lunev, Roman Kagan, Paolo Bonzini In commit 108b249c453dd7132599ab6dc7e435a7036c193f, "KVM: x86: introduce get_kvmclock_ns", a function to obtain the time as would be done by the guest via kvmclock was introduced and used throughout the code. The problem is that it reads the guest TSC value via kvm_read_l1_tsc which can only be called in vCPU context as it reads VMCS(TSC_OFFSET) directly (on Intel CPUs). Therefore, when called in kvm_arch_vm_ioctl for KVM_[GS]ET_CLOCK, it returns bogus values, breaking save/restore. Make this function take vCPU pointer instead and use it only in vCPU context (also adjust Hyper-V's get_time_ref_counter similarly); for ioctl(KVM_[GS]ET_CLOCK) manipulate kvmclock_offset directly under the assumption that the user of those ioctls is prepared to deal with time warps or calls them when the vCPUs aren't running (i.e. save/restore). Fixes: 108b249c453dd7132599ab6dc7e435a7036c193f Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> --- arch/x86/kvm/hyperv.c | 16 +++++++--------- arch/x86/kvm/x86.c | 21 ++++++++------------- arch/x86/kvm/x86.h | 2 +- 3 files changed, 16 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 42b1c83..7508a32 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -384,10 +384,9 @@ static void synic_init(struct kvm_vcpu_hv_synic *synic) } } -static u64 get_time_ref_counter(struct kvm *kvm) +static u64 get_time_ref_counter(struct kvm_vcpu *vcpu) { - struct kvm_hv *hv = &kvm->arch.hyperv; - struct kvm_vcpu *vcpu; + struct kvm_hv *hv = &vcpu->kvm->arch.hyperv; u64 tsc; /* @@ -395,9 +394,8 @@ static u64 get_time_ref_counter(struct kvm *kvm) * stable, fall back to get_kvmclock_ns. */ if (!hv->tsc_ref.tsc_sequence) - return div_u64(get_kvmclock_ns(kvm), 100); + return div_u64(get_kvmclock_ns(vcpu), 100); - vcpu = kvm_get_vcpu(kvm, 0); tsc = kvm_read_l1_tsc(vcpu, rdtsc()); return mul_u64_u64_shr(tsc, hv->tsc_ref.tsc_scale, 64) + hv->tsc_ref.tsc_offset; @@ -451,7 +449,7 @@ static int stimer_start(struct kvm_vcpu_hv_stimer *stimer) u64 time_now; ktime_t ktime_now; - time_now = get_time_ref_counter(stimer_to_vcpu(stimer)->kvm); + time_now = get_time_ref_counter(stimer_to_vcpu(stimer)); ktime_now = ktime_get(); if (stimer->config & HV_STIMER_PERIODIC) { @@ -591,7 +589,7 @@ static int stimer_send_msg(struct kvm_vcpu_hv_stimer *stimer) (struct hv_timer_message_payload *)&msg->u.payload; payload->expiration_time = stimer->exp_time; - payload->delivery_time = get_time_ref_counter(vcpu->kvm); + payload->delivery_time = get_time_ref_counter(vcpu); return synic_deliver_msg(vcpu_to_synic(vcpu), HV_STIMER_SINT(stimer->config), msg); } @@ -626,7 +624,7 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu) if (exp_time) { time_now = - get_time_ref_counter(vcpu->kvm); + get_time_ref_counter(vcpu); if (time_now >= exp_time) stimer_expiration(stimer); } @@ -1051,7 +1049,7 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) data = hv->hv_hypercall; break; case HV_X64_MSR_TIME_REF_COUNT: - data = get_time_ref_counter(kvm); + data = get_time_ref_counter(vcpu); break; case HV_X64_MSR_REFERENCE_TSC: data = hv->hv_tsc_page; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e375235..650ed5f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1722,10 +1722,9 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) #endif } -static u64 __get_kvmclock_ns(struct kvm *kvm) +static u64 __get_kvmclock_ns(struct kvm_vcpu *vcpu) { - struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); - struct kvm_arch *ka = &kvm->arch; + struct kvm_arch *ka = &vcpu->kvm->arch; s64 ns; if (vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT) { @@ -1738,13 +1737,13 @@ static u64 __get_kvmclock_ns(struct kvm *kvm) return ns; } -u64 get_kvmclock_ns(struct kvm *kvm) +u64 get_kvmclock_ns(struct kvm_vcpu *vcpu) { unsigned long flags; s64 ns; local_irq_save(flags); - ns = __get_kvmclock_ns(kvm); + ns = __get_kvmclock_ns(vcpu); local_irq_restore(flags); return ns; @@ -4081,7 +4080,6 @@ long kvm_arch_vm_ioctl(struct file *filp, } case KVM_SET_CLOCK: { struct kvm_clock_data user_ns; - u64 now_ns; r = -EFAULT; if (copy_from_user(&user_ns, argp, sizeof(user_ns))) @@ -4092,19 +4090,16 @@ long kvm_arch_vm_ioctl(struct file *filp, goto out; r = 0; - local_irq_disable(); - now_ns = __get_kvmclock_ns(kvm); - kvm->arch.kvmclock_offset += user_ns.clock - now_ns; - local_irq_enable(); + kvm->arch.kvmclock_offset = user_ns.clock - + ktime_get_boot_ns(); kvm_gen_update_masterclock(kvm); break; } case KVM_GET_CLOCK: { struct kvm_clock_data user_ns; - u64 now_ns; - now_ns = get_kvmclock_ns(kvm); - user_ns.clock = now_ns; + user_ns.clock = ktime_get_boot_ns() + + kvm->arch.kvmclock_offset; user_ns.flags = 0; memset(&user_ns.pad, 0, sizeof(user_ns.pad)); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index e8ff3e4..002bb85 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -159,7 +159,7 @@ void kvm_set_pending_timer(struct kvm_vcpu *vcpu); int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr); -u64 get_kvmclock_ns(struct kvm *kvm); +u64 get_kvmclock_ns(struct kvm_vcpu *vcpu); int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes, -- 2.7.4 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM:x86: avoid VMCS access from non-vCPU context 2016-10-31 17:12 [PATCH] KVM:x86: avoid VMCS access from non-vCPU context Roman Kagan @ 2016-10-31 23:31 ` Paolo Bonzini 2016-11-03 14:17 ` Roman Kagan 0 siblings, 1 reply; 4+ messages in thread From: Paolo Bonzini @ 2016-10-31 23:31 UTC (permalink / raw) To: Roman Kagan Cc: kvm, Radim Krčmář, Denis V. Lunev, Luiz Capitulino > In commit 108b249c453dd7132599ab6dc7e435a7036c193f, "KVM: x86: introduce > get_kvmclock_ns", a function to obtain the time as would be done by the > guest via kvmclock was introduced and used throughout the code. > > The problem is that it reads the guest TSC value via kvm_read_l1_tsc > which can only be called in vCPU context as it reads VMCS(TSC_OFFSET) > directly (on Intel CPUs). Therefore, when called in kvm_arch_vm_ioctl > for KVM_[GS]ET_CLOCK, it returns bogus values, breaking save/restore. The patch isn't good unfortunately, because only __get_kvmclock_ns can return the pvclock value. However, after Luiz's commit a545ab6a0085 ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) the TSC offset is cached outside the VMCS/VMCB and all the complexity in vmx.c and (to a lesser extent) svm.c can be dropped. See the following untested patch... Paolo diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4b20f7304b9c..bdde80731f49 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -948,7 +948,6 @@ struct kvm_x86_ops { int (*get_lpage_level)(void); bool (*rdtscp_supported)(void); bool (*invpcid_supported)(void); - void (*adjust_tsc_offset_guest)(struct kvm_vcpu *vcpu, s64 adjustment); void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); @@ -958,8 +957,6 @@ struct kvm_x86_ops { void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); - u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc); - void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2); int (*check_intercept)(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f8157a36ab09..8ca1eca5038d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1138,21 +1138,6 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) mark_dirty(svm->vmcb, VMCB_INTERCEPTS); } -static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment) -{ - struct vcpu_svm *svm = to_svm(vcpu); - - svm->vmcb->control.tsc_offset += adjustment; - if (is_guest_mode(vcpu)) - svm->nested.hsave->control.tsc_offset += adjustment; - else - trace_kvm_write_tsc_offset(vcpu->vcpu_id, - svm->vmcb->control.tsc_offset - adjustment, - svm->vmcb->control.tsc_offset); - - mark_dirty(svm->vmcb, VMCB_INTERCEPTS); -} - static void avic_init_vmcb(struct vcpu_svm *svm) { struct vmcb *vmcb = svm->vmcb; @@ -3449,12 +3434,6 @@ static int cr8_write_interception(struct vcpu_svm *svm) return 0; } -static u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) -{ - struct vmcb *vmcb = get_host_vmcb(to_svm(vcpu)); - return vmcb->control.tsc_offset + host_tsc; -} - static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct vcpu_svm *svm = to_svm(vcpu); @@ -5422,8 +5401,6 @@ static inline void avic_post_state_restore(struct kvm_vcpu *vcpu) .has_wbinvd_exit = svm_has_wbinvd_exit, .write_tsc_offset = svm_write_tsc_offset, - .adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest, - .read_l1_tsc = svm_read_l1_tsc, .set_tdp_cr3 = set_tdp_cr3, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 74a4df993a51..754c3a7f444a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -421,7 +421,6 @@ struct nested_vmx { /* vmcs02_list cache of VMCSs recently used to run L2 guests */ struct list_head vmcs02_pool; int vmcs02_num; - u64 vmcs01_tsc_offset; bool change_vmcs01_virtual_x2apic_mode; /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending; @@ -2605,20 +2604,6 @@ static u64 guest_read_tsc(struct kvm_vcpu *vcpu) } /* - * Like guest_read_tsc, but always returns L1's notion of the timestamp - * counter, even if a nested guest (L2) is currently running. - */ -static u64 vmx_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) -{ - u64 tsc_offset; - - tsc_offset = is_guest_mode(vcpu) ? - to_vmx(vcpu)->nested.vmcs01_tsc_offset : - vmcs_read64(TSC_OFFSET); - return host_tsc + tsc_offset; -} - -/* * writes 'offset' into guest's timestamp counter offset register */ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) @@ -2631,7 +2616,6 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) * to the newly set TSC to get L2's TSC. */ struct vmcs12 *vmcs12; - to_vmx(vcpu)->nested.vmcs01_tsc_offset = offset; /* recalculate vmcs02.TSC_OFFSET: */ vmcs12 = get_vmcs12(vcpu); vmcs_write64(TSC_OFFSET, offset + @@ -2644,19 +2628,6 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) } } -static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment) -{ - u64 offset = vmcs_read64(TSC_OFFSET); - - vmcs_write64(TSC_OFFSET, offset + adjustment); - if (is_guest_mode(vcpu)) { - /* Even when running L2, the adjustment needs to apply to L1 */ - to_vmx(vcpu)->nested.vmcs01_tsc_offset += adjustment; - } else - trace_kvm_write_tsc_offset(vcpu->vcpu_id, offset, - offset + adjustment); -} - static bool guest_cpuid_has_vmx(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best = kvm_find_cpuid_entry(vcpu, 1, 0); @@ -10061,9 +10032,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) vmcs_write64(TSC_OFFSET, - vmx->nested.vmcs01_tsc_offset + vmcs12->tsc_offset); + vcpu->arch.tsc_offset + vmcs12->tsc_offset); else - vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); + vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset); if (kvm_has_tsc_control) decache_tsc_multiplier(vmx); @@ -10293,8 +10264,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) enter_guest_mode(vcpu); - vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET); - if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); @@ -10818,7 +10787,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, load_vmcs12_host_state(vcpu, vmcs12); /* Update any VMCS fields that might have changed while L2 ran */ - vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); + vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset); if (vmx->hv_deadline_tsc == -1) vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_VMX_PREEMPTION_TIMER); @@ -11339,8 +11308,6 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu) .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, .write_tsc_offset = vmx_write_tsc_offset, - .adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest, - .read_l1_tsc = vmx_read_l1_tsc, .set_tdp_cr3 = vmx_set_cr3, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e954be8a3185..3017de0431bd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1409,7 +1409,7 @@ static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) { - return kvm_x86_ops->read_l1_tsc(vcpu, kvm_scale_tsc(vcpu, host_tsc)); + return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc); } EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); @@ -1547,7 +1547,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr) static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment) { - kvm_x86_ops->adjust_tsc_offset_guest(vcpu, adjustment); + kvm_vcpu_write_tsc_offset(vcpu, vcpu->arch.tsc_offset + adjustment); } static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment) @@ -1555,7 +1555,7 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment) if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio) WARN_ON(adjustment < 0); adjustment = kvm_scale_tsc(vcpu, (u64) adjustment); - kvm_x86_ops->adjust_tsc_offset_guest(vcpu, adjustment); + adjust_tsc_offset_guest(vcpu, adjustment); } #ifdef CONFIG_X86_64 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM:x86: avoid VMCS access from non-vCPU context 2016-10-31 23:31 ` Paolo Bonzini @ 2016-11-03 14:17 ` Roman Kagan 2016-11-03 17:10 ` Paolo Bonzini 0 siblings, 1 reply; 4+ messages in thread From: Roman Kagan @ 2016-11-03 14:17 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, Radim Krčmář, Denis V. Lunev, Luiz Capitulino On Mon, Oct 31, 2016 at 07:31:53PM -0400, Paolo Bonzini wrote: > > > In commit 108b249c453dd7132599ab6dc7e435a7036c193f, "KVM: x86: introduce > > get_kvmclock_ns", a function to obtain the time as would be done by the > > guest via kvmclock was introduced and used throughout the code. > > > > The problem is that it reads the guest TSC value via kvm_read_l1_tsc > > which can only be called in vCPU context as it reads VMCS(TSC_OFFSET) > > directly (on Intel CPUs). Therefore, when called in kvm_arch_vm_ioctl > > for KVM_[GS]ET_CLOCK, it returns bogus values, breaking save/restore. > > The patch isn't good unfortunately, because only __get_kvmclock_ns > can return the pvclock value. Agreed. > However, after Luiz's commit a545ab6a0085 > ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) > the TSC offset is cached outside the VMCS/VMCB I was tempted to use it but saw it not updated in adjust_tsc_offset_* and thought it had some different meaning. Apparently that was just a bug which you also fix in your patch by making adjust_tsc_offset_guest go through kvm_vcpu_write_tsc_offset. > and all the complexity in > vmx.c and (to a lesser extent) svm.c can be dropped. See the following > untested patch... > Consider it Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Tested-by: Roman Kagan <rkagan@virtuozzo.com> Thanks, Roman. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM:x86: avoid VMCS access from non-vCPU context 2016-11-03 14:17 ` Roman Kagan @ 2016-11-03 17:10 ` Paolo Bonzini 0 siblings, 0 replies; 4+ messages in thread From: Paolo Bonzini @ 2016-11-03 17:10 UTC (permalink / raw) To: Roman Kagan, kvm, Radim Krčmář, Denis V. Lunev, Luiz Capitulino On 03/11/2016 15:17, Roman Kagan wrote: > On Mon, Oct 31, 2016 at 07:31:53PM -0400, Paolo Bonzini wrote: >> >>> In commit 108b249c453dd7132599ab6dc7e435a7036c193f, "KVM: x86: introduce >>> get_kvmclock_ns", a function to obtain the time as would be done by the >>> guest via kvmclock was introduced and used throughout the code. >>> >>> The problem is that it reads the guest TSC value via kvm_read_l1_tsc >>> which can only be called in vCPU context as it reads VMCS(TSC_OFFSET) >>> directly (on Intel CPUs). Therefore, when called in kvm_arch_vm_ioctl >>> for KVM_[GS]ET_CLOCK, it returns bogus values, breaking save/restore. >> >> The patch isn't good unfortunately, because only __get_kvmclock_ns >> can return the pvclock value. > > Agreed. > >> However, after Luiz's commit a545ab6a0085 >> ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) >> the TSC offset is cached outside the VMCS/VMCB > > I was tempted to use it but saw it not updated in > adjust_tsc_offset_* and thought it had some different meaning. > Apparently that was just a bug which you also fix in your patch by > making adjust_tsc_offset_guest go through kvm_vcpu_write_tsc_offset. > >> and all the complexity in >> vmx.c and (to a lesser extent) svm.c can be dropped. See the following >> untested patch... >> > > Consider it > > Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> > Tested-by: Roman Kagan <rkagan@virtuozzo.com> Too bad, I've already sent out the pull request so I couldn't include the tags. But thanks very much for the careful review and for testing! Paolo ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-11-03 17:10 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-31 17:12 [PATCH] KVM:x86: avoid VMCS access from non-vCPU context Roman Kagan 2016-10-31 23:31 ` Paolo Bonzini 2016-11-03 14:17 ` Roman Kagan 2016-11-03 17:10 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).