From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEE3EC433EF for ; Wed, 11 May 2022 20:34:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347821AbiEKUea (ORCPT ); Wed, 11 May 2022 16:34:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230383AbiEKUe1 (ORCPT ); Wed, 11 May 2022 16:34:27 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20DFA98092 for ; Wed, 11 May 2022 13:34:26 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id bj12-20020a056a02018c00b003a9eebaad34so1526523pgb.10 for ; Wed, 11 May 2022 13:34:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=zE2XEkp4B9K0dDNGPYDUEidf+D6l4wjVA+kP1Xd1v40=; b=WwIurN4tWoXFU2hoMQSI8OmYw8RWtjZBli2InX37gKIT/LH8aIEfsA3I8JGf4HSS09 aGHV9O5NJXzpPZEdWxVwUpg2PHYHepiaSbDWzP5Nen1zYJIDxUSIwGIo0uxDxcNRcM5/ rpJ5K19OcHip7lC3oRSRYztNVM2VCUpSZjM6C4AhhRN8TqtXB7mi2Gf16d/1GqbxcOyt sjJvZXj5WAcYdhu7BfJCsqF/AaQv7AUlWOAEmOGS+17gDNkdZl4eSQ0c5DTuHWFh0G0c ynO5OW74D3x3cvyQocCbSIBVf7HMRoe+X04LAqqvt9Xiq8194oy/Mi2V7RnWLCRHqlaG Zr2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=zE2XEkp4B9K0dDNGPYDUEidf+D6l4wjVA+kP1Xd1v40=; b=xA2JlQqRHMEFn9knUtfDiusFBayyO/ZuY2qLxcnZ7xAZEWFugvGzfAxC+/6T1TrzAw Jq5sEtp3+dyjIRHrYabvYJQHMh8K3isYk8dEGzvGEI3bZIud3Fq1RcMcOr34n4K9N9bs YhBITWB5AVSDntu7lY2hhqCPzFbFeh/42SMoibybn/cpJ3Yubn/d4ii/FeOiDQlL9Nyw XtIsvfLzHz6wOXe1tI3R4ZJEHNUW4695FFxlghytqCEIy/jjynEW9VRvifxbtf+IrS6r 007BDPDGA9Gd/Wwf9v8SP5e154mOW6mppEnCs9rqh23nMjZAET7cQUMMGzvdiBZX2mkD L3Iw== X-Gm-Message-State: AOAM532aEHGfCXuc42UJnJlMg1ICr5HNJWA/Yb4XEo2sWGPcyGsDx7eU JWpO74NNf/7maR8QlVPMNz/LxyynKJvL4MAn0IfUk10vdqoLNtDHBK1YwN7WDZ9ophioB+qEyCq pN2aDpoWrM9LhafYYBofBMrC2QUSXv2DYoyWVMvRhx4jcSBZTTU14mEJ6ZpEM37E= X-Google-Smtp-Source: ABdhPJymTcbl3i7fse56ivA+sAMIdpD/Lwy4o3VfjQSqthpwCY6CRY2uNW8jstivWOduQzRUDt10IWfSWMMFBA== X-Received: from romanton.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:43ef]) (user=romanton job=sendgmr) by 2002:a17:902:6a83:b0:15f:28b9:ea6 with SMTP id n3-20020a1709026a8300b0015f28b90ea6mr8237614plk.156.1652301265442; Wed, 11 May 2022 13:34:25 -0700 (PDT) Date: Wed, 11 May 2022 20:29:33 +0000 Message-Id: <20220511202932.3266607-1-romanton@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.36.0.550.gb090851708-goog Subject: [PATCH v5] KVM: x86: Use current rather than snapshotted TSC frequency if it is constant From: Anton Romanov To: kvm@vger.kernel.org, pbonzini@redhat.com Cc: seanjc@google.com, vkuznets@redhat.com, mlevitsk@redhat.com, Anton Romanov Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Don't snapshot tsc_khz into per-cpu cpu_tsc_khz if the host TSC is constant, in which case the actual TSC frequency will never change and thus capturing TSC during initialization is unnecessary, KVM can simply use tsc_khz. This value is snapshotted from kvm_timer_init->kvmclock_cpu_online->tsc_khz_changed(NULL) On CPUs with constant TSC, but not a hardware-specified TSC frequency, snapshotting cpu_tsc_khz and using that to set a VM's target TSC frequency can lead to VM to think its TSC frequency is not what it actually is if refining the TSC completes after KVM snapshots tsc_khz. The actual frequency never changes, only the kernel's calculation of what that frequency is changes. Ideally, KVM would not be able to race with TSC refinement, or would have a hook into tsc_refine_calibration_work() to get an alert when refinement is complete. Avoiding the race altogether isn't practical as refinement takes a relative eternity; it's deliberately put on a work queue outside of the normal boot sequence to avoid unnecessarily delaying boot. Adding a hook is doable, but somewhat gross due to KVM's ability to be built as a module. And if the TSC is constant, which is likely the case for every VMX/SVM-capable CPU produced in the last decade, the race can be hit if and only if userspace is able to create a VM before TSC refinement completes; refinement is slow, but not that slow. For now, punt on a proper fix, as not taking a snapshot can help some uses cases and not taking a snapshot is arguably correct irrespective of the race with refinement. Signed-off-by: Anton Romanov --- v5: * do not setup kvmclock_cpu_online at all with X86_FEATURE_CONSTANT_TSC arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4790f0d7d40b..9295e2af8e81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2907,6 +2907,22 @@ static void kvm_update_masterclock(struct kvm *kvm) kvm_end_pvclock_update(kvm); } +/* + * Use the kernel's tsc_khz directly if the TSC is constant, otherwise use KVM's + * per-CPU value (which may be zero if a CPU is going offline). Note, tsc_khz + * can change during boot even if the TSC is constant, as it's possible for KVM + * to be loaded before TSC calibration completes. Ideally, KVM would get a + * notification when calibration completes, but practically speaking calibration + * will complete before userspace is alive enough to create VMs. + */ +static unsigned long get_cpu_tsc_khz(void) +{ + if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) + return tsc_khz; + else + return __this_cpu_read(cpu_tsc_khz); +} + /* Called within read_seqcount_begin/retry for kvm->pvclock_sc. */ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { @@ -2917,7 +2933,8 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) get_cpu(); data->flags = 0; - if (ka->use_master_clock && __this_cpu_read(cpu_tsc_khz)) { + if (ka->use_master_clock && + (static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc_khz))) { #ifdef CONFIG_X86_64 struct timespec64 ts; @@ -2931,7 +2948,7 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) data->flags |= KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; - kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, + kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL, &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); @@ -3049,7 +3066,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); - tgt_tsc_khz = __this_cpu_read(cpu_tsc_khz); + tgt_tsc_khz = get_cpu_tsc_khz(); if (unlikely(tgt_tsc_khz == 0)) { local_irq_restore(flags); kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); @@ -8646,9 +8663,11 @@ static void tsc_khz_changed(void *data) struct cpufreq_freqs *freq = data; unsigned long khz = 0; + WARN_ON_ONCE(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)); + if (data) khz = freq->new; - else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) + else khz = cpufreq_quick_get(raw_smp_processor_id()); if (!khz) khz = tsc_khz; @@ -8669,8 +8688,10 @@ static void kvm_hyperv_tsc_notifier(void) hyperv_stop_tsc_emulation(); /* TSC frequency always matches when on Hyper-V */ - for_each_present_cpu(cpu) - per_cpu(cpu_tsc_khz, cpu) = tsc_khz; + if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { + for_each_present_cpu(cpu) + per_cpu(cpu_tsc_khz, cpu) = tsc_khz; + } kvm_max_guest_tsc_khz = tsc_khz; list_for_each_entry(kvm, &vm_list, vm_list) { @@ -8807,10 +8828,10 @@ static void kvm_timer_init(void) #endif cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER); - } - cpuhp_setup_state(CPUHP_AP_X86_KVM_CLK_ONLINE, "x86/kvm/clk:online", - kvmclock_cpu_online, kvmclock_cpu_down_prep); + cpuhp_setup_state(CPUHP_AP_X86_KVM_CLK_ONLINE, "x86/kvm/clk:online", + kvmclock_cpu_online, kvmclock_cpu_down_prep); + } } #ifdef CONFIG_X86_64 @@ -8963,10 +8984,11 @@ void kvm_arch_exit(void) #endif kvm_lapic_exit(); - if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) + if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER); - cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE); + cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE); + } #ifdef CONFIG_X86_64 pvclock_gtod_unregister_notifier(&pvclock_gtod_notifier); irq_work_sync(&pvclock_irq_work); -- 2.36.0.550.gb090851708-goog