From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B1A91A5B98 for ; Thu, 27 Feb 2025 02:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740622745; cv=none; b=eIU4NApiqRjGA2zV6Shqi0hbrderXGeUhOzXLGr24cPqEyZCbomPKBe02kD6A6fr8ul+GOGX0G/sqHDvk0gmiXa5LkVE8gvgz+DT0RbgtHsVghFuW7v2AVCRNHK7lXwBhIhO5KsmtrHVqVTrcsblgtK5jiVQuFpXc7ysmybCzKw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740622745; c=relaxed/simple; bh=6vzm1ly57iavrGyYCwy8YTCbhY8JsEYqKT94TS5Hqqk=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=J7hLFez7WAaumj59oNhgKWfLI5tBoySUrMpO63Ljnl+OvXpY9AfFaaTQyG6rDu0IOmATfkttQkE5RssLtznM4PqIz9aEEluyQsa+Kx+2eotWI0w7Pz7fMcmQ/V6kkEvr4Uxyh/8wudtB7Qqh0bthYpMtMbL0fvEfuLRjGJFeIUw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PyCGmm6G; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PyCGmm6G" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2fc4dc34291so1060332a91.3 for ; Wed, 26 Feb 2025 18:19:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740622743; x=1741227543; darn=lists.linux.dev; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=st/MNJiEmtSZ09xGK+8+BbgbXd3TSC2HpMD5MWqL4Rs=; b=PyCGmm6Gsh3ktXrO9y0rvWRyTvRVP588sMS338qynA7oMfeOvphlqb7d602mu9ut+8 amRBfF6D/zAOxMWGAirkTdezOSvL0v9bybb6nuvl5Bz+CoIEqBDLMQaNMHcwWxcBquaR EQfwVNqC90LATGMuTZZiQopraupMmk5mpUfT7nWnZNSZaeLzy6dvfpGRNfe7jp73XF8F 3zYbRTX7mu3WKMGzgijTAO85v2SAnT7xUQ7vUyyB+T2L59DcsYpxpfiJvzK4I0WRjiLN zR61gRtI6gZK3GAg9li5INtXHjp61HNdrh2QUhUS+ktKV/VNttUVbgtU7cz9HF1p81Gq y86g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740622743; x=1741227543; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=st/MNJiEmtSZ09xGK+8+BbgbXd3TSC2HpMD5MWqL4Rs=; b=oSdrTeC1TA3p86i8u+VFJVLEQ/m18PpFEWiPWRgy858q6ER7PMiOn091ng0+Lg+Voi jyW919tjbOgom/VavaKreVMBiuBRVMYVMfBfCUvmtKt16d2A9xzEHkf3gBGokSNgNzIH BQ31v9ZoY6PajfeuoIk7NU6Zhxpu3nHiYWJXh/PvkP7m2g/dtUn29+VV0Ch0pUhPDmQ1 A9+AyE8ND6oFazPUQuQl4FoFH2rNvOsD1tosY3sINiQVyhsb3fc/mmRWFVup3cvypvEX gIR4KX963hoDLeu+baDvclYC76h7YMK/akyCtSun3khJi8R1XgrrtrmBfABL8xhVohDh hS7Q== X-Forwarded-Encrypted: i=1; AJvYcCV9eFA/2YJH6bGTRSlWMDU9SELIKapAhCtZh3JZTGMQLQgYV9m6J+tBxXJDobpWw/U/Fj9lyM/gl5Ep6mc5ew==@lists.linux.dev X-Gm-Message-State: AOJu0Yx64zdlvxaAU5HdKkToo09214KIldlJsmcA5wUJsujjVeNvGPWZ 7ucjicGTnx2h3zgGsyEEIdlMqZH64DDjX7GDX6h2lKE0np3zPox2RLZIs9c/Bnpx30w+1N6zyoh 8pg== X-Google-Smtp-Source: AGHT+IH0Z5GBXLArvAZe/lakaosQcAMuaUE8ywgN1crYyvWwIhA/1v2sV6rL/y5U8JWXjzB9EKIe2Lk1Tz0= X-Received: from pjbqn6.prod.google.com ([2002:a17:90b:3d46:b0:2fc:201d:6026]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:17cd:b0:2f9:c139:b61f with SMTP id 98e67ed59e1d1-2fce78a3812mr44219981a91.14.1740622742780; Wed, 26 Feb 2025 18:19:02 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 26 Feb 2025 18:18:16 -0800 Precedence: bulk X-Mailing-List: virtualization@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.48.1.711.g2feabab25a-goog Message-ID: <20250227021855.3257188-1-seanjc@google.com> Subject: [PATCH v2 00/38] x86: Try to wrangle PV clocks vs. TSC From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "Kirill A. Shutemov" , Paolo Bonzini , Sean Christopherson , Juergen Gross , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Ajay Kaher , Jan Kiszka , Andy Lutomirski , Peter Zijlstra , Daniel Lezcano , John Stultz Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, virtualization@lists.linux.dev, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, Tom Lendacky , Nikunj A Dadhania Content-Type: text/plain; charset="UTF-8" This... snowballed a bit. The bulk of the changes are in kvmclock and TSC, but pretty much every hypervisor's guest-side code gets touched at some point. I am reaonsably confident in the correctness of the KVM changes. For all other hypervisors, assume it's completely broken until proven otherwise. Note, I deliberately omitted: Alexey Makhalov jailhouse-dev@googlegroups.com from the To/Cc, as those emails bounced on the last version, and I have zero desire to get 38*2 emails telling me an email couldn't be delivered. The primary goal of this series is (or at least was, when I started) to fix flaws with SNP and TDX guests where a PV clock provided by the untrusted hypervisor is used instead of the secure/trusted TSC that is controlled by trusted firmware. The secondary goal is to draft off of the SNP and TDX changes to slightly modernize running under KVM. Currently, KVM guests will use TSC for clocksource, but not sched_clock. And they ignore Intel's CPUID-based TSC and CPU frequency enumeration, even when using the TSC instead of kvmclock. And if the host provides the core crystal frequency in CPUID.0x15, then KVM guests can use that for the APIC timer period instead of manually calibrating the frequency. Lots more background on the SNP/TDX motiviation: https://lore.kernel.org/all/20250106124633.1418972-13-nikunj@amd.com v2: - Add struct to hold the TSC CPUID output. [Boris] - Don't pointlessly inline the TSC CPUID helpers. [Boris] - Fix a variable goof in a helper, hopefully for real this time. [Dan] - Collect reviews. [Nikunj] - Override the sched_clock save/restore hooks if and only if a PV clock is successfully registered. - During resome, restore clocksources before reading persistent time. - Clean up more warts created by kvmclock. - Fix more bugs in kvmclock's suspend/resume handling. - Try to harden kvmclock against future bugs. v1: https://lore.kernel.org/all/20250201021718.699411-1-seanjc@google.com Sean Christopherson (38): x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15 x86/tsc: Add standalone helper for getting CPU frequency from CPUID x86/tsc: Add helper to register CPU and TSC freq calibration routines x86/sev: Mark TSC as reliable when configuring Secure TSC x86/sev: Move check for SNP Secure TSC support to tsc_early_init() x86/tdx: Override PV calibration routines with CPUID-based calibration x86/acrn: Mark TSC frequency as known when using ACRN for calibration clocksource: hyper-v: Register sched_clock save/restore iff it's necessary clocksource: hyper-v: Drop wrappers to sched_clock save/restore helpers clocksource: hyper-v: Don't save/restore TSC offset when using HV sched_clock x86/kvmclock: Setup kvmclock for secondary CPUs iff CONFIG_SMP=y x86/kvm: Don't disable kvmclock on BSP in syscore_suspend() x86/paravirt: Move handling of unstable PV clocks into paravirt_set_sched_clock() x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c x86/xen/time: Nullify x86_platform's sched_clock save/restore hooks x86/vmware: Nullify save/restore hooks when using VMware's sched_clock x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock x86/paravirt: Pass sched_clock save/restore helpers during registration x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init x86/pvclock: Mark setup helpers and related various as __init/__ro_after_init x86/pvclock: WARN if pvclock's valid_flags are overwritten x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during kvmclock_init() timekeeping: Resume clocksources before reading persistent clock x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock x86/kvmclock: WARN if wall clock is read while kvmclock is suspended x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't sched_clock x86/paravirt: Mark __paravirt_set_sched_clock() as __init x86/paravirt: Plumb a return code into __paravirt_set_sched_clock() x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted TSC x86/tsc: Pass KNOWN_FREQ and RELIABLE as params to registration x86/tsc: Rejects attempts to override TSC calibration with lesser routine x86/kvmclock: Mark TSC as reliable when it's constant and nonstop x86/kvmclock: Get CPU base frequency from CPUID when it's available x86/kvmclock: Get TSC frequency from CPUID when its available x86/kvmclock: Stuff local APIC bus period when core crystal freq comes from CPUID x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop x86/paravirt: kvmclock: Setup kvmclock early iff it's sched_clock arch/x86/coco/sev/core.c | 9 +- arch/x86/coco/tdx/tdx.c | 27 ++- arch/x86/include/asm/kvm_para.h | 10 +- arch/x86/include/asm/paravirt.h | 16 +- arch/x86/include/asm/tdx.h | 2 + arch/x86/include/asm/tsc.h | 20 +++ arch/x86/include/asm/x86_init.h | 2 - arch/x86/kernel/cpu/acrn.c | 5 +- arch/x86/kernel/cpu/mshyperv.c | 69 +------- arch/x86/kernel/cpu/vmware.c | 11 +- arch/x86/kernel/jailhouse.c | 6 +- arch/x86/kernel/kvm.c | 39 +++-- arch/x86/kernel/kvmclock.c | 260 +++++++++++++++++++++-------- arch/x86/kernel/paravirt.c | 35 +++- arch/x86/kernel/pvclock.c | 9 +- arch/x86/kernel/smpboot.c | 2 +- arch/x86/kernel/tsc.c | 141 ++++++++++++---- arch/x86/kernel/x86_init.c | 1 - arch/x86/mm/mem_encrypt_amd.c | 3 - arch/x86/xen/time.c | 13 +- drivers/clocksource/hyperv_timer.c | 38 +++-- include/clocksource/hyperv_timer.h | 2 - kernel/time/timekeeping.c | 9 +- 23 files changed, 487 insertions(+), 242 deletions(-) base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3 -- 2.48.1.711.g2feabab25a-goog