From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E84C838C438 for ; Fri, 15 May 2026 19:20:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778872844; cv=none; b=MGka4K5eV3bSJ0T6/4V7LY+HkidchFHXGH3HQxmXufLcQ6Wn31D2CUorR4zGaJiKY6tyQ6dafi1ee2CZYG7927z7CD8Z+77UEnaDzojn+6cgiz2+v2yV3Uw5J+bKP5qyRrO9eu6H8JMVMCM2zG24EKjhJ9CZYkm33QP3t/5UhSQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778872844; c=relaxed/simple; bh=kRIdSZAEhRDIy6thx1PxO41yunUq4X5taCThhob60Ug=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CgVeYKr9U/gQD/qvE5zoWBFatrtyDdYY+B4QWTLt0iO/9bDQHBM47VHNKacnpSATjI04cfzOdY3D56GAje7FrXlyWnqPYxmO2Vk7Qageyg/TvVMmuqueYDyTBlZQFypVTwksix0pNvnk5PHfgsn08BJ5WXE9PBbfuz2w2tnvR0g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=frtXgtAB; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="frtXgtAB" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c802545ae0eso150415a12.2 for ; Fri, 15 May 2026 12:20:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778872842; x=1779477642; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=b0rQrOpjJ+Fm4CzrD79jMnHA824Fta4drwfB/O0XH0s=; b=frtXgtAB+UhY+IbJUC8AH7eQq50Y8IfEHNs2pCgc7gSMUqES7Oel5XkYCj4ODo1KnD QByKQQVSLp0hnbW57QKvRRNMkKzn4tuulMwXb16JjDXnsVar2Xa96ewIUeP/lnbaqGAm 96NWyuy/o73aycuQH7PoVHOZ1zz3BbK51GOwOqzTBKpYNp/VwR9os1vQTqSnMOfdYeKN L96t12XUfnSw9AySaMvdtkz+UTCkMv1knMAc+nTAjKf6yHpYHDnrhr/osTA1kopDMEKC Ird9lMKj8QIoMOXoUZ9vrH3dMdhthxrdRS3Tm3sagRqYhko8Gv+1EHaLCQ4WzlHplm3M 9zfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778872842; x=1779477642; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b0rQrOpjJ+Fm4CzrD79jMnHA824Fta4drwfB/O0XH0s=; b=W/mxkONwPY8mo+KxI1dS44F/0LTIGSRbO/lClBpbqOpYXBKhHo3mCSeLKR1vPQOqvt yHq9VDfqMuMuAtQ+ouFi2+MO9g805PcRQscTtRTYyetmZz/0LorVu9jzU/KHCWrbLMUW +HRrs8fEb81jWiOOmMHkCf2L1LPzMmGxy4ou1kehqTIlsovmkWA2pdmKFPerflIFUtMr rK2agKuNDhkLAz5Qjw06HgkBFUVy5iLsogNieaA5P8rO3ZKP4tbBYqbKKbtnjXnxmZjj MukE0bFUaVdxTIZfyeGmfVoLHB86O0FJ+hqGB9oY2NfdoEOlKa6FsHdMkxHPAuvpeKH3 2+GQ== X-Forwarded-Encrypted: i=1; AFNElJ/OuhVqeJBKdJZzgmQ4DS9AjrfwSiHCloOJ/79PYgidE+IyL+CxHzcftTfqGZc8nTZ7w0IEYDIhyWKc7kg=@vger.kernel.org X-Gm-Message-State: AOJu0YwCuad57i94wPs0gIf5lOeo5EPpEVdBEhzgxHmGHAYe3Lb6hav9 TKIAx3uPcwxOdWGefbA9rcLZ0cgX3eBuWD+7UYhiHe75GI8XSFAWsIEJezyHz0VD5sPH/mk0Wme y4hokJA== X-Received: from pgbcr5.prod.google.com ([2002:a05:6a02:4105:b0:c80:2656:4e70]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:3299:b0:3a3:f936:2de6 with SMTP id adf61e73a8af0-3b22ebdbb23mr6517200637.26.1778872841995; Fri, 15 May 2026 12:20:41 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 15 May 2026 12:19:12 -0700 In-Reply-To: <20260515191942.1892718-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-hyperv@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260515191942.1892718-1-seanjc@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260515191942.1892718-12-seanjc@google.com> Subject: [PATCH v3 11/41] x86/kvm: Don't disable kvmclock on BSP in syscore_suspend() From: Sean Christopherson To: Kiryl Shutsemau , Paolo Bonzini , Sean Christopherson , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Long Li , Ajay Kaher , Alexey Makhalov , Jan Kiszka , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Juergen Gross , Daniel Lezcano , Thomas Gleixner , John Stultz Cc: Rick Edgecombe , Vitaly Kuznetsov , Broadcom internal kernel review list , Boris Ostrovsky , Stephen Boyd , x86@kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, Michael Kelley , Tom Lendacky , Nikunj A Dadhania , Thomas Gleixner , David Woodhouse Content-Type: text/plain; charset="UTF-8" Don't disable kvmclock on the BSP during syscore_suspend(), as the BSP's clock is NOT restored during syscore_resume(), but is instead restored earlier via the sched_clock restore callback. If suspend is aborted, e.g. due to a late wakeup, the BSP will run without its clock enabled, which "works" only because KVM-the-hypervisor is kind enough to not clobber the shared memory when the clock is disabled. But over time, the BSP's view of time will drift from APs. Plumb in an "action" to KVM-as-a-guest and kvmclock code in preparation for additional cleanups to kvmclock's suspend/resume logic. Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_para.h | 8 +++++++- arch/x86/kernel/kvm.c | 15 ++++++++------- arch/x86/kernel/kvmclock.c | 31 +++++++++++++++++++++++++------ 3 files changed, 40 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 4a47c16e2df8..2adba2aff539 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -118,8 +118,14 @@ static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1, } #ifdef CONFIG_KVM_GUEST +enum kvm_guest_cpu_action { + KVM_GUEST_BSP_SUSPEND, + KVM_GUEST_AP_OFFLINE, + KVM_GUEST_SHUTDOWN, +}; + void kvmclock_init(void); -void kvmclock_disable(void); +void kvmclock_cpu_action(enum kvm_guest_cpu_action action); bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 06534e16cfb5..0131bc1cb459 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -457,7 +457,7 @@ static void __init sev_map_percpu_data(void) } } -static void kvm_guest_cpu_offline(bool shutdown) +static void kvm_guest_cpu_offline(enum kvm_guest_cpu_action action) { kvm_disable_steal_time(); if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) @@ -465,9 +465,10 @@ static void kvm_guest_cpu_offline(bool shutdown) if (kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) wrmsrq(MSR_KVM_MIGRATION_CONTROL, 0); kvm_pv_disable_apf(); - if (!shutdown) + if (action != KVM_GUEST_SHUTDOWN) apf_task_wake_all(); - kvmclock_disable(); + + kvmclock_cpu_action(action); } static int kvm_cpu_online(unsigned int cpu) @@ -723,7 +724,7 @@ static int kvm_cpu_down_prepare(unsigned int cpu) unsigned long flags; local_irq_save(flags); - kvm_guest_cpu_offline(false); + kvm_guest_cpu_offline(KVM_GUEST_AP_OFFLINE); local_irq_restore(flags); return 0; } @@ -734,7 +735,7 @@ static int kvm_suspend(void *data) { u64 val = 0; - kvm_guest_cpu_offline(false); + kvm_guest_cpu_offline(KVM_GUEST_BSP_SUSPEND); #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL if (kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) @@ -765,7 +766,7 @@ static struct syscore kvm_syscore = { static void kvm_pv_guest_cpu_reboot(void *unused) { - kvm_guest_cpu_offline(true); + kvm_guest_cpu_offline(KVM_GUEST_SHUTDOWN); } static int kvm_pv_reboot_notify(struct notifier_block *nb, @@ -789,7 +790,7 @@ static struct notifier_block kvm_pv_reboot_nb = { #ifdef CONFIG_CRASH_DUMP static void kvm_crash_shutdown(struct pt_regs *regs) { - kvm_guest_cpu_offline(true); + kvm_guest_cpu_offline(KVM_GUEST_SHUTDOWN); native_machine_crash_shutdown(regs); } #endif diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index df95516a9d89..006e3a13500b 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -178,8 +178,22 @@ static void kvm_register_clock(char *txt) pr_debug("kvm-clock: cpu %d, msr %llx, %s", smp_processor_id(), pa, txt); } +static void kvmclock_disable(void) +{ + if (msr_kvm_system_time) + native_write_msr(msr_kvm_system_time, 0); +} + static void kvm_save_sched_clock_state(void) { + /* + * Stop host writes to kvmclock immediately prior to suspend/hibernate. + * If the system is hibernating, then kvmclock will likely reside at a + * different physical address when the system awakens, and host writes + * to the old address prior to reconfiguring kvmclock would clobber + * random memory. + */ + kvmclock_disable(); } static void kvm_restore_sched_clock_state(void) @@ -187,6 +201,17 @@ static void kvm_restore_sched_clock_state(void) kvm_register_clock("primary cpu clock, resume"); } +void kvmclock_cpu_action(enum kvm_guest_cpu_action action) +{ + /* + * Don't disable kvmclock on the BSP during suspend. If kvmclock is + * being used for sched_clock, then it needs to be kept alive until the + * last minute, and restored as quickly as possible after resume. + */ + if (action != KVM_GUEST_BSP_SUSPEND) + kvmclock_disable(); +} + #ifdef CONFIG_SMP static void kvm_setup_secondary_clock(void) { @@ -194,12 +219,6 @@ static void kvm_setup_secondary_clock(void) } #endif -void kvmclock_disable(void) -{ - if (msr_kvm_system_time) - native_write_msr(msr_kvm_system_time, 0); -} - static void __init kvmclock_init_mem(void) { unsigned long ncpus; -- 2.54.0.563.g4f69b47b94-goog