All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Lenny Szubowicz <lszubowi@redhat.com>,
	pbonzini@redhat.com, seanjc@google.com, wanpengli@tencent.com,
	jmattson@google.com, joro@8bytes.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/kvmclock: Stop kvmclocks for hibernate restore
Date: Wed, 17 Mar 2021 14:30:45 +0100	[thread overview]
Message-ID: <87sg4t7vqy.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20210311132847.224406-1-lszubowi@redhat.com>

Lenny Szubowicz <lszubowi@redhat.com> writes:

> Turn off host updates to the registered kvmclock memory
> locations when transitioning to a hibernated kernel in
> resume_target_kernel().
>
> This is accomplished for secondary vcpus by disabling host
> clock updates for that vcpu when it is put offline. For the
> primary vcpu, it's accomplished by using the existing call back
> from save_processor_state() to kvm_save_sched_clock_state().
>
> The registered kvmclock memory locations may differ between
> the currently running kernel and the hibernated kernel, which
> is being restored and resumed. Kernel memory corruption is thus
> possible if the host clock updates are allowed to run while the
> hibernated kernel is relocated to its original physical memory
> locations.
>
> This is similar to the problem solved for kexec by
> commit 1e977aa12dd4 ("x86: KVM guest: disable clock before rebooting.")
>
> Commit 95a3d4454bb1 ("x86/kvmclock: Switch kvmclock data to a
> PER_CPU variable") innocently increased the exposure for this
> problem by dynamically allocating the physical pages that are
> used for host clock updates when the vcpu count exceeds 64.
> This increases the likelihood that the registered kvmclock
> locations will differ for vcpus above 64.
>
> Reported-by: Xiaoyi Chen <cxiaoyi@amazon.com>
> Tested-by: Mohamed Aboubakr <mabouba@amazon.com>
> Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
> ---
>  arch/x86/kernel/kvmclock.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index aa593743acf6..291ffca41afb 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -187,8 +187,18 @@ static void kvm_register_clock(char *txt)
>  	pr_info("kvm-clock: cpu %d, msr %llx, %s", smp_processor_id(), pa, txt);
>  }
>  
> +/*
> + * Turn off host clock updates to the registered memory location when the
> + * cpu clock context is saved via save_processor_state(). Enables correct
> + * handling of the primary cpu clock when transitioning to a hibernated
> + * kernel in resume_target_kernel(), where the old and new registered
> + * memory locations may differ.
> + */
>  static void kvm_save_sched_clock_state(void)
>  {
> +	native_write_msr(msr_kvm_system_time, 0, 0);
> +	kvm_disable_steal_time();
> +	pr_info("kvm-clock: cpu %d, clock stopped", smp_processor_id());
>  }

Nitpick: should we rename kvm_save_sched_clock_state() to something more
generic, like kvm_disable_host_clock_updates() to indicate, that what it
does is not only sched clock related?

>  
>  static void kvm_restore_sched_clock_state(void)
> @@ -311,9 +321,23 @@ static int kvmclock_setup_percpu(unsigned int cpu)
>  	return p ? 0 : -ENOMEM;
>  }
>  
> +/*
> + * Turn off host clock updates to the registered memory location when a
> + * cpu is placed offline. Enables correct handling of secondary cpu clocks
> + * when transitioning to a hibernated kernel in resume_target_kernel(),
> + * where the old and new registered memory locations may differ.
> + */
> +static int kvmclock_cpu_offline(unsigned int cpu)
> +{
> +	native_write_msr(msr_kvm_system_time, 0, 0);
> +	pr_info("kvm-clock: cpu %d, clock stopped", cpu);

I'd say this pr_info() is superfluous: on a system with hundereds of
vCPUs users will get flooded with 'clock stopped' messages which don't
actually mean much: in case native_write_msr() fails the error gets
reported in dmesg anyway. I'd suggest we drop this and pr_info() in
kvm_save_sched_clock_state().

> +	return 0;

Why don't we disable steal time accounting here? MSR_KVM_STEAL_TIME is
also per-cpu. Can we merge kvm_save_sched_clock_state() with
kvmclock_cpu_offline() maybe?

> +}
> +
>  void __init kvmclock_init(void)
>  {
>  	u8 flags;
> +	int cpuhp_prepare;
>  
>  	if (!kvm_para_available() || !kvmclock)
>  		return;
> @@ -325,8 +349,14 @@ void __init kvmclock_init(void)
>  		return;
>  	}
>  
> -	if (cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "kvmclock:setup_percpu",
> -			      kvmclock_setup_percpu, NULL) < 0) {
> +	cpuhp_prepare = cpuhp_setup_state(CPUHP_BP_PREPARE_DYN,
> +					  "kvmclock:setup_percpu",
> +					  kvmclock_setup_percpu, NULL);
> +	if (cpuhp_prepare < 0)
> +		return;
> +	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "kvmclock:cpu_offline",
> +			      NULL, kvmclock_cpu_offline) < 0) {
> +		cpuhp_remove_state(cpuhp_prepare);

In case we fail to set up kvmclock_cpu_offline() callback we get broken
hybernation but without kvmclock_setup_percpu() call we get a broken
guest (as kvmclock() may be the only reliable clocksource) so I'm not
exactly sure we're active in a best way with cpuhp_remove_state()
here. I don't feel strong though, I think it can stay but in that case
I'd add a pr_warn() at least.

>  		return;
>  	}

-- 
Vitaly


  reply	other threads:[~2021-03-17 13:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11 13:28 [PATCH] x86/kvmclock: Stop kvmclocks for hibernate restore Lenny Szubowicz
2021-03-17 13:30 ` Vitaly Kuznetsov [this message]
2021-03-26  2:26   ` Lenny Szubowicz
2021-03-26 10:57     ` Vitaly Kuznetsov
2021-03-26 12:37       ` Vitaly Kuznetsov
2021-03-26 13:01         ` Vitaly Kuznetsov
2021-03-26 17:14           ` Lenny Szubowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg4t7vqy.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lszubowi@redhat.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.