public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Lenny Szubowicz <lszubowi@redhat.com>,
	pbonzini@redhat.com, seanjc@google.com, wanpengli@tencent.com,
	jmattson@google.com, joro@8bytes.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/kvmclock: Stop kvmclocks for hibernate restore
Date: Wed, 17 Mar 2021 14:30:45 +0100	[thread overview]
Message-ID: <87sg4t7vqy.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20210311132847.224406-1-lszubowi@redhat.com>

Lenny Szubowicz <lszubowi@redhat.com> writes:

> Turn off host updates to the registered kvmclock memory
> locations when transitioning to a hibernated kernel in
> resume_target_kernel().
>
> This is accomplished for secondary vcpus by disabling host
> clock updates for that vcpu when it is put offline. For the
> primary vcpu, it's accomplished by using the existing call back
> from save_processor_state() to kvm_save_sched_clock_state().
>
> The registered kvmclock memory locations may differ between
> the currently running kernel and the hibernated kernel, which
> is being restored and resumed. Kernel memory corruption is thus
> possible if the host clock updates are allowed to run while the
> hibernated kernel is relocated to its original physical memory
> locations.
>
> This is similar to the problem solved for kexec by
> commit 1e977aa12dd4 ("x86: KVM guest: disable clock before rebooting.")
>
> Commit 95a3d4454bb1 ("x86/kvmclock: Switch kvmclock data to a
> PER_CPU variable") innocently increased the exposure for this
> problem by dynamically allocating the physical pages that are
> used for host clock updates when the vcpu count exceeds 64.
> This increases the likelihood that the registered kvmclock
> locations will differ for vcpus above 64.
>
> Reported-by: Xiaoyi Chen <cxiaoyi@amazon.com>
> Tested-by: Mohamed Aboubakr <mabouba@amazon.com>
> Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
> ---
>  arch/x86/kernel/kvmclock.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index aa593743acf6..291ffca41afb 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -187,8 +187,18 @@ static void kvm_register_clock(char *txt)
>  	pr_info("kvm-clock: cpu %d, msr %llx, %s", smp_processor_id(), pa, txt);
>  }
>  
> +/*
> + * Turn off host clock updates to the registered memory location when the
> + * cpu clock context is saved via save_processor_state(). Enables correct
> + * handling of the primary cpu clock when transitioning to a hibernated
> + * kernel in resume_target_kernel(), where the old and new registered
> + * memory locations may differ.
> + */
>  static void kvm_save_sched_clock_state(void)
>  {
> +	native_write_msr(msr_kvm_system_time, 0, 0);
> +	kvm_disable_steal_time();
> +	pr_info("kvm-clock: cpu %d, clock stopped", smp_processor_id());
>  }

Nitpick: should we rename kvm_save_sched_clock_state() to something more
generic, like kvm_disable_host_clock_updates() to indicate, that what it
does is not only sched clock related?

>  
>  static void kvm_restore_sched_clock_state(void)
> @@ -311,9 +321,23 @@ static int kvmclock_setup_percpu(unsigned int cpu)
>  	return p ? 0 : -ENOMEM;
>  }
>  
> +/*
> + * Turn off host clock updates to the registered memory location when a
> + * cpu is placed offline. Enables correct handling of secondary cpu clocks
> + * when transitioning to a hibernated kernel in resume_target_kernel(),
> + * where the old and new registered memory locations may differ.
> + */
> +static int kvmclock_cpu_offline(unsigned int cpu)
> +{
> +	native_write_msr(msr_kvm_system_time, 0, 0);
> +	pr_info("kvm-clock: cpu %d, clock stopped", cpu);

I'd say this pr_info() is superfluous: on a system with hundereds of
vCPUs users will get flooded with 'clock stopped' messages which don't
actually mean much: in case native_write_msr() fails the error gets
reported in dmesg anyway. I'd suggest we drop this and pr_info() in
kvm_save_sched_clock_state().

> +	return 0;

Why don't we disable steal time accounting here? MSR_KVM_STEAL_TIME is
also per-cpu. Can we merge kvm_save_sched_clock_state() with
kvmclock_cpu_offline() maybe?

> +}
> +
>  void __init kvmclock_init(void)
>  {
>  	u8 flags;
> +	int cpuhp_prepare;
>  
>  	if (!kvm_para_available() || !kvmclock)
>  		return;
> @@ -325,8 +349,14 @@ void __init kvmclock_init(void)
>  		return;
>  	}
>  
> -	if (cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "kvmclock:setup_percpu",
> -			      kvmclock_setup_percpu, NULL) < 0) {
> +	cpuhp_prepare = cpuhp_setup_state(CPUHP_BP_PREPARE_DYN,
> +					  "kvmclock:setup_percpu",
> +					  kvmclock_setup_percpu, NULL);
> +	if (cpuhp_prepare < 0)
> +		return;
> +	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "kvmclock:cpu_offline",
> +			      NULL, kvmclock_cpu_offline) < 0) {
> +		cpuhp_remove_state(cpuhp_prepare);

In case we fail to set up kvmclock_cpu_offline() callback we get broken
hybernation but without kvmclock_setup_percpu() call we get a broken
guest (as kvmclock() may be the only reliable clocksource) so I'm not
exactly sure we're active in a best way with cpuhp_remove_state()
here. I don't feel strong though, I think it can stay but in that case
I'd add a pr_warn() at least.

>  		return;
>  	}

-- 
Vitaly


  reply	other threads:[~2021-03-17 13:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11 13:28 [PATCH] x86/kvmclock: Stop kvmclocks for hibernate restore Lenny Szubowicz
2021-03-17 13:30 ` Vitaly Kuznetsov [this message]
2021-03-26  2:26   ` Lenny Szubowicz
2021-03-26 10:57     ` Vitaly Kuznetsov
2021-03-26 12:37       ` Vitaly Kuznetsov
2021-03-26 13:01         ` Vitaly Kuznetsov
2021-03-26 17:14           ` Lenny Szubowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg4t7vqy.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lszubowi@redhat.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox