All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: "Fernando Luis Vázquez Cao" <fernando_b1@lab.ntt.co.jp>
Cc: Gleb Natapov <gleb@kernel.org>, Will Auld <will.auld@intel.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH] target-i386: clear guest TSC on reset
Date: Thu, 05 Dec 2013 10:28:18 +0100	[thread overview]
Message-ID: <52A04732.4040105@redhat.com> (raw)
In-Reply-To: <1386224104.3091.3.camel@nexus>

Il 05/12/2013 07:15, Fernando Luis Vázquez Cao ha scritto:
> VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
> guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
> commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
> in cyc2ns_offset").
> 
> To put it in a nutshell, if a Linux guest without the patch above applied
> has been up more than 208 days and attempts a warm reset chances are that
> the newly booted kernel will panic or hang.
> 
> (*) Intel Xeon E5 processors show the same broken behavior due to
>     the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
>     Processor E5 Family Specification Update - August 2013): "The
>     TSC (Time Stamp Counter MSR 10H) should be cleared on
>     reset. Due to this erratum the TSC is not affected by warm
>     reset."
> 
> Cc: stable@vger.kernel.org
> Cc: Will Auld <will.auld@intel.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>

I agree that the bug is in QEMU.  One small nit in your patch is that
you should reset env->tsc_adjust and env->tsc in x86_cpu_reset.  This
would already be pretty good.

However, a bigger problem is that env->tsc is a useless duplicate of
"cpu_get_ticks() + env->tsc_adjust".  It would be nice to drop env->tsc
completely except for migration backwards compatibility.  Thus you can:

- fill in env->tsc as mentioned above from target-i386/machine.c's
cpu_pre_save function.  This guarantees backwards compatibility.

- add a function cpu_set_ticks(int64_t ticks) to cpus.c.  The function
does nothing if use_icount is true, otherwise it needs to have (roughly)
the opposite logic compared to cpu_get_ticks.  You then call this
function from x86_cpu_reset instead of setting env->tsc.  You can
similarly call this function from kvm_get_msrs.

- add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
kvm-stub.c.  For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
int64_t ticks) in target-*/kvm.c.  The kvm_arch_set_tsc() function has a
dummy implementation for all architectures except x86.  For x86 it calls
KVM_SET_MSRS passing "ticks + env->tsc_offset".

- call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()

Can you do this?

Thanks,

Paolo

> ---
> 
> --- qemu-orig/target-i386/kvm.c	2013-11-28 07:02:45.000000000 +0900
> +++ qemu/target-i386/kvm.c	2013-12-05 14:47:03.085738175 +0900
> @@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
>          kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>      }
>      if (has_msr_tsc_adjust) {
> +        if (level == KVM_PUT_RESET_STATE)
> +            env->tsc_adjust = 0;
>          kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
>      }
>      if (has_msr_misc_enable) {
> @@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
>          kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
>      }
>  #endif
> -    if (level == KVM_PUT_FULL_STATE) {
> +    /*
> +     * The following MSRs have side effects on the guest or are too heavy
> +     * for normal writeback. Limit them to reset or full state updates.
> +     */
> +    if (level >= KVM_PUT_RESET_STATE) {
> +        if (level == KVM_PUT_RESET_STATE)
> +            env->tsc = 0;
>          /*
>           * KVM is yet unable to synchronize TSC values of multiple VCPUs on
>           * writeback. Until this is fixed, we only write the offset to SMP
>           * guests after migration, desynchronizing the VCPUs, but avoiding
>           * huge jump-backs that would occur without any writeback at all.
>           */
> -        if (smp_cpus == 1 || env->tsc != 0) {
> +        if (smp_cpus == 1 || env->tsc != 0 || level == KVM_PUT_RESET_STATE) {
>              kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>          }
> -    }
> -    /*
> -     * The following MSRs have side effects on the guest or are too heavy
> -     * for normal writeback. Limit them to reset or full state updates.
> -     */
> -    if (level >= KVM_PUT_RESET_STATE) {
>          kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
>                            env->system_time_msr);
>          kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


WARNING: multiple messages have this Message-ID (diff)
From: Paolo Bonzini <pbonzini@redhat.com>
To: "Fernando Luis Vázquez Cao" <fernando_b1@lab.ntt.co.jp>
Cc: Gleb Natapov <gleb@kernel.org>, Will Auld <will.auld@intel.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH] target-i386: clear guest TSC on reset
Date: Thu, 05 Dec 2013 10:28:18 +0100	[thread overview]
Message-ID: <52A04732.4040105@redhat.com> (raw)
In-Reply-To: <1386224104.3091.3.camel@nexus>

Il 05/12/2013 07:15, Fernando Luis Vázquez Cao ha scritto:
> VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
> guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
> commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
> in cyc2ns_offset").
> 
> To put it in a nutshell, if a Linux guest without the patch above applied
> has been up more than 208 days and attempts a warm reset chances are that
> the newly booted kernel will panic or hang.
> 
> (*) Intel Xeon E5 processors show the same broken behavior due to
>     the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
>     Processor E5 Family Specification Update - August 2013): "The
>     TSC (Time Stamp Counter MSR 10H) should be cleared on
>     reset. Due to this erratum the TSC is not affected by warm
>     reset."
> 
> Cc: stable@vger.kernel.org
> Cc: Will Auld <will.auld@intel.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>

I agree that the bug is in QEMU.  One small nit in your patch is that
you should reset env->tsc_adjust and env->tsc in x86_cpu_reset.  This
would already be pretty good.

However, a bigger problem is that env->tsc is a useless duplicate of
"cpu_get_ticks() + env->tsc_adjust".  It would be nice to drop env->tsc
completely except for migration backwards compatibility.  Thus you can:

- fill in env->tsc as mentioned above from target-i386/machine.c's
cpu_pre_save function.  This guarantees backwards compatibility.

- add a function cpu_set_ticks(int64_t ticks) to cpus.c.  The function
does nothing if use_icount is true, otherwise it needs to have (roughly)
the opposite logic compared to cpu_get_ticks.  You then call this
function from x86_cpu_reset instead of setting env->tsc.  You can
similarly call this function from kvm_get_msrs.

- add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
kvm-stub.c.  For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
int64_t ticks) in target-*/kvm.c.  The kvm_arch_set_tsc() function has a
dummy implementation for all architectures except x86.  For x86 it calls
KVM_SET_MSRS passing "ticks + env->tsc_offset".

- call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()

Can you do this?

Thanks,

Paolo

> ---
> 
> --- qemu-orig/target-i386/kvm.c	2013-11-28 07:02:45.000000000 +0900
> +++ qemu/target-i386/kvm.c	2013-12-05 14:47:03.085738175 +0900
> @@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
>          kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>      }
>      if (has_msr_tsc_adjust) {
> +        if (level == KVM_PUT_RESET_STATE)
> +            env->tsc_adjust = 0;
>          kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
>      }
>      if (has_msr_misc_enable) {
> @@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
>          kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
>      }
>  #endif
> -    if (level == KVM_PUT_FULL_STATE) {
> +    /*
> +     * The following MSRs have side effects on the guest or are too heavy
> +     * for normal writeback. Limit them to reset or full state updates.
> +     */
> +    if (level >= KVM_PUT_RESET_STATE) {
> +        if (level == KVM_PUT_RESET_STATE)
> +            env->tsc = 0;
>          /*
>           * KVM is yet unable to synchronize TSC values of multiple VCPUs on
>           * writeback. Until this is fixed, we only write the offset to SMP
>           * guests after migration, desynchronizing the VCPUs, but avoiding
>           * huge jump-backs that would occur without any writeback at all.
>           */
> -        if (smp_cpus == 1 || env->tsc != 0) {
> +        if (smp_cpus == 1 || env->tsc != 0 || level == KVM_PUT_RESET_STATE) {
>              kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>          }
> -    }
> -    /*
> -     * The following MSRs have side effects on the guest or are too heavy
> -     * for normal writeback. Limit them to reset or full state updates.
> -     */
> -    if (level >= KVM_PUT_RESET_STATE) {
>          kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
>                            env->system_time_msr);
>          kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2013-12-05  9:28 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03  7:08 [PATCH] kvm: clear guest TSC on reset Fernando Luis Vázquez Cao
2013-12-03  8:04 ` Fernando Luis Vázquez Cao
2013-12-05  6:08   ` Fernando Luis Vázquez Cao
2013-12-05  6:08     ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-05  6:15     ` [PATCH] target-i386: " Fernando Luis Vázquez Cao
2013-12-05  6:15       ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-05  9:28       ` Paolo Bonzini [this message]
2013-12-05  9:28         ` Paolo Bonzini
2013-12-05 13:15         ` Fernando Luis Vazquez Cao
2013-12-05 13:15           ` [Qemu-devel] " Fernando Luis Vazquez Cao
2013-12-05 13:53           ` Paolo Bonzini
2013-12-05 13:53             ` [Qemu-devel] " Paolo Bonzini
2013-12-05 15:42             ` Fernando Luis Vazquez Cao
2013-12-05 15:42               ` [Qemu-devel] " Fernando Luis Vazquez Cao
2013-12-05 16:02               ` Paolo Bonzini
2013-12-05 16:02                 ` [Qemu-devel] " Paolo Bonzini
2013-12-05 16:40                 ` Marcelo Tosatti
2013-12-05 16:40                   ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 17:06                   ` Marcelo Tosatti
2013-12-05 17:06                     ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 16:17               ` Marcelo Tosatti
2013-12-05 16:17                 ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 16:38                 ` Paolo Bonzini
2013-12-05 16:38                   ` [Qemu-devel] " Paolo Bonzini
2013-12-06  8:24                   ` Fernando Luis Vázquez Cao
2013-12-06  8:24                     ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  8:33                     ` [PATCH 1//2 v3] " Fernando Luis Vázquez Cao
2013-12-06  8:33                       ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  8:38                       ` [PATCH 2/2] target-i386: do not special case TSC writeback Fernando Luis Vázquez Cao
2013-12-06  8:38                         ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  8:36                     ` [PATCH] target-i386: clear guest TSC on reset Paolo Bonzini
2013-12-06  8:36                       ` [Qemu-devel] " Paolo Bonzini
2013-12-06  8:56                       ` Fernando Luis Vázquez Cao
2013-12-06  8:56                         ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  9:08                         ` Paolo Bonzini
2013-12-06  9:08                           ` [Qemu-devel] " Paolo Bonzini
2013-12-06  9:20                           ` Fernando Luis Vazquez Cao
2013-12-06  9:20                             ` [Qemu-devel] " Fernando Luis Vazquez Cao
2013-12-06 14:22                     ` Marcelo Tosatti
2013-12-06 14:22                       ` [Qemu-devel] " Marcelo Tosatti
2013-12-09  8:50                       ` Fernando Luis Vázquez Cao
2013-12-09  8:50                         ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-12  2:52                         ` Fernando Luis Vázquez Cao
2013-12-12  2:52                           ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-12 12:18                           ` Paolo Bonzini
2013-12-12 12:18                             ` [Qemu-devel] " Paolo Bonzini
2013-12-05 16:12         ` Marcelo Tosatti
2013-12-05 16:12           ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 16:32           ` Paolo Bonzini
2013-12-05 16:32             ` [Qemu-devel] " Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A04732.4040105@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=fernando_b1@lab.ntt.co.jp \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=will.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.