All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Fernando Luis Vázquez Cao" <fernando_b1@lab.ntt.co.jp>,
	"Gleb Natapov" <gleb@kernel.org>,
	"Will Auld" <will.auld@intel.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [PATCH] target-i386: clear guest TSC on reset
Date: Thu, 5 Dec 2013 14:12:34 -0200	[thread overview]
Message-ID: <20131205161234.GA17277@amt.cnet> (raw)
In-Reply-To: <52A04732.4040105@redhat.com>

On Thu, Dec 05, 2013 at 10:28:18AM +0100, Paolo Bonzini wrote:
> Il 05/12/2013 07:15, Fernando Luis Vázquez Cao ha scritto:
> > VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
> > guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
> > commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
> > in cyc2ns_offset").
> > 
> > To put it in a nutshell, if a Linux guest without the patch above applied
> > has been up more than 208 days and attempts a warm reset chances are that
> > the newly booted kernel will panic or hang.
> > 
> > (*) Intel Xeon E5 processors show the same broken behavior due to
> >     the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
> >     Processor E5 Family Specification Update - August 2013): "The
> >     TSC (Time Stamp Counter MSR 10H) should be cleared on
> >     reset. Due to this erratum the TSC is not affected by warm
> >     reset."
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Will Auld <will.auld@intel.com>
> > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> 
> I agree that the bug is in QEMU.  One small nit in your patch is that
> you should reset env->tsc_adjust and env->tsc in x86_cpu_reset.  This
> would already be pretty good.
> 
> However, a bigger problem is that env->tsc is a useless duplicate of
> "cpu_get_ticks() + env->tsc_adjust".  It would be nice to drop env->tsc
> completely except for migration backwards compatibility.  Thus you can:
> 
> - fill in env->tsc as mentioned above from target-i386/machine.c's
> cpu_pre_save function.  This guarantees backwards compatibility.
> 
> - add a function cpu_set_ticks(int64_t ticks) to cpus.c.  The function
> does nothing if use_icount is true, otherwise it needs to have (roughly)
> the opposite logic compared to cpu_get_ticks.  You then call this
> function from x86_cpu_reset instead of setting env->tsc.  You can
> similarly call this function from kvm_get_msrs.
> 
> - add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
> kvm-stub.c.  For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
> int64_t ticks) in target-*/kvm.c.  The kvm_arch_set_tsc() function has a
> dummy implementation for all architectures except x86.  For x86 it calls
> KVM_SET_MSRS passing "ticks + env->tsc_offset".
> 
> - call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()

env->tsc is just a placeholder for the vcpu TSC.

A vcpus TSC from QEMU's point of view is a register initialized to zero,
which requires read/write from KVM, and migration.

Not sure what is the point of your idea.

> 
> Can you do this?
> 
> Thanks,
> 
> Paolo
> 
> > ---
> > 
> > --- qemu-orig/target-i386/kvm.c	2013-11-28 07:02:45.000000000 +0900
> > +++ qemu/target-i386/kvm.c	2013-12-05 14:47:03.085738175 +0900
> > @@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
> >          kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
> >      }
> >      if (has_msr_tsc_adjust) {
> > +        if (level == KVM_PUT_RESET_STATE)
> > +            env->tsc_adjust = 0;
> >          kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
> >      }
> >      if (has_msr_misc_enable) {
> > @@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
> >          kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
> >      }
> >  #endif
> > -    if (level == KVM_PUT_FULL_STATE) {
> > +    /*
> > +     * The following MSRs have side effects on the guest or are too heavy
> > +     * for normal writeback. Limit them to reset or full state updates.
> > +     */
> > +    if (level >= KVM_PUT_RESET_STATE) {
> > +        if (level == KVM_PUT_RESET_STATE)
> > +            env->tsc = 0;
> >          /*
> >           * KVM is yet unable to synchronize TSC values of multiple VCPUs on
> >           * writeback. Until this is fixed, we only write the offset to SMP
> >           * guests after migration, desynchronizing the VCPUs, but avoiding
> >           * huge jump-backs that would occur without any writeback at all.
> >           */
> > -        if (smp_cpus == 1 || env->tsc != 0) {
> > +        if (smp_cpus == 1 || env->tsc != 0 || level == KVM_PUT_RESET_STATE) {
> >              kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >          }
> > -    }
> > -    /*
> > -     * The following MSRs have side effects on the guest or are too heavy
> > -     * for normal writeback. Limit them to reset or full state updates.
> > -     */
> > -    if (level >= KVM_PUT_RESET_STATE) {
> >          kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
> >                            env->system_time_msr);
> >          kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Gleb Natapov" <gleb@kernel.org>,
	"Will Auld" <will.auld@intel.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	"Fernando Luis Vázquez Cao" <fernando_b1@lab.ntt.co.jp>
Subject: Re: [Qemu-devel] [PATCH] target-i386: clear guest TSC on reset
Date: Thu, 5 Dec 2013 14:12:34 -0200	[thread overview]
Message-ID: <20131205161234.GA17277@amt.cnet> (raw)
In-Reply-To: <52A04732.4040105@redhat.com>

On Thu, Dec 05, 2013 at 10:28:18AM +0100, Paolo Bonzini wrote:
> Il 05/12/2013 07:15, Fernando Luis Vázquez Cao ha scritto:
> > VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
> > guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
> > commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
> > in cyc2ns_offset").
> > 
> > To put it in a nutshell, if a Linux guest without the patch above applied
> > has been up more than 208 days and attempts a warm reset chances are that
> > the newly booted kernel will panic or hang.
> > 
> > (*) Intel Xeon E5 processors show the same broken behavior due to
> >     the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
> >     Processor E5 Family Specification Update - August 2013): "The
> >     TSC (Time Stamp Counter MSR 10H) should be cleared on
> >     reset. Due to this erratum the TSC is not affected by warm
> >     reset."
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Will Auld <will.auld@intel.com>
> > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> 
> I agree that the bug is in QEMU.  One small nit in your patch is that
> you should reset env->tsc_adjust and env->tsc in x86_cpu_reset.  This
> would already be pretty good.
> 
> However, a bigger problem is that env->tsc is a useless duplicate of
> "cpu_get_ticks() + env->tsc_adjust".  It would be nice to drop env->tsc
> completely except for migration backwards compatibility.  Thus you can:
> 
> - fill in env->tsc as mentioned above from target-i386/machine.c's
> cpu_pre_save function.  This guarantees backwards compatibility.
> 
> - add a function cpu_set_ticks(int64_t ticks) to cpus.c.  The function
> does nothing if use_icount is true, otherwise it needs to have (roughly)
> the opposite logic compared to cpu_get_ticks.  You then call this
> function from x86_cpu_reset instead of setting env->tsc.  You can
> similarly call this function from kvm_get_msrs.
> 
> - add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
> kvm-stub.c.  For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
> int64_t ticks) in target-*/kvm.c.  The kvm_arch_set_tsc() function has a
> dummy implementation for all architectures except x86.  For x86 it calls
> KVM_SET_MSRS passing "ticks + env->tsc_offset".
> 
> - call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()

env->tsc is just a placeholder for the vcpu TSC.

A vcpus TSC from QEMU's point of view is a register initialized to zero,
which requires read/write from KVM, and migration.

Not sure what is the point of your idea.

> 
> Can you do this?
> 
> Thanks,
> 
> Paolo
> 
> > ---
> > 
> > --- qemu-orig/target-i386/kvm.c	2013-11-28 07:02:45.000000000 +0900
> > +++ qemu/target-i386/kvm.c	2013-12-05 14:47:03.085738175 +0900
> > @@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
> >          kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
> >      }
> >      if (has_msr_tsc_adjust) {
> > +        if (level == KVM_PUT_RESET_STATE)
> > +            env->tsc_adjust = 0;
> >          kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
> >      }
> >      if (has_msr_misc_enable) {
> > @@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
> >          kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
> >      }
> >  #endif
> > -    if (level == KVM_PUT_FULL_STATE) {
> > +    /*
> > +     * The following MSRs have side effects on the guest or are too heavy
> > +     * for normal writeback. Limit them to reset or full state updates.
> > +     */
> > +    if (level >= KVM_PUT_RESET_STATE) {
> > +        if (level == KVM_PUT_RESET_STATE)
> > +            env->tsc = 0;
> >          /*
> >           * KVM is yet unable to synchronize TSC values of multiple VCPUs on
> >           * writeback. Until this is fixed, we only write the offset to SMP
> >           * guests after migration, desynchronizing the VCPUs, but avoiding
> >           * huge jump-backs that would occur without any writeback at all.
> >           */
> > -        if (smp_cpus == 1 || env->tsc != 0) {
> > +        if (smp_cpus == 1 || env->tsc != 0 || level == KVM_PUT_RESET_STATE) {
> >              kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >          }
> > -    }
> > -    /*
> > -     * The following MSRs have side effects on the guest or are too heavy
> > -     * for normal writeback. Limit them to reset or full state updates.
> > -     */
> > -    if (level >= KVM_PUT_RESET_STATE) {
> >          kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
> >                            env->system_time_msr);
> >          kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

  parent reply	other threads:[~2013-12-05 16:13 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03  7:08 [PATCH] kvm: clear guest TSC on reset Fernando Luis Vázquez Cao
2013-12-03  8:04 ` Fernando Luis Vázquez Cao
2013-12-05  6:08   ` Fernando Luis Vázquez Cao
2013-12-05  6:08     ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-05  6:15     ` [PATCH] target-i386: " Fernando Luis Vázquez Cao
2013-12-05  6:15       ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-05  9:28       ` Paolo Bonzini
2013-12-05  9:28         ` [Qemu-devel] " Paolo Bonzini
2013-12-05 13:15         ` Fernando Luis Vazquez Cao
2013-12-05 13:15           ` [Qemu-devel] " Fernando Luis Vazquez Cao
2013-12-05 13:53           ` Paolo Bonzini
2013-12-05 13:53             ` [Qemu-devel] " Paolo Bonzini
2013-12-05 15:42             ` Fernando Luis Vazquez Cao
2013-12-05 15:42               ` [Qemu-devel] " Fernando Luis Vazquez Cao
2013-12-05 16:02               ` Paolo Bonzini
2013-12-05 16:02                 ` [Qemu-devel] " Paolo Bonzini
2013-12-05 16:40                 ` Marcelo Tosatti
2013-12-05 16:40                   ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 17:06                   ` Marcelo Tosatti
2013-12-05 17:06                     ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 16:17               ` Marcelo Tosatti
2013-12-05 16:17                 ` [Qemu-devel] " Marcelo Tosatti
2013-12-05 16:38                 ` Paolo Bonzini
2013-12-05 16:38                   ` [Qemu-devel] " Paolo Bonzini
2013-12-06  8:24                   ` Fernando Luis Vázquez Cao
2013-12-06  8:24                     ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  8:33                     ` [PATCH 1//2 v3] " Fernando Luis Vázquez Cao
2013-12-06  8:33                       ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  8:38                       ` [PATCH 2/2] target-i386: do not special case TSC writeback Fernando Luis Vázquez Cao
2013-12-06  8:38                         ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  8:36                     ` [PATCH] target-i386: clear guest TSC on reset Paolo Bonzini
2013-12-06  8:36                       ` [Qemu-devel] " Paolo Bonzini
2013-12-06  8:56                       ` Fernando Luis Vázquez Cao
2013-12-06  8:56                         ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-06  9:08                         ` Paolo Bonzini
2013-12-06  9:08                           ` [Qemu-devel] " Paolo Bonzini
2013-12-06  9:20                           ` Fernando Luis Vazquez Cao
2013-12-06  9:20                             ` [Qemu-devel] " Fernando Luis Vazquez Cao
2013-12-06 14:22                     ` Marcelo Tosatti
2013-12-06 14:22                       ` [Qemu-devel] " Marcelo Tosatti
2013-12-09  8:50                       ` Fernando Luis Vázquez Cao
2013-12-09  8:50                         ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-12  2:52                         ` Fernando Luis Vázquez Cao
2013-12-12  2:52                           ` [Qemu-devel] " Fernando Luis Vázquez Cao
2013-12-12 12:18                           ` Paolo Bonzini
2013-12-12 12:18                             ` [Qemu-devel] " Paolo Bonzini
2013-12-05 16:12         ` Marcelo Tosatti [this message]
2013-12-05 16:12           ` Marcelo Tosatti
2013-12-05 16:32           ` Paolo Bonzini
2013-12-05 16:32             ` [Qemu-devel] " Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131205161234.GA17277@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fernando_b1@lab.ntt.co.jp \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=will.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.