From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42691) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VobY3-0001c1-IU for qemu-devel@nongnu.org; Thu, 05 Dec 2013 11:13:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VobXy-0005sQ-J7 for qemu-devel@nongnu.org; Thu, 05 Dec 2013 11:13:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:64517) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VobXy-0005qb-Ao for qemu-devel@nongnu.org; Thu, 05 Dec 2013 11:13:02 -0500 Date: Thu, 5 Dec 2013 14:12:34 -0200 From: Marcelo Tosatti Message-ID: <20131205161234.GA17277@amt.cnet> References: <1386054500.25757.10.camel@nexus> <529D90A6.2080801@lab.ntt.co.jp> <52A0186A.2050207@lab.ntt.co.jp> <1386224104.3091.3.camel@nexus> <52A04732.4040105@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <52A04732.4040105@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] target-i386: clear guest TSC on reset List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Gleb Natapov , Will Auld , qemu-devel@nongnu.org, kvm@vger.kernel.org, Fernando Luis =?iso-8859-1?Q?V=E1zquez?= Cao On Thu, Dec 05, 2013 at 10:28:18AM +0100, Paolo Bonzini wrote: > Il 05/12/2013 07:15, Fernando Luis V=E1zquez Cao ha scritto: > > VCPU TSC is not cleared by a warm reset (*), which leaves many Linux > > guests vulnerable to the overflow in cyc2ns_offset fixed by upstream > > commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix over= flow > > in cyc2ns_offset"). > >=20 > > To put it in a nutshell, if a Linux guest without the patch above app= lied > > has been up more than 208 days and attempts a warm reset chances are = that > > the newly booted kernel will panic or hang. > >=20 > > (*) Intel Xeon E5 processors show the same broken behavior due to > > the errata "TSC is Not Affected by Warm Reset" (Intel=AE Xeon=AE > > Processor E5 Family Specification Update - August 2013): "The > > TSC (Time Stamp Counter MSR 10H) should be cleared on > > reset. Due to this erratum the TSC is not affected by warm > > reset." > >=20 > > Cc: stable@vger.kernel.org > > Cc: Will Auld > > Cc: Marcelo Tosatti > > Signed-off-by: Fernando Luis Vazquez Cao >=20 > I agree that the bug is in QEMU. One small nit in your patch is that > you should reset env->tsc_adjust and env->tsc in x86_cpu_reset. This > would already be pretty good. >=20 > However, a bigger problem is that env->tsc is a useless duplicate of > "cpu_get_ticks() + env->tsc_adjust". It would be nice to drop env->tsc > completely except for migration backwards compatibility. Thus you can: >=20 > - fill in env->tsc as mentioned above from target-i386/machine.c's > cpu_pre_save function. This guarantees backwards compatibility. >=20 > - add a function cpu_set_ticks(int64_t ticks) to cpus.c. The function > does nothing if use_icount is true, otherwise it needs to have (roughly= ) > the opposite logic compared to cpu_get_ticks. You then call this > function from x86_cpu_reset instead of setting env->tsc. You can > similarly call this function from kvm_get_msrs. >=20 > - add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and > kvm-stub.c. For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu, > int64_t ticks) in target-*/kvm.c. The kvm_arch_set_tsc() function has = a > dummy implementation for all architectures except x86. For x86 it call= s > KVM_SET_MSRS passing "ticks + env->tsc_offset". >=20 > - call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks() env->tsc is just a placeholder for the vcpu TSC. A vcpus TSC from QEMU's point of view is a register initialized to zero, which requires read/write from KVM, and migration. Not sure what is the point of your idea. >=20 > Can you do this? >=20 > Thanks, >=20 > Paolo >=20 > > --- > >=20 > > --- qemu-orig/target-i386/kvm.c 2013-11-28 07:02:45.000000000 +0900 > > +++ qemu/target-i386/kvm.c 2013-12-05 14:47:03.085738175 +0900 > > @@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int > > kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave= ); > > } > > if (has_msr_tsc_adjust) { > > + if (level =3D=3D KVM_PUT_RESET_STATE) > > + env->tsc_adjust =3D 0; > > kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjus= t); > > } > > if (has_msr_misc_enable) { > > @@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int > > kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar); > > } > > #endif > > - if (level =3D=3D KVM_PUT_FULL_STATE) { > > + /* > > + * The following MSRs have side effects on the guest or are too = heavy > > + * for normal writeback. Limit them to reset or full state updat= es. > > + */ > > + if (level >=3D KVM_PUT_RESET_STATE) { > > + if (level =3D=3D KVM_PUT_RESET_STATE) > > + env->tsc =3D 0; > > /* > > * KVM is yet unable to synchronize TSC values of multiple V= CPUs on > > * writeback. Until this is fixed, we only write the offset = to SMP > > * guests after migration, desynchronizing the VCPUs, but av= oiding > > * huge jump-backs that would occur without any writeback at= all. > > */ > > - if (smp_cpus =3D=3D 1 || env->tsc !=3D 0) { > > + if (smp_cpus =3D=3D 1 || env->tsc !=3D 0 || level =3D=3D KVM= _PUT_RESET_STATE) { > > kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc); > > } > > - } > > - /* > > - * The following MSRs have side effects on the guest or are too = heavy > > - * for normal writeback. Limit them to reset or full state updat= es. > > - */ > > - if (level >=3D KVM_PUT_RESET_STATE) { > > kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, > > env->system_time_msr); > > kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_= clock_msr); > >=20 > >=20 > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > >=20