From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Xen 4 TSC problems Date: Thu, 24 Feb 2011 07:16:05 +0000 Message-ID: References: <4D655A39.2040501@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4D655A39.2040501@gmail.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Olivier Hanesse Cc: Dan Magenheimer , xen-devel@lists.xensource.com, Xen Users , Jeremy Fitzhardinge , Mark Adams List-Id: xen-devel@lists.xenproject.org Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well would be interesting, if you still have it installed on any of these machines. -- Keir On 23/02/2011 19:04, "Olivier Hanesse" wrote: > I am sorry for the lack of information. > Every domUs on the dom0 are affected by this bug at the exact same time. >=20 > And I had this bug on a dozen servers (all running on the same hw) since > October (when I switched from Xen 3.2 to 4.0). >=20 > Regards >=20 > Olivier >=20 > Le 23/02/2011 18:19, Keir Fraser a =E9crit : >> On 23/02/2011 16:16, "Dan Magenheimer" wrot= e: >>=20 >>> It=B9s very unlikely this is a problem with TSC. It is most likely a Xen = (or >>> possibly a PV Linux) problem where a guest (or dom0) either =B3goes out t= o >>> lunch=B2 for a long period, or some other timer gets stuck. The =B3clockso= urce >>> tsc unstable=B2 message is a side effect of this... it=B9s very likely the = TSC >>> that IS stable and correct and the other clocksource (pvclock) has >>> lost/gained >>> 50 minutes! >>>=20 >>> Mark Adams cc=B9ed and his original xen-devel posting below. The fact th= at >>> two >>> different users (possibly on the same processor/system type?) have subm= itted >>> the message with a delta so similar would lead me to believe there is s= ome >>> timer that is =B3wrapping=B2. And since pvclock is usually the clocksource= for >>> dom0, and pvclock is driven! by Xen=B9s =B3system time=B2, a reasonable gues= s is >>> that the timer that is wrapping is in Xen itself. >>>=20 >>> Mark=B9s delta =3D -2999660303788 ns >>> Your delta =3D -2999660334211 ns >>>=20 >>> Googling, I see the HPET wraparound is ~306 seconds and this delta is a= bout >>> 3000 seconds, so that may be a bad guess. >>>=20 >>> Keir, any thoughts on this? Do you recall any post-4.0 patches that ma= y >>> have >>> fixed this? >> I've never seen a 3000s wrap, and I don't know of anything that would ha= ve >> fixed a bug like this. If this is a Xen time wrap of some kind then it w= ould >> affect all running guests; it's not clear here whether only one, or all, >> guests see the wrap. >>=20 >> K. >>=20 >>> Thanks, >>> Dan >>>=20 >>> References: >>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.htm= l >>> https://lkml.org/lkml/2010/10/26/126 >>>=20 >>>=20 >>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] >>> Sent: Wednesday, February 23, 2011 3:50 AM >>> To: xen-devel@lists.xensource.co! m; Xen Users >>> Subject: [Xen-devel] Xen 4 TSC problems >>>=20 >>>=20 >>> Hello >>>=20 >>>=20 >>>=20 >>> I've got an issue about time keeping with Xen 4.0 (Debian squeeze relea= se). >>>=20 >>>=20 >>>=20 >>> My problem is here (hopefully I amn't the only one, so there might be a= bug >>> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D599161#50 >>>=20 >>> After some times, I got this error : Clocksource tsc unstable (delta =3D >>> -2999660334211 ns). It has happened on several servers. >>>=20 >>>=20 >>>=20 >>> Looking at the output of "xm debug-key s;" >>>=20 >>>=20 >>>=20 >>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, >>> warp=3D2850 >>> (count=3D3) >>>=20 >>>=20 >>>=20 >>> I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the >>> "constant_tsc", but not the "nonstop_tsc" one. >>>=20 >>> On other systems with a newer cpu with "nonstop_tsc", I don't have this >>> issue >>> (systems are running the same distros with same config). >>>=20 >>>=20 >>>=20 >>> I tried to boot with "max_cstate=3D0", but nothing changed, my TSC isn't >>> reliable and after some times, I will got the "50min" issue again. >>>=20 >>>=20 >>>=20 >>> I don't unders! tand how a system can do a jump of "50min" in the futu= re. >>> Why >>> 50min ? it is not 40min, not 1 hour, it is always 50min. >>>=20 >>> I don't know how to make my TSC "reliable" (I already disable everythin= g >>> about >>> Powerstate in BIOS Settings). >>>=20 >>>=20 >>>=20 >>> Any ideas ? >>>=20 >>>=20 >>>=20 >>> Regards >>>=20 >>>=20 >>>=20 >>> Olivier >>>=20 >>=20 >=20