From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: [Xen-devel] Xen 4 TSC problems Date: Mon, 28 Feb 2011 15:39:48 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-users-bounces@lists.xensource.com Errors-To: xen-users-bounces@lists.xensource.com To: Olivier Hanesse , Dan Magenheimer Cc: Jeremy Fitzhardinge , xen-devel@lists.xensource.com, Keir Fraser , Jan Beulich , Xen Users , Mark Adams List-Id: xen-devel@lists.xenproject.org On 28/02/2011 15:23, "Olivier Hanesse" wrote: > Keir :=A0 >=20 > Yes, it is "under progress".=A0 > To make this change, I had to reboot every server, so it is taking time > (production server :() > So i was hoping to find a quick method to mitigate this issue on domUs wh= ile > rebooting servers. >=20 > As this bug happens once or twice per server since October, I can't say t= hat > right now that changing platform timer to PIT fixed it. I have to wait (I= hope > forever!) this bug to happen again on a 'patched' server ...=A0 >=20 > But even with clcoksource=3Dpit, I am seeing some warp=3D3000+ in debug messa= ge ? > I guess it is not a good sign, is it ? Better not to have it, but honestly you're very unlikely to see any problem from it. It's totally unrelated to the 3000-second time jumps. -- Keir > Jan : I was hoping to find a way to make the domU clocksource more > "independent" like with xen3.2. >=20 >=20 > 2011/2/28 Dan Magenheimer >> Hi Olivier =AD >> =A0 >> It is the Xen clocksource that you want to try to change, not the dom0 >> clocksource.=A0 To do this, you need to specify =B3clocksource=3Dpit=B2 on the X= en >> boot line (and reboot), not the dom0 boot line. >> =A0 >> I believe Mark Adams played with tsc_mode to see if it solved! his (simi= lar? >> identical?) problem last year, and it didn=B9t make any difference. >>=20 >> Please try booting Xen with =B3clocksource=3Dpit=B2 and ensure that =B3Platform = timer >> is 1.19MHz PIT=B2 appears in the Xen boot messages.=A0 If the 50min jump doe= s not >> appear again, it would point to a problem in the hpet, either hardware o= r >> software. >> =A0 >> Thanks, >> Dan >> =A0 >>=20 >> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] >> Sent: Monday, February 28, 2011 7:37 AM >> To: Jeremy Fitzhardinge >> Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; >> xen-devel@lists.xensource.com; Xen Users; Keir Fraser >>=20 >>=20 >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> =A0 >>=20 >> Hello, >>=20 >> =A0 >> It happened again twice this weekend. >>=20 >> =A0 >>=20 >> What about setting "tsc_mode=3D2" for my vms ? Should this mode prevent th= is >> bug (coming from a bad emulated tsc due to firmware issue ? is it possib= le ?) >> from affecting time in domUs ? >>=20 >> =A0 >>=20 >> Setting clocksource=3Dpit, make 'tsc' available in >> "/sys/devices/system/clocksource/clocksource0/available_clocksource" >> (otherwise only xen is available, is it normal ? ).=A0 >>=20 >> =A0 >>=20 >> Should I bypass xen clocksource and use tsc as a clocksource for dom0/do= mU ? >> or =A0will it be worsed ? >>=20 >> =A0 >>=20 >> Regards >>=20 >> =A0 >>=20 >> Olivier >>=20 >> =A0 >>=20 >> 2011/2/24 Jeremy Fitzhardinge >>=20 >> On 02/24/2011 09:43 AM, Dan Magenheimer wrote: >>> Just a wild guess, but this in Olivier's posted output: >>>=20 >>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more ti= mes. >>>=20 >>> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the >>> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue >>> (or a complete red herring, but I thought it worth mentioning). >>>=20 >>> Mark and Olivier, it would be interesting to know if you are >>> using the same processor/system. >> It definitely seems like some kind of problem on the host system rather >> than anything in the guests themselves. ! =A0If the platform timer is >> misbehaving, then Xen could be completely screwing up the pvclock >> calibration which it then passes to guests. >>=20 >> Could it be one of those "platform clock stops in certain power states" >> problems? >>=20 >>=20 >> =A0 =A0J >>=20 >>>> -----Original Message----- >>>> From: Keir Fraser [mailto:keir.xen@gmail.com] >>>> Sent: Thursday, February 24, 2011 7:52 AM >>>> To: Olivier Hanesse; Jan Beulich >>>> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xe= n >>>> Users; Dan Magenheimer; Keir Fraser >>>> Subject: Re: [Xen-devel] Xen 4 TSC problems >>>>=20 >>>> On 24/02/2011 14:20, "Olivier Hanesse" >>>> wrote: >>>>=20 >>>>> Both dom0 and domUs are affected by this" jump". >>>>>=20 >>>>> I expect to see something like "TSC marked as reliable, warp =3D 0". >>>>> I got this on newer hardware with same config/distros. >>>> It depends on the CPU itself, older CPUs do not have the super-stable >>>> TSC >>>> features. But that should never cause a massive 3000s time jump. >>>>=20 >>>>> Is there a way to measure if it is a TSC warp ? to point out a cpu >>>> tsc issue ? >>>>=20 >>>> The TSC warps or out-of-sync issues that we could reasonably expect >>>> would be >>>> on the order of microseconds. A 3000s warp is something else entirely. >>>> Xen >>>> is very confused and/or some TSC or platform timer has jumped a long >>>> way >>>> (indicating a hardware/firmware issue). >>>>=20 >>>> =A0-- Keir >>>>=20 >>> >! ;> 2011/2/24 Jan Beulich >>=20 >>>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse >>>> wrote: >>>>>>> I tried to turn off cstates with max_cstate=3D0 without success >>>> (still "not >>>>>>> reliable"). >>>>>>>=20 >>>>>>> With cpuidle=3D0, I also got : >>>>>>>=20 >>>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >>>> reliable, >>>>>>> warp=3D3022 (count=3D1) >>>>>> This message by itself isn't telling much I believe. >>>>>>=20 >>>>>>> xm info | grep command >>>>>>> xen_commandline =A0 =A0 =A0 =A0: dom0_mem=3D512M cpuidle=3D0 loglvl=3Dall >>>> guest_loglvl=3Dall >>>>>>> dom0_max_vcpus=3D1 dom0_vcp! us_pin console=3Dvga,com1 com1=3D19200,8n1 >>=20 >>>>>>>=20 >>>>>>> Keir : >>>>>>>=20 >>>>>>> Using clocksource=3Dpit : >>>>>>>=20 >>>>>>> (XEN) Platform timer is 1.193MHz PIT >>>>>>>=20 >>>>>>> I also got : >>>>>>>=20 >>>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >>>> reliable, >>>>>>> warp=3D3262 (count=3D2) >>>>>> The question is whether any of this eliminates the time jumps seen >>>>>> by your DomU-s (from your past mails I wasn't actually sure whether >>>>>> Dom0 also experienced this problem, albeit it would be odd if it >>>> didn't). >>>>>> Jan >>>>>>=20 >>>>>> Jan >>>>>>=20 >>>>>=20 >>>>=20 >> =A0 >=20 >=20