From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Magenheimer Subject: RE: [Xen-devel] Xen 4 TSC problems Date: Mon, 28 Feb 2011 07:30:30 -0800 (PST) Message-ID: References: <68c41dd8-9195-41b0-83d7-9242b8eff809@default> <49766a9d-0e37-46d2-9497-33c2e981a871@default AANLkTikptc2POrKQgJuoVZRdwJTo64DJ_hm12KuPky4D@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1891310359==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-users-bounces@lists.xensource.com Errors-To: xen-users-bounces@lists.xensource.com To: Olivier Hanesse Cc: Jeremy Fitzhardinge , xen-devel@lists.xensource.com, Keir Fraser , Jan Beulich , Keir Fraser , Xen Users , Mark Adams List-Id: xen-devel@lists.xenproject.org --===============1891310359== Content-Type: multipart/alternative; boundary="__129890705141018905abhmt020" --__129890705141018905abhmt020 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Hi Olivier - =20 By "warp=3D3000+ in debug message" do you mean the Xen boot message "TSC h= as constant rate..., warp =3D NNNN"? =20 If so, this is a very different "warp" measured in cycles, not in seconds, = so 3000 is more like a microsecond not an hour, and this is normal (not a b= ad sign). =20 Dan =20 From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]=20 Sent: Monday, February 28, 2011 8:23 AM To: Dan Magenheimer Cc: Jeremy Fitzhardinge; Keir Fraser; Jan Beulich; Mark Adams; xen-devel@li= sts.xensource.com; Xen Users; Keir Fraser Subject: Re: [Xen-devel] Xen 4 TSC problems =20 Keir :=20 =20 Yes, it is "under progress".=20 To make this change, I had to reboot every server, so it is taking time (pr= oduction server :() So i was hoping to find a quick method to mitigate this issue on domUs whil= e rebooting servers. =20 As this bug happens once or twice per server since October, I can't say tha= t right now that changing platform timer to PIT fixed it. I have to wait (I= hope forever!) this bug to happen again on a 'patched' server ...=20 =20 But even with clcoksource=3Dpit, I am seeing some warp=3D3000+ in debug mes= sage ? I guess it is not a good sign, is it ? =20 Jan : I was hoping to find a way to make the domU clocksource more "indepen= dent" like with xen3.2. =20 =20 2011/2/28 Dan Magenheimer Hi Olivier - =20 It is the Xen clocksource that you want to try to change, not the dom0 cloc= ksource. To do this, you need to specify "clocksource=3Dpit" on the Xen bo= ot line (and reboot), not the dom0 boot line. =20 I believe Mark Adams played with tsc_mode to see if it solved! his (similar= ? identical?) problem last year, and it didn't make any difference. Please try booting Xen with "clocksource=3Dpit" and ensure that "Platform t= imer is 1.19MHz PIT" appears in the Xen boot messages. If the 50min jump d= oes not appear again, it would point to a problem in the hpet, either hardw= are or software. =20 Thanks, Dan =20 From: Olivier Hanesse [mailto:HYPERLINK "mailto:olivier.hanesse@gmail.com" = \nolivier.hanesse@gmail.com]=20 Sent: Monday, February 28, 2011 7:37 AM To: Jeremy Fitzhardinge Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; HYPERLINK "mailt= o:xen-devel@lists.xensource.com" \nxen-devel@lists.xensource.com; Xen Users= ; Keir Fraser Subject: Re: [Xen-devel] Xen 4 TSC problems =20 Hello, =20 It happened again twice this weekend. =20 What about setting "tsc_mode=3D2" for my vms ? Should this mode prevent thi= s bug (coming from a bad emulated tsc due to firmware issue ? is it possibl= e ?) from affecting time in domUs ? =20 Setting clocksource=3Dpit, make 'tsc' available in "/sys/devices/system/clo= cksource/clocksource0/available_clocksource" (otherwise only xen is availab= le, is it normal ? ).=20 =20 Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU = ? or will it be worsed ? =20 Regards =20 Olivier =20 2011/2/24 Jeremy Fitzhardinge On 02/24/2011 09:43 AM, Dan Magenheimer wrote: > Just a wild guess, but this in Olivier's posted output: > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more time= s. > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > (or a complete red herring, but I thought it worth mentioning). > > Mark and Olivier, it would be interesting to know if you are > using the same processor/system. It definitely seems like some kind of problem on the host system rather than anything in the guests themselves. ! If the platform timer is misbehaving, then Xen could be completely screwing up the pvclock calibration which it then passes to guests. Could it be one of those "platform clock stops in certain power states" problems? J >> -----Original Message----- >> From: Keir Fraser [mailto:HYPERLINK "mailto:keir.xen@gmail.com" \nkeir.x= en@gmail.com] >> Sent: Thursday, February 24, 2011 7:52 AM >> To: Olivier Hanesse; Jan Beulich >> Cc: Mark Adams; Jeremy Fitzhardinge; HYPERLINK "mailto:xen-devel@lists.x= ensource.com" \nxen-devel@lists.xensource.com; Xen >> Users; Dan Magenheimer; Keir Fraser >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> >> On 24/02/2011 14:20, "Olivier Hanesse" >> wrote: >> >>> Both dom0 and domUs are affected by this" jump". >>> >>> I expect to see something like "TSC marked as reliable, warp =3D 0". >>> I got this on newer hardware with same config/distros. >> It depends on the CPU itself, older CPUs do not have the super-stable >> TSC >> features. But that should never cause a massive 3000s time jump. >> >>> Is there a way to measure if it is a TSC warp ? to point out a cpu >> tsc issue ? >> >> The TSC warps or out-of-sync issues that we could reasonably expect >> would be >> on the order of microseconds. A 3000s warp is something else entirely. >> Xen >> is very confused and/or some TSC or platform timer has jumped a long >> way >> (indicating a hardware/firmware issue). >> >> -- Keir >> >>! ;> 2011/2/24 Jan Beulich >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse >> wrote: >>>>> I tried to turn off cstates with max_cstate=3D0 without success >> (still "not >>>>> reliable"). >>>>> >>>>> With cpuidle=3D0, I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3D3022 (count=3D1) >>>> This message by itself isn't telling much I believe. >>>> >>>>> xm info | grep command >>>>> xen_commandline : dom0_mem=3D512M cpuidle=3D0 loglvl=3Dall >> guest_loglvl=3Dall >>>>> dom0_max_vcpus=3D1 dom0_vcp! us_pin console=3Dvga,com1 com1=3D19200,8= n1 >>>>> >>>>> Keir : >>>>> >>>>> Using clocksource=3Dpit : >>>>> >>>>> (XEN) Platform timer is 1.193MHz PIT >>>>> >>>>> I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3D3262 (count=3D2) >>>> The question is whether any of this eliminates the time jumps seen >>>> by your DomU-s (from your past mails I wasn't actually sure whether >>>> Dom0 also experienced this problem, albeit it would be odd if it >> didn't). >>>> Jan >>>> >>>> Jan >>>> >>> >> =20 =20 --__129890705141018905abhmt020 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: quoted-printable

Hi Ol= ivier –

&nb= sp;

By “warp=3D3000+ = in  debug message” do you mean the Xen boot message “TSC= has constant rate..., warp =3D NNNN”?

 

<= span style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#= 1F497D'>If so, this is a very different “warp” measured in cy= cles, not in seconds, so 3000 is more like a microsecond not an hour, ! =20 and this is normal (not a bad sign).

 

Dan<= /o:p>

 

From: Olivier Hanesse [ma= ilto:olivier.hanesse@gmail.com]
Sent: Monday, February 28, 201= 1 8:23 AM
To: Dan Magenheimer
Cc: Jeremy Fitzhardinge= ; Keir Fraser; Jan Beulich; Mark Adams; xen-devel@lists.xensource.com; Xen Users; Keir Fraser
Subject: = Re: [Xen-devel] Xen 4 TSC problems

 

Keir :&nbs= p;

 

Yes, it is "under progress". =

To make this change, I had= to reboot every server, so it is taking time (production server :()=

So i was hoping to find a quick= method to mitigate this issue on domUs while rebooting servers.

 

As this bug happens once or twice per server since Octo= ber, I can't say that right now that changing platform timer to PIT fixed= it. I have to wait (I hope forever!) this bug to happen again on a 'patc= hed' server ... 

 

Bu= t even with clcoksource=3Dpit, I am seeing some warp=3D3000+ in debug mes= sage ? I guess it is not a good sign, is it ?

 

J= an : I was hoping to find a way to make the domU clocksource more "i= ndependent" like with xen3.2.

 

 <= /o:p>

2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>

Hi Olivier –

 

It is the Xen clocks= ource that you want to try to change, not the dom0 clocksource.  To = do this, you need to specify “clocksource=3Dpit” on the Xen b= oot line (and reboot), not the dom0 boot line.

 

I believe M= ark Adams played with tsc_mode to see if it solved! his (similar? identic= al?) problem last year, and it didn’t make any difference.


= Please try booting Xen with “clocksource=3Dpit” and ensu! =20 re that “Platform timer is 1.19MHz PIT” appears in the Xen boot messa= ges.  If the 50min jump does not appear again, it would point to a p= roblem in the hpet, either hardware or software.

 

Thanks,

Dan

 

From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
Se= nt: Monday, February 28, 2011 7:37 AM
To: Jeremy Fitzhardin= ge
Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; <= a href=3D"mailto:xen-devel@lists.xensource.com" target=3D"_blank">xen-dev= el@lists.xensource.com; Xen Users; Keir Fraser

<= div>


Subject: Re: [Xen-devel] Xen 4 T= SC problems

&nbs= p;

Hello,

 <= /o:p>

It happened again twice this weekend.<= /p>

 

What about sett= ing "tsc_mode=3D2" for my vms ? Should this mode prevent this b= ug (coming from a bad emulated tsc due to firmware issue ? is it possible= ?) from affecting time in domUs ?

&nbs= p;

Setting clocksource=3Dpit, make 'tsc= ' available in "/sys/devices/system/clocksource/clocksource0/availab= le_clocksource" (otherwise only xen is available, is it norma! =20 l ? ). 

 

Should I bypass xen clocksource and use tsc as a clocksource f= or dom0/domU ? or  will it be worsed ?

 

Regards

 

Olivier

 

=

2011/2/24 Je= remy Fitzhardinge <jeremy@goop.org>

On 02/24/2011 09:43 AM, Da= n Magenheimer wrote:
> Just a wild guess, but this in Olivier's pos= ted output:
>
> (XEN) Platform timer appears to have unexpect= edly wrapped 10 or more times.
>
> and the fact that a 32-bit= HPET wrap is ~300 seconds and, with the
> "10 or more times&q= uot;, 10 * 300 seconds is 3000 seconds, might be a clue
> (or a com= plete red herring, but I thought it worth mentioning).
>
> Ma= rk and Olivier, it would be interesting to know if you are
> using = the same processor/system.

It definitely see= ms like some kind of problem on the host system rather
than anything in the guests themselves. !  If the platform= timer is
misbehaving, then Xen could be completely screwing up the pv= clock
calibration which it then passes to guests.

Could it be o= ne of those "platform clock stops in certain power states"
p= roblems?


   J

>> -----Original Message-----
>&= gt; From: Keir Fraser [mailto:keir.xen@gmail.com]
>> Sent: Thursday, February= 24, 2011 7:52 AM
>> To: Olivier Hanesse; Jan Beulich
>>= ; Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen>> Users; Dan Magenheimer; Keir Fraser
>> Subject: Re: [= Xen-devel] Xen 4 TSC problems
>>
>> On 24/02/2011 14:20= , "Olivier Hanesse" <olivier.hanes= se@gmail.com>
>> wrote:
>>
>>> Both = dom0 and domUs are affected by this" jump".
>>>
= >>> I expect to see something like "TSC marked as reliable,= warp =3D 0".
>>> I got this on newer hardware with same= config/distros.
>> It depends on the CPU itself, older CPUs do = not have the super-stable
>> TSC
>> features. But that = should never cause a massive 3000s time jump.
>>
>>>= Is there a way to measure if it is a TSC warp ? to point out a cpu
&g= t;> tsc issue ?
>>
>> The TSC warps or out-of-sync i= ssues that we could reasonably expect
>> would be
>> on= the order of microseconds. A 3000s warp is something else entirely.
&= gt;> Xen
>> is very confused and/or some TSC or platform time= r has jumped a long
>> way
>> =20 (indicating a hardware/firmware issue).
>>
>>  -- = Keir
>>

>&= gt! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com>


>>>>>>> On 24.02.11 at 12:57, Oli= vier Hanesse <olivier.hanesse@gmail.com>
>> wrote:
>>&g= t;>> I tried to turn off cstates with max_cstate=3D0 without succes= s
>> (still "not
>>>>> reliable").>>>>>
>>>>> With cpuidle=3D0, I also g= ot :
>>>>>
>>>>> (XEN) TSC has consta= nt rate, deep Cstates possible, so not
>> reliable,
>>&= gt;>> warp=3D3022 (count=3D1)
>>>> This message by i= tself isn't telling much I believe.
>>>>
>>>>> xm info | grep comma= nd
>>>>> xen_commandline        : d= om0_mem=3D512M cpuidle=3D0 loglvl=3Dall
>> guest_loglvl=3Dall

>>>>> dom0_max_vcpu= s=3D1 dom0_vcp! us_pin console=3Dvga,com1 com1=3D19200,8n1

=


>>>>>
>>>>>= ; Keir :
>>>>>
>>>>> Using clocksourc= e=3Dpit :
>>>>>
>>>>> (XEN) Platform = timer is 1.193MHz PIT
>>>>>
>>>>> I a= lso got :
>>>>>
>>>>> (XEN) TSC has c= onstant rate, deep Cstates possible, so not
>> reliable,
>= >>>> warp=3D3262 (count=3D2)
>>>> The question= is whether any of this eliminates the time jumps seen
>>>>= ; by your DomU-s (from your past mails I wasn't actually sure whether
>>>> Dom0 also experienced this problem, albeit it= would be odd if it
>> didn't).
>>>> Jan
>&= gt;>>
>>>> Jan
>>>>
>>>>>

 

 

--__129890705141018905abhmt020-- --===============1891310359== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users --===============1891310359==--