From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier Hanesse Subject: Re: [Xen-devel] Xen 4 TSC problems Date: Mon, 28 Feb 2011 16:23:07 +0100 Message-ID: References: <68c41dd8-9195-41b0-83d7-9242b8eff809@default> <49766a9d-0e37-46d2-9497-33c2e981a871@default> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1988755835==" Return-path: In-Reply-To: <49766a9d-0e37-46d2-9497-33c2e981a871@default> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-users-bounces@lists.xensource.com Errors-To: xen-users-bounces@lists.xensource.com To: Dan Magenheimer Cc: Jeremy Fitzhardinge , xen-devel@lists.xensource.com, Keir Fraser , Jan Beulich , Keir Fraser , Xen Users , Mark Adams List-Id: xen-devel@lists.xenproject.org --===============1988755835== Content-Type: multipart/alternative; boundary=0015175cb0c6bbce5f049d59411e --0015175cb0c6bbce5f049d59411e Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Keir : Yes, it is "under progress". To make this change, I had to reboot every server, so it is taking time (production server :() So i was hoping to find a quick method to mitigate this issue on domUs whil= e rebooting servers. As this bug happens once or twice per server since October, I can't say tha= t right now that changing platform timer to PIT fixed it. I have to wait (I hope forever!) this bug to happen again on a 'patched' server ... But even with clcoksource=3Dpit, I am seeing some warp=3D3000+ in debug mes= sage ? I guess it is not a good sign, is it ? Jan : I was hoping to find a way to make the domU clocksource more "independent" like with xen3.2. 2011/2/28 Dan Magenheimer > Hi Olivier =96 > > > > It is the Xen clocksource that you want to try to change, not the dom0 > clocksource. To do this, you need to specify =93clocksource=3Dpit=94 on = the Xen > boot line (and reboot), not the dom0 boot line. > > > > I believe Mark Adams played with tsc_mode to see if it solved! his > (similar? identical?) problem last year, and it didn=92t make any differe= nce. > > > Please try booting Xen with =93clocksource=3Dpit=94 and ensure that =93Pl= atform > timer is 1.19MHz PIT=94 appears in the Xen boot messages. If the 50min j= ump > does not appear again, it would point to a problem in the hpet, either > hardware or software. > > > > Thanks, > > Dan > > > > *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com] > *Sent:* Monday, February 28, 2011 7:37 AM > *To:* Jeremy Fitzhardinge > *Cc:* Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; > xen-devel@lists.xensource.com; Xen Users; Keir Fraser > > *Subject:* Re: [Xen-devel] Xen 4 TSC problems > > > > Hello, > > > > It happened again twice this weekend. > > > > What about setting "tsc_mode=3D2" for my vms ? Should this mode prevent t= his > bug (coming from a bad emulated tsc due to firmware issue ? is it possibl= e > ?) from affecting time in domUs ? > > > > Setting clocksource=3Dpit, make 'tsc' available in > "/sys/devices/system/clocksource/clocksource0/available_clocksource" > (otherwise only xen is available, is it normal ? ). > > > > Should I bypass xen clocksource and use tsc as a clocksource for dom0/dom= U > ? or will it be worsed ? > > > > Regards > > > > Olivier > > > > 2011/2/24 Jeremy Fitzhardinge > > On 02/24/2011 09:43 AM, Dan Magenheimer wrote: > > Just a wild guess, but this in Olivier's posted output: > > > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more > times. > > > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > > (or a complete red herring, but I thought it worth mentioning). > > > > Mark and Olivier, it would be interesting to know if you are > > using the same processor/system. > > It definitely seems like some kind of problem on the host system rather > than anything in the guests themselves. ! If the platform timer is > misbehaving, then Xen could be completely screwing up the pvclock > calibration which it then passes to guests. > > Could it be one of those "platform clock stops in certain power states" > problems? > > > J > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.xen@gmail.com] > >> Sent: Thursday, February 24, 2011 7:52 AM > >> To: Olivier Hanesse; Jan Beulich > >> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xe= n > >> Users; Dan Magenheimer; Keir Fraser > >> Subject: Re: [Xen-devel] Xen 4 TSC problems > >> > >> On 24/02/2011 14:20, "Olivier Hanesse" > >> wrote: > >> > >>> Both dom0 and domUs are affected by this" jump". > >>> > >>> I expect to see something like "TSC marked as reliable, warp =3D 0". > >>> I got this on newer hardware with same config/distros. > >> It depends on the CPU itself, older CPUs do not have the super-stable > >> TSC > >> features. But that should never cause a massive 3000s time jump. > >> > >>> Is there a way to measure if it is a TSC warp ? to point out a cpu > >> tsc issue ? > >> > >> The TSC warps or out-of-sync issues that we could reasonably expect > >> would be > >> on the order of microseconds. A 3000s warp is something else entirely. > >> Xen > >> is very confused and/or some TSC or platform timer has jumped a long > >> way > >> (indicating a hardware/firmware issue). > >> > >> -- Keir > >> > >>! ;> 2011/2/24 Jan Beulich > > >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse > >> wrote: > >>>>> I tried to turn off cstates with max_cstate=3D0 without success > >> (still "not > >>>>> reliable"). > >>>>> > >>>>> With cpuidle=3D0, I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3D3022 (count=3D1) > >>>> This message by itself isn't telling much I believe. > >>>> > >>>>> xm info | grep command > >>>>> xen_commandline : dom0_mem=3D512M cpuidle=3D0 loglvl=3Dall > >> guest_loglvl=3Dall > >>>>> dom0_max_vcpus=3D1 dom0_vcp! us_pin console=3Dvga,com1 com1=3D19200= ,8n1 > > >>>>> > >>>>> Keir : > >>>>> > >>>>> Using clocksource=3Dpit : > >>>>> > >>>>> (XEN) Platform timer is 1.193MHz PIT > >>>>> > >>>>> I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3D3262 (count=3D2) > >>>> The question is whether any of this eliminates the time jumps seen > >>>> by your DomU-s (from your past mails I wasn't actually sure whether > >>>> Dom0 also experienced this problem, albeit it would be odd if it > >> didn't). > >>>> Jan > >>>> > >>>> Jan > >>>> > >>> > >> > > > --0015175cb0c6bbce5f049d59411e Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Keir :=A0

Yes, it is "under progress"= .=A0
To make this change, I had to reboot every server, so it is = taking time (production server :()
So i was hoping to find a quick method to mitigate this issue on domUs= while rebooting servers.

As this bug happens once= or twice per server since October, I can't say that right now that cha= nging platform timer to PIT fixed it. I have to wait (I hope forever!) this= bug to happen again on a 'patched' server ...=A0

But even with clcoksource=3Dpit, I am seeing some warp= =3D3000+ in debug message ? I guess it is not a good sign, is it ?

Jan : I was hoping to find a way to make the domU clocksou= rce more "independent" like with xen3.2.


2011/2/28 Dan Mage= nheimer <dan.magenheimer@oracle.com>

Hi Olivier =96

=A0<= /span>

It is the Xen clocksource th= at you want to try to change, not the dom0 clocksource.=A0 To do this, you = need to specify =93clocksource=3Dpit=94 on the Xen boot line (and reboot), = not the dom0 boot line.

=A0

I believe Mark Adams played with tsc_mode to see if it solved! his (similar? identical?) problem last year, and it didn=92t make any differenc= e.


Please try booting Xen with =93clocksource=3Dpit=94 and ensure = that =93Platform timer is 1.19MHz PIT=94 appears in the Xen boot messages.= =A0 If the 50min jump does not appear again, it would point to a problem in= the hpet, either hardware or software.

=A0

Thanks,

Dan

=A0

From:= Olivier Hanesse [mailto:olivier.hanesse@gmail.com= ]
Sent: Monday, February 28, 2011 7:37 AM
To: Jeremy Fitzhar= dinge
Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; = xen-deve= l@lists.xensource.com; Xen Users; Keir Fraser


Subject: Re: [Xen-devel] Xen 4= TSC problems

=A0

Hello,

=A0

It happened agai= n twice this weekend.

=A0

What about setting "tsc_mode=3D2" for my vms ? Should this = mode prevent this bug (coming from a bad emulated tsc due to firmware issue= ? is it possible ?) from affecting time in domUs ?

=A0

= Setting clocksource=3Dpit, make 'tsc' available in "/sys/devic= es/system/clocksource/clocksource0/available_clocksource" (otherwise o= nly xen is available, is it normal ? ).=A0

=A0

= Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU = ? or =A0will it be worsed ?

=A0

Regards

=A0

=

Olivier

=A0

2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>

On 02/24/2011 09= :43 AM, Dan Magenheimer wrote:
> Just a wild guess, but this in Olivi= er's posted output:
>
> (XEN) Platform timer appears to hav= e unexpectedly wrapped 10 or more times.
>
> and the fact that a 32-bit HPET wrap is ~300 seconds and, with= the
> "10 or more times", 10 * 300 seconds is 3000 seconds= , might be a clue
> (or a complete red herring, but I thought it wort= h mentioning).
>
> Mark and Olivier, it would be interesting to know if you are> using the same processor/system.

It = definitely seems like some kind of problem on the host system rather
tha= n anything in the guests themselves. ! =A0If the platform timer is
misbehaving, then Xen could be completely screwing= up the pvclock
calibration which it then passes to guests.

Could= it be one of those "platform clock stops in certain power states"= ;
problems?


=A0 =A0J

>= ;> -----Original Message-----
>> From: Keir Fraser [mailto:keir.xen@gmail.com]=
>> Sent: Thursday, February 24, 2011 7:52 AM
>> To: Olivier = Hanesse; Jan Beulich
>> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lis= ts.xensource.com; Xen
>> Users; Dan Magenheimer; Keir Fraser
>> Subject: Re: [Xen-= devel] Xen 4 TSC problems
>>
>> On 24/02/2011 14:20, &quo= t;Olivier Hanesse" <olivier.hanesse@gmail.com>
>> wrote:
>>
>>> Both dom0 and domUs are affected by this= " jump".
>>>
>>> I expect to see somethin= g like "TSC marked as reliable, warp =3D 0".
>>> I go= t this on newer hardware with same config/distros.
>> It depends on the CPU itself, older CPUs do not have the super-sta= ble
>> TSC
>> features. But that should never cause a mas= sive 3000s time jump.
>>
>>> Is there a way to measure= if it is a TSC warp ? to point out a cpu
>> tsc issue ?
>>
>> The TSC warps or out-of-sync i= ssues that we could reasonably expect
>> would be
>> on t= he order of microseconds. A 3000s warp is something else entirely.
>&= gt; Xen
>> is very confused and/or some TSC or platform timer has jumped a lo= ng
>> way
>> (indicating a hardware/firmware issue).
&= gt;>
>> =A0-- Keir
>>
>&gt! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com>

>>>>&g= t;>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com><= br> >> wrote:
>>>>> I tried to turn off cstates with ma= x_cstate=3D0 without success
>> (still "not
>>>&g= t;> reliable").
>>>>>
>>>>> Wit= h cpuidle=3D0, I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, d= eep Cstates possible, so not
>> reliable,
>>>>> = warp=3D3022 (count=3D1)
>>>> This message by itself isn'= t telling much I believe.
>>>>
>>>>> xm info | grep command
>>= >>> xen_commandline =A0 =A0 =A0 =A0: dom0_mem=3D512M cpuidle=3D0 l= oglvl=3Dall
>> guest_loglvl=3Dall
>>>>> do= m0_max_vcpus=3D1 dom0_vcp! us_pin console=3Dvga,com1 com1=3D19200,8n1

>>>>&g= t;
>>>>> Keir :
>>>>>
>>>&g= t;> Using clocksource=3Dpit :
>>>>>
>>>>= ;> (XEN) Platform timer is 1.193MHz PIT
>>>>>
>>>>> I also got :
>>>&g= t;>
>>>>> (XEN) TSC has constant rate, deep Cstates po= ssible, so not
>> reliable,
>>>>> warp=3D3262 (c= ount=3D2)
>>>> The question is whether any of this eliminates the time ju= mps seen
>>>> by your DomU-s (from your past mails I wasn= 9;t actually sure whether
>>>> Dom0 also experienced this pr= oblem, albeit it would be odd if it
>> didn't).
>>>> Jan
>>>>
>&g= t;>> Jan
>>>>
>>>
>>

=

=A0


--0015175cb0c6bbce5f049d59411e-- --===============1988755835== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users --===============1988755835==--