From mboxrd@z Thu Jan 1 00:00:00 1970 From: Carsten Schiers Subject: RE: Huge Time went backwards Date: Tue, 7 Apr 2009 13:24:03 +0200 Message-ID: <7062954.461239103443883.JavaMail.root@uhura> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "dan.magenheimer" , "Tim.Deegan" , xen-devel List-Id: xen-devel@lists.xenproject.org Hi, I have found another event from "Time want backwards" with my 3.3.1, so I= analyzed a bit deeper and systematically. Maybe there is something, you recognize.= Sorry=20 this post is so long. Short summary of system: Gigabyte GA-M56S-S3, AMD 4050e Dualcore, Debian E= tch as base for a self-compiled Xen 3.3.1 with 2.6.18.8 besides the Firewall. Do= m0 is 64bit, there is one 64bit DomU, all others are 32bit. The 64bit DomUs and= one 32bit DomU have 2 VCPUs, the others have one only. I use munin to monitor only a few values, two of them are cpu usage of do= mains and residency on the four available P-States. I noticed a peak at 06:25. I fo= und out that=20 it's because all Domains are running cron.daily, in addition munin-cron f= ires up (like=20 it does all five minutes).=20 I generally reduced "Time went backwards" messages by setting sample_rate= of the ondemand=20 govenor to 4sec, using newer Kernels and spreading crons, but this is all= cure on symtoms, I guess.=20 Today, at 06:25, there was another Time went Backwards in Dom0: Apr 7 06:25:01 data /USR/SBIN/CRON[24074]: (munin) CMD (if [ -x /usr/bin= /munin-cron... Apr 7 06:25:01 data /USR/SBIN/CRON[24075]: (root) CMD (if [ -x /etc/muni= n/plugins/apt_all... Apr 7 06:25:03 data kernel: Timer ISR/0: Time went backwards: delta=3D-3= 2282595 delta_cpu=3D43717405=20 shadow=3D375953403877760 off=3D31846903 proc= essed=3D375953468006000=20 cpu_processed=3D375953392006000 Apr 7 06:25:03 data kernel: 0: 375953392006000 Apr 7 06:25:03 data kernel: 1: 375953464006000 It's 32ms, nothing unusual. It's just after munin-cron, I guess the 2 sec= onds are needed before the graphic calculation starts. So we have a system sleeping well a= t 1,0GHz, that jumps to 2,1GHz. Right then, we have the message.=20 I checked with all other Domains. In terms of "Time went backwards", ther= e was only one other Domain, spock, the 32bit 2 VCPU Domain: Apr 7 06:22:37 spock -- MARK -- Apr 7 06:25:04 spock kernel: Timer ISR/1: Time went backwards: delta=3D-= 25302545 delta_cpu=3D-21302545=20 shadow=3D375953435562320 off=3D7458 process= ed=3D375953460871440=20 cpu_processed=3D375953456871440 Apr 7 06:25:04 spock kernel: 0: 375953452871440 Apr 7 06:25:04 spock kernel: 1: 375953456871440 Again, it was sleeping. It comes up with a different delta. But what made= me curious is: the last 6 digits of the per_cpu(processed_system_time) are invariant in bot c= ases. I am not expert enough to judge, whether this is right. When browsing through the syslogs, I also saw another event that mentione= d TSC: all DomUs besides the 64 Bit DomUs mention TSC unstability: Apr 2 22:02:22 shields kernel: Clocksource tsc unstable (delta =3D -2630= 23689 ns) Apr 2 22:02:32 tuvok kernel: TSC appears to be running slowly. Marking i= t as unstable Apr 2 22:02:34 kes kernel: TSC appears to be running slowly. Marking it a= s unstable Apr 2 22:02:33 uhura kernel: TSC appears to be running slowly. Marking i= t as unstable Apr 2 22:02:34 worf kernel: TSC appears to be running slowly. Marking it= as unstable You have to know that this was shortly after a reboot. When rebooting, I w= ill start up all DomUs by xendomains with a 15sec delay. So I thought that it is quite= unusual that they find out about unstable TSC at the same time. So I looked up Dom0: Apr 2 22:02:19 data ntpd[6325]: synchronized to 212.112.228.242, stratum= 2 Apr 2 22:02:19 data ntpd[6325]: time reset -2.027414 s Apr 2 22:02:19 data ntpd[6325]: kernel time sync enabled 0001 I guess *this* could be conected with Tims patch. As promised, I will shortly try to set up other versions of Xen and later= kernels. But honestly, I don't expect the situation to change, as I normally follow the patches b= uild into mercurial. Best Regards, Carsten. ----- Originalnachricht ----- Von: Carsten Schiers Gesendet: Mon, 6.4.2009 23:19 An: dan.magenheimer ; xen-devel Cc: Tim.Deegan Betreff: AW: RE: AW: RE: [Xen-devel] Huge Time went backwards Thanks Dan. As I lost my setup (although I could have restored that - it was an lvm),= I set up everything new. Today I compiled Xen 3.3.1, 3.3-testing and 3.4-unstab= le and two kernel pairs for 64 and 32 bit (with and without MSI support; I have s= ome issues with that in one of my DomUs that gets four PCI devices passed thr= u). As I am unsure whether I can simply install xen-unstable tools over 3.3.1= , I will set up a copy of my Dom0 and install it there. I modified the extra versi= on so=20 that I can use the same boot dir and still can distinguish between the ke= rnels. I still have not found anything like tagging for the kernel; the tar-ball= for=20 Xen-3.3.1 will not compile with my Xen. So for kernels, it's now the one f= rom=20 today in any case. Next step is to carefully start testing, beginning with MSI and Xen 3.4, a= s I expect the best results there. Then I will shorten the ondemand sampling r= ate, which is now at roughly 4 sec. This is to prevent the core to jump their c= lock rate too often. In parallel, I will try to find out where exactly this message is produce= d and what it means. I will report ;-) Andy yes, it's all pv, no hvm. And all DomUs get their time from Dom0/ntp= d.=20 And there seems to be no drift. Best Regards, Carsten.=20 -----Urspr=FCngliche Nachricht----- Von: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]=20 Gesendet: Montag, 6. April 2009 23:00 An: Carsten Schiers; xen-devel Cc: Tim.Deegan Betreff: RE: AW: RE: [Xen-devel] Huge Time went backwards Hi Carsten -- I think domain0 (and all PV domains) only use the one paravirtualized clock based on xen system time. Changing clock options such as hpet and notsc will not affect a PV domain, only an HVM domain. Dan > -----Original Message----- > From: Carsten Schiers [mailto:carsten@schiers.de] > Sent: Friday, April 03, 2009 11:11 AM > To: Dan Magenheimer; xen-devel > Cc: Tim.Deegan > Subject: AW: RE: [Xen-devel] Huge Time went backwards >=20 >=20 > > Interesting. This is reported booting dom0, correct? Are > > you running NTP in dom0? >=20 > It's in Dom0 log when powernow-k8 is loaded, which is after=20 > loading loop=20 > and prior on=20 > mounting disks. Roughly 4 seconds prior to starting ntpd, so=20 > I guess it=20 > has no interaction. >=20 > It feels a bit like that 3.3.1 and 3.4 set values differently=20 > in the CPU=20 > cores, because > when rebooting the same Xen version two or three times again,=20 > it's away.=20 > It will come back > with either Xen version when you switch version. At least it=20 > feels that=20 > way. >=20 > >> If I get on your nerves with my time keeping issues > > > >No, it is good to raise awareness of these issues until they > >are all fixed. ESPECIALLY if you see time problems in xen-unstable, > >it would be good to get them fixed before 3.4 is final. >=20 > Unfortunately, I broke my unstable. Also, it's a=20 > semi-productive family=20 > server, so slots to > Test are a bit rare. I try my best to set it up again and do more=20 > testing. >=20 > I have this Time went backwards issues from the beginning.=20 > Now I tried=20 > to set HPET to 32bit, > although I don't know whether a) BIOS is read to use 32bit instead of=20 > 64bit, or b) this makes > any difference, or c) HEPT and TSC have something in common when it=20 > comes to TSC drifts related > to power management. >=20 > I also wondered, whether I should try booting with notsc, but it's a=20 > dual core and I think then > I need TSC, or don't I? >=20 > BR, > Carsten. >=20 >=20 > -----Original Message----- > From: Carsten Schiers [mailto:carsten@schiers.de] > Sent: Friday, April 03, 2009 6:52 AM > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Huge Time went backwards >=20 >=20 > Just an observation, to whom it may concern: when booting=20 > between 3.3.1=20 > and > current xen-3.4-unstable, right after loading powernow-k8,=20 > there will be=20 > one huge > Time went backwards messages (500ms upt to 1,5s), which disappears=20 > unless > you change Xen version again. >=20 > BTW: If I get on your nerves with my time keeping issues, just drop a=20 > note and=20 > I keep calm ;-). >=20 > BR, > Carsten. >=20 >=20 >