public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* False "lost ticks" on dual-Opteron system (=> timer twice as fast)
@ 2005-05-08 12:45 Bernd Paysan
  2005-05-08 13:40 ` [suse-amd64] " Andi Kleen
  2005-05-21 19:42 ` Hendrik Visage
  0 siblings, 2 replies; 19+ messages in thread
From: Bernd Paysan @ 2005-05-08 12:45 UTC (permalink / raw)
  To: suse-amd64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2090 bytes --]

Hi,

I've recently set up a dual Opteron RAID server (AMD-8000-based Tyan 
Thunder K8S Pro SCSI board, 2 246 Opterons, stepping 10). Kernel is a 
modified 2.6.11.4-20a from SuSE 9.3 (SMP version, sure). The Opterons 
are capable of changing the CPU frequency (between 1GHz and 2GHz).

The system clock runs (on average) about twice as fast as it should be. 
A closer observation revealed that the clock jumps forward by about 
10-30 seconds every 10-30 seconds (plus other oddities, including 
backward clock jumps). The timer interrupts are distributed roughly 
evenly among the two CPUs, but looking at the timer interrupt number 
(grep timer /proc/interrupts) revealed that for about 10-30 seconds, 
one CPU gets the interrupt, and then the other CPU gets them; the 
transition causes the system clock to advance.

A quick look at timer_interrupt shows what I suspect is the culprit: 
Each CPU keeps track of the last TSC at a timer interrupt, and adds the 
"lost" ticks to jiffies when perceived necessary. If there's only a 
single jiffies, but two vxtime.last_tsc, it can't work.

A quick workaround would be to ditch the handling of the "lost" jiffies. 
I still anticipate to have annoying time skews by do_gettimeoffset() 
(that's what explains the other oddities - if I do gettimeofday() on 
the CPU that isn't getting interrupts, I'll going to add the "lost" 
jiffies, too). A proposed fix would be to *also* store the last jiffies 
value in the vxtime variable, and verify if it's really *this* CPU that 
did miss the timer interrupts. This local "last-stored-jiffies" can 
help do_gettimeoffset() to calculate the local time good enough on both 
CPUs.

What I can't believe is that I'm the only one who has this problem.

<rant>I know the timer system on an Intel or AMD system is broken by 
design, because there should be a single constant-clocked atomically 
read-only system-wide timer. But this is no excuse for that ;-).</rant>

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-05-25 17:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-08 12:45 False "lost ticks" on dual-Opteron system (=> timer twice as fast) Bernd Paysan
2005-05-08 13:40 ` [suse-amd64] " Andi Kleen
2005-05-08 16:22   ` Bernd Paysan
2005-05-09 10:53   ` Bernd Paysan
2005-05-09 13:17     ` Bernd Paysan
2005-05-10 10:53       ` Ed Tomlinson
2005-05-10 13:32         ` Andi Kleen
2005-05-10 11:12       ` Andi Kleen
2005-05-10 11:36         ` Bernd Paysan
2005-05-10 11:54         ` Bernd Paysan
2005-05-10 13:07           ` Andi Kleen
2005-05-10 13:15             ` Bernd Paysan
2005-05-10 13:21               ` Andi Kleen
2005-05-10 13:39                 ` Arjan van de Ven
2005-05-21 19:42 ` Hendrik Visage
2005-05-21 20:54   ` Scott Robert Ladd
     [not found]   ` <428F9FA6.1000800@coyotegulch.com>
     [not found]     ` <d93f04c70505211500216d8614@mail.gmail.com>
2005-05-23 11:50       ` Scott Robert Ladd
2005-05-23 23:04         ` Hendrik Visage
2005-05-25 17:06           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox