* RE: [patch] prefer TSC over PM Timer
@ 2004-11-16 18:27 Pallipadi, Venkatesh
2004-11-17 1:50 ` dean gaudet
0 siblings, 1 reply; 10+ messages in thread
From: Pallipadi, Venkatesh @ 2004-11-16 18:27 UTC (permalink / raw)
To: john stultz, dean gaudet; +Cc: lkml
>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of john stultz
>Sent: Monday, November 15, 2004 5:38 PM
>To: dean gaudet
>Cc: lkml
>Subject: Re: [patch] prefer TSC over PM Timer
>
>On Mon, 2004-11-15 at 16:23, dean gaudet wrote:
>> i've heard other folks have independently run into this
>problem -- in fact
>> i see the most recent fc2 kernels already do this. i'd like
>this to be
>> accepted into the main kernel though.
>>
>> the x86 PM Timer is an order of magnitude slower than the TSC for
>> gettimeofday calls. i'm seeing 8%+ of the time spent doing
>gettimeofday
>> in someworkloads... and apparently kernel.org was seeing 80%
>of its time
>> go to gettimeofday during the fc3-release overload. PM
>timer is also less
>> accurate than TSC.
>>
I think trying to remove repeated inl()'s in read_pmtmr is a better
fix for this issue. As John mentioned in other thread, we should do
repeated reads only when something looks broken. Not always.
TSC counter stops couting when the CPU is in deep sleep state. It
should be OK to use tsc with Centrinos which support Enhanced Speedstep
Technology. But, it will have issues with older system that supports
Older Speedstep. So, I would say using pm_timer as default is better
as that works correctly on most of the systems.
Thanks,
Venki
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [patch] prefer TSC over PM Timer
2004-11-16 18:27 [patch] prefer TSC over PM Timer Pallipadi, Venkatesh
@ 2004-11-17 1:50 ` dean gaudet
2004-11-17 10:43 ` Mikael Pettersson
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: dean gaudet @ 2004-11-17 1:50 UTC (permalink / raw)
To: Pallipadi, Venkatesh; +Cc: john stultz, lkml
On Tue, 16 Nov 2004, Pallipadi, Venkatesh wrote:
> I think trying to remove repeated inl()'s in read_pmtmr is a better
> fix for this issue. As John mentioned in other thread, we should do
> repeated reads only when something looks broken. Not always.
that would be a nice improvement... then timer_pm will only be 3x as slow
as timer_tsc instead of 10x slower :) it's still a lot of unnecessary
overhead for many systems, and unfortunately this is a real performance
problem (albeit exaggerated by code which is overzealous in its use of
gettimeofday()).
on a tangent... has the local apic timer ever been considered? it's fixed
rate, and my measurements show it in the same performance ballpark as TSC.
i know that all p3, p-m, p4, k8 and efficeon have local APIC, but i'm not
sure if k7 (other than k7 smp parts of course) have local apics... so i'm
not sure how widespread it is compared to pm-timer.
wouldn't local apic timer be a lot better for NUMA too?
hey wait, what exactly is the problem with TSC on NUMA? don't you just
need some per-cpu data (epoch and calibration) to make it work?
-dean
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [patch] prefer TSC over PM Timer
2004-11-17 1:50 ` dean gaudet
@ 2004-11-17 10:43 ` Mikael Pettersson
2004-11-17 14:19 ` Dmitry Torokhov
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Mikael Pettersson @ 2004-11-17 10:43 UTC (permalink / raw)
To: dean gaudet; +Cc: Pallipadi, Venkatesh, john stultz, lkml
dean gaudet writes:
> On Tue, 16 Nov 2004, Pallipadi, Venkatesh wrote:
>
> > I think trying to remove repeated inl()'s in read_pmtmr is a better
> > fix for this issue. As John mentioned in other thread, we should do
> > repeated reads only when something looks broken. Not always.
>
> that would be a nice improvement... then timer_pm will only be 3x as slow
> as timer_tsc instead of 10x slower :) it's still a lot of unnecessary
> overhead for many systems, and unfortunately this is a real performance
> problem (albeit exaggerated by code which is overzealous in its use of
> gettimeofday()).
>
> on a tangent... has the local apic timer ever been considered? it's fixed
> rate, and my measurements show it in the same performance ballpark as TSC.
>
> i know that all p3, p-m, p4, k8 and efficeon have local APIC, but i'm not
> sure if k7 (other than k7 smp parts of course) have local apics... so i'm
> not sure how widespread it is compared to pm-timer.
All K7/K8s except the very first K7 Model 1 have local APICs.
There is no difference between UP and MP parts in this respect.
/Mikael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] prefer TSC over PM Timer
2004-11-17 1:50 ` dean gaudet
2004-11-17 10:43 ` Mikael Pettersson
@ 2004-11-17 14:19 ` Dmitry Torokhov
2004-11-17 15:31 ` Alan Cox
2004-11-18 2:01 ` [patch] prefer TSC over PM Timer Krzysztof Halasa
3 siblings, 0 replies; 10+ messages in thread
From: Dmitry Torokhov @ 2004-11-17 14:19 UTC (permalink / raw)
To: dean gaudet; +Cc: Pallipadi, Venkatesh, john stultz, lkml
On Tue, 16 Nov 2004 17:50:42 -0800 (PST), dean gaudet
<dean-list-linux-kernel@arctic.org> wrote:
> on a tangent... has the local apic timer ever been considered? it's fixed
> rate, and my measurements show it in the same performance ballpark as TSC.
>
At least Dell laptops will die horrible death if you enable lapic,
probably others.
--
Dmitry
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [patch] prefer TSC over PM Timer
2004-11-17 1:50 ` dean gaudet
2004-11-17 10:43 ` Mikael Pettersson
2004-11-17 14:19 ` Dmitry Torokhov
@ 2004-11-17 15:31 ` Alan Cox
2004-11-17 17:48 ` summary (Re: [patch] prefer TSC over PM Timer) dean gaudet
2004-11-18 2:01 ` [patch] prefer TSC over PM Timer Krzysztof Halasa
3 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2004-11-17 15:31 UTC (permalink / raw)
To: dean gaudet; +Cc: Pallipadi, Venkatesh, john stultz, lkml
On Mer, 2004-11-17 at 01:50, dean gaudet wrote:
> on a tangent... has the local apic timer ever been considered? it's fixed
> rate, and my measurements show it in the same performance ballpark as TSC.
>
It would certainly work for the SMP cases which are most of the "hard"
cases where TSC breaks. This seems to be a good path to me and although
C3 would fail as has been pointed out a C3 resume is sufficiently
expensive that fixing up the tsc offset on the resume from PMTMR isn't
going to kill anyone.
> hey wait, what exactly is the problem with TSC on NUMA? don't you just
> need some per-cpu data (epoch and calibration) to make it work?
You have unrelated clocks that drift over time. You can't just calibrate
them.
Its different to the BP6 for example where you at least know the CPU
clocks are fixed ratio.
^ permalink raw reply [flat|nested] 10+ messages in thread
* summary (Re: [patch] prefer TSC over PM Timer)
2004-11-17 15:31 ` Alan Cox
@ 2004-11-17 17:48 ` dean gaudet
2004-11-17 22:30 ` George Anzinger
0 siblings, 1 reply; 10+ messages in thread
From: dean gaudet @ 2004-11-17 17:48 UTC (permalink / raw)
To: john stultz, Dominik Brodowski, Pallipadi, Venkatesh, Alan Cox
Cc: linux, lkml
ok thanks everyone... i've been educated, and attempted to summarize the
situation.
if timer_pm is fixed to read the PM timer only once on non-broken systems
then it is generally the best choice. it is only at a ~3x disadvantage
compared to tsc/lapic in that case.
until/unless C3 and deeper resync tsc then it's best not to default to tsc
even on transmeta. it would require some co-ordination between timer_tsc
and ACPI code to know if C3/etc. are enabled, i don't see that
co-ordination there now. so it really does seem like adding "clock=tsc"
to boot is best left to installers/users/not-the-kernel for now.
here's my device summary:
PIT:
- many slow i/o accesses to read
- works everywhere
PM:
- minimum one slow i/o access to read
- measurements on a handful of systems show one PM timer read
costs ~3x a TSC read.
- kernel presently uses 3 reads as a bug workaround, but can be
reduced to one read.
- works on ~all hardware less than a few years old
TSC:
- fast read
- on most systems this varies with power mgmt -- and some power mgmt
occurs "behind-the-scenes" without kernel awareness
- cpufreq is better and better at tracking the changes (but not on SMP?)
- 2.6.10-rc2 disables even more behind-the-scenes power mgmt
- stops counting in C3 (solved? with PIT/PM/RTC read coming out of C3)
- drift possible across nodes in NUMA
local APIC:
- fast read (approx same as TSC)
- enabling lapic causes some dell laptops to crash
- stops counting in C3 (solvable with PIT/PM/RTC read coming out of C3)
- shared with scheduler -- easy to manage today
- can't be shared with scheduler if we add variable scheduler ticks
(can't read CCR and write ICR atomically -- potential to drift)
- local apic timer ticks are the best choice for scheduling on SMP
because it allows all the CPU schedulers to be skewed and avoid
lock conflicts.
- drift possible across nodes in NUMA?
HPET:
- at the moment i know nothing about it (none of my systems have it)
let me know if i've missed anything.
-dean
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: summary (Re: [patch] prefer TSC over PM Timer)
2004-11-17 17:48 ` summary (Re: [patch] prefer TSC over PM Timer) dean gaudet
@ 2004-11-17 22:30 ` George Anzinger
2004-11-17 23:09 ` john stultz
0 siblings, 1 reply; 10+ messages in thread
From: George Anzinger @ 2004-11-17 22:30 UTC (permalink / raw)
To: dean gaudet
Cc: john stultz, Dominik Brodowski, Pallipadi, Venkatesh, Alan Cox,
linux, lkml
dean gaudet wrote:
> ok thanks everyone... i've been educated, and attempted to summarize the
> situation.
>
> if timer_pm is fixed to read the PM timer only once on non-broken systems
> then it is generally the best choice. it is only at a ~3x disadvantage
> compared to tsc/lapic in that case.
>
> until/unless C3 and deeper resync tsc then it's best not to default to tsc
> even on transmeta. it would require some co-ordination between timer_tsc
> and ACPI code to know if C3/etc. are enabled, i don't see that
> co-ordination there now. so it really does seem like adding "clock=tsc"
> to boot is best left to installers/users/not-the-kernel for now.
>
> here's my device summary:
>
> PIT:
> - many slow i/o accesses to read
> - works everywhere
>
> PM:
> - minimum one slow i/o access to read
> - measurements on a handful of systems show one PM timer read
> costs ~3x a TSC read.
> - kernel presently uses 3 reads as a bug workaround, but can be
> reduced to one read.
> - works on ~all hardware less than a few years old
Both the PIT and PM use the same 14.3181818MHz "rock" which is chosen for time
keeping. As such the PIT & PM should be considered the "GOLD" standard for time
keeping.
>
> TSC:
> - fast read
> - on most systems this varies with power mgmt -- and some power mgmt
> occurs "behind-the-scenes" without kernel awareness
> - cpufreq is better and better at tracking the changes (but not on SMP?)
> - 2.6.10-rc2 disables even more behind-the-scenes power mgmt
> - stops counting in C3 (solved? with PIT/PM/RTC read coming out of C3)
> - drift possible across nodes in NUMA
The TSC frequency is unknown. During boot an attempt is made to calibrate it by
comparing it with the PIT. This attempt is flawed by the I/O delays in
accessing the PIT and so will be off by 5 or more counts per tick (measured on
an 800 MHZ box, and this was done after changing the calibration time to the max
PIT count, ~50ms, and attempting to pair the beginning and ending I/O
instructions so as to, as much as possible, negate the I/O delays). It is also
not driven by a time keeping "rock" and may also be varied to lower EMI
radiation (isn't time keeping interesting).
>
> local APIC:
> - fast read (approx same as TSC)
> - enabling lapic causes some dell laptops to crash
> - stops counting in C3 (solvable with PIT/PM/RTC read coming out of C3)
> - shared with scheduler -- easy to manage today
> - can't be shared with scheduler if we add variable scheduler ticks
> (can't read CCR and write ICR atomically -- potential to drift)
> - local apic timer ticks are the best choice for scheduling on SMP
> because it allows all the CPU schedulers to be skewed and avoid
> lock conflicts.
Actually doing this is problematic as it skews the timer expire time. With the
per cpu timer lists in 2.6 there is very little lock contention. I think we can
safely dismiss the lock issue.
> - drift possible across nodes in NUMA?
The APIC timer is again on a different "rock" which is not designed for time
keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.
IMHO, the best time keeping we can get in and x86 box is to:
a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
it again so the I/O access issues become mute),
b) select either the TSC (if we think it is stable) or the pm_timer to do the
short term between tick interpolation and also to detect and correct for PIT
interrupt overrun (like we missed a tick or two). We should prefer the TSC here
because of speed and that it is read every gettimeofday() access.
c) Use the PIT interrupt (followed by an IPI from the PIT interrupt handler for
SMP systems) to do the scheduler and timer list servicing. (We really do want
the timer list to be serviced as close to the jiffies++ as possible.)
d) Use the APIC timer for both finer (as in High Resolution Timers, HRT) and
courser timing (as in variable scheduler ticks, VST).
The current HRT patch (see signature) does a, b, and c. I am currently working
on d.
>
> HPET:
> - at the moment i know nothing about it (none of my systems have it)
Well, we do know that it is in I/O space and all that that implies...
>
> let me know if i've missed anything.
>
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: summary (Re: [patch] prefer TSC over PM Timer)
2004-11-17 22:30 ` George Anzinger
@ 2004-11-17 23:09 ` john stultz
2004-11-17 23:24 ` George Anzinger
0 siblings, 1 reply; 10+ messages in thread
From: john stultz @ 2004-11-17 23:09 UTC (permalink / raw)
To: george
Cc: dean gaudet, Dominik Brodowski, Pallipadi, Venkatesh, Alan Cox,
linux, lkml
On Wed, 2004-11-17 at 14:30 -0800, George Anzinger wrote:
> The APIC timer is again on a different "rock" which is not designed for time
> keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.
>
> IMHO, the best time keeping we can get in and x86 box is to:
>
> a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
> it again so the I/O access issues become mute),
>
> b) select either the TSC (if we think it is stable) or the pm_timer to do the
> short term between tick interpolation and also to detect and correct for PIT
> interrupt overrun (like we missed a tick or two). We should prefer the TSC here
> because of speed and that it is read every gettimeofday() access.
My only qualm here is that using the TSC to interpolate between timer
ticks allows for time inconsistencies. If the TSC isn't cumulatively
accurate, then when used in between ticks it will cause minor
inaccuracies and possible inconsistencies. I'd instead prefer picking a
single time source, and using NTP to correct for drift or inaccurate
calibration.
Also breaking time subsystem from requiring regular periodic ticks
allows for tickless systems and additional power management savings. But
this should be saved for another thread.
thanks
-john
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: summary (Re: [patch] prefer TSC over PM Timer)
2004-11-17 23:09 ` john stultz
@ 2004-11-17 23:24 ` George Anzinger
0 siblings, 0 replies; 10+ messages in thread
From: George Anzinger @ 2004-11-17 23:24 UTC (permalink / raw)
To: john stultz
Cc: dean gaudet, Dominik Brodowski, Pallipadi, Venkatesh, Alan Cox,
linux, lkml
john stultz wrote:
> On Wed, 2004-11-17 at 14:30 -0800, George Anzinger wrote:
>
>>The APIC timer is again on a different "rock" which is not designed for time
>>keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.
>>
>>IMHO, the best time keeping we can get in and x86 box is to:
>>
>>a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
>>it again so the I/O access issues become mute),
>>
>>b) select either the TSC (if we think it is stable) or the pm_timer to do the
>>short term between tick interpolation and also to detect and correct for PIT
>>interrupt overrun (like we missed a tick or two). We should prefer the TSC here
>>because of speed and that it is read every gettimeofday() access.
>
>
> My only qualm here is that using the TSC to interpolate between timer
> ticks allows for time inconsistencies. If the TSC isn't cumulatively
> accurate, then when used in between ticks it will cause minor
> inaccuracies and possible inconsistencies. I'd instead prefer picking a
> single time source, and using NTP to correct for drift or inaccurate
> calibration.
I think the inconstistancies are of the order of micro seconds and so will not
really show. Not all systems are connected to an NTP server. One possibility
is to build in an ntp like thing that averages out the PIT ticks and refines the
TSC count per tick thing over a much longer period. This would drive the errors
way down into the noise and it still honors the notion of the PIT being the
STANDARD for time.
>
> Also breaking time subsystem from requiring regular periodic ticks
> allows for tickless systems and additional power management savings. But
> this should be saved for another thread.
Amen!
>
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] prefer TSC over PM Timer
2004-11-17 1:50 ` dean gaudet
` (2 preceding siblings ...)
2004-11-17 15:31 ` Alan Cox
@ 2004-11-18 2:01 ` Krzysztof Halasa
3 siblings, 0 replies; 10+ messages in thread
From: Krzysztof Halasa @ 2004-11-18 2:01 UTC (permalink / raw)
To: dean gaudet; +Cc: Pallipadi, Venkatesh, john stultz, lkml
dean gaudet <dean-list-linux-kernel@arctic.org> writes:
> i know that all p3, p-m, p4, k8 and efficeon have local APIC,
Some Celeron P3s (the one in my notebook for example) have no L-APIC:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Celeron (Coppermine)
stepping : 6
cpu MHz : 597.367
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips : 1179.64
--
Krzysztof Halasa
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-11-18 2:10 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-16 18:27 [patch] prefer TSC over PM Timer Pallipadi, Venkatesh
2004-11-17 1:50 ` dean gaudet
2004-11-17 10:43 ` Mikael Pettersson
2004-11-17 14:19 ` Dmitry Torokhov
2004-11-17 15:31 ` Alan Cox
2004-11-17 17:48 ` summary (Re: [patch] prefer TSC over PM Timer) dean gaudet
2004-11-17 22:30 ` George Anzinger
2004-11-17 23:09 ` john stultz
2004-11-17 23:24 ` George Anzinger
2004-11-18 2:01 ` [patch] prefer TSC over PM Timer Krzysztof Halasa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox