* Request for feedback on Generic Timeofday Subsystem (B20)
@ 2006-03-06 18:01 john stultz
0 siblings, 0 replies; 11+ messages in thread
From: john stultz @ 2006-03-06 18:01 UTC (permalink / raw)
To: linux-arch
I meant to CC this list when I sent the original mail to lkml last
Friday, but was eager to leave the office and forgot. Ralf Baechle
reminded me that it would be a good idea to get the right folks
attention, so while its a bit late, below is the announcement email I
sent to lkml, where the rest of the patch set can be found.
thanks
-john
On Fri, 2006-03-03 at 13:15 -0800, Andrew Morton wrote:
> So we can do this, but the question is do we _want_ to do it? If the arch
> maintainers can look at it from a high level and say "yup, I can use that
> and it'll improve/simplify/speedup/reduce my code" then yes, it's worth
> making the effort.
So, with 2.6.16 closing out, its time to decide if the
following generic timekeeping code is ready to go into 2.6.17. I know
this is pretty dull code, however if any of you arch maintainers could
give it a quick once-over and let Andrew and me know if you have any
positive or negative reaction to this code, I would really appreciate
it.
(To avoid spamming important and busy arch maintainers, I've only CC'ed
them on this announcement. The full patchset can be found on lkml.)
Please note, that there are still improvements to be made. Roman, for
instance, has made some good suggestions that I'm working to implement.
However, I feel the code as it stands _is_ ready for inclusion, as it
resolves a number of issues that users are seeing, and has been in the
-rt and -mm trees for some time with few problem reports.
Summary:
This patchset provides a generic timekeeping subsystem that is
independent of the timer interrupt. This allows for robust and correct
behavior in cases of late or lost ticks, avoids interpolation errors,
reduces duplication in arch specific code, and allows or assists future
changes such as high-res timers, dynamic ticks, or realtime preemption.
Additionally, it provides finer nanosecond resolution values to the
clock_gettime functions.
Why do we need this?
Correctness: With the current tick-based interpolated timekeeping used
in most architectures, there is the possibility for time inconsistencies
caused by NTP adjustment, less then perfect calibration of the highres
time hardware, or late or missed ticks. Over time, a number of fixes
have been added trying to minimize the effect of these errors, but as
hardware gets faster, the observable window for time inconsistencies
increases and the fixes are less effective. Further, correct
timekeeping becomes even more difficult with future features like
dynamic ticks or with realtime preemption when using the existing
timekeeping code.
Consolidation: We have quite a bit of duplication between the arches in
their timekeeping code. Many have chosen slightly different paths which
can make it difficult to resolve bugs or enable new features that could
be more generic. Additionally since the interaction between arch
generic and arch specific code is not well specified, the code has
become quite fragile as timekeeping fixes may or may not affect other
arches and may not affect them in the same way.
Its a bit outdated, but my 2005 OLS presentation (basically more of the
same, but with a few pictures) on this code can be found here:
http://sr71.net/~jstultz/tod/ols-presentation-final.pdf
Overview of the code:
This patchset that I'm submitting breaks up into a few basic chunks:
1) Minimal rework of the NTP code to isolate it so that it does not
directly change the timekeeping variables as a side effect. Provides
accessors to the NTP state machine that allows the timekeeping core to
modify time accordingly.
2) Introduces the clocksource abstraction, which provides a generic
method to register, select and use hardware specific clocksource
drivers.
3) Introduces the generic timekeeping core that uses the clocksource
infrastructure instead of the timer interrupt to maintain time and
provides generic accessors to the system time and wall time. Also uses
the NTP changes to scale time smoothly between ticks.
4) Patches to convert the i386 arch to use the timekeeping core
5) Finally a patch to provide clocksource drivers for i386 hardware
I also have patches that provide #4 and #5 for powerpc and x86-64,
however they are not ready for submission, but you can find them at the
link below.
Changes since the B19 release:
o Synced w/ changes from -mm
o jiffies/jiffies_64 and related locking fix (from Atsushi Nemoto)
o Dropped acpi_pm_buggy paranoia (Suggested by Adrian Bunk)
o Re-remove duplicate leapsecond code which accidentally got dropped.
Outstanding issues:
o No reported problem from testing in -mm
Still on my TODO list:
o Possible merge with Roman's NTP changes
o Continue working on improved ntp error accounting patch
o Squish any bugs that pop up from testing
o Clean and split up x86-64 and powerpc patches
o ppc, s390, arm, ia64, alpha, sparc, sparc64, etc work
The patchset applies against the current 2.6.16-rc5-git.
The complete patchset (including unsubmitted powerpc and x86-64 code)
can be found here:
http://sr71.net/~jstultz/tod/
I'd like to thank the following people who have contributed ideas,
criticism, testing and code that has helped shape this work:
George Anzinger, Nish Aravamudan, Max Asbock, Serge Belyshev,
Dominik Brodowski, Adrian Bunk, Thomas Gleixner, Darren Hart, Christoph
Lameter, Matt Mackal, Keith Mannthey, Ingo Molnar, Andrew Morton, Paul
Munt, Martin Schwidefsky, Frank Sorenson, Ulrich Windl, Jonathan
Woithe, Darrick Wong, Roman Zippel and any others whom I've
accidentally left off this list.
thanks
-john
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Request for feedback on Generic Timeofday Subsystem (B20)
@ 2006-03-06 20:15 Luck, Tony
2006-03-07 1:35 ` john stultz
0 siblings, 1 reply; 11+ messages in thread
From: Luck, Tony @ 2006-03-06 20:15 UTC (permalink / raw)
To: john stultz, linux-arch
> Why do we need this?
>
> Correctness: ...
> Consolidation: ...
Any performance measurements? ia64 has gone to some lengths to
ensure that gettimeofday(2) has a very low overhead (it does
show up in the kernel profiles of some benchmarks ... managed
runtime systems in particular, but others like to get timestamps
at quite frighteningly short intervals). So I'm interested in
smp cases where are moderate number of cpus (4-16) are pounding
on gettimeofday(). I think that the huge SMP systems running
HPC workloads spend less of their time asking what time it is,
so I'm not as worried about the 512 cpu ... but if all 512 cpus
do happen to call gettimeofday() at the same time the system
shouldn't sink into the swamp as cache-lines bounce around.
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-06 20:15 Request for feedback on Generic Timeofday Subsystem (B20) Luck, Tony
@ 2006-03-07 1:35 ` john stultz
2006-03-07 19:06 ` Luck, Tony
2006-03-10 1:15 ` john stultz
0 siblings, 2 replies; 11+ messages in thread
From: john stultz @ 2006-03-07 1:35 UTC (permalink / raw)
To: Luck, Tony; +Cc: linux-arch
On Mon, 2006-03-06 at 12:15 -0800, Luck, Tony wrote:
> > Why do we need this?
> >
> > Correctness: ...
> > Consolidation: ...
>
> Any performance measurements? ia64 has gone to some lengths to
> ensure that gettimeofday(2) has a very low overhead (it does
> show up in the kernel profiles of some benchmarks ... managed
> runtime systems in particular, but others like to get timestamps
> at quite frighteningly short intervals). So I'm interested in
> smp cases where are moderate number of cpus (4-16) are pounding
> on gettimeofday(). I think that the huge SMP systems running
> HPC workloads spend less of their time asking what time it is,
> so I'm not as worried about the 512 cpu ... but if all 512 cpus
> do happen to call gettimeofday() at the same time the system
> shouldn't sink into the swamp as cache-lines bounce around.
This is a good question, as I haven't run much in the way of performance
measurements recently. More below on that.
I'd be interested in hearing more about specifically what ia64 has done
(reading the fsyscall asm is not my idea of fun :), but I'd like to
think that generally it shouldn't be an issue. First of all, the
infrastructure is there for arch specific optimizations like
VDSO/vsyscall implementations (or fsyscall on ia64). In fact, it makes
it much easier to implement the vsyscall feature on arches that do not
yet support it (I have a patch for i386 that was pretty straight
forward, although I still need to sort out the unwind bits for gdb). And
secondly, larger SMP systems (although PPC and SPARC are exceptions)
tend to require use of slower mmapped clocksources which tend to
dominate the gettimeofday() usage.
But part of the issue is that there are a number of arches that have
done their own arch specific optimizations that are for the most part,
generically applicable. This goes back to the consolidation point. My
hope is to join these divergent efforts to the benefit of all. This may
sound a bit naive and wide-eyed, and I'm fine allowing for arch specific
optimizations where they are necessary, but really, what are we doing in
almost all cases? Reading some hardware, converting it to nanoseconds
and adding it to a base value, all under a seq_read_lock. We don't need
a dozen implementations of this, and it makes other features harder to
implement because we don't know things like: Which arches will function
if we disable interrupts for a bit? Or what is the hardware level
resolution of clock_gettime()?
But back to the actual performance numbers:
The last time I generated numbers for i386 the patch hit gettimeofday()
by ~2%, which was the worst case I could generate using the clocksource
with the lowest overhead (TSC). Most of this was due to some extra u64
usage and the lack of a generic mul_u64xu32 wrapper. However for this
cost, you get correct behavior (which I think is *much* more important,
at least for i386) and nanosecond resolution in clock_gettime().
Currently I suspect the impact its a bit worse with the patches in -mm,
since cycle_t was set back to a u64 to be extra robust in the case of 2
seconds of lost ticks. That's more of a -RT tree concern, so I'd be fine
setting that back to a unsigned long for mainline.
And while its not gettimeofday(), I'm working with Roman on further
reducing the overhead in the periodic_hook() accumulation function, so
there is less periodic overhead as well.
I'll try to generate some more numbers and put some more focus on the
performance aspect of the current patches. If anyone does get a chance
to look the code over in detail, please do point out any specific
concerns or ideas for improvements.
thanks
-john
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-07 1:35 ` john stultz
@ 2006-03-07 19:06 ` Luck, Tony
2006-03-07 19:37 ` john stultz
2006-03-10 1:15 ` john stultz
1 sibling, 1 reply; 11+ messages in thread
From: Luck, Tony @ 2006-03-07 19:06 UTC (permalink / raw)
To: john stultz; +Cc: linux-arch
John,
Thanks for the detailed reply.
On Mon, Mar 06, 2006 at 05:35:46PM -0800, john stultz wrote:
> I'd be interested in hearing more about specifically what ia64 has done
> (reading the fsyscall asm is not my idea of fun :)
It is at least heavily commented asm ... but yes, perhaps not the
most fun way to spend your time. Essentially it just hooks into
the generic time interpolator code (but without actually calling
any of those functions due to restrictions on what fsyscall code
can do).
> hope is to join these divergent efforts to the benefit of all. This may
> sound a bit naive and wide-eyed, and I'm fine allowing for arch specific
> optimizations where they are necessary, but really, what are we doing in
> almost all cases? Reading some hardware, converting it to nanoseconds
> and adding it to a base value, all under a seq_read_lock.
Yes, that's pretty much what happens ... the only real complication
is that different h/w might require diferent access methods to read it.
> We don't need
> a dozen implementations of this, and it makes other features harder to
> implement because we don't know things like: Which arches will function
> if we disable interrupts for a bit? Or what is the hardware level
> resolution of clock_gettime()?
The fly in the ointment here is the restrictions on what can be done
inside the fsyscall handler (see Documentation/ia64/fsys.txt for the
gory details ... but the short form is that function calls are out,
so everything needs to be done, carefully, in assembler).
> The last time I generated numbers for i386 the patch hit gettimeofday()
> by ~2%, which was the worst case I could generate using the clocksource
> with the lowest overhead (TSC). Most of this was due to some extra u64
> usage and the lack of a generic mul_u64xu32 wrapper. However for this
> cost, you get correct behavior (which I think is *much* more important,
> at least for i386) and nanosecond resolution in clock_gettime().
Fast is indeed no use if the answer is wrong. Performance of this
on ia64 would be totally dependent on whether we can hook into your
framework while staying within the restrictions of fsyscall code.
> Currently I suspect the impact its a bit worse with the patches in -mm,
> since cycle_t was set back to a u64 to be extra robust in the case of 2
> seconds of lost ticks. That's more of a -RT tree concern, so I'd be fine
> setting that back to a unsigned long for mainline.
With Xen (and other) virtualized environments, you may need to keep that
capability to handle 2 seconds of lost ticks. I haven't seen any upper
bound for how long the hypervisor may starve a guest OS!
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-07 19:06 ` Luck, Tony
@ 2006-03-07 19:37 ` john stultz
2006-03-07 22:40 ` Luck, Tony
0 siblings, 1 reply; 11+ messages in thread
From: john stultz @ 2006-03-07 19:37 UTC (permalink / raw)
To: Luck, Tony; +Cc: linux-arch
On Tue, 2006-03-07 at 11:06 -0800, Luck, Tony wrote:
> John,
>
> Thanks for the detailed reply.
>
> On Mon, Mar 06, 2006 at 05:35:46PM -0800, john stultz wrote:
> > I'd be interested in hearing more about specifically what ia64 has done
> > (reading the fsyscall asm is not my idea of fun :)
>
> It is at least heavily commented asm ... but yes, perhaps not the
> most fun way to spend your time. Essentially it just hooks into
> the generic time interpolator code (but without actually calling
> any of those functions due to restrictions on what fsyscall code
> can do).
>
> > hope is to join these divergent efforts to the benefit of all. This may
> > sound a bit naive and wide-eyed, and I'm fine allowing for arch specific
> > optimizations where they are necessary, but really, what are we doing in
> > almost all cases? Reading some hardware, converting it to nanoseconds
> > and adding it to a base value, all under a seq_read_lock.
>
> Yes, that's pretty much what happens ... the only real complication
> is that different h/w might require diferent access methods to read it.
Yes, and the clocksource abstraction provides the generic method to
access the hardware.
> > We don't need
> > a dozen implementations of this, and it makes other features harder to
> > implement because we don't know things like: Which arches will function
> > if we disable interrupts for a bit? Or what is the hardware level
> > resolution of clock_gettime()?
>
> The fly in the ointment here is the restrictions on what can be done
> inside the fsyscall handler (see Documentation/ia64/fsys.txt for the
> gory details ... but the short form is that function calls are out,
> so everything needs to be done, carefully, in assembler).
So this was discussed at length early in the design phase w/ Christoph
Lameter as a result the clocksource abstraction is intentionally similar
to the time_interpolator structure. However, Ingo did not like exposing
the access type and hardware pointers (which would allow the limited
fsyscall asm code to access the hardware) inside the clocksource
structure, so they were removed.
While if its a deal breaker, I'm ok with adding those raw access info
back into the structure, I'd first ask why ia64 must use this very
constrained fsyscall method instead of something more flexible where it
doesn't have to be written in asm like vsyscall/VDSO which x86-64 and
powerpc use?
I don't know exactly the details of the fsyscall feature, but since
vsyscalls are done completely in userspace, it might even be more
efficient. Though let me know if that would not be the case.
> > The last time I generated numbers for i386 the patch hit gettimeofday()
> > by ~2%, which was the worst case I could generate using the clocksource
> > with the lowest overhead (TSC). Most of this was due to some extra u64
> > usage and the lack of a generic mul_u64xu32 wrapper. However for this
> > cost, you get correct behavior (which I think is *much* more important,
> > at least for i386) and nanosecond resolution in clock_gettime().
>
> Fast is indeed no use if the answer is wrong. Performance of this
> on ia64 would be totally dependent on whether we can hook into your
> framework while staying within the restrictions of fsyscall code.
>
> > Currently I suspect the impact its a bit worse with the patches in -mm,
> > since cycle_t was set back to a u64 to be extra robust in the case of 2
> > seconds of lost ticks. That's more of a -RT tree concern, so I'd be fine
> > setting that back to a unsigned long for mainline.
>
> With Xen (and other) virtualized environments, you may need to keep that
> capability to handle 2 seconds of lost ticks. I haven't seen any upper
> bound for how long the hypervisor may starve a guest OS!
Well, I know some clock sources like the ACPI PM timer wrap every 5
seconds, so in that case Xen guests might need some lower frequency
virtualized clocksource driver which wouldn't be too hard to implement
and would keep any complications out of the common code.
thanks
-john
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-07 19:37 ` john stultz
@ 2006-03-07 22:40 ` Luck, Tony
2006-03-07 23:01 ` john stultz
0 siblings, 1 reply; 11+ messages in thread
From: Luck, Tony @ 2006-03-07 22:40 UTC (permalink / raw)
To: john stultz; +Cc: linux-arch
> > The fly in the ointment here is the restrictions on what can be done
> > inside the fsyscall handler (see Documentation/ia64/fsys.txt for the
> > gory details ... but the short form is that function calls are out,
> > so everything needs to be done, carefully, in assembler).
>
> So this was discussed at length early in the design phase w/ Christoph
> Lameter as a result the clocksource abstraction is intentionally similar
> to the time_interpolator structure. However, Ingo did not like exposing
> the access type and hardware pointers (which would allow the limited
> fsyscall asm code to access the hardware) inside the clocksource
> structure, so they were removed.
>
> While if its a deal breaker, I'm ok with adding those raw access info
> back into the structure, I'd first ask why ia64 must use this very
> constrained fsyscall method instead of something more flexible where it
> doesn't have to be written in asm like vsyscall/VDSO which x86-64 and
> powerpc use?
>
> I don't know exactly the details of the fsyscall feature, but since
> vsyscalls are done completely in userspace, it might even be more
> efficient. Though let me know if that would not be the case.
fsyscall on ia64 makes use of the "epc" (enter privileged code) instruction
which (when placed in a page with a magic TLB attribute) will increase
privilege level without the overhead of a trap. All the restrictions
on what can be done are there to keep the overhead as low as possible
(a "fast" system call path should after all be fast). Some numbers
on haw fast: cold cache call to gettimeofday() ~700 cycles, hot cache
case ~160 cycles (on a 1.7GHz system).
I can't quite see how gettimeofday() can be correctly implemented
purely in userspace on a system where there is jitter in the clock source,
but I'm clueless about how vsyscall/VDSO works.
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-07 22:40 ` Luck, Tony
@ 2006-03-07 23:01 ` john stultz
2006-03-08 21:33 ` Luck, Tony
0 siblings, 1 reply; 11+ messages in thread
From: john stultz @ 2006-03-07 23:01 UTC (permalink / raw)
To: Luck, Tony; +Cc: linux-arch
On Tue, 2006-03-07 at 14:40 -0800, Luck, Tony wrote:
> > While if its a deal breaker, I'm ok with adding those raw access info
> > back into the structure, I'd first ask why ia64 must use this very
> > constrained fsyscall method instead of something more flexible where it
> > doesn't have to be written in asm like vsyscall/VDSO which x86-64 and
> > powerpc use?
>
> I can't quite see how gettimeofday() can be correctly implemented
> purely in userspace on a system where there is jitter in the clock source,
> but I'm clueless about how vsyscall/VDSO works.
You are right there. The jitter handling (if I recall, basically a
cmpxchg w/ the last read cycle value to be sure the clocksource doesn't
go backward) wouldn't be doable in userspace, but it seems that would
already be a pretty bad hit on performance. Is it not? And how many
systems actually use unsycned/jittery ITCs instead of alternative mmioed
clocksources?
Regardless, if its really a blocking issue, I'm not opposed to putting
the direct access methods back into the structure, or going with an
alternative solution to make these bits doable. Ingo might have a better
idea for this as well.
Do you have any other issues or questions?
thanks
-john
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-07 23:01 ` john stultz
@ 2006-03-08 21:33 ` Luck, Tony
0 siblings, 0 replies; 11+ messages in thread
From: Luck, Tony @ 2006-03-08 21:33 UTC (permalink / raw)
To: john stultz; +Cc: linux-arch
On Tue, Mar 07, 2006 at 03:01:58PM -0800, john stultz wrote:
> You are right there. The jitter handling (if I recall, basically a
> cmpxchg w/ the last read cycle value to be sure the clocksource doesn't
> go backward) wouldn't be doable in userspace, but it seems that would
> already be a pretty bad hit on performance. Is it not? And how many
> systems actually use unsycned/jittery ITCs instead of alternative mmioed
> clocksources?
Yes the cmpxchg can be painful on a large NUMA ... so those systems tend
to use a non-jittery source. Smaller machines may not have access (as
BIOS may not describe where the HPET is), and there is also a tradeoff
since reading the "ar.itc" register is much faster[1] than reading some
bus-based clock ... so the cmpxchg may not hurt too badly if you only
have a few cpus.
> Regardless, if its really a blocking issue, I'm not opposed to putting
> the direct access methods back into the structure, or going with an
> alternative solution to make these bits doable. Ingo might have a better
> idea for this as well.
I'm always open to a "better idea" ... but if one of those fails to show up
then I'd like to have the direct access methods.
> Do you have any other issues or questions?
Not at this time.
-Tony
[1] Ok, "is less slow than" (reading ar.itc isn't "fast" either).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-10 1:15 ` john stultz
@ 2006-03-09 18:18 ` Andi Kleen
2006-03-18 0:35 ` john stultz
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2006-03-09 18:18 UTC (permalink / raw)
To: john stultz; +Cc: Luck, Tony, linux-arch
On Friday 10 March 2006 02:15, john stultz wrote:
> On an x86-64 AMD server using nopmtimer/clocksource=tsc:
> mainline vs TOD
> gettimeofday(): 74.2%
> clock_gettime(CLOCK_MONOTONIC): 77.4%
> clock_gettime(CLOCK_REALTIME): 71.6%
>
> Hmmmm. I'm heading out of town for the weekend in a few moments, and I'd
> really like to re-verify those numbers, but yea, that's a 25%
> improvement. Might be too good to be true, but that should get Andi's
> attention :)
What is clocksource=tsc? It doesn't exist on 64bit kernels. And what are the absolute
numbers?
And clock_gettime uses a completely different path from gettimeofday so if they
have the same percentage your results look somewhat suspicious.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-07 1:35 ` john stultz
2006-03-07 19:06 ` Luck, Tony
@ 2006-03-10 1:15 ` john stultz
2006-03-09 18:18 ` Andi Kleen
1 sibling, 1 reply; 11+ messages in thread
From: john stultz @ 2006-03-10 1:15 UTC (permalink / raw)
To: Luck, Tony; +Cc: Andi Kleen, linux-arch
On Mon, 2006-03-06 at 17:35 -0800, john stultz wrote:
> The last time I generated numbers for i386 the patch hit gettimeofday()
> by ~2%, which was the worst case I could generate using the clocksource
> with the lowest overhead (TSC). Most of this was due to some extra u64
> usage and the lack of a generic mul_u64xu32 wrapper. However for this
> cost, you get correct behavior (which I think is *much* more important,
> at least for i386) and nanosecond resolution in clock_gettime().
Ok, so here are some rough numbers calling the various time interfaces
all using the fastest clocksource available (TSC) for timekeeping to
give the best code vs code comparison.
On my p4m laptop w/ idle=poll and clocksource/clock=tsc
mainline vs TOD
gettimeofday(): 106.5%
clock_gettime(CLOCK_MONOTONIC): 99.1%
clock_gettime(CLOCK_REALTIME): 100.1%
So, 6% hit on gtod, the trade off being more robust timekeeping and you
get nanosecond res from clock_gettime for free.
Again, on my laptop w/ idle=poll and clocksource/clock=tsc + minor
performance tweaks of moving cycle_t back to a unsigned long.
mainline vs TOD
gettimeofday(): 103.3%
clock_gettime(CLOCK_MONOTONIC): 98.2%
clock_gettime(CLOCK_REALTIME): 99.0%
Getting closer to calling it a wash.
On an x86-64 AMD server using nopmtimer/clocksource=tsc:
mainline vs TOD
gettimeofday(): 74.2%
clock_gettime(CLOCK_MONOTONIC): 77.4%
clock_gettime(CLOCK_REALTIME): 71.6%
Hmmmm. I'm heading out of town for the weekend in a few moments, and I'd
really like to re-verify those numbers, but yea, that's a 25%
improvement. Might be too good to be true, but that should get Andi's
attention :)
thanks
-john
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Request for feedback on Generic Timeofday Subsystem (B20)
2006-03-09 18:18 ` Andi Kleen
@ 2006-03-18 0:35 ` john stultz
0 siblings, 0 replies; 11+ messages in thread
From: john stultz @ 2006-03-18 0:35 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-arch
On Thu, 2006-03-09 at 19:18 +0100, Andi Kleen wrote:
> On Friday 10 March 2006 02:15, john stultz wrote:
>
> > On an x86-64 AMD server using nopmtimer/clocksource=tsc:
> > mainline vs TOD
> > gettimeofday(): 74.2%
> > clock_gettime(CLOCK_MONOTONIC): 77.4%
> > clock_gettime(CLOCK_REALTIME): 71.6%
> >
> > Hmmmm. I'm heading out of town for the weekend in a few moments, and I'd
> > really like to re-verify those numbers, but yea, that's a 25%
> > improvement. Might be too good to be true, but that should get Andi's
> > attention :)
>
> What is clocksource=tsc? It doesn't exist on 64bit kernels.
Its part of the new clocksource infrastructure, which is arch generic.
> And clock_gettime uses a completely different path from gettimeofday so if they
> have the same percentage your results look somewhat suspicious.
Yea, It was too good to be true. I went back to confirm the numbers and
realized there were config differences. Using the same config didn't
give such interesting results, but I haven't had a chance to look into
it any closer. I've been busy reworking the patches (watch for them
later tonight) to try to achieve a more evolutionary approach as you
suggested. Once I get x86-64 back up and running I'll get some more hard
numbers for you to look at.
Sorry for the sensationalism. :)
-john
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-03-18 0:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-06 20:15 Request for feedback on Generic Timeofday Subsystem (B20) Luck, Tony
2006-03-07 1:35 ` john stultz
2006-03-07 19:06 ` Luck, Tony
2006-03-07 19:37 ` john stultz
2006-03-07 22:40 ` Luck, Tony
2006-03-07 23:01 ` john stultz
2006-03-08 21:33 ` Luck, Tony
2006-03-10 1:15 ` john stultz
2006-03-09 18:18 ` Andi Kleen
2006-03-18 0:35 ` john stultz
-- strict thread matches above, loose matches on Subject: below --
2006-03-06 18:01 john stultz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox