* Re: do_gettimeofday vs. rdtsc in the scheduler [not found] ` <20020917.133933.69057655.davem@redhat.com.suse.lists.linux.kernel> @ 2002-09-17 21:00 ` Andi Kleen 2002-09-17 20:54 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2002-09-17 21:00 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel, johnstul, anton.wilson "David S. Miller" <davem@redhat.com> writes: > From: john stultz <johnstul@us.ibm.com> > Date: 17 Sep 2002 13:29:18 -0700 > > Some NUMA boxes do not have synced TSC, so on those systems your > code won't work. > > It would have been really nice if x86 had specified a "system tick" > register that incremented based upon the system bus cycles and thus > were immune the processor rates. It has - the local APIC timer. It has a tick register too that you can read. Unfortunately it's buggy/unreliable on many systems. Linux uses it for task scheduling and the local timer interrupt when it works, but it's not really good enough for gettimeofday. Microsoft/Intel have specified the HPET timer as replacement, but it is still missing in many chipsets and buggy in others. Also reading HPET is somewhat more costly than reading TSCs because it goes to the southbridge, so there are cases where using TSC is probably better (e.g. I think for networking packet time stamping the TSC is just fine with all its limitations) > I foresee lots of patches coming which basically are "how does this > x86 system provide a stable synchronized tick source". >From those who didn't implement HPET but some own spec like IBM. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 21:00 ` do_gettimeofday vs. rdtsc in the scheduler Andi Kleen @ 2002-09-17 20:54 ` David S. Miller 2002-09-17 21:28 ` Alan Cox 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 20:54 UTC (permalink / raw) To: ak; +Cc: linux-kernel, johnstul, anton.wilson From: Andi Kleen <ak@suse.de> Date: 17 Sep 2002 23:00:38 +0200 Also reading HPET is somewhat more costly than reading TSCs because it goes to the southbridge, so there are cases where using TSC is probably better (e.g. I think for networking packet time stamping the TSC is just fine with all its limitations) The cpu gets a bus clock input, so the system tick should be processor local as much as TSC is. It's boggling that this is being messed up so much. I can't believe Sun got something incredibly right (Ultra-III has a system tick) :-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 20:54 ` David S. Miller @ 2002-09-17 21:28 ` Alan Cox 2002-09-17 21:18 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: Alan Cox @ 2002-09-17 21:28 UTC (permalink / raw) To: David S. Miller; +Cc: ak, linux-kernel, johnstul, anton.wilson On Tue, 2002-09-17 at 21:54, David S. Miller wrote: > The cpu gets a bus clock input, so the system tick should be processor > local as much as TSC is. > > It's boggling that this is being messed up so much. I can't believe > Sun got something incredibly right (Ultra-III has a system tick) :-) A bus clock - but things like the x440 have more than one bus clock. Its NUMA. Also the bus clock and rdtsc clock are different - rdtsc is dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a BP6 board with tsc on and enjoy ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 21:28 ` Alan Cox @ 2002-09-17 21:18 ` David S. Miller 2002-09-17 22:02 ` James Cleverdon 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 21:18 UTC (permalink / raw) To: alan; +Cc: ak, linux-kernel, johnstul, anton.wilson From: Alan Cox <alan@lxorguk.ukuu.org.uk> Date: 17 Sep 2002 22:28:12 +0100 A bus clock - but things like the x440 have more than one bus clock. Its NUMA. Also the bus clock and rdtsc clock are different - rdtsc is dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a BP6 board with tsc on and enjoy That's mostly my point. If the bus clocks differ, then great create some system wide crystal oscillator. That's a detail, the important bit is that you don't need to go out to the system bus to read the tick value, it must be cpu local to be effective and without serious performance impact. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 21:18 ` David S. Miller @ 2002-09-17 22:02 ` James Cleverdon 2002-09-17 22:44 ` Andi Kleen 2002-09-18 6:40 ` Vojtech Pavlik 0 siblings, 2 replies; 29+ messages in thread From: James Cleverdon @ 2002-09-17 22:02 UTC (permalink / raw) To: David S. Miller, alan; +Cc: ak, linux-kernel, johnstul, anton.wilson On Tuesday 17 September 2002 02:18 pm, David S. Miller wrote: > From: Alan Cox <alan@lxorguk.ukuu.org.uk> > Date: 17 Sep 2002 22:28:12 +0100 > > A bus clock - but things like the x440 have more than one bus clock. Its > NUMA. Also the bus clock and rdtsc clock are different - rdtsc is > dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a > BP6 board with tsc on and enjoy > > That's mostly my point. > > If the bus clocks differ, then great create some system wide crystal > oscillator. That's a detail, the important bit is that you don't need > to go out to the system bus to read the tick value, it must be cpu > local to be effective and without serious performance impact. > - It's more than just a detail. Sequent's last NUMA system (_not_ the NUMA-Q; never released) did exactly what you suggest. The midplane card generated the bus clock for all quad modules. We had requested this feature because it was such a pain dealing with clock drift between nodes in the OS. The HW guys were able to give us synchronized bus clocks on a 16-way box, but warned us that it would not be practical on the 256-way. Too much clock skew at those speeds, or something like that. I suppose you could trade off interconnect rate for clock sync, but then performance would suffer. I don't know how Sun and SGI manage with their larger systems. Either they don't do clock sync, or they may have to make expensive tradeoffs. Interestingly, Intel's IA64 manual does not guarantee that the CPU clock (and thus its TSC register) has anything to do with the bus clock rate. Maybe they want to dabble with asynchronous logic or multiple clock domains in future CPUs. Trivia: NUMA-Q systems running Dynix/PTX can contain quads running at very different CPU speeds. This made locating some race conditions quite easy. -- James Cleverdon IBM xSeries Linux Solutions {jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 22:02 ` James Cleverdon @ 2002-09-17 22:44 ` Andi Kleen 2002-09-17 22:38 ` David S. Miller 2002-09-18 6:40 ` Vojtech Pavlik 1 sibling, 1 reply; 29+ messages in thread From: Andi Kleen @ 2002-09-17 22:44 UTC (permalink / raw) To: James Cleverdon Cc: David S. Miller, alan, ak, linux-kernel, johnstul, anton.wilson > I don't know how Sun and SGI manage with their larger systems. Either they > don't do clock sync, or they may have to make expensive tradeoffs. I guess you could always run NTP between the different CPUs ;) ;) -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 22:44 ` Andi Kleen @ 2002-09-17 22:38 ` David S. Miller 2002-09-17 22:55 ` James Cleverdon 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 22:38 UTC (permalink / raw) To: ak; +Cc: jamesclv, alan, linux-kernel, johnstul, anton.wilson From: Andi Kleen <ak@suse.de> Date: Wed, 18 Sep 2002 00:44:42 +0200 > I don't know how Sun and SGI manage with their larger systems. Either they > don't do clock sync, or they may have to make expensive tradeoffs. I guess you could always run NTP between the different CPUs ;) ;) :-) More seriously, you don't need to have the cpu tick registers sync'd, it is the rate that matters. Once booted, you can sync these system tick registers with a pretty straight forward algorithm in the kernel. Bonus points if you can figure out how to cancel out the cost of moving the system tick sample cachelines between master and slave in your algorithm :-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 22:38 ` David S. Miller @ 2002-09-17 22:55 ` James Cleverdon 2002-09-17 23:12 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: James Cleverdon @ 2002-09-17 22:55 UTC (permalink / raw) To: David S. Miller, ak; +Cc: alan, linux-kernel, johnstul, anton.wilson On Tuesday 17 September 2002 03:38 pm, David S. Miller wrote: > From: Andi Kleen <ak@suse.de> > Date: Wed, 18 Sep 2002 00:44:42 +0200 > > > I don't know how Sun and SGI manage with their larger systems. Either > > they don't do clock sync, or they may have to make expensive > > tradeoffs. > > I guess you could always run NTP between the different CPUs ;) ;) > > :-) > > More seriously, you don't need to have the cpu tick registers sync'd, > it is the rate that matters. > > Once booted, you can sync these system tick registers with a pretty > straight forward algorithm in the kernel. Bonus points if you can > figure out how to cancel out the cost of moving the system tick sample > cachelines between master and slave in your algorithm :-) Been there. Done that. Had the product canceled. ;^) The initial sync was easy, even with variable latencies on cache lines. A much simplified NTP-ish algorithm works fine. The painful thing was bus clock drift and programs that foolishly relied on the TSC being the same between CPUs and between nodes. -- James Cleverdon IBM xSeries Linux Solutions {jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 22:55 ` James Cleverdon @ 2002-09-17 23:12 ` David S. Miller 2002-09-17 23:32 ` john stultz 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 23:12 UTC (permalink / raw) To: jamesclv; +Cc: ak, alan, linux-kernel, johnstul, anton.wilson From: James Cleverdon <jamesclv@us.ibm.com> Date: Tue, 17 Sep 2002 15:55:52 -0700 The initial sync was easy, even with variable latencies on cache lines. A much simplified NTP-ish algorithm works fine. The painful thing was bus clock drift and programs that foolishly relied on the TSC being the same between CPUs and between nodes. This is why the gettimeofday implementation should use the system tick thing and also any profiling support in the C library should avoid TSC as well. For small stretches of code TSC can be used for very precise profiling but otherwise it is pretty useless by in large. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:12 ` David S. Miller @ 2002-09-17 23:32 ` john stultz 2002-09-17 23:32 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: john stultz @ 2002-09-17 23:32 UTC (permalink / raw) To: David S. Miller; +Cc: James, ak, Alan Cox, lkml, anton.wilson On Tue, 2002-09-17 at 16:12, David S. Miller wrote: > From: James Cleverdon <jamesclv@us.ibm.com> > Date: Tue, 17 Sep 2002 15:55:52 -0700 > > The initial sync was easy, even with variable latencies on cache lines. A > much simplified NTP-ish algorithm works fine. The painful thing was bus > clock drift and programs that foolishly relied on the TSC being the same > between CPUs and between nodes. > > This is why the gettimeofday implementation should use the system tick > thing and also any profiling support in the C library should avoid > TSC as well. I think the point James is making is that on very large systems, you will get system tick skew as well. On one system I know of, the bus frequency is intensionally skewed slightly between nodes. This is what causes the TSCs to skew, and I believe would also cause this "system tick" to skew as well. Additionally, where is this system tick thing? You make it sound like its a register in the cpu, and while the Ultra-III may have one, I'm unaware of a system/bus tick register on intel chips. Is it in some semi-documented MSR? I apologize for being confused, I'm just not sure if your criticizing the code or the hardware. thanks -john ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:32 ` john stultz @ 2002-09-17 23:32 ` David S. Miller 2002-09-17 23:52 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 23:32 UTC (permalink / raw) To: johnstul; +Cc: jamesclv, ak, alan, linux-kernel, anton.wilson From: john stultz <johnstul@us.ibm.com> Date: 17 Sep 2002 16:32:15 -0700 Additionally, where is this system tick thing? You make it sound like its a register in the cpu, and while the Ultra-III may have one, I'm unaware of a system/bus tick register on intel chips. Is it in some semi-documented MSR? It's in a register on Ultra-III. The whole point of this conversation, if you read my initial postings, is that "this should have been specified in the x86 architecture" I know full well it isn't currently :-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:32 ` David S. Miller @ 2002-09-17 23:52 ` Andi Kleen 2002-09-17 23:46 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2002-09-17 23:52 UTC (permalink / raw) To: David S. Miller; +Cc: johnstul, jamesclv, ak, alan, linux-kernel, anton.wilson On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote: > From: john stultz <johnstul@us.ibm.com> > Date: 17 Sep 2002 16:32:15 -0700 > > Additionally, where is this system tick thing? You make it sound like > its a register in the cpu, and while the Ultra-III may have one, I'm > unaware of a system/bus tick register on intel chips. Is it in some > semi-documented MSR? > > It's in a register on Ultra-III. The whole point of this > conversation, if you read my initial postings, is that > "this should have been specified in the x86 architecture" > > I know full well it isn't currently :-) Sorry, it's wrong. The x86 architecture has several such registers (apic timers, 8253 timer, HPET [Microsoft requires this for new hardware that will be w*s certified]) They just all suck on various systems or in general. HPET is ok, but still not widespread enough. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:52 ` Andi Kleen @ 2002-09-17 23:46 ` David S. Miller 2002-09-17 23:58 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 23:46 UTC (permalink / raw) To: ak; +Cc: johnstul, jamesclv, alan, linux-kernel, anton.wilson From: Andi Kleen <ak@suse.de> Date: Wed, 18 Sep 2002 01:52:09 +0200 On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote: > I know full well it isn't currently :-) Sorry, it's wrong. The x86 architecture has several such registers Not in the processor, and not architectually specified. All of the things you list are in the scope of things outside the cpu. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:46 ` David S. Miller @ 2002-09-17 23:58 ` Andi Kleen 2002-09-17 23:51 ` David S. Miller 2002-09-19 11:20 ` Mikael Pettersson 0 siblings, 2 replies; 29+ messages in thread From: Andi Kleen @ 2002-09-17 23:58 UTC (permalink / raw) To: David S. Miller; +Cc: ak, johnstul, jamesclv, alan, linux-kernel, anton.wilson On Tue, Sep 17, 2002 at 04:46:49PM -0700, David S. Miller wrote: > From: Andi Kleen <ak@suse.de> > Date: Wed, 18 Sep 2002 01:52:09 +0200 > > On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote: > > I know full well it isn't currently :-) > > Sorry, it's wrong. The x86 architecture has several such registers > > Not in the processor, and not architectually specified. > > All of the things you list are in the scope of things outside > the cpu. The local APIC timer is specified in the Intel Manual volume 3 for example. It's an optional feature (CPUID), but pretty much everyone has it. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:58 ` Andi Kleen @ 2002-09-17 23:51 ` David S. Miller 2002-09-18 0:05 ` Andi Kleen 2002-09-19 11:20 ` Mikael Pettersson 1 sibling, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 23:51 UTC (permalink / raw) To: ak; +Cc: johnstul, jamesclv, alan, linux-kernel, anton.wilson From: Andi Kleen <ak@suse.de> Date: Wed, 18 Sep 2002 01:58:38 +0200 The local APIC timer is specified in the Intel Manual volume 3 for example. It's an optional feature (CPUID), but pretty much everyone has it. It is internal or external to the processor? Ie. can it be in the southbridge or something? If yes, then I still hold my point. You shouldn't have to PIO to get a reliable timer value. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:51 ` David S. Miller @ 2002-09-18 0:05 ` Andi Kleen 2002-09-18 1:04 ` James Cleverdon 2002-09-20 11:04 ` Maciej W. Rozycki 0 siblings, 2 replies; 29+ messages in thread From: Andi Kleen @ 2002-09-18 0:05 UTC (permalink / raw) To: David S. Miller; +Cc: ak, johnstul, jamesclv, alan, linux-kernel, anton.wilson On Tue, Sep 17, 2002 at 04:51:31PM -0700, David S. Miller wrote: > From: Andi Kleen <ak@suse.de> > Date: Wed, 18 Sep 2002 01:58:38 +0200 > > The local APIC timer is specified in the Intel Manual volume 3 for example. > It's an optional feature (CPUID), but pretty much everyone has it. > > It is internal or external to the processor? Ie. can it be in the > southbridge or something? If yes, then I still hold my point. Local Apic is in the cpu. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-18 0:05 ` Andi Kleen @ 2002-09-18 1:04 ` James Cleverdon 2002-09-19 18:02 ` Andrea Arcangeli 2002-09-20 11:04 ` Maciej W. Rozycki 1 sibling, 1 reply; 29+ messages in thread From: James Cleverdon @ 2002-09-18 1:04 UTC (permalink / raw) To: Andi Kleen, David S. Miller Cc: ak, johnstul, alan, linux-kernel, anton.wilson On Tuesday 17 September 2002 05:05 pm, Andi Kleen wrote: > On Tue, Sep 17, 2002 at 04:51:31PM -0700, David S. Miller wrote: > > From: Andi Kleen <ak@suse.de> > > Date: Wed, 18 Sep 2002 01:58:38 +0200 > > > > The local APIC timer is specified in the Intel Manual volume 3 for > > example. It's an optional feature (CPUID), but pretty much everyone has > > it. > > > > It is internal or external to the processor? Ie. can it be in the > > southbridge or something? If yes, then I still hold my point. > > Local Apic is in the cpu. > > -Andi I believe you gents are going off at a tangent. Intel's current P4 manual says the local APIC timer is driven by the "bus clock". For serial APICs that was doubtless the APIC serial bus clock, which almost always was derived from the system clock. For P4 systems with the xAPIC in parallel mode, the only one available is the system bus. If a multi-node system doesn't have synchronized bus clocks, it doesn't matter which one you use. The time bases will drift relative to each other. It's even worse when the "Frequency Spreading" BIOS option is turned on. Then, the bus clocks are deliberately offset by as much as half a megahertz (doubtless to pass FCC or equivalent emission certifications). I don't know what Sun does with the Ultra SPARC 3's time counter. Maybe they have a separate clock input for it that runs at 1 MHz so skew and distribution is no problem. That's fine for Sun; they build their own CPUs and can put in whatever they want. The rest of us have to work with what we get from the different manufacturers. And, just about all of them use a value derived from the bus clock -- which might have drift in a multi-node system. That's where a better abstraction of the timer hardware would come in handy. It would use the PIT or TSC for 99% of boxes, and switch to special code for the weird ones. -- James Cleverdon IBM xSeries Linux Solutions {jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-18 1:04 ` James Cleverdon @ 2002-09-19 18:02 ` Andrea Arcangeli 0 siblings, 0 replies; 29+ messages in thread From: Andrea Arcangeli @ 2002-09-19 18:02 UTC (permalink / raw) To: James Cleverdon Cc: Andi Kleen, David S. Miller, johnstul, alan, linux-kernel, anton.wilson On Tue, Sep 17, 2002 at 06:04:33PM -0700, James Cleverdon wrote: > have a separate clock input for it that runs at 1 MHz so skew and The clock input should be the same, or they can always run out of synchrony if you left it running forever. The timer generation is an analogic thing, the reception is digital, so having a single timer guarantees no counter skew. If the precision we'd need from the timer driving gettimeofday would be 1HZ, so 1 tick per second, you could make it scale perfectly without oscillations on a 256G box. you simply can't do that with a < 1nanosecond tick period on more than a few cpus, because of physics, or it happens what's been mentioned a number of times on this thread (oscillations generated by the latency of the signal delivery or further slowdown in accessing the information with overhead in the interconnects). The best hardware solution to this problem is to have two cpu registers increased by two timers, one is the regular cpu tick (TSC) that we have today, that could even go away with asynchronous cpus, and the other timer would be the new "real time timer", a 10/100khz clock delivered to all the cpus that goes to increase such in-cpu-core counter (so that it can be read from userspace too inside vgettimeofday and with extremely low latency, exactly like the current tsc, but driven by such a secondary low frequency timer that will tell us about the time changes). 10/100usec should be much more than enough margin to deliver this timer to all the hundred cpus with a very small oscillation. And no software that I'm aware about needs a time-of-day precision over 10/100usec. An interrupt itself is going to take some usec. A context switch as well is going to take more than 10usec, that's the important bit to guarantee gettimeofday to be monothone, different threads can have a minor difference in the perception of the time, dominated by the speed of light delivery of the timer signal, that's not a problem as far as it's monothone. The TSC and also the system clock mentioned by Dave are way too fast to be kept synchronized in a numa without introducing significant drifts and oscillations. If somebody really needs 1usec resolution, he will first need vsyscalls to avoid enter/exit kernel latencies, likely he will need to run iopl with irq disabled, and so it should be ok to use the TSC in such case with a specialized hacked kernel config option (with all the disclaimer that it would break if the cpu clock changes under you etc...) All mere mortals will be perfectly fine with a 100khz clock for gettimeofday. If sun did a 1mhz clock to achieve the above suggested design solution, then they did the optimal thing IMHO. Another approch would be to use separate timer sources per-cpu and to re-resychronize every once in a while, at regular intervals that guarantees the drift not to spread above the half of the time of the shortest context switch, but it would need tedious software support with knowledge of very lowevel hardware informations, so I'd definitely prefer the previous mentioned solution that will require all hardware vendors to get it right or it won't work. Like it's happening now with the TSC, with the difference that the 100k timer would be doable, while the TSC at 2ghz isn't doable. Of course the cyclone timer and the HPET are the very next best thing the hardware vendors could provide us on x86, and of course you cannot do better than the cyclone and HPET without upgrading the cpu too, because the cpu is simply missing a register to avoid hitting the southbridge at every vgettimeofday. At least the good thing is that HPET is mapped in a mmio region so we don't need to enter kernel but only to access the southbridge from userspace and that saves a number of usec at every gettimeofday. All of this assumes gettimeofday is an important operation and that an additional cpu sequence counter and an additional numa-shared timer would payoff to make gettimeofday most efficient and most accurate on all class of machines. It would be also an option to replace the TSC with such new "real time counter" if adding a new counter is too expensive, the TSC is almost unusable in its current too high frequency form, it is useful only for microbenchmarking, so it's more a debugging facility than a production feature, while the other would be a really useful feature not only for debugging/benchmarking purposes. Andrea ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-18 0:05 ` Andi Kleen 2002-09-18 1:04 ` James Cleverdon @ 2002-09-20 11:04 ` Maciej W. Rozycki 1 sibling, 0 replies; 29+ messages in thread From: Maciej W. Rozycki @ 2002-09-20 11:04 UTC (permalink / raw) To: Andi Kleen Cc: David S. Miller, johnstul, jamesclv, alan, linux-kernel, anton.wilson On Wed, 18 Sep 2002, Andi Kleen wrote: > > It is internal or external to the processor? Ie. can it be in the > > southbridge or something? If yes, then I still hold my point. > > Local Apic is in the cpu. Except from when it's an i82489DX... Rare but still. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 23:58 ` Andi Kleen 2002-09-17 23:51 ` David S. Miller @ 2002-09-19 11:20 ` Mikael Pettersson 2002-09-19 13:27 ` Alan Cox 1 sibling, 1 reply; 29+ messages in thread From: Mikael Pettersson @ 2002-09-19 11:20 UTC (permalink / raw) To: Andi Kleen Cc: David S. Miller, johnstul, jamesclv, alan, linux-kernel, anton.wilson Andi Kleen writes: > On Tue, Sep 17, 2002 at 04:46:49PM -0700, David S. Miller wrote: > > From: Andi Kleen <ak@suse.de> > > Date: Wed, 18 Sep 2002 01:52:09 +0200 > > > > On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote: > > > I know full well it isn't currently :-) > > > > Sorry, it's wrong. The x86 architecture has several such registers > > > > Not in the processor, and not architectually specified. > > > > All of the things you list are in the scope of things outside > > the cpu. > > The local APIC timer is specified in the Intel Manual volume 3 for example. > It's an optional feature (CPUID), but pretty much everyone has it. Except that like everything else related to the local APIC, you're at the mercy of the competence (or lack thereof) of the BIOS implementors. - There are plenty of laptops whose CPUs have local APICs but whose BIOSen go berserk if you enable it. There are also plenty of laptops that don't have one, since Intel removed it from many Mobile P6 CPUs. - There are even some desktop boards with BIOS problems, including Intel's AL440LX on which Linux must stay away from the local APIC timer. To assume the local APIC works on 686-class UP boxes is not realistic, alas. /Mikael ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-19 11:20 ` Mikael Pettersson @ 2002-09-19 13:27 ` Alan Cox 2002-09-19 13:39 ` Mikael Pettersson 2002-09-20 15:26 ` John Levon 0 siblings, 2 replies; 29+ messages in thread From: Alan Cox @ 2002-09-19 13:27 UTC (permalink / raw) To: Mikael Pettersson Cc: Andi Kleen, David S. Miller, johnstul, James Cleverdon, linux-kernel, anton.wilson On Thu, 2002-09-19 at 12:20, Mikael Pettersson wrote: > > The local APIC timer is specified in the Intel Manual volume 3 for example. > > It's an optional feature (CPUID), but pretty much everyone has it. > > Except that like everything else related to the local APIC, you're at > the mercy of the competence (or lack thereof) of the BIOS implementors. > - There are plenty of laptops whose CPUs have local APICs but whose > BIOSen go berserk if you enable it. There are also plenty of laptops Frequently because we don't disable it again before any APM calls I suspect. When a CPU goes into sleep mode you must disable PMC and local apic timer interrupts. > that don't have one, since Intel removed it from many Mobile P6 CPUs. > - There are even some desktop boards with BIOS problems, including Intel's > AL440LX on which Linux must stay away from the local APIC timer. > > To assume the local APIC works on 686-class UP boxes is not realistic, alas. Yep ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-19 13:27 ` Alan Cox @ 2002-09-19 13:39 ` Mikael Pettersson 2002-09-20 15:26 ` John Levon 1 sibling, 0 replies; 29+ messages in thread From: Mikael Pettersson @ 2002-09-19 13:39 UTC (permalink / raw) To: Alan Cox Cc: Mikael Pettersson, Andi Kleen, David S. Miller, johnstul, James Cleverdon, linux-kernel, anton.wilson Alan Cox writes: > On Thu, 2002-09-19 at 12:20, Mikael Pettersson wrote: > > > The local APIC timer is specified in the Intel Manual volume 3 for example. > > > It's an optional feature (CPUID), but pretty much everyone has it. > > > > Except that like everything else related to the local APIC, you're at > > the mercy of the competence (or lack thereof) of the BIOS implementors. > > - There are plenty of laptops whose CPUs have local APICs but whose > > BIOSen go berserk if you enable it. There are also plenty of laptops > > Frequently because we don't disable it again before any APM calls I > suspect. When a CPU goes into sleep mode you must disable PMC and local > apic timer interrupts. We do on sane boxes where the APM BIOS informs us before suspending. E.g., on my ASUS P3B-F & P4T-E suspend works with local APIC enabled because I hooked both the NMI watchdog and local APIC to the PM system, so we disable before suspending and restore afterwards. The problem is that some BIOSen don't post the suspend event to our APM driver, so we fail to disable before suspend, and some BIOSen (like the utter crap Dell put in the Inspiron) die on all entries to the BIOS: pull the power cord -> #SMM event -> box crashes. /Mikael ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-19 13:27 ` Alan Cox 2002-09-19 13:39 ` Mikael Pettersson @ 2002-09-20 15:26 ` John Levon 1 sibling, 0 replies; 29+ messages in thread From: John Levon @ 2002-09-20 15:26 UTC (permalink / raw) To: linux-kernel On Thu, Sep 19, 2002 at 02:27:19PM +0100, Alan Cox wrote: > > - There are plenty of laptops whose CPUs have local APICs but whose > > BIOSen go berserk if you enable it. There are also plenty of laptops > > Frequently because we don't disable it again before any APM calls I > suspect. When a CPU goes into sleep mode you must disable PMC and local > apic timer interrupts. Isn't this exactly what apic_pm_suspend() does ? Or is that in 2.5 only ? regards john -- Support the project - http://www.gtonline.net/private/mapp/project/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 22:02 ` James Cleverdon 2002-09-17 22:44 ` Andi Kleen @ 2002-09-18 6:40 ` Vojtech Pavlik 2002-09-19 18:04 ` Andrea Arcangeli 1 sibling, 1 reply; 29+ messages in thread From: Vojtech Pavlik @ 2002-09-18 6:40 UTC (permalink / raw) To: James Cleverdon Cc: David S. Miller, alan, ak, linux-kernel, johnstul, anton.wilson On Tue, Sep 17, 2002 at 03:02:04PM -0700, James Cleverdon wrote: > On Tuesday 17 September 2002 02:18 pm, David S. Miller wrote: > > From: Alan Cox <alan@lxorguk.ukuu.org.uk> > > Date: 17 Sep 2002 22:28:12 +0100 > > > > A bus clock - but things like the x440 have more than one bus clock. Its > > NUMA. Also the bus clock and rdtsc clock are different - rdtsc is > > dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a > > BP6 board with tsc on and enjoy > > > > That's mostly my point. > > > > If the bus clocks differ, then great create some system wide crystal > > oscillator. That's a detail, the important bit is that you don't need > > to go out to the system bus to read the tick value, it must be cpu > > local to be effective and without serious performance impact. > > - > > It's more than just a detail. Sequent's last NUMA system (_not_ the NUMA-Q; > never released) did exactly what you suggest. The midplane card generated > the bus clock for all quad modules. We had requested this feature because it > was such a pain dealing with clock drift between nodes in the OS. > > The HW guys were able to give us synchronized bus clocks on a 16-way box, but > warned us that it would not be practical on the 256-way. Too much clock skew > at those speeds, or something like that. I suppose you could trade off > interconnect rate for clock sync, but then performance would suffer. > > I don't know how Sun and SGI manage with their larger systems. Either they > don't do clock sync, or they may have to make expensive tradeoffs. > > Interestingly, Intel's IA64 manual does not guarantee that the CPU clock (and > thus its TSC register) has anything to do with the bus clock rate. Maybe > they want to dabble with asynchronous logic or multiple clock domains in > future CPUs. The point here is: You don't need a synchronized bus clock. You don't need synchronized CPU clocks. You need a synchronized system-wide clock that doesn't drive any bus or CPU, just a simple counter in every CPU that you can read from inside the CPU. You can pull that pretty far and to many CPUs. That's what I understand Sun does. -- Vojtech Pavlik SuSE Labs ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-18 6:40 ` Vojtech Pavlik @ 2002-09-19 18:04 ` Andrea Arcangeli 0 siblings, 0 replies; 29+ messages in thread From: Andrea Arcangeli @ 2002-09-19 18:04 UTC (permalink / raw) To: Vojtech Pavlik Cc: James Cleverdon, David S. Miller, alan, ak, linux-kernel, johnstul, anton.wilson On Wed, Sep 18, 2002 at 08:40:22AM +0200, Vojtech Pavlik wrote: > The point here is: You don't need a synchronized bus clock. You don't > need synchronized CPU clocks. You need a synchronized system-wide clock > that doesn't drive any bus or CPU, just a simple counter in every CPU > that you can read from inside the CPU. You can pull that pretty far and Exactly. Andrea ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <200209172020.g8HKKPF13227@eng2.beaverton.ibm.com>]
* Re: Fwd: do_gettimeofday vs. rdtsc in the scheduler [not found] <200209172020.g8HKKPF13227@eng2.beaverton.ibm.com> @ 2002-09-17 20:29 ` john stultz 2002-09-17 20:39 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: john stultz @ 2002-09-17 20:29 UTC (permalink / raw) To: anton wilson; +Cc: lkml > I'm writing a patch for the scheduler that allows normal processes to run > occasionally even though real-time processes completely dominate the CPU. > In > order to do this the way I want to for a specific real-time application, I > need to keep track of the times that the schedule(void) function gets > called. > This time is then used to calculate the time difference between when a > normal > process was run last and the current time. I was trying to avoid > do_gettimeofday because of the overhead, but now I'm wondering if rdtsc on > an > SMP machine may mess up my readings because the TSC from two different > processors may be read. Am I right in assuming this? Secondly, any good > suggestions on how to proceed with my patch? Tread with caution. Some NUMA boxes do not have synced TSC, so on those systems your code won't work. Additionally, you code would need to take other technologies like speedstep into account as well Alternatively, you might want to try using get_cycles, or some other semi-abstracted interface, so alternative time sources could be used in the future without having to re-write your code. I'm working on somewhat abstracting out time sources with my timer-changes patch, so take a peek at it and let me know if you have any suggestions. thanks -john ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 20:29 ` Fwd: " john stultz @ 2002-09-17 20:39 ` David S. Miller 2002-09-17 20:57 ` john stultz 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-09-17 20:39 UTC (permalink / raw) To: johnstul; +Cc: anton.wilson, linux-kernel From: john stultz <johnstul@us.ibm.com> Date: 17 Sep 2002 13:29:18 -0700 Some NUMA boxes do not have synced TSC, so on those systems your code won't work. It would have been really nice if x86 had specified a "system tick" register that incremented based upon the system bus cycles and thus were immune the processor rates. I foresee lots of patches coming which basically are "how does this x86 system provide a stable synchronized tick source". ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 20:39 ` David S. Miller @ 2002-09-17 20:57 ` john stultz 2002-09-17 20:56 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: john stultz @ 2002-09-17 20:57 UTC (permalink / raw) To: David S. Miller; +Cc: anton.wilson, lkml, george anzinger On Tue, 2002-09-17 at 13:39, David S. Miller wrote: > From: john stultz <johnstul@us.ibm.com> > Date: 17 Sep 2002 13:29:18 -0700 > > Some NUMA boxes do not have synced TSC, so on those systems your > code won't work. > > It would have been really nice if x86 had specified a "system tick" > register that incremented based upon the system bus cycles and thus > were immune the processor rates. Some systems do, if I'm understanding you properly. Summit based boxes have an on-chipset performance counter that runs at 100Mhz. My cyclone-timer patch uses this as a gettimeofday/__delay time source in the 2.4 kernel. Additionally George Anzinger has patches that allow the ACPI PM timer to be used as well. Intel's HPET should also provide another time source. > I foresee lots of patches coming which basically are "how does this > x86 system provide a stable synchronized tick source". True, but hopefully my timer-changes patch will allow for better abstraction around these varied time sources, so one won't really need to know how all of these different sources work. thanks -john ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: do_gettimeofday vs. rdtsc in the scheduler 2002-09-17 20:57 ` john stultz @ 2002-09-17 20:56 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2002-09-17 20:56 UTC (permalink / raw) To: johnstul; +Cc: anton.wilson, linux-kernel, george From: john stultz <johnstul@us.ibm.com> Date: 17 Sep 2002 13:57:13 -0700 On Tue, 2002-09-17 at 13:39, David S. Miller wrote: > It would have been really nice if x86 had specified a "system tick" > register that incremented based upon the system bus cycles and thus > were immune the processor rates. Some systems do, if I'm understanding you properly. Summit based boxes have an on-chipset performance counter that runs at 100Mhz. My cyclone-timer patch uses this as a gettimeofday/__delay time source in the 2.4 kernel. Additionally George Anzinger has patches that allow the ACPI PM timer to be used as well. Intel's HPET should also provide another time source. If any of these need to go beyond the cpu to get the tick value, they are misimplemented. The cpu gets the system bus tick input at it's bus pins, therefore it can implement the system tick register locally obviating the need to go to a south bridge or memory controller or whatever else external to the cpu to get at the value. ^ permalink raw reply [flat|nested] 29+ messages in thread
* do_gettimeofday vs. rdtsc in the scheduler @ 2002-09-09 22:21 anton wilson 0 siblings, 0 replies; 29+ messages in thread From: anton wilson @ 2002-09-09 22:21 UTC (permalink / raw) To: linux-kernel I'm writing a patch for the scheduler that allows normal processes to run occasionally even though real-time processes completely dominate the CPU. In order to do this the way I want to for a specific real-time application, I need to keep track of the times that the schedule(void) function gets called. This time is then used to calculate the time difference between when a normal process was run last and the current time. I was trying to avoid do_gettimeofday because of the overhead, but now I'm wondering if rdtsc on an SMP machine may mess up my readings because the TSC from two different processors may be read. Am I right in assuming this? Secondly, any good suggestions on how to proceed with my patch? Thanks, Anton ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2002-09-20 15:21 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200209172020.g8HKKPF13227@eng2.beaverton.ibm.com.suse.lists.linux.kernel>
[not found] ` <1032294559.22815.180.camel@cog.suse.lists.linux.kernel>
[not found] ` <20020917.133933.69057655.davem@redhat.com.suse.lists.linux.kernel>
2002-09-17 21:00 ` do_gettimeofday vs. rdtsc in the scheduler Andi Kleen
2002-09-17 20:54 ` David S. Miller
2002-09-17 21:28 ` Alan Cox
2002-09-17 21:18 ` David S. Miller
2002-09-17 22:02 ` James Cleverdon
2002-09-17 22:44 ` Andi Kleen
2002-09-17 22:38 ` David S. Miller
2002-09-17 22:55 ` James Cleverdon
2002-09-17 23:12 ` David S. Miller
2002-09-17 23:32 ` john stultz
2002-09-17 23:32 ` David S. Miller
2002-09-17 23:52 ` Andi Kleen
2002-09-17 23:46 ` David S. Miller
2002-09-17 23:58 ` Andi Kleen
2002-09-17 23:51 ` David S. Miller
2002-09-18 0:05 ` Andi Kleen
2002-09-18 1:04 ` James Cleverdon
2002-09-19 18:02 ` Andrea Arcangeli
2002-09-20 11:04 ` Maciej W. Rozycki
2002-09-19 11:20 ` Mikael Pettersson
2002-09-19 13:27 ` Alan Cox
2002-09-19 13:39 ` Mikael Pettersson
2002-09-20 15:26 ` John Levon
2002-09-18 6:40 ` Vojtech Pavlik
2002-09-19 18:04 ` Andrea Arcangeli
[not found] <200209172020.g8HKKPF13227@eng2.beaverton.ibm.com>
2002-09-17 20:29 ` Fwd: " john stultz
2002-09-17 20:39 ` David S. Miller
2002-09-17 20:57 ` john stultz
2002-09-17 20:56 ` David S. Miller
2002-09-09 22:21 anton wilson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox