* [RFC] New Time of day proposal (updated 9/2/04)
@ 2004-09-02 21:07 john stultz
2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz
` (3 more replies)
0 siblings, 4 replies; 84+ messages in thread
From: john stultz @ 2004-09-02 21:07 UTC (permalink / raw)
To: lkml
Cc: tim, george, albert, Ulrich.Windl, clameter, len.brown, linux,
davidm, ak, paulus, schwidefsky, jimix, keith maanthey, greg kh,
Patricia Gaughen, Chris McDermott
All,
Here again is my updated time of day proposal. Also to follow is
working code for i386. I wanted to get this out so folks could actually
play around with the core architecture independent code. The i386
specific bits function, but are unfinished. Thus they have been broken
out and should mostly be ignored for this release.
I'd really appreciate any feedback or concerns about this proposal and
the following code.
thanks
-john
Proposal for an architecture independent time of day implementation.
-------------------------------------------------------------------
John Stultz (johnstul@us.ibm.com)
DRAFT
Thu Sep 2 12:35:11 PDT 2004
Credits:
Keith Mannthey: Aided initial design.
Aided greatly to implementation details.
George Anzinger: Initial review and corrections.
Ulrich Windl: Review and suggestions for clarity.
Many of the time of day related issues that cropped up in 2.5
development occurred where a fix or change was made to a number of
architectures, but missed a few others. Currently every architecture has
its own set of timekeeping functions that basically do the same thing,
only using different (or frequently, not so different) types of
hardware. As hardware has changed, many architectures have had to
re-engineer their time system to handle multiple time and interrupt
sources. With little common infrastructure, either each separate
implementation has its own quirks and bugs, or we end up with a
reasonable quantity of duplicated code. Additionally the lack of a clear
time of day interface has led developers to use jiffies, HZ, and the raw
xtime values to calculate the time of day themselves. This has lead to a
number of troublesome bugs.
With the goal to simplify, streamline and consolidate the time-of-day
infrastructure, I propose the following common implementation across all
arches. This will allow generic bugs to be fixed once, reduce code
duplication, and with many architectures sharing the same time source,
this allows drivers to be written once for multiple architectures.
Additionally it will better delineate the lines between the timer
subsystem and the time-of-day subsystem, opening the door for more
flexible and better timekeeping.
Features of this design:
========================
o Splits time of day management from timer interrupts:
This is necessary for virtualization & tickless systems. It allows us
to no longer care how often clock_interrupt() is called. Missing, early
or lost interrupts do not affect time keeping (within bounds - ie: the
time source cannot overflow). This isolates HZ and jiffies to the timer
subsystem (mostly), as they are frequently and incorrectly used to
calculate time.
Additionally, it allows for dynamic tick interrupts / high-res ticks.
Avoid the need to interpolate between multiple shoddy time sources, and
lets us be agnostic to where the periodic interrupts come from (cleans
up i386 HPET interrupt code).
o Consolidates a large amount of code:
Allows for shared times source implementations, such as: i386, x86-64
and ia64 all use HPET, i386 and x86-64 both have ACPI PM timers, and
i386 and ia64 both have cyclone counters. Time sources are just drivers!
Also work for user space gettimeofday implementations will be able to be
shared across all arches (assuming the hardware time source can be
safely accessed from user space).
o Generic algorithms which use time-source drivers chosen at runtime:
Drivers are just simple hw accessors functions with no internal state
needed. They can be loaded and changed while the system is running, like
normal modules.
o More consistent and readable code:
Drop wall_to_monotonic & xtime in favor of a more simple system_time
and wall_time_offset variables. Where system_time is the monotonically
increasing nanoseconds since boot time and wall_time_offset is the
offset added to system_time to calculate time of day.
o Uses nanoseconds as the kernel's base time unit.
Rather then doing ugly manipulations to timevals or timespecs, this
simplifies math, and gives us plenty of room to grow (64bits of
nanoseconds ~= 584 years).
o Clearly separates the NTP code from the time code:
Creates a clean and clear interface, keeping all the NTP related code
in a single place. Save brains, normal people shouldn't have to think
about the in kernel ntp machinery.
Brief Psudo-code to illustrate the design:
==========================================
Globals:
--------
offset_base: timesource cycle value at last call to timeofday_hook()
system_time: time in ns calculated at last call to timeofday_hook()
wall_offset: offset to monotonic_clock() to get current time of day
Functions:
----------
timeofday_hook()
now = read(); /* read the timesource */
ns = cyc2ns(now - offset_base); /* calc nsecs since last call */
ntp_ns = ntp_scale(ns); /* apply ntp scaling */
system_time += ntp_ns; /* add scaled value to system_time */
ntp_advance(ns); /* advance ntp state machine by ns */
offset_base = now; /* set new offset_base */
monotonic_clock()
now = read(); /* read the timesource */
ns = cyc2ns(now - offset_base); /* calculate nsecs since last hook */
ntp_ns = ntp_scale(ns); /* apply ntp scaling */
return system_time + ntp_ns; /* return system_time and scaled value
*/
settimeofday(desired)
wall_offset = desired - monotonic_clock(); /* set wall offset */
gettimeofday()
return wall_offset + monotonic_clock(); /* return current timeofday */
Points I'm glossing over for now:
====================================================
o Have to convert back to time_val for syscall interface
o ntp_scale(ns): scales ns by NTP scaling factor
- see ntp.c for details
- costly, but correct.
o ntp_advance(ns): advances NTP state machine by ns
- see ntp.c for details
o What is the cost of throwing around 64bit values for everything?
- Do we need an arch specific time structure that varies size
accordingly?
o Some arches (arm, for example) do not have high res timing hardware
- In this case we can have a "jiffies" timesource
- cyc2ns(x) = x*(NSEC_PER_SEC/HZ)
- doesn't work for tickless systems
o vsyscalls/userspace gettimeofday()
- Mark functions and data w/ __vsyscall attribute
- Use linker to put all __vsyscall data in the same set of pages
- Mark those pages user-executable
- Should work for all arches
- ia64 fastcall should be used for timesource implementation only
o suspend/resume
- not yet implemented, but shouldn't be hard
Anything else? What am I missing or just being ignorant of?
^ permalink raw reply [flat|nested] 84+ messages in thread* [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 21:07 [RFC] New Time of day proposal (updated 9/2/04) john stultz @ 2004-09-02 21:09 ` john stultz 2004-09-02 21:11 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) john stultz ` (2 more replies) 2004-09-02 22:09 ` [RFC] New Time of day proposal (updated 9/2/04) Christoph Lameter ` (2 subsequent siblings) 3 siblings, 3 replies; 84+ messages in thread From: john stultz @ 2004-09-02 21:09 UTC (permalink / raw) To: lkml Cc: tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott All, This patch implements the architecture independent portion of the time of day subsystem. Included is timeofday.c (which includes all the time of day management and accessor functions), ntp.c (which includes the ntp scaling code, leap second processing, and ntp kernel state machine code), interface definition .h files, the example jiffies timesource (lowest common denominator time source, mainly for use as example code) and minimal hooks into arch independent code. The patch does not function without minimal architecture specific hooks (i386 example to follow), and it can be applied to a tree without affecting the code. I look forward to your comments and feedback. thanks -john linux-2.6.9-rc1_timeofday-core_A0.patch ======================================= diff -Nru a/drivers/Makefile b/drivers/Makefile --- a/drivers/Makefile 2004-09-02 13:27:00 -07:00 +++ b/drivers/Makefile 2004-09-02 13:27:00 -07:00 @@ -52,3 +52,4 @@ obj-$(CONFIG_CPU_FREQ) += cpufreq/ obj-$(CONFIG_MMC) += mmc/ obj-y += firmware/ +obj-$(CONFIG_NEWTOD) += timesource/ diff -Nru a/drivers/timesource/Makefile b/drivers/timesource/Makefile --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/drivers/timesource/Makefile 2004-09-02 13:27:00 -07:00 @@ -0,0 +1 @@ +obj-y += jiffies.o diff -Nru a/drivers/timesource/jiffies.c b/drivers/timesource/jiffies.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/drivers/timesource/jiffies.c 2004-09-02 13:27:00 -07:00 @@ -0,0 +1,61 @@ +/* + * linux/drivers/timesource/jiffies.c + * + * Copyright (C) 2004 IBM + * + * This file contains the jiffies based time source. + * + */ +#include <linux/timesource.h> +#include <linux/jiffies.h> +#include <linux/init.h> + +/* The Jiffies based timesource is the lowest common + * denominator time source which should function on + * all systems. It has the same course resolution as + * the timer interrupt frequency HZ and it suffers + * inaccuracies caused by missed or lost timer + * interrupts and the inability for the timer + * interrupt hardware to accuratly tick at the + * requested HZ value. It is also not reccomended + * for "tick-less" systems. + */ + +static cycle_t jiffies_read(void) +{ + cycle_t ret = get_jiffies_64(); + return ret; +} + +static cycle_t jiffies_delta(cycle_t now, cycle_t then) +{ + /* simple subtraction, no need to mask */ + return now - then; +} + +static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) +{ + + cyc *= NSEC_PER_SEC/HZ; + + /* no remainder, so set it to zero */ + if (remainder) + *remainder = 0; + + return (nsec_t)cyc; +} + +struct timesource_t timesource_jiffies = { + .name = "jiffies", + .priority = 0, /* lowest priority*/ + .read = jiffies_read, + .delta = jiffies_delta, + .cyc2ns = jiffies_cyc2ns, +}; + +static int init_jiffies_timesource(void) +{ + register_timesource(×ource_jiffies); + return 0; +} +module_init(init_jiffies_timesource); diff -Nru a/include/linux/ntp.h b/include/linux/ntp.h --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/include/linux/ntp.h 2004-09-02 13:27:00 -07:00 @@ -0,0 +1,22 @@ +/* linux/include/linux/ntp.h + * + * Copyright (C) 2003, 2004 IBM, John Stultz (johnstul@us.ibm.com) + * + * This file contains time of day helper functions + */ + +#ifndef _LINUX_NTP_H +#define _LINUX_NTP_H +#include <linux/types.h> +#include <linux/time.h> +#include <linux/timex.h> + +/* timeofday interfaces */ +nsec_t ntp_scale(nsec_t value); +void ntp_advance(nsec_t value); +int ntp_adjtimex(struct timex*); +int ntp_leapsecond(struct timespec now); +void ntp_clear(void); +int get_ntp_status(void); + +#endif diff -Nru a/include/linux/time.h b/include/linux/time.h --- a/include/linux/time.h 2004-09-02 13:27:00 -07:00 +++ b/include/linux/time.h 2004-09-02 13:27:00 -07:00 @@ -24,6 +24,10 @@ #ifdef __KERNEL__ +/* timeofday base types */ +typedef u64 nsec_t; +typedef u64 cycle_t; + #include <linux/spinlock.h> #include <linux/seqlock.h> #include <linux/timex.h> diff -Nru a/include/linux/timeofday.h b/include/linux/timeofday.h --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/include/linux/timeofday.h 2004-09-02 13:27:00 -07:00 @@ -0,0 +1,61 @@ +/* linux/include/linux/timeofday.h + * + * Copyright (C) 2003, 2004 IBM, John Stultz (johnstul@us.ibm.com) + * + * This file contains the interface to the time of day subsystem + */ +#ifndef _LINUX_TIMEOFDAY_H +#define _LINUX_TIMEOFDAY_H +#include <linux/types.h> +#include <linux/time.h> + +#ifdef CONFIG_NEWTOD +nsec_t get_lowres_timestamp(void); +nsec_t get_lowres_timeofday(void); +nsec_t do_monotonic_clock(void); + + +void do_gettimeofday(struct timeval *tv); +int do_settimeofday(struct timespec *tv); +int do_adjtimex(struct timex *tx); + +void timeofday_interrupt_hook(void); +void timeofday_init(void); + + +/* Helper functions */ +static inline struct timeval ns2timeval(nsec_t ns) +{ + struct timeval tv; + tv.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &tv.tv_usec); + tv.tv_usec /= NSEC_PER_USEC; + return tv; +} + +static inline struct timespec ns2timespec(nsec_t ns) +{ + struct timespec ts; + ts.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts.tv_nsec); + return ts; +} + +static inline u64 timespec2ns(struct timespec* ts) +{ + nsec_t ret; + ret = ((nsec_t)ts->tv_sec) * NSEC_PER_SEC; + ret += ts->tv_nsec; + return ret; +} + +static inline nsec_t timeval2ns(struct timeval* tv) +{ + nsec_t ret; + ret = ((nsec_t)tv->tv_sec) * NSEC_PER_SEC; + ret += tv->tv_usec*NSEC_PER_USEC; + return ret; +} +#else /* CONFIG_NEWTOD */ +#define timeofday_interrupt_hook() +#define timeofday_init() +#endif /* CONFIG_NEWTOD */ +#endif /* _LINUX_TIMEOFDAY_H */ diff -Nru a/include/linux/timesource.h b/include/linux/timesource.h --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/include/linux/timesource.h 2004-09-02 13:27:00 -07:00 @@ -0,0 +1,36 @@ +/* linux/include/linux/timesource.h + * + * Copyright (C) 2003, 2004 IBM, John Stultz (johnstul@us.ibm.com) + * + * This file contains the structure definitions for timesources. + * + * If you are not a timesource, or the time of day code, you should + * not be including this file. + */ +#ifndef _LINUX_TIMESORUCE_H +#define _LINUX_TIMESORUCE_H + +#include <linux/types.h> +#include <linux/time.h> + +/* struct timesource_t: + * Provides mostly state-free accessors to the underlying + * hardware. + * name: ptr to timesource name + * priority:priority value (higher is better) + * @read: returns a cycle value + * @delta: calculates the difference between two cycle values + * @cyc2ns: converts a cycle value to ns (expected to be expensive) + */ +struct timesource_t { + char* name; + int priority; + cycle_t (*read)(void); + cycle_t (*delta)(cycle_t now, cycle_t then); + nsec_t (*cyc2ns)(cycle_t cycles, cycle_t* remainder); +}; + +/* used to install a new time source */ +void register_timesource(struct timesource_t*); + +#endif diff -Nru a/init/main.c b/init/main.c --- a/init/main.c 2004-09-02 13:27:00 -07:00 +++ b/init/main.c 2004-09-02 13:27:00 -07:00 @@ -45,6 +45,7 @@ #include <linux/unistd.h> #include <linux/rmap.h> #include <linux/mempolicy.h> +#include <linux/timeofday.h> #include <asm/io.h> #include <asm/bugs.h> @@ -514,6 +515,7 @@ pidhash_init(); init_timers(); softirq_init(); + timeofday_init(); time_init(); /* diff -Nru a/kernel/Makefile b/kernel/Makefile --- a/kernel/Makefile 2004-09-02 13:27:00 -07:00 +++ b/kernel/Makefile 2004-09-02 13:27:00 -07:00 @@ -9,6 +9,7 @@ rcupdate.o intermodule.o extable.o params.o posix-timers.o \ kthread.o +obj-$(CONFIG_NEWTOD) += timeofday.o ntp.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o obj-$(CONFIG_SMP) += cpu.o diff -Nru a/kernel/ntp.c b/kernel/ntp.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/kernel/ntp.c 2004-09-02 13:27:00 -07:00 @@ -0,0 +1,554 @@ +/******************************************************************** +* linux/kernel/ntp.c +* +* NTP state machine and time scaling code. +* +* Copyright (C) 2004 IBM, John Stultz (johnstul@us.ibm.com) +* +* Portions rewritten from kernel/time.c and kernel/timer.c +* Please see those files for original copyrights. +* +* Hopefully you should never have to understand or touch +* any of the code below. but don't let that keep you from trying! +* +* This code is loosely based on David Mills' RFC 1589 and its +* updates. Please see the following for more details: +* http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt +* http://www.eecis.udel.edu/~mills/database/reports/kern/kernb.pdf +* +* NOTE: To simplify the code, we do not implement any of +* the PPS code, as the code that uses it never was merged. +* -johnstul@us.ibm.com +* +* Revision History: +* 2004-09-02: A0 +* o First pass sent to lkml for review. +* +* +* TODO List: +* o More documentation +* o More testing +* o More optimization +*********************************************************************/ + +#include <linux/ntp.h> +#include <linux/errno.h> +#include <linux/sched.h> /* Needed for capable() */ + +/* NTP scaling code + * Functions: + * ---------- + * nsec_t ntp_scale(nsec_t value): + * Scales the nsec_t vale using ntp kernel state + * void ntp_advance(nsec_t interval): + * Increments the NTP state machine by interval time + * static int ntp_hardupdate(long offset, struct timeval tv) + * ntp_adjtimex helper function + * int ntp_adjtimex(struct timex* tx): + * Interface to adjust NTP state machine + * int ntp_leapsecond(struct timespec now) + * Does NTP leapsecond processing. Returns number of + * seconds current time should be adjusted by. + * void ntp_clear(void): + * Clears the ntp kernel state + * int get_ntp_status(void): + * returns ntp_status value + * + * Variables: + * ---------- + * ntp kernel state variables: + * See below for full list. + * ntp_lock: + * Protects ntp kernel state variables + */ + + + +/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */ +/* 5.1 Interface Variables */ +static int ntp_status = STA_UNSYNC; /* status */ +static long ntp_offset; /* usec */ +static long ntp_constant = 2; /* ntp magic? */ +static long ntp_maxerror = NTP_PHASE_LIMIT; /* usec */ +static long ntp_esterror = NTP_PHASE_LIMIT; /* usec */ +static const long ntp_tolerance = MAXFREQ; /* shifted ppm */ +static const long ntp_precision = 1; /* constant */ + +/* 5.2 Phase-Lock Loop Variables */ +static long ntp_freq; /* shifted ppm */ +static long ntp_reftime; /* sec */ + +/* Extra values */ +static int ntp_state = TIME_OK; /* leapsecond state */ +static long ntp_tick = USEC_PER_SEC/USER_HZ; /* tick length */ + +static s64 ss_offset_len; /* SINGLESHOT offset adj interval (nsec)*/ +static long singleshot_adj; /* +/- MAX_SINGLESHOT_ADJ (ppm)*/ +static long tick_adj; /* tx->tick adjustment (ppm) */ +static long offset_adj; /* offset adjustment (ppm) */ + + +/* lock for the above variables */ +static seqlock_t ntp_lock = SEQLOCK_UNLOCKED; + +#define MILLION 1000000 +#define MAX_SINGLESHOT_ADJ 500 /* (ppm) */ +#define SEC_PER_DAY 86400 + +/* Required to safely shift negative values */ +#define shiftR(x,s) (x < 0) ? (-((-x) >> (s))) : ((x) >> (s)) + +/* nsec_t ntp_scale(nsec_t): + * Scale the raw time interval to an NTP scaled value + * + * This is done in three steps: + * 1. Tick adjustment. + * 2. Frequency adjustment. + * 3. Singleshot offset adjustment + * + * The first two are done over the entire raw interval. While + * the third is only done as long as the raw interval is less + * then the singleshot offset length. + * + * The basic equasion is: + * raw = raw + ppm scaling + * + * So while raw < ss_offset_len: + * raw = raw + (raw*(tick_adj+freq_adj+ss_offset_adj))/MILLION + * + * But when raw is > ss_offset_len: + * raw = raw + * + (ss_offset_len*(tick_adj+freq_adj+ss_offset_adj))/MILLION + * + ((raw - ss_offset_len)*(tick_adj+freq_adj))/MILLION + * + * XXX - TODO: + * o Get rid of nasty 64bit divides (try shifts) + * o more comments! + */ +nsec_t ntp_scale(nsec_t raw) +{ + u64 ret = (u64)raw; + unsigned long seq; + u64 offset_len; + long freq_ppm, tick_ppm, ss_ppm, offset_ppm; + + /* save off current kernel state */ + do { + seq = read_seqbegin(&ntp_lock); + + tick_ppm = tick_adj; + offset_ppm = offset_adj; + freq_ppm = shiftR(ntp_freq,SHIFT_USEC); + ss_ppm = singleshot_adj; + offset_len = (u64)ss_offset_len; + + } while (read_seqretry(&ntp_lock, seq)); + + + /* Due to sign issues, this is a bit ugly */ + if (raw <= offset_len) { + /* calcluate total ppm */ + long ppm = tick_ppm + freq_ppm + offset_ppm + ss_ppm; + + /* use abs(ppm) to avoid sign issues w/ do_div */ + u64 tmp = raw * abs(ppm); + + /* XXX - try to replace this w/ a 64bit shift */ + do_div(tmp, MILLION); + + /* reapply the sign lost by using abs(ppm) */ + if(ppm < 0) + ret -= tmp; + else + ret += tmp; + + } else { + /* Only apply MAX_SINGLESHOT_ADJ for the ss_offset_len */ + s64 v1 = offset_len*(tick_ppm + freq_ppm + offset_ppm + ss_ppm); + s64 v2 = (s64)(raw - offset_len)*(tick_ppm + freq_ppm+ offset_ppm); + + s64 tmp = v1+v2; + + /* use abs(tmp) to avoid sign issues w/ do_div */ + u64 tmp2 = abs(tmp); + + /* XXX - try to replace this w/ a 64bit shift */ + do_div(tmp2,MILLION); + + /* reapply the sign lost by using abs(tmp2) */ + if(tmp < 0) + ret -= tmp2; + else + ret += tmp2; + } + + return (nsec_t)ret; +} + +/* void ntp_advance(nsec_t interval): + * Periodic hook which increments NTP state machine by interval + * This is ntp_hardclock in the RFC. + */ +void ntp_advance(nsec_t interval) +{ + static u64 interval_sum=0; + + /* inc interval sum */ + interval_sum += interval; + + write_seqlock_irq(&ntp_lock); + + /* decrement singleshot offset interval */ + ss_offset_len =- interval; + if(ss_offset_len < 0) /* make sure it doesn't go negative */ + ss_offset_len=0; + + /* Do second overflow code */ + while (interval_sum > NSEC_PER_SEC) { + /* XXX - I'd prefer to smoothly apply this math + * at each call to ntp_advance() rather then each + * second. + */ + long tmp; + + /* Bump maxerror by ntp_tolerance */ + ntp_maxerror += shiftR(ntp_tolerance, SHIFT_USEC); + if (ntp_maxerror > NTP_PHASE_LIMIT) { + ntp_maxerror = NTP_PHASE_LIMIT; + ntp_status |= STA_UNSYNC; + } + + /* Calculate offset_adj for the next second */ + tmp = ntp_offset; + if (!(ntp_status & STA_FLL)) + tmp = shiftR(tmp, SHIFT_KG + ntp_constant); + + /* bound the adjustment to MAXPHASE/MINSEC */ + if (tmp > (MAXPHASE / MINSEC) << SHIFT_UPDATE) + tmp = (MAXPHASE / MINSEC) << SHIFT_UPDATE; + if (tmp < -(MAXPHASE / MINSEC) << SHIFT_UPDATE) + tmp = -(MAXPHASE / MINSEC) << SHIFT_UPDATE; + + offset_adj = shiftR(tmp, SHIFT_UPDATE); /* (usec/sec) = ppm */ + ntp_offset -= tmp; + + interval_sum -= NSEC_PER_SEC; + } + +/* XXX - if still needed do equiv code to switching time_next_adjust */ + + write_sequnlock_irq(&ntp_lock); + +} + + +/* called only by ntp_adjtimex while holding ntp_lock */ +static int ntp_hardupdate(long offset, struct timeval tv) +{ + int ret; + long tmp, interval; + + ret = 0; + if (!(ntp_status & STA_PLL)) + return ret; + + tmp = offset; + /* Make sure offset is bounded by MAXPHASE */ + if (tmp > MAXPHASE) + tmp = MAXPHASE; + if (tmp < -MAXPHASE) + tmp = -MAXPHASE; + + ntp_offset = tmp << SHIFT_UPDATE; + + if ((ntp_status & STA_FREQHOLD) || (ntp_reftime == 0)) + ntp_reftime = tv.tv_sec; + + /* calculate seconds since last call to hardupdate */ + interval = tv.tv_sec - ntp_reftime; + ntp_reftime = tv.tv_sec; + + if ((ntp_status & STA_FLL) && (interval >= MINSEC)) { + long damping; + tmp = (offset / interval); /* ppm (usec/sec)*/ + + /* convert to shifted ppm, then apply damping factor */ + + /* calculate damping factor - XXX bigger comment!*/ + damping = SHIFT_KH - SHIFT_USEC; + + /* apply damping factor */ + ntp_freq += shiftR(tmp,damping); + + printk("ntp->freq change: %ld\n",shiftR(tmp,damping)); + + } else if ((ntp_status & STA_PLL) && (interval < MAXSEC)) { + long damping; + tmp = offset * interval; /* ppm XXX - not quite*/ + + /* calculate damping factor - XXX bigger comment!*/ + damping = (2 * ntp_constant) + SHIFT_KF - SHIFT_USEC; + + /* apply damping factor */ + ntp_freq += shiftR(tmp,damping); + + printk("ntp->freq change: %ld\n", shiftR(tmp,damping)); + + } else { /* interval out of bounds */ + printk("interval out of bounds: %ld\n", interval); + ret = -1; /* TIME_ERROR */ + } + + /* bound ntp_freq */ + if (ntp_freq > ntp_tolerance) + ntp_freq = ntp_tolerance; + if (ntp_freq < -ntp_tolerance) + ntp_freq = -ntp_tolerance; + + return ret; +} + +/* int ntp_adjtimex(struct timex* tx) + * Interface to change NTP state machine + */ +int ntp_adjtimex(struct timex* tx) +{ + long save_offset; + int result; + +/*=[Sanity checking]===============================*/ + /* Check capabilities if we're trying to modify something */ + if (tx->modes && !capable(CAP_SYS_TIME)) + return -EPERM; + + /* frequency adjustment limited to +/- MAXFREQ */ + if ((tx->modes & ADJ_FREQUENCY) + && (abs(tx->freq) > MAXFREQ)) + return -EINVAL; + + /* maxerror adjustment limited to NTP_PHASE_LIMIT */ + if ((tx->modes & ADJ_MAXERROR) + && (tx->maxerror < 0 + || tx->maxerror >= NTP_PHASE_LIMIT)) + return -EINVAL; + + /* esterror adjustment limited to NTP_PHASE_LIMIT */ + if ((tx->modes & ADJ_ESTERROR) + && (tx->esterror < 0 + || tx->esterror >= NTP_PHASE_LIMIT)) + return -EINVAL; + + /* constant adjustment must be positive */ + if ((tx->modes & ADJ_TIMECONST) + && (tx->constant < 0)) + return -EINVAL; + + /* Single shot mode can only be used by itself */ + if (((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT) + && (tx->modes != ADJ_OFFSET_SINGLESHOT)) + return -EINVAL; + + /* offset adjustment limited to +/- MAXPHASE */ + if ((tx->modes != ADJ_OFFSET_SINGLESHOT) + && (tx->modes & ADJ_OFFSET) + && (abs(tx->offset)>= MAXPHASE)) + return -EINVAL; + + /* tick adjustment limited to 10% */ + if ((tx->modes & ADJ_TICK) + && ((tx->tick < 900000/USER_HZ) + ||(tx->tick > 11000000/USER_HZ))) + return -EINVAL; + + /* dbg output XXX - yank me! */ + if(tx->modes) { + printk("adjtimex: tx->offset: %ld tx->freq: %ld\n", + tx->offset, tx->freq); + } + +/*=[Kernel input bits]==========================*/ + write_seqlock_irq(&ntp_lock); + + result = ntp_state; + + /* For ADJ_OFFSET_SINGLESHOT we must return the old offset */ + save_offset = shiftR(ntp_offset, SHIFT_UPDATE); + + /* Process input parameters */ + if (tx->modes & ADJ_STATUS) { + ntp_status &= STA_RONLY; + ntp_status |= tx->status & ~STA_RONLY; + } + + if (tx->modes & ADJ_FREQUENCY) + ntp_freq = tx->freq; + + if (tx->modes & ADJ_MAXERROR) + ntp_maxerror = tx->maxerror; + + if (tx->modes & ADJ_ESTERROR) + ntp_esterror = tx->esterror; + + if (tx->modes & ADJ_TIMECONST) + ntp_constant = tx->constant; + + if (tx->modes & ADJ_OFFSET) { + if (tx->modes == ADJ_OFFSET_SINGLESHOT) { + if (tx->offset < 0) + singleshot_adj = -MAX_SINGLESHOT_ADJ; + else + singleshot_adj = MAX_SINGLESHOT_ADJ; + /* Calculate single shot offset interval: + * + * len = offset/(+/-)MAX_SINGESHOT_ADJ + * len = abs(offset)/MAX_SINGESHOT_ADJ + * len = (abs(offset)*NSEC_PER_USEC) + * /(MAX_SINGESHOT_ADJ /NSEC_PER_SEC) + * len = (abs(offset)*NSEC_PER_USEC*NSEC_PER_SEC)/MAX_SINGESHOT_ADJ + * len = abs(offset)*NSEC_PER_SEC*(NSEC_PER_USEC/MAX_SINGESHOT_ADJ) + */ + ss_offset_len = abs(tx->offset); + ss_offset_len *= NSEC_PER_SEC * (NSEC_PER_USEC/MAX_SINGLESHOT_ADJ); + } + /* call hardupdate() */ + if (ntp_hardupdate(tx->offset, tx->time)) + result = TIME_ERROR; + } + + if (tx->modes & ADJ_TICK) { + /* first calculate usec/user_tick offset */ + tick_adj = (USEC_PER_SEC/USER_HZ) - tx->tick; + /* multiply by user_hz to get usec/sec => ppm */ + tick_adj *= USER_HZ; + /* save tx->tick for future calls to adjtimex */ + ntp_tick = tx->tick; + } + + if ((ntp_status & (STA_UNSYNC|STA_CLOCKERR)) != 0 ) + result = TIME_ERROR; + +/*=[Kernel output bits]================================*/ + /* write kernel state to user timex values*/ + if ((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT) + tx->offset = save_offset; + else + tx->offset = shiftR(ntp_offset, SHIFT_UPDATE); + + tx->freq = ntp_freq; + tx->maxerror = ntp_maxerror; + tx->esterror = ntp_esterror; + tx->status = ntp_status; + tx->constant = ntp_constant; + tx->precision = ntp_precision; + tx->tolerance = ntp_tolerance; + + /* PPS is not implemented, so these are zero */ + tx->ppsfreq = /*XXX - Not Implemented!*/ 0; + tx->jitter = /*XXX - Not Implemented!*/ 0; + tx->shift = /*XXX - Not Implemented!*/ 0; + tx->stabil = /*XXX - Not Implemented!*/ 0; + tx->jitcnt = /*XXX - Not Implemented!*/ 0; + tx->calcnt = /*XXX - Not Implemented!*/ 0; + tx->errcnt = /*XXX - Not Implemented!*/ 0; + tx->stbcnt = /*XXX - Not Implemented!*/ 0; + + write_sequnlock_irq(&ntp_lock); + + return result; +} + + +/* void ntp_leapsecond(struct timespec now): + * NTP Leapsecnod processing code. Returns the number of + * seconds (-1, 0, or 1) that should be added to the current + * time to properly adjust for leapseconds. + */ +int ntp_leapsecond(struct timespec now) +{ + /* + * Leap second processing. If in leap-insert state at + * the end of the day, the system clock is set back one + * second; if in leap-delete state, the system clock is + * set ahead one second. The microtime() routine or + * external clock driver will insure that reported time + * is always monotonic. The ugly divides should be + * replaced. + */ + static time_t leaptime = 0; + + switch (ntp_state) { + case TIME_OK: + if (ntp_status & STA_INS) { + ntp_state = TIME_INS; + /* calculate end of today (23:59:59)*/ + leaptime = now.tv_sec + SEC_PER_DAY - (now.tv_sec % SEC_PER_DAY) - 1; + } + else if (ntp_status & STA_DEL) { + ntp_state = TIME_DEL; + /* calculate end of today (23:59:59)*/ + leaptime = now.tv_sec + SEC_PER_DAY - (now.tv_sec % SEC_PER_DAY) - 1; + } + break; + + case TIME_INS: + /* Once we are at (or past) leaptime, insert the second */ + if (now.tv_sec > leaptime) { + ntp_state = TIME_OOP; + printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n"); + + return -1; + } + break; + + case TIME_DEL: + /* Once we are at (or past) leaptime, delete the second */ + if (now.tv_sec >= leaptime) { + ntp_state = TIME_WAIT; + printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n"); + + return 1; + } + break; + + case TIME_OOP: + /* Wait for the end of the leap second*/ + if (now.tv_sec > (leaptime + 1)) + ntp_state = TIME_WAIT; + break; + + case TIME_WAIT: + if (!(ntp_status & (STA_INS | STA_DEL))) + ntp_state = TIME_OK; + } + + return 0; +} + +/* void ntp_clear(void): + * Clears the NTP state machine. + */ +void ntp_clear(void) +{ + write_seqlock_irq(&ntp_lock); + + /* clear everything */ + ntp_status |= STA_UNSYNC; + ntp_maxerror = NTP_PHASE_LIMIT; + ntp_esterror = NTP_PHASE_LIMIT; + ss_offset_len=0; + singleshot_adj=0; + tick_adj=0; + offset_adj =0; + + write_sequnlock_irq(&ntp_lock); + +} + +/* int get_ntp_status(void): + * Returns the NTP status. + */ +int get_ntp_status(void) +{ + return ntp_status; +} + diff -Nru a/kernel/time.c b/kernel/time.c --- a/kernel/time.c 2004-09-02 13:27:00 -07:00 +++ b/kernel/time.c 2004-09-02 13:27:00 -07:00 @@ -33,6 +33,7 @@ #include <linux/smp_lock.h> #include <asm/uaccess.h> #include <asm/unistd.h> +#include <linux/timeofday.h> /* * The timezone where the local system is located. Used as a default by some @@ -210,6 +211,7 @@ /* adjtimex mainly allows reading (and writing, if superuser) of * kernel time-keeping variables. used by xntpd. */ +#ifndef CONFIG_NEWTOD int do_adjtimex(struct timex *txc) { long ltemp, mtemp, save_adjust; @@ -392,6 +394,7 @@ do_gettimeofday(&txc->time); return(result); } +#endif asmlinkage long sys_adjtimex(struct timex __user *txc_p) { diff -Nru a/kernel/timeofday.c b/kernel/timeofday.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/kernel/timeofday.c 2004-09-02 13:27:00 -07:00 @@ -0,0 +1,330 @@ +/********************************************************************* +* linux/kernel/timeofday.c +* +* Copyright (C) 2003, 2004 IBM, John Stultz (johnstul@us.ibm.com) +* +* This file contains the functions which access and manage +* the system's time of day functionality. +* +* Revision History: +* 2004-09-02: A0 +* o First pass sent to lkml for review. +* +* TODO List: +* o More testing +**********************************************************************/ + +#include <linux/timeofday.h> +#include <linux/timesource.h> +#include <linux/ntp.h> +#include <linux/timex.h> + +/*XXX - remove later */ +#define TIME_DBG 1 +#define TIME_DBG_FREQ 120000 + +/*[Nanosecond based variables]---------------- + * system_time: + * Monotonically increasing counter of the number of nanoseconds + * since boot. + * wall_time_offset: + * Offset added to system_time to provide accurate time-of-day + */ +static nsec_t system_time; +static nsec_t wall_time_offset; + + +/*[Cycle based variables]---------------- + * offset_base: + * Value of the timesource at the last clock_interrupt_hook() + * (adjusted only minorly to account for rounded off cycles) + */ +static cycle_t offset_base; + +/*[Time source data]------------------- + * timesource + * current timesource pointer (initialized to timesource_jiffies) + * next_timesource: + * pointer to the timesource that will be installed at the next hook + */ +extern struct timesource_t timesource_jiffies; +static struct timesource_t *timesource = ×ource_jiffies; +static struct timesource_t *next_timesource; + +/*[Locks]---------------------------- + * system_time_lock: + * generic lock for all locally scoped time values + */ +static seqlock_t system_time_lock = SEQLOCK_UNLOCKED; + + +/* [XXX - Hacks]-------------------- + * Makes stuff compile + */ +extern unsigned long get_cmos_time(void); +extern void sync_persistant_clock(struct timespec ts); + +/* get_lowres_timestamp(): + * Returns a low res timestamp. + * (ie: the value of system_time as calculated at + * the last invocation of clock_interrupt_hook() ) + */ +nsec_t get_lowres_timestamp(void) +{ + nsec_t ret; + unsigned long seq; + do { + seq = read_seqbegin(&system_time_lock); + + /* quickly grab system_time*/ + ret = system_time; + + } while (read_seqretry(&system_time_lock, seq)); + + return ret; +} + +/* get_lowres_timeofday(): + * Returns a low res time of day, as calculated at the + * last invocation of clock_interrupt_hook() + */ +nsec_t get_lowres_timeofday(void) +{ + nsec_t ret; + unsigned long seq; + do { + seq = read_seqbegin(&system_time_lock); + + /* quickly calculate low-res time of day */ + ret = system_time + wall_time_offset; + + } while (read_seqretry(&system_time_lock, seq)); + + return ret; +} + + +/* __monotonic_clock(): + * private function, must hold system_time_lock lock when being + * called. Returns the monotonically increasing number of + * nanoseconds since the system booted (adjusted by NTP scaling) + */ +static nsec_t __monotonic_clock(void) +{ + nsec_t ret, ns_offset; + cycle_t now, delta; + + /* read timesource */ + now = timesource->read(); + + /* calculate the delta since the last clock_interrupt */ + delta = timesource->delta(now, offset_base); + + /* convert to nanoseconds */ + ns_offset = timesource->cyc2ns(delta, 0); + + /* apply the NTP scaling */ + ns_offset = ntp_scale(ns_offset); + + /* add result to system time */ + ret = system_time + ns_offset; + + return ret; +} + + +/* do_monotonic_clock(): + * Returns the monotonically increasing number of nanoseconds + * since the system booted via __monotonic_clock() + */ +nsec_t do_monotonic_clock(void) +{ + nsec_t ret; + unsigned long seq; + + /* atomically read __monotonic_clock() */ + do { + seq = read_seqbegin(&system_time_lock); + + ret = __monotonic_clock(); + + } while (read_seqretry(&system_time_lock, seq)); + + return ret; +} + + +/* do_gettimeofday(): + * Returns the time of day + */ +void do_gettimeofday(struct timeval *tv) +{ + nsec_t wall, sys; + unsigned long seq; + + /* atomically read wall and sys time */ + do { + seq = read_seqbegin(&system_time_lock); + + wall = wall_time_offset; + sys = __monotonic_clock(); + + } while (read_seqretry(&system_time_lock, seq)); + + /* add them and convert to timeval */ + *tv = ns2timeval(wall+sys); +} + + +/* do_settimeofday(): + * Sets the time of day + */ +int do_settimeofday(struct timespec *tv) +{ + /* convert timespec to ns */ + nsec_t newtime = timespec2ns(tv); + + /* atomically adjust wall_time_offset to the desired value */ + write_seqlock_irq(&system_time_lock); + + wall_time_offset = newtime - __monotonic_clock(); + + /* clear NTP settings */ + ntp_clear(); + + write_sequnlock_irq(&system_time_lock); + + return 0; +} + +/* do_adjtimex: + * Userspace NTP daemon's interface to the kernel NTP variables + */ +int do_adjtimex(struct timex *tx) +{ + do_gettimeofday(&tx->time); /* set timex->time*/ + /* Note: We set tx->time first, */ + /* because ntp_adjtimex uses it */ + + return ntp_adjtimex(tx); /* call out to NTP code */ +} + + +/* timeofday_interrupt_hook: + * calculates the delta since the last interrupt, + * updates system time and clears the offset. + * likely called by timer_interrupt() + */ +void timeofday_interrupt_hook(void) +{ + cycle_t now, delta, remainder; + nsec_t ns, ntp_ns; + long leapsecond; + + write_seqlock(&system_time_lock); + + /* read time source */ + now = timesource->read(); + + /* calculate cycle delta */ + delta = timesource->delta(now, offset_base); + + /* convert cycles to ns and save remainder */ + ns = timesource->cyc2ns(delta, &remainder); + + /* apply NTP scaling factor for this tick */ + ntp_ns = ntp_scale(ns); + +#if TIME_DBG /* XXX - remove later*/ +{ + static int dbg=0; + if(!(dbg++%TIME_DBG_FREQ)){ + printk("now: %lluc - then: %lluc = delta: %lluc -> %llu ns + %llu cyc (ntp: %lluc)\n", + now, offset_base, delta, ns, remainder, ntp_ns); + } +} +#endif + /* update system_time */ + system_time += ntp_ns; + + /* reset the offset_base */ + offset_base = now; + + /* subtract remainder to account for rounded off cycles */ + offset_base = timesource->delta(offset_base,remainder); + + /* advance the ntp state machine by ns*/ + ntp_advance(ns); + + /* do ntp leap second processing*/ + leapsecond = ntp_leapsecond(ns2timespec(system_time+wall_time_offset)); + if (leapsecond == 1) /* XXX could this be cleaner? */ + wall_time_offset += 1; + if (leapsecond == -1) + wall_time_offset -= 1; + + /* sync the persistant clock */ + if (!(get_ntp_status() & STA_UNSYNC)) + sync_persistant_clock(ns2timespec(system_time + wall_time_offset)); + + /* if necessary, switch timesources */ + if (next_timesource) { + /* immediately set new offset_base */ + offset_base = next_timesource->read(); + /* swap timesources */ + timesource = next_timesource; + next_timesource = 0; + + printk(KERN_INFO "Time: %s timesource has been installed\n", + timesource->name); + } + + + /* update legacy time values */ + write_seqlock(&xtime_lock); + xtime = ns2timespec(system_time + wall_time_offset); + wall_to_monotonic = ns2timespec(wall_time_offset); + wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec; + wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec; + write_sequnlock(&xtime_lock); + + write_sequnlock(&system_time_lock); +} + +/* register_timesource(): + * Used to install a new timesource + */ +void register_timesource(struct timesource_t* t) +{ + write_seqlock(&system_time_lock); + + /* XXX - check override */ + + /* if next_timesource has been set, make sure we beat that one too */ + if (next_timesource && (t->priority > next_timesource->priority)) + next_timesource = t; + else if(t->priority > timesource->priority) + next_timesource = t; + + write_sequnlock(&system_time_lock); +} + + +/* timeofday_init(): + * Initializes time variables + */ +void timeofday_init(void) +{ + write_seqlock(&system_time_lock); + + /* clear and initialize offsets*/ + offset_base = timesource->read(); + wall_time_offset = ((u64)get_cmos_time()) * NSEC_PER_SEC; + + /* clear NTP scaling factor*/ + ntp_clear(); + + write_sequnlock(&system_time_lock); + + return; +} diff -Nru a/kernel/timer.c b/kernel/timer.c --- a/kernel/timer.c 2004-09-02 13:27:00 -07:00 +++ b/kernel/timer.c 2004-09-02 13:27:00 -07:00 @@ -559,6 +559,7 @@ int tickadj = 500/HZ ? : 1; /* microsecs */ +#ifndef CONFIG_NEWTOD /* * phase-lock loop variables */ @@ -786,6 +787,9 @@ second_overflow(); } } +#else /* CONFIG_NEWTOD */ +#define update_wall_time(x) +#endif /* CONFIG_NEWTOD */ static inline void do_process_times(struct task_struct *p, unsigned long user, unsigned long system) ^ permalink raw reply [flat|nested] 84+ messages in thread
* [RFC][PATCH] new timeofday i386 hooks (v.A0) 2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz @ 2004-09-02 21:11 ` john stultz 2004-09-02 21:12 ` [RFC][PATCH] new timeofday i386 timesources (v.A0) john stultz 2004-09-03 1:44 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) George Anzinger 2004-09-02 22:19 ` [RFC][PATCH] new timeofday core subsystem (v.A0) Christoph Lameter 2004-09-03 1:39 ` George Anzinger 2 siblings, 2 replies; 84+ messages in thread From: john stultz @ 2004-09-02 21:11 UTC (permalink / raw) To: lkml Cc: tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott All, This patch implements the minimal i386 architecture hooks to enable the new time of day subsystem code. It applies on top of my linux-2.6.9-rc1_timeofday-core_A0 patch and with this patch applied, you can test the new time of day subsystem on i386. Basically it adds the call to timeofday_interrupt_hook() and cuts alot of code out of the build. The only new code is the sync_persistant_clock() function which is mostly ripped out of do_timer_interrupt(). Pretty un-interesting. I look forward to your comments and feedback. thanks -john linux-2.6.9-rc1_timeofday-i386_A0.patch ======================================= diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig --- a/arch/i386/Kconfig 2004-09-02 13:29:59 -07:00 +++ b/arch/i386/Kconfig 2004-09-02 13:29:59 -07:00 @@ -14,6 +14,10 @@ 486, 586, Pentiums, and various instruction-set-compatible chips by AMD, Cyrix, and others. +config NEWTOD + bool + default y + config MMU bool default y diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c --- a/arch/i386/kernel/time.c 2004-09-02 13:29:59 -07:00 +++ b/arch/i386/kernel/time.c 2004-09-02 13:29:59 -07:00 @@ -67,6 +67,8 @@ #include "io_ports.h" +#include <linux/timeofday.h> + extern spinlock_t i8259A_lock; int pit_latch_buggy; /* extern */ @@ -87,6 +89,7 @@ struct timer_opts *cur_timer = &timer_none; +#ifndef CONFIG_NEWTOD /* * This version of gettimeofday has microsecond resolution * and better than microsecond precision on fast x86 machines with TSC. @@ -169,6 +172,7 @@ } EXPORT_SYMBOL(do_settimeofday); +#endif static int set_rtc_mmss(unsigned long nowtime) { @@ -194,12 +198,39 @@ * Note: This function is required to return accurate * time even in the absence of multiple timer ticks. */ +#ifndef CONFIG_NEWTOD unsigned long long monotonic_clock(void) { return cur_timer->monotonic_clock(); } EXPORT_SYMBOL(monotonic_clock); +#endif +void sync_persistant_clock(struct timespec ts) +{ + /* + * If we have an externally synchronized Linux clock, then update + * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be + * called as close as possible to 500 ms before the new second starts. + */ + if (ts.tv_sec > last_rtc_update + 660 && + (ts.tv_nsec / 1000) + >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && + (ts.tv_nsec / 1000) + <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { + /* horrible...FIXME */ + if (efi_enabled) { + if (efi_set_rtc_mmss(ts.tv_sec) == 0) + last_rtc_update = ts.tv_sec; + else + last_rtc_update = ts.tv_sec - 600; + } else if (set_rtc_mmss(ts.tv_sec) == 0) + last_rtc_update = ts.tv_sec; + else + last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */ + } + +} /* * timer_interrupt() needs to keep up the real-time clock, @@ -226,6 +257,7 @@ do_timer_interrupt_hook(regs); +#ifndef CONFIG_NEWTOD /* * If we have an externally synchronized Linux clock, then update * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be @@ -248,6 +280,7 @@ else last_rtc_update = xtime.tv_sec - 600; /* do it again in 60 s */ } +#endif #ifdef CONFIG_MCA if( MCA_bus ) { @@ -282,11 +315,15 @@ */ write_seqlock(&xtime_lock); +#ifndef CONFIG_NEWTOD cur_timer->mark_offset(); +#endif do_timer_interrupt(irq, NULL, regs); write_sequnlock(&xtime_lock); + + timeofday_interrupt_hook(); return IRQ_HANDLED; } ^ permalink raw reply [flat|nested] 84+ messages in thread
* [RFC][PATCH] new timeofday i386 timesources (v.A0) 2004-09-02 21:11 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) john stultz @ 2004-09-02 21:12 ` john stultz 2004-09-03 1:44 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) George Anzinger 1 sibling, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-02 21:12 UTC (permalink / raw) To: lkml Cc: tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott All, This patch implements most of the i386 time sources (tsc, pit, cyclone, acpi-pm). It applies ontop of my linux-2.6.9-rc1_timeofday-i386_A0 patch. It provides real timesources (opposed to the example jiffies timesource) that can be used for more realistic testing. This patch is the shabbiest of the three. It needs to be broken up, and cleaned. The i386_pit.c is broken and the cpufreq code needs to be added to tsc.c. But it will get you going so you can test and play with the core code. thanks -john linux-2.6.9_rc1_timeofday-i386-timesources_A0.patch =================================================== diff -Nru a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile --- a/arch/i386/kernel/Makefile 2004-09-02 13:45:15 -07:00 +++ b/arch/i386/kernel/Makefile 2004-09-02 13:45:15 -07:00 @@ -7,7 +7,7 @@ obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \ ptrace.o i8259.o ioport.o ldt.o setup.o time.o sys_i386.o \ pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \ - doublefault.o + doublefault.o tsc.o obj-y += cpu/ obj-y += timers/ diff -Nru a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c --- a/arch/i386/kernel/setup.c 2004-09-02 13:45:15 -07:00 +++ b/arch/i386/kernel/setup.c 2004-09-02 13:45:15 -07:00 @@ -48,6 +48,7 @@ #include <asm/io_apic.h> #include <asm/ist.h> #include <asm/io.h> +#include <asm/tsc.h> #include "setup_arch_pre.h" #include <bios_ebda.h> @@ -1413,6 +1414,7 @@ conswitchp = &dummy_con; #endif #endif + tsc_init(); } #include "setup_arch_post.h" diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c --- a/arch/i386/kernel/time.c 2004-09-02 13:45:15 -07:00 +++ b/arch/i386/kernel/time.c 2004-09-02 13:45:15 -07:00 @@ -412,6 +412,7 @@ void __init time_init(void) { +#ifndef CONFIG_NEWTOD #ifdef CONFIG_HPET_TIMER if (is_hpet_capable()) { /* @@ -429,6 +430,7 @@ cur_timer = select_timer(); printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name); +#endif time_init_hook(); } diff -Nru a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile --- a/arch/i386/kernel/timers/Makefile 2004-09-02 13:45:15 -07:00 +++ b/arch/i386/kernel/timers/Makefile 2004-09-02 13:45:15 -07:00 @@ -4,6 +4,6 @@ obj-y := timer.o timer_none.o timer_tsc.o timer_pit.o common.o -obj-$(CONFIG_X86_CYCLONE_TIMER) += timer_cyclone.o +#obj-$(CONFIG_X86_CYCLONE_TIMER) += timer_cyclone.o obj-$(CONFIG_HPET_TIMER) += timer_hpet.o -obj-$(CONFIG_X86_PM_TIMER) += timer_pm.o +#obj-$(CONFIG_X86_PM_TIMER) += timer_pm.o diff -Nru a/arch/i386/kernel/timers/common.c b/arch/i386/kernel/timers/common.c --- a/arch/i386/kernel/timers/common.c 2004-09-02 13:45:15 -07:00 +++ b/arch/i386/kernel/timers/common.c 2004-09-02 13:45:15 -07:00 @@ -21,8 +21,6 @@ * device. */ -#define CALIBRATE_TIME (5 * 1000020/HZ) - unsigned long __init calibrate_tsc(void) { mach_prepare_counter(); diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c --- a/arch/i386/kernel/timers/timer.c 2004-09-02 13:45:15 -07:00 +++ b/arch/i386/kernel/timers/timer.c 2004-09-02 13:45:15 -07:00 @@ -14,13 +14,13 @@ /* list of timers, ordered by preference, NULL terminated */ static struct timer_opts* timers[] = { #ifdef CONFIG_X86_CYCLONE_TIMER - &timer_cyclone, +// &timer_cyclone, #endif #ifdef CONFIG_HPET_TIMER &timer_hpet, #endif #ifdef CONFIG_X86_PM_TIMER - &timer_pmtmr, +// &timer_pmtmr, #endif &timer_tsc, &timer_pit, diff -Nru a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/arch/i386/kernel/tsc.c 2004-09-02 13:45:15 -07:00 @@ -0,0 +1,39 @@ +#include <linux/init.h> +#include <linux/timex.h> +#include <asm/tsc.h> +#include "mach_timer.h" + +unsigned long cpu_freq_khz; + +void tsc_init(void) +{ + unsigned long long start, end; + unsigned long count; + u64 delta64; + int i; + + /* repeat 3 times to make sure the cache is warm */ + for(i=0; i < 3; i++) { + mach_prepare_counter(); + rdtscll(start); + mach_countup(&count); + rdtscll(end); + } + delta64 = end - start; + + /* cpu freq too fast */ + if(delta64 > (1ULL<<32)) + return; + /* cpu freq too slow */ + if (delta64 <= CALIBRATE_TIME) + return; + + delta64 *= 1000; + do_div(delta64,CALIBRATE_TIME); + cpu_freq_khz = (unsigned long)delta64; + + cpu_khz = cpu_freq_khz; + + printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000); + +} diff -Nru a/drivers/timesource/Makefile b/drivers/timesource/Makefile --- a/drivers/timesource/Makefile 2004-09-02 13:45:15 -07:00 +++ b/drivers/timesource/Makefile 2004-09-02 13:45:15 -07:00 @@ -1 +1,6 @@ obj-y += jiffies.o +obj-$(CONFIG_X86) += i386_tsc.o +#obj-$(CONFIG_X86) += i386_pit.o +obj-$(CONFIG_X86_SUMMIT) += cyclone.o +obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o + diff -Nru a/drivers/timesource/acpi_pm.c b/drivers/timesource/acpi_pm.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/drivers/timesource/acpi_pm.c 2004-09-02 13:45:15 -07:00 @@ -0,0 +1,130 @@ +#include <linux/timesource.h> +#include <linux/errno.h> +#include <linux/init.h> +#include <asm/io.h> +#include "mach_timer.h" + +/* Number of PMTMR ticks expected during calibration run */ +#define PMTMR_TICKS_PER_SEC 3579545 +#define PMTMR_EXPECTED_RATE \ + ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10)) + + +/* The I/O port the PMTMR resides at. + * The location is detected during setup_arch(), + * in arch/i386/acpi/boot.c */ +u32 pmtmr_ioport = 0; + +#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */ + +static inline u32 read_pmtmr(void) +{ + u32 v1=0,v2=0,v3=0; + /* It has been reported that because of various broken + * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time + * source is not latched, so you must read it multiple + * times to insure a safe value is read. + */ + do { + v1 = inl(pmtmr_ioport); + v2 = inl(pmtmr_ioport); + v3 = inl(pmtmr_ioport); + } while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1) + || (v3 > v1 && v3 < v2)); + + /* mask the output to 24 bits */ + return v2 & ACPI_PM_MASK; +} + + +static cycle_t acpi_pm_read(void) +{ + return (cycle_t)read_pmtmr(); +} + +static cycle_t acpi_pm_delta(cycle_t now, cycle_t then) +{ + return (now - then) & ACPI_PM_MASK; +} + +static nsec_t acpi_pm_cyc2ns(cycle_t cyc, cycle_t* remainder) +{ + /* Power management timer runs at 3.579545 cycles per + * microsecond. Thus it runs at 0.003579545 cyc/nsec which + * inverted is 279.36511 ns/cyc. Which is ~286070/1024 + */ + u64 cycles = cyc; + cycles *= 286070; + cycles >>= 10; + if(remainder) + *remainder = 0; + return (nsec_t)cycles; +} + +struct timesource_t timesource_acpi_pm = { + .name = "acpi_pm", + .priority = 200, + .read = acpi_pm_read, + .delta = acpi_pm_delta, + .cyc2ns = acpi_pm_cyc2ns, +}; + +/* + * Some boards have the PMTMR running way too fast. We check + * the PMTMR rate against PIT channel 2 to catch these cases. + */ +static int verify_pmtmr_rate(void) +{ + u32 value1, value2; + unsigned long count, delta; + + mach_prepare_counter(); + value1 = read_pmtmr(); + mach_countup(&count); + value2 = read_pmtmr(); + delta = (value2 - value1) & ACPI_PM_MASK; + + /* Check that the PMTMR delta is within 5% of what we expect */ + if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 || + delta > (PMTMR_EXPECTED_RATE * 21) / 20) { + printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE); + return -1; + } + + return 0; +} + + +static int init_acpi_pm_timesource(void) +{ + u32 value1, value2; + unsigned int i; + + if (!pmtmr_ioport) + return -ENODEV; + + /* "verify" this timing source */ + value1 = read_pmtmr(); + for (i = 0; i < 10000; i++) { + value2 = read_pmtmr(); + if (value2 == value1) + continue; + if (value2 > value1) + goto pm_good; + if ((value2 < value1) && ((value2) < 0xFFF)) + goto pm_good; + printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2); + return -EINVAL; + } + printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1); + return -ENODEV; + +pm_good: + if (verify_pmtmr_rate() != 0) + return -ENODEV; + + register_timesource(×ource_acpi_pm); + return 0; +} + +module_init(init_acpi_pm_timesource); diff -Nru a/drivers/timesource/cyclone.c b/drivers/timesource/cyclone.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/drivers/timesource/cyclone.c 2004-09-02 13:45:15 -07:00 @@ -0,0 +1,167 @@ +#include <linux/timesource.h> +#include <linux/errno.h> +#include <linux/string.h> +#include <linux/timex.h> +#include <linux/init.h> + +#include <asm/io.h> +#include <asm/pgtable.h> +#include <asm/fixmap.h> +#include "mach_timer.h" + +#define CYCLONE_CBAR_ADDR 0xFEB00CD0 +#define CYCLONE_PMCC_OFFSET 0x51A0 +#define CYCLONE_MPMC_OFFSET 0x51D0 +#define CYCLONE_MPCS_OFFSET 0x51A8 +#define CYCLONE_TIMER_FREQ 100000000 +#define CYCLONE_TIMER_MASK (((u64)1<<40)-1) /* 40 bit mask */ + +unsigned long cyclone_freq_khz; + +int use_cyclone = 0; +static u32* volatile cyclone_timer; /* Cyclone MPMC0 register */ + +/* helper macro to atomically read both cyclone counter registers */ +#define read_cyclone_counter(low,high) \ + do{ \ + high = cyclone_timer[1]; low = cyclone_timer[0]; \ + } while (high != cyclone_timer[1]); + + +static cycle_t cyclone_read(void) +{ + u32 low, high; + u64 ret; + + read_cyclone_counter(low,high); + ret = ((u64)high << 32)|low; + + return (cycle_t)ret; +} + +static cycle_t cyclone_delta(cycle_t now, cycle_t then) +{ + return (now - then)&CYCLONE_TIMER_MASK; +} + +static nsec_t cyclone_cyc2ns(cycle_t cyc, cycle_t* remainder) +{ + u64 rem; + cyc *= 1000000; + rem = do_div(cyc, cyclone_freq_khz); + if (remainder) + *remainder = rem/1000000; /*XXX - do we still loose precision here? */ + return (nsec_t)cyc; +} + +struct timesource_t timesource_cyclone = { + .name = "cyclone", + .priority = 100, + .read = cyclone_read, + .delta = cyclone_delta, + .cyc2ns = cyclone_cyc2ns, +}; + + +static void calibrate_cyclone(void) +{ + u32 startlow, starthigh, endlow, endhigh, delta32; + u64 start, end, delta64; + unsigned long i, count; + /* repeat 3 times to make sure the cache is warm */ + for(i=0; i < 3; i++) { + mach_prepare_counter(); + read_cyclone_counter(startlow,starthigh); + mach_countup(&count); + read_cyclone_counter(endlow,endhigh); + } + start = (u64)starthigh<<32|startlow; + end = (u64)endhigh<<32|endlow; + + delta64 = end - start; + printk("cyclone delta: %llu\n", delta64); + delta64 *= (ACTHZ/1000)>>8; + printk("delta*hz = %llu\n", delta64); + delta32 = (u32)delta64; + cyclone_freq_khz = delta32/CALIBRATE_ITERATION; + printk("calculated cyclone_freq: %lu khz\n", cyclone_freq_khz); +} + +static int init_cyclone_timesource(void) +{ + u32* reg; + u32 base; /* saved cyclone base address */ + u32 pageaddr; /* page that contains cyclone_timer register */ + u32 offset; /* offset from pageaddr to cyclone_timer register */ + int i; + + /*make sure we're on a summit box*/ + if(!use_cyclone) return -ENODEV; + + printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n"); + + /* find base address */ + pageaddr = (CYCLONE_CBAR_ADDR)&PAGE_MASK; + offset = (CYCLONE_CBAR_ADDR)&(~PAGE_MASK); + set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr); + reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset); + if(!reg){ + printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n"); + return -ENODEV; + } + base = *reg; + if(!base){ + printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n"); + return -ENODEV; + } + + /* setup PMCC */ + pageaddr = (base + CYCLONE_PMCC_OFFSET)&PAGE_MASK; + offset = (base + CYCLONE_PMCC_OFFSET)&(~PAGE_MASK); + set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr); + reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset); + if(!reg){ + printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n"); + return -ENODEV; + } + reg[0] = 0x00000001; + + /* setup MPCS */ + pageaddr = (base + CYCLONE_MPCS_OFFSET)&PAGE_MASK; + offset = (base + CYCLONE_MPCS_OFFSET)&(~PAGE_MASK); + set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr); + reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset); + if(!reg){ + printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n"); + return -ENODEV; + } + reg[0] = 0x00000001; + + /* map in cyclone_timer */ + pageaddr = (base + CYCLONE_MPMC_OFFSET)&PAGE_MASK; + offset = (base + CYCLONE_MPMC_OFFSET)&(~PAGE_MASK); + set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr); + cyclone_timer = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset); + if(!cyclone_timer){ + printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n"); + return -ENODEV; + } + + /*quick test to make sure its ticking*/ + for(i=0; i<3; i++){ + u32 old = cyclone_timer[0]; + int stall = 100; + while(stall--) barrier(); + if(cyclone_timer[0] == old){ + printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n"); + cyclone_timer = 0; + return -ENODEV; + } + } + calibrate_cyclone(); + register_timesource(×ource_cyclone); + + return 0; +} + +module_init(init_cyclone_timesource); diff -Nru a/drivers/timesource/i386_pit.c b/drivers/timesource/i386_pit.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/drivers/timesource/i386_pit.c 2004-09-02 13:45:15 -07:00 @@ -0,0 +1,99 @@ +/* pit timesource */ + +#include <linux/timesource.h> +#include <linux/timex.h> +#include <linux/init.h> + +#include <asm/io.h> +#include <asm/timer.h> +#include "io_ports.h" +#include "do_timer.h" + +extern u64 jiffies_64; +extern long jiffies; +extern spinlock_t i8253_lock; + +/* Since the PIT overflows every tick, its not very useful + * to just read by itself. So throw jiffies into the mix to + * and just return nanoseconds in pit_read(). + */ + +static cycle_t pit_read(void) +{ + unsigned long flags; + int count; + unsigned long jiffies_t; + static int count_p; + static unsigned long jiffies_p = 0; + + spin_lock_irqsave(&i8253_lock, flags); + + outb_p(0x00, PIT_MODE); /* latch the count ASAP */ + + count = inb_p(PIT_CH0); /* read the latched count */ + jiffies_t = jiffies; + count |= inb_p(PIT_CH0) << 8; + + /* VIA686a test code... reset the latch if count > max + 1 */ + if (count > LATCH) { + outb_p(0x34, PIT_MODE); + outb_p(LATCH & 0xff, PIT_CH0); + outb(LATCH >> 8, PIT_CH0); + count = LATCH - 1; + } + + /* + * avoiding timer inconsistencies (they are rare, but they happen)... + * there are two kinds of problems that must be avoided here: + * 1. the timer counter underflows + * 2. hardware problem with the timer, not giving us continuous time, + * the counter does small "jumps" upwards on some Pentium systems, + * (see c't 95/10 page 335 for Neptun bug.) + */ + + if( jiffies_t == jiffies_p ) { + if( count > count_p ) { + /* the nutcase */ + count = do_timer_overflow(count); + } + } else + jiffies_p = jiffies_t; + + count_p = count; + + spin_unlock_irqrestore(&i8253_lock, flags); + + count = ((LATCH-1) - count) * TICK_SIZE; + count = (count + LATCH/2) / LATCH; + + count *= 1000; /* convert count from usec->nsec */ + + return (cycle_t)((jiffies_64 * TICK_NSEC) + count); +} + +static cycle_t pit_delta(cycle_t now, cycle_t then) +{ + return now - then; +} + +/* just return cyc, as its already in ns */ +static nsec_t pit_cyc2ns(cycle_t cyc, cycle_t* remainder) +{ + return (nsec_t)cyc; +} + +static struct timesource_t timesource_pit = { + .name = "pit", + .priority = 0, + .read = pit_read, + .delta = pit_delta, + .cyc2ns = pit_cyc2ns, +}; + +static int init_pit_timesource(void) +{ + register_timesource(×ource_pit); + return 0; +} + +module_init(init_pit_timesource); diff -Nru a/drivers/timesource/i386_tsc.c b/drivers/timesource/i386_tsc.c --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/drivers/timesource/i386_tsc.c 2004-09-02 13:45:15 -07:00 @@ -0,0 +1,53 @@ +/* TODO: + * o cpufreq code + * o better calibration + */ + +#include <linux/timesource.h> +#include "mach_timer.h" +#include <linux/timex.h> +#include <linux/init.h> +#include <linux/cpufreq.h> +#include <asm/tsc.h> + +static cycle_t tsc_read(void) +{ + u64 ret; + rdtscll(ret); + return (cycle_t)ret; +} + +static cycle_t tsc_delta(cycle_t now, cycle_t then) +{ + return now - then; +} + +static nsec_t tsc_cyc2ns(cycle_t cyc, cycle_t* remainder) +{ + u32 rem; + cyc *= 1000000; + rem = do_div(cyc, cpu_freq_khz); + if(remainder) + *remainder = rem/1000000; /*XXX - do we still loose precision here? */ + return (nsec_t)cyc; +} + + +static struct timesource_t timesource_tsc = { + .name = "tsc", + .priority = 25, + .read = tsc_read, + .delta = tsc_delta, + .cyc2ns = tsc_cyc2ns, +}; + + +static int init_tsc_timesource(void) +{ + /* All the initialization is done in arch/i386/kernel/tsc.c */ + if (cpu_has_tsc && cpu_freq_khz) + register_timesource(×ource_tsc); + return 0; +} + +module_init(init_tsc_timesource); diff -Nru a/include/asm-i386/mach-default/mach_timer.h b/include/asm-i386/mach-default/mach_timer.h --- a/include/asm-i386/mach-default/mach_timer.h 2004-09-02 13:45:15 -07:00 +++ b/include/asm-i386/mach-default/mach_timer.h 2004-09-02 13:45:15 -07:00 @@ -14,8 +14,11 @@ */ #ifndef _MACH_TIMER_H #define _MACH_TIMER_H +#include <asm/io.h> -#define CALIBRATE_LATCH (5 * LATCH) +#define CALIBRATE_ITERATION 50 +#define CALIBRATE_LATCH (CALIBRATE_ITERATION * LATCH) +#define CALIBRATE_TIME (CALIBRATE_ITERATION * 1000020/HZ) static inline void mach_prepare_counter(void) { diff -Nru a/include/asm-i386/timer.h b/include/asm-i386/timer.h --- a/include/asm-i386/timer.h 2004-09-02 13:45:15 -07:00 +++ b/include/asm-i386/timer.h 2004-09-02 13:45:15 -07:00 @@ -42,7 +42,7 @@ extern struct timer_opts timer_pit; extern struct timer_opts timer_tsc; #ifdef CONFIG_X86_CYCLONE_TIMER -extern struct timer_opts timer_cyclone; +//extern struct timer_opts timer_cyclone; #endif extern unsigned long calibrate_tsc(void); @@ -53,6 +53,6 @@ #endif #ifdef CONFIG_X86_PM_TIMER -extern struct timer_opts timer_pmtmr; +//extern struct timer_opts timer_pmtmr; #endif #endif diff -Nru a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h --- /dev/null Wed Dec 31 16:00:00 196900 +++ b/include/asm-i386/tsc.h 2004-09-02 13:45:15 -07:00 @@ -0,0 +1,6 @@ +#ifndef _ASM_I386_TSC_H +#define _ASM_I386_TSC_H +extern unsigned long cpu_freq_khz; +void tsc_init(void); + +#endif ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday i386 hooks (v.A0) 2004-09-02 21:11 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) john stultz 2004-09-02 21:12 ` [RFC][PATCH] new timeofday i386 timesources (v.A0) john stultz @ 2004-09-03 1:44 ` George Anzinger 2004-09-03 2:06 ` john stultz 2004-09-03 8:07 ` Ulrich Windl 1 sibling, 2 replies; 84+ messages in thread From: George Anzinger @ 2004-09-03 1:44 UTC (permalink / raw) To: john stultz Cc: lkml, tim, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > All, > This patch implements the minimal i386 architecture hooks to enable the > new time of day subsystem code. It applies on top of my > linux-2.6.9-rc1_timeofday-core_A0 patch and with this patch applied, you > can test the new time of day subsystem on i386. Basically it adds the > call to timeofday_interrupt_hook() and cuts alot of code out of the > build. The only new code is the sync_persistant_clock() function which > is mostly ripped out of do_timer_interrupt(). Pretty un-interesting. > > I look forward to your comments and feedback. > > thanks > -john > > linux-2.6.9-rc1_timeofday-i386_A0.patch > ======================================= > diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig > --- a/arch/i386/Kconfig 2004-09-02 13:29:59 -07:00 > +++ b/arch/i386/Kconfig 2004-09-02 13:29:59 -07:00 > @@ -14,6 +14,10 @@ > 486, 586, Pentiums, and various instruction-set-compatible chips by > AMD, Cyrix, and others. > > +config NEWTOD > + bool > + default y > + > config MMU > bool > default y > diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c > --- a/arch/i386/kernel/time.c 2004-09-02 13:29:59 -07:00 > +++ b/arch/i386/kernel/time.c 2004-09-02 13:29:59 -07:00 > @@ -67,6 +67,8 @@ > > #include "io_ports.h" > > +#include <linux/timeofday.h> > + > extern spinlock_t i8259A_lock; > int pit_latch_buggy; /* extern */ > > @@ -87,6 +89,7 @@ > > struct timer_opts *cur_timer = &timer_none; > > +#ifndef CONFIG_NEWTOD > /* > * This version of gettimeofday has microsecond resolution > * and better than microsecond precision on fast x86 machines with TSC. > @@ -169,6 +172,7 @@ > } > > EXPORT_SYMBOL(do_settimeofday); > +#endif > > static int set_rtc_mmss(unsigned long nowtime) > { > @@ -194,12 +198,39 @@ > * Note: This function is required to return accurate > * time even in the absence of multiple timer ticks. > */ > +#ifndef CONFIG_NEWTOD > unsigned long long monotonic_clock(void) > { > return cur_timer->monotonic_clock(); > } > EXPORT_SYMBOL(monotonic_clock); > +#endif > > +void sync_persistant_clock(struct timespec ts) > +{ > + /* > + * If we have an externally synchronized Linux clock, then update > + * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be > + * called as close as possible to 500 ms before the new second starts. > + */ > + if (ts.tv_sec > last_rtc_update + 660 && > + (ts.tv_nsec / 1000) > + >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && > + (ts.tv_nsec / 1000) > + <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { > + /* horrible...FIXME */ > + if (efi_enabled) { > + if (efi_set_rtc_mmss(ts.tv_sec) == 0) > + last_rtc_update = ts.tv_sec; > + else > + last_rtc_update = ts.tv_sec - 600; > + } else if (set_rtc_mmss(ts.tv_sec) == 0) > + last_rtc_update = ts.tv_sec; > + else > + last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */ > + } > + I have wondered, and continue to do so, why this is not a timer driven function. It just seems silly to check this every interrupt when we have low overhead timers for just this sort of thing. I wonder about the load calc in the same way... Now, the question is how do you hook up the timer list. We MUST be able to start a timer that will run for several min to hours and have it expire such that the wall time difference is "really" close to what was requested. This requires some "lock up" between the wall clock and the timer subsystem. What are your thoughts here? George -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday i386 hooks (v.A0) 2004-09-03 1:44 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) George Anzinger @ 2004-09-03 2:06 ` john stultz 2004-09-03 8:07 ` Ulrich Windl 1 sibling, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-03 2:06 UTC (permalink / raw) To: george anzinger Cc: lkml, tim, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 18:44, George Anzinger wrote: > john stultz wrote: > > +void sync_persistant_clock(struct timespec ts) > > +{ > > + /* > > + * If we have an externally synchronized Linux clock, then update > > + * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be > > + * called as close as possible to 500 ms before the new second starts. > > + */ > > + if (ts.tv_sec > last_rtc_update + 660 && > > + (ts.tv_nsec / 1000) > > + >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && > > + (ts.tv_nsec / 1000) > > + <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { > > + /* horrible...FIXME */ > > + if (efi_enabled) { > > + if (efi_set_rtc_mmss(ts.tv_sec) == 0) > > + last_rtc_update = ts.tv_sec; > > + else > > + last_rtc_update = ts.tv_sec - 600; > > + } else if (set_rtc_mmss(ts.tv_sec) == 0) > > + last_rtc_update = ts.tv_sec; > > + else > > + last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */ > > + } > > + > I have wondered, and continue to do so, why this is not a timer driven function. > It just seems silly to check this every interrupt when we have low overhead > timers for just this sort of thing. > > I wonder about the load calc in the same way... The really cool thing is: with the new infrastructure separating the timeofday code and the timer code, we can make timeofday_interrupt_hook be called from a soft-timer! We don't have to do the accounting every interrupt, but just frequently enough that the time sources don't overflow! > Now, the question is how do you hook up the timer list. We MUST be able to > start a timer that will run for several min to hours and have it expire such > that the wall time difference is "really" close to what was requested. This > requires some "lock up" between the wall clock and the timer subsystem. > > What are your thoughts here? Well, looking way down the road, once timeofday is independent from the soft-timer code, we can invert the dependency so the soft-timer code uses the high-res timeofday code instead of jiffies for determining if a timer has expired. That would allow for more precise timer intervals to be set, without worry of how it might affect timekeeping. Easier said then done, I agree, but this change does allow for these sorts of thing to be attempted. My favorite benefit is that lost-ticks caused by bad drivers become a timer/scheduler latency issue rather then a timeofday is broken issue :) Yay! Thanks again, George! -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday i386 hooks (v.A0) 2004-09-03 1:44 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) George Anzinger 2004-09-03 2:06 ` john stultz @ 2004-09-03 8:07 ` Ulrich Windl 2004-09-03 18:09 ` George Anzinger 1 sibling, 1 reply; 84+ messages in thread From: Ulrich Windl @ 2004-09-03 8:07 UTC (permalink / raw) To: george Cc: lkml, tim, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On 2 Sep 2004 at 18:44, George Anzinger wrote: > john stultz wrote: > > All, > > This patch implements the minimal i386 architecture hooks to enable the > > new time of day subsystem code. It applies on top of my > > linux-2.6.9-rc1_timeofday-core_A0 patch and with this patch applied, you > > can test the new time of day subsystem on i386. Basically it adds the > > call to timeofday_interrupt_hook() and cuts alot of code out of the > > build. The only new code is the sync_persistant_clock() function which > > is mostly ripped out of do_timer_interrupt(). Pretty un-interesting. > > > > I look forward to your comments and feedback. > > > > thanks > > -john > > > > linux-2.6.9-rc1_timeofday-i386_A0.patch > > ======================================= > > diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig > > --- a/arch/i386/Kconfig 2004-09-02 13:29:59 -07:00 > > +++ b/arch/i386/Kconfig 2004-09-02 13:29:59 -07:00 > > @@ -14,6 +14,10 @@ > > 486, 586, Pentiums, and various instruction-set-compatible chips by > > AMD, Cyrix, and others. > > > > +config NEWTOD > > + bool > > + default y > > + > > config MMU > > bool > > default y > > diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c > > --- a/arch/i386/kernel/time.c 2004-09-02 13:29:59 -07:00 > > +++ b/arch/i386/kernel/time.c 2004-09-02 13:29:59 -07:00 > > @@ -67,6 +67,8 @@ > > > > #include "io_ports.h" > > > > +#include <linux/timeofday.h> > > + > > extern spinlock_t i8259A_lock; > > int pit_latch_buggy; /* extern */ > > > > @@ -87,6 +89,7 @@ > > > > struct timer_opts *cur_timer = &timer_none; > > > > +#ifndef CONFIG_NEWTOD > > /* > > * This version of gettimeofday has microsecond resolution > > * and better than microsecond precision on fast x86 machines with TSC. > > @@ -169,6 +172,7 @@ > > } > > > > EXPORT_SYMBOL(do_settimeofday); > > +#endif > > > > static int set_rtc_mmss(unsigned long nowtime) > > { > > @@ -194,12 +198,39 @@ > > * Note: This function is required to return accurate > > * time even in the absence of multiple timer ticks. > > */ > > +#ifndef CONFIG_NEWTOD > > unsigned long long monotonic_clock(void) > > { > > return cur_timer->monotonic_clock(); > > } > > EXPORT_SYMBOL(monotonic_clock); > > +#endif > > > > +void sync_persistant_clock(struct timespec ts) > > +{ > > + /* > > + * If we have an externally synchronized Linux clock, then update > > + * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be > > + * called as close as possible to 500 ms before the new second starts. > > + */ > > + if (ts.tv_sec > last_rtc_update + 660 && > > + (ts.tv_nsec / 1000) > > + >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && > > + (ts.tv_nsec / 1000) > > + <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { > > + /* horrible...FIXME */ > > + if (efi_enabled) { > > + if (efi_set_rtc_mmss(ts.tv_sec) == 0) > > + last_rtc_update = ts.tv_sec; > > + else > > + last_rtc_update = ts.tv_sec - 600; > > + } else if (set_rtc_mmss(ts.tv_sec) == 0) > > + last_rtc_update = ts.tv_sec; > > + else > > + last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */ > > + } > > + > I have wondered, and continue to do so, why this is not a timer driven function. I think it depends on how reliable timers are regarding in-time triggering. This code has to be executed on-time to make sense. Really. > It just seems silly to check this every interrupt when we have low overhead > timers for just this sort of thing. > > I wonder about the load calc in the same way... That's completely different. ... Regards, Ulrich ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday i386 hooks (v.A0) 2004-09-03 8:07 ` Ulrich Windl @ 2004-09-03 18:09 ` George Anzinger 0 siblings, 0 replies; 84+ messages in thread From: George Anzinger @ 2004-09-03 18:09 UTC (permalink / raw) To: Ulrich Windl Cc: lkml, tim, albert, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Ulrich Windl wrote: > On 2 Sep 2004 at 18:44, George Anzinger wrote: > > ~ >>>+#endif >>> >>>+void sync_persistant_clock(struct timespec ts) >>>+{ >>>+ /* >>>+ * If we have an externally synchronized Linux clock, then update >>>+ * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be >>>+ * called as close as possible to 500 ms before the new second starts. >>>+ */ >>>+ if (ts.tv_sec > last_rtc_update + 660 && >>>+ (ts.tv_nsec / 1000) >>>+ >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && >>>+ (ts.tv_nsec / 1000) >>>+ <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { >>>+ /* horrible...FIXME */ >>>+ if (efi_enabled) { >>>+ if (efi_set_rtc_mmss(ts.tv_sec) == 0) >>>+ last_rtc_update = ts.tv_sec; >>>+ else >>>+ last_rtc_update = ts.tv_sec - 600; >>>+ } else if (set_rtc_mmss(ts.tv_sec) == 0) >>>+ last_rtc_update = ts.tv_sec; >>>+ else >>>+ last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */ >>>+ } >>>+ >> >>I have wondered, and continue to do so, why this is not a timer driven function. > > > I think it depends on how reliable timers are regarding in-time triggering. This > code has to be executed on-time to make sense. Really. Granted, but if we are late we can easily skip to the next second. The 60s thing is rather arbitrary. If we are always late, well, there are lots of user aps that rely on timers being on time most of the time. So we need to get that right. -g > > >> It just seems silly to check this every interrupt when we have low overhead >>timers for just this sort of thing. >> >>I wonder about the load calc in the same way... > > > That's completely different. > > ... > > Regards, > Ulrich > > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz 2004-09-02 21:11 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) john stultz @ 2004-09-02 22:19 ` Christoph Lameter 2004-09-02 22:28 ` john stultz 2004-09-03 1:39 ` George Anzinger 2 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-02 22:19 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2 Sep 2004, john stultz wrote: > +struct timesource_t { > + char* name; > + int priority; > + cycle_t (*read)(void); > + cycle_t (*delta)(cycle_t now, cycle_t then); > + nsec_t (*cyc2ns)(cycle_t cycles, cycle_t* remainder); > +}; This is defining functions to be called for a timesource. Which means that the functionality for clock_gettime cannot be implemented as a fastcall anymore on IA64. The delta and the cyc2ns function are always the same. The compensation for jitter etc is always the same and can also be implemented generically. Could we change this to avoid the function calls. For example. struct timesource_t { char *name; int priority; int type; void *address; long long int frequency; } The functions are always delta = (now-then) cycle2ns = cycles * (NSEC_PER_SEC / frequency) + remainder types of time sources #define TIMESOURCE_CPU 0 #define TIMESOURCE_MMIO32 1 #define TIMESOURCE_MMIO64 2 #define TIMESOURCE_FUNCTION 3 /* To catch odd cases */ ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 22:19 ` [RFC][PATCH] new timeofday core subsystem (v.A0) Christoph Lameter @ 2004-09-02 22:28 ` john stultz 2004-09-02 22:42 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-02 22:28 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 15:19, Christoph Lameter wrote: > On Thu, 2 Sep 2004, john stultz wrote: > > > +struct timesource_t { > > + char* name; > > + int priority; > > + cycle_t (*read)(void); > > + cycle_t (*delta)(cycle_t now, cycle_t then); > > + nsec_t (*cyc2ns)(cycle_t cycles, cycle_t* remainder); > > +}; > > This is defining functions to be called for a timesource. Which means that > the functionality for clock_gettime cannot be implemented as a fastcall > anymore on IA64. The delta and the cyc2ns function are always the same. > The compensation for jitter etc is always the same and can also be > implemented generically. > > Could we change this to avoid the function calls. > > For example. > > struct timesource_t { > char *name; > int priority; > int type; > void *address; > long long int frequency; > } > > The functions are always > > delta = (now-then) > cycle2ns = cycles * (NSEC_PER_SEC / frequency) + remainder > > types of time sources > > #define TIMESOURCE_CPU 0 > #define TIMESOURCE_MMIO32 1 > #define TIMESOURCE_MMIO64 2 > #define TIMESOURCE_FUNCTION 3 /* To catch odd cases */ What about my idea from yesterday of inverting the fastcall relationship? Instead of creating a structure that exports values and pointers the fastcall can use to create a time of day, why not use the fast call to read the raw time and return it back to the time of day code (which may be running in user context)? This avoids the duplication of having to re-implement the timeofday/clock_gettime functions in fastcall asm code. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 22:28 ` john stultz @ 2004-09-02 22:42 ` Christoph Lameter 2004-09-02 23:14 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-02 22:42 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2 Sep 2004, john stultz wrote: > What about my idea from yesterday of inverting the fastcall > relationship? Instead of creating a structure that exports values and > pointers the fastcall can use to create a time of day, why not use the > fast call to read the raw time and return it back to the time of day > code (which may be running in user context)? This avoids the duplication > of having to re-implement the timeofday/clock_gettime functions in > fastcall asm code. "Read raw time"? How can you read the raw time in a fast call if the fast call needs to do additional function calls (as defined in the proposed time structure) in the kernel context in order to retrieve time? A fast call cannot do any function calls in the kernel context or otherwise. The overhead of the function calls will reduce the performance of time access significantly. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 22:42 ` Christoph Lameter @ 2004-09-02 23:14 ` john stultz 2004-09-02 23:39 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-02 23:14 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 15:42, Christoph Lameter wrote: > On Thu, 2 Sep 2004, john stultz wrote: > > > What about my idea from yesterday of inverting the fastcall > > relationship? Instead of creating a structure that exports values and > > pointers the fastcall can use to create a time of day, why not use the > > fast call to read the raw time and return it back to the time of day > > code (which may be running in user context)? This avoids the duplication > > of having to re-implement the timeofday/clock_gettime functions in > > fastcall asm code. > > "Read raw time"? How can you read the raw time in a fast call if the > fast call needs to do additional function calls (as defined in the > proposed time structure) in the kernel context in order to retrieve time? > > A fast call cannot do any function calls in the kernel context or > otherwise. > > The overhead of the function calls will reduce the performance of time > access significantly. Forgive me if I mis-understand the fastcall method, but this is the concept. Instead of having a fastcall function that implmements the gettimofday/clock_gettime + ntp scaling + etc all in asm we do the following: When creating a ia64 timesource (say the cyclone just for a specific example) we do something like the following. /* trivial delta (nothing new here) */ static cycle_t cyclone_fastcall_delta(cycle_t now, cycle_t then) { return (now - then); } /* trivial cyc2ns (nothing new here) */ static nsec_t cyclone_fastcall_cyc2ns(cycle_t cyc, cycle_t* remainder) { u64 rem; cyc *= freq_multiplier; if (remainder) *remainder = 0; return (nsec_t)cyc; } /* fastcall read, this is where it gets interesting */ static cycle_t cyclone_fasatcall_read(void) { u64 ret; ret = fastcall_readcyclonecounter(); return (cycle_t)ret; } struct timesource_t timesource_cyclone_fastcall = { .name = "cyclone_fastcall", .priority = 100, .read = cyclone_fastcall_read, .delta = cyclone_fastcall_delta, .cyc2ns = cyclone_fastcall_cyc2ns, }; Then you implement a fastcall for fastcall_readcyclonecounter(), which in crazy ia64 asm would do something like: ENTRY(fastcall_readcyclonecounter) blip // magic privledge escalation blop // load cyclone counter into memory bloop // copy cyclone counter back into return register ;; END(fastcall_readcyclonecounter) This avoids 150+ lines of asm needed to re-implement the gettimeofday math. However, I could be mistaken. Is something like this possible? thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 23:14 ` john stultz @ 2004-09-02 23:39 ` Christoph Lameter 2004-09-03 0:07 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-02 23:39 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2 Sep 2004, john stultz wrote: > Then you implement a fastcall for fastcall_readcyclonecounter(), which > in crazy ia64 asm would do something like: > > ENTRY(fastcall_readcyclonecounter) > blip // magic privledge escalation > blop // load cyclone counter into memory > bloop // copy cyclone counter back into return register > ;; > END(fastcall_readcyclonecounter) > > > This avoids 150+ lines of asm needed to re-implement the gettimeofday > math. > > However, I could be mistaken. Is something like this possible? Of course but its not a generic way of timer acccess and requires a fastcall for each timer. You still have the problem of exporting the frequency and the time offsets to user space (those also need to be kept current in order for one to be able to calculate a timer value!). The syscall/fastcall API then needs to be extended to allow for a system call to access each of the individual timers supported. And how would this be supported from user space? We link in special libraries to support for cyclone and any other supported timers into each program? I thought a kernel would provide hardware abstraction? The current IA64 fastcall timer interface is generic and is bound into the clock_gettime interface of glibc. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 23:39 ` Christoph Lameter @ 2004-09-03 0:07 ` john stultz 2004-09-03 0:47 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 0:07 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 16:39, Christoph Lameter wrote: > On Thu, 2 Sep 2004, john stultz wrote: > > > Then you implement a fastcall for fastcall_readcyclonecounter(), which > > in crazy ia64 asm would do something like: > > > > ENTRY(fastcall_readcyclonecounter) > > blip // magic privledge escalation > > blop // load cyclone counter into memory > > bloop // copy cyclone counter back into return register > > ;; > > END(fastcall_readcyclonecounter) > > > > > > This avoids 150+ lines of asm needed to re-implement the gettimeofday > > math. > > > > However, I could be mistaken. Is something like this possible? > > Of course but its not a generic way of timer acccess and > requires a fastcall for each timer. You still have the problem of > exporting the frequency and the time offsets to user space (those also > need to be kept current in order for one to be able to calculate a timer > value!). The syscall/fastcall API then needs to be extended to allow for a > system call to access each of the individual timers supported. Indeed, it would require a fastcall accessor for each timesource, but maybe fastcall is the wrong way to describe it. Could it just be a function that uses the epc to escalate its privileges? As for freq and offsets, those would already be user-visible (see below for more details) > And how would this be supported from user space? We link in special > libraries to support for cyclone and any other supported timers into > each program? I thought a kernel would provide hardware abstraction? The plan at the moment is that the timeofday core gettimeofday code path as well as any timesource that supports it adds a _vsyscall linker attribute. Then the linker will place all the code on a special page or set of pages. Those pages would then be mapped in as user-readable. Then just like the x86-64's vsyscall gettimeofday, glibc would re-direct gettimeofday calls to the user-mapped kernel pages, where it would execute in user mode (with maybe the epc privilege escalation for ia64 time sources that required it). I had to do most of this for the i386 vsyscall-gettimeofday port, but I was unhappy with the duplication of code (and bugs), as well as the fact that I was then being pushed to re-do it for each architecture. While its not currently implemented (I was hoping to keep the discussion focused on the correctness of what's been implemented), I feel the plan for user-mode access won't be too complex. I'm still open for further discussion if you'd like, obviously performance is quite important, and I want to calm any fears you have, but I'm sure the new ntp code plenty of performance issues to look at before we start digging into usermode access, so maybe we can come back to this discussion later? > The current IA64 fastcall timer interface is generic and is bound into the > clock_gettime interface of glibc. Yes, but x86-64 has one way, and ia64 does it another, and i know ppc folks have talked about their own user mode time access. Chasing down a time bug across arches gets to be fairly hairy, so I'm trying to simplify that. Again, I really appreciate your thoughts and feedback. Let me see if in the next release I can have more code to illustrate the user-mode access plan. thanks! -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 0:07 ` john stultz @ 2004-09-03 0:47 ` Christoph Lameter 2004-09-03 1:30 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-03 0:47 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2 Sep 2004, john stultz wrote: > > Of course but its not a generic way of timer acccess and > > requires a fastcall for each timer. You still have the problem of > > exporting the frequency and the time offsets to user space (those also > > need to be kept current in order for one to be able to calculate a timer > > value!). The syscall/fastcall API then needs to be extended to allow for a > > system call to access each of the individual timers supported. > > Indeed, it would require a fastcall accessor for each timesource, but > maybe fastcall is the wrong way to describe it. Could it just be a > function that uses the epc to escalate its privileges? As for freq and > offsets, those would already be user-visible (see below for more > details) The only way curent way to enter the kernel from glibc with a fastcall is the EPC. > The plan at the moment is that the timeofday core gettimeofday code path > as well as any timesource that supports it adds a _vsyscall linker > attribute. Then the linker will place all the code on a special page or > set of pages. Those pages would then be mapped in as user-readable. Then > just like the x86-64's vsyscall gettimeofday, glibc would re-direct > gettimeofday calls to the user-mapped kernel pages, where it would > execute in user mode (with maybe the epc privilege escalation for ia64 > time sources that required it). The EPC call already does do a *secure* transfer like this on IA64 and will execute kernel code without user space mapping. This idea raises all sorts of concerns.... > I had to do most of this for the i386 vsyscall-gettimeofday port, but I > was unhappy with the duplication of code (and bugs), as well as the fact > that I was then being pushed to re-do it for each architecture. While > its not currently implemented (I was hoping to keep the discussion > focused on the correctness of what's been implemented), I feel the plan > for user-mode access won't be too complex. I'm still open for further > discussion if you'd like, obviously performance is quite important, and > I want to calm any fears you have, but I'm sure the new ntp code plenty > of performance issues to look at before we start digging into usermode > access, so maybe we can come back to this discussion later? The functions in the timer source structure is a central problem for IA64. We cannot take that out right now. The full parameterization of timer access as I have suggested also allows user page access to timers with simply mapping a page to access the timer. No other gimmicks are needed since the timer source structure contains all information to calculate a timer offset. > Yes, but x86-64 has one way, and ia64 does it another, and i know ppc > folks have talked about their own user mode time access. Chasing down a > time bug across arches gets to be fairly hairy, so I'm trying to > simplify that. The simplest thins is to provide a data structure without any functions attached that can simply be copied into userspace if wanted. If an arch needs special time access then that is depending on the arch specific methods available and such a data structure as I have proposed will include all the info necessary to implement such user mode time access. A requirement to call functions in the kernel to determine time makes these solution impossible. And its getting extremely complex if one has to invoke different functions for each timer supported. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 0:47 ` Christoph Lameter @ 2004-09-03 1:30 ` john stultz 2004-09-03 7:43 ` George Anzinger 2004-09-03 16:18 ` Christoph Lameter 0 siblings, 2 replies; 84+ messages in thread From: john stultz @ 2004-09-03 1:30 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 17:47, Christoph Lameter wrote: > On Thu, 2 Sep 2004, john stultz wrote: > > > > Of course but its not a generic way of timer acccess and > > > requires a fastcall for each timer. You still have the problem of > > > exporting the frequency and the time offsets to user space (those also > > > need to be kept current in order for one to be able to calculate a timer > > > value!). The syscall/fastcall API then needs to be extended to allow for a > > > system call to access each of the individual timers supported. > > > > Indeed, it would require a fastcall accessor for each timesource, but > > maybe fastcall is the wrong way to describe it. Could it just be a > > function that uses the epc to escalate its privileges? As for freq and > > offsets, those would already be user-visible (see below for more > > details) > > The only way curent way to enter the kernel from glibc with a fastcall is > the EPC. Hmm. I must be explaining myself poorly, or not understanding you. I apologize for not understanding this EPC/fastcall business well enough. I'd like to use EPC from a user-executable kernel page to escalate privileges to access the hardware counter. I don't care if I have to use the the current fastcall (fsys.S) interface or not. However you're making sounds like this isn't possible, so I'll need to do some research. > > The plan at the moment is that the timeofday core gettimeofday code path > > as well as any timesource that supports it adds a _vsyscall linker > > attribute. Then the linker will place all the code on a special page or > > set of pages. Those pages would then be mapped in as user-readable. Then > > just like the x86-64's vsyscall gettimeofday, glibc would re-direct > > gettimeofday calls to the user-mapped kernel pages, where it would > > execute in user mode (with maybe the epc privilege escalation for ia64 > > time sources that required it). > > The EPC call already does do a *secure* transfer like this on IA64 and > will execute kernel code without user space mapping. This idea raises all sorts > of concerns.... Yes, but its not portable. Reducing duplicate code so timeofday maintenance isn't a nightmare is the first goal here. It may not be completely achievable, and when I hit that point I'll have to rework the design, but at this point I'm not convinced that it cannot be done. As for security concerns, all your mapping out to userspace are the time variables and the functions needed to convert those to a accurate time of day. This is almost what you suggest below, but with the additional math from the kernel. The method I'm suggesting is very similar to x86-64's arch/x86-64/kernel/vsyscall.c. > > I had to do most of this for the i386 vsyscall-gettimeofday port, but I > > was unhappy with the duplication of code (and bugs), as well as the fact > > that I was then being pushed to re-do it for each architecture. While > > its not currently implemented (I was hoping to keep the discussion > > focused on the correctness of what's been implemented), I feel the plan > > for user-mode access won't be too complex. I'm still open for further > > discussion if you'd like, obviously performance is quite important, and > > I want to calm any fears you have, but I'm sure the new ntp code plenty > > of performance issues to look at before we start digging into usermode > > access, so maybe we can come back to this discussion later? > > The functions in the timer source structure is a central problem for IA64. We > cannot take that out right now. Don't worry I'm not submitting this code just yet. :) I'll need all (or at least most of the important) architecture maintainers to sign on before I try to push this code in. Right now I'm trying to shake out possible problems with the design and the first pass implementation of the code. You're helping me do that, and I thank you for it. Concessions will be made, but for now I'm going to try to preserve the current design and work around the problems as they arise. > The full parameterization of timer access as I have suggested also allows > user page access to timers with simply mapping a page to access the timer. > No other gimmicks are needed since the timer source structure contains all > information to calculate a timer offset. > > > Yes, but x86-64 has one way, and ia64 does it another, and i know ppc > > folks have talked about their own user mode time access. Chasing down a > > time bug across arches gets to be fairly hairy, so I'm trying to > > simplify that. > > The simplest thins is to provide a data structure without any functions > attached that can simply be copied into userspace if wanted. If an arch > needs special time access then that is depending on the arch specific > methods available and such a data structure as I have proposed will > include all the info necessary to implement such user mode time access. Ehhh.. I really don't like the idea of giving all the raw values to userspace and letting user-code do the timeofday calculation. Fixing bugs in each arches timeofday code is hard enough. Imagine if we have to go through and fix userspace too! It would also make a user/kernel data interface that we'd have to preserve. I'd like to avoid that and instead use the vsyscall method to give us greater flexibility. Plus I doubt anyone would want to implement the NTP adjustments in userspace? eek! > A requirement to call functions in the kernel to determine time makes > these solution impossible. And its getting extremely complex if one has to > invoke different functions for each timer supported. The struct timesource interface of read()/delta()/cyc2ns() was the best generalization I could come up with. I feel they're necessary for the following reasons: cyc2ns(): In this conversion we can optimize the math depending on the timesource. If the timesource freq is a power of 2, we can just use shift! However if its a weird value and we have to be very precise, we do a full 64bit divide. We're not stuck with one equation given a freq value. delta(): Some counters don't fill 32 or 64 bits. ACPI PM time source is 24 bits, and the cyclone is 40. Thus to do proper twos complement subtraction without overflow worries you need to mask the subtraction. This can be done by exporting a mask value w/ the freq value, but was cleaner when moved into the timesource. read(): Rather then just giving the address of the register, the read call allows for timesource specific logic. This lets us use jiffies as a timesource, or in cases like the ACPI PM timesource, where the register must be read 3 times in order to ensure a correct value is latched, we can avoid having to include that logic into the generic code, so it does not affect systems that do not use or have that timesource. But I doubt I'll convince you with words, so let me work on it a bit and see if I can code around your concerns and put you at ease. You've brought up some good issues, and I'll definitely work to resolve them! Thanks again! -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 1:30 ` john stultz @ 2004-09-03 7:43 ` George Anzinger 2004-09-03 19:32 ` john stultz 2004-09-03 16:18 ` Christoph Lameter 1 sibling, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-03 7:43 UTC (permalink / raw) To: john stultz Cc: Christoph Lameter, lkml, tim, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > On Thu, 2004-09-02 at 17:47, Christoph Lameter wrote: > >>On Thu, 2 Sep 2004, john stultz wrote: >> >> >>>>Of course but its not a generic way of timer acccess and >>>>requires a fastcall for each timer. You still have the problem of >>>>exporting the frequency and the time offsets to user space (those also >>>>need to be kept current in order for one to be able to calculate a timer >>>>value!). The syscall/fastcall API then needs to be extended to allow for a >>>>system call to access each of the individual timers supported. >>> >>>Indeed, it would require a fastcall accessor for each timesource, but >>>maybe fastcall is the wrong way to describe it. Could it just be a >>>function that uses the epc to escalate its privileges? As for freq and >>>offsets, those would already be user-visible (see below for more >>>details) >> >>The only way curent way to enter the kernel from glibc with a fastcall is >>the EPC. > > > Hmm. I must be explaining myself poorly, or not understanding you. I > apologize for not understanding this EPC/fastcall business well enough. > I'd like to use EPC from a user-executable kernel page to escalate > privileges to access the hardware counter. I don't care if I have to use > the the current fastcall (fsys.S) interface or not. However you're > making sounds like this isn't possible, so I'll need to do some > research. > > >>>The plan at the moment is that the timeofday core gettimeofday code path >>>as well as any timesource that supports it adds a _vsyscall linker >>>attribute. Then the linker will place all the code on a special page or >>>set of pages. Those pages would then be mapped in as user-readable. Then >>>just like the x86-64's vsyscall gettimeofday, glibc would re-direct >>>gettimeofday calls to the user-mapped kernel pages, where it would >>>execute in user mode (with maybe the epc privilege escalation for ia64 >>>time sources that required it). >> >>The EPC call already does do a *secure* transfer like this on IA64 and >>will execute kernel code without user space mapping. This idea raises all sorts >>of concerns.... > > > Yes, but its not portable. Reducing duplicate code so timeofday > maintenance isn't a nightmare is the first goal here. It may not be > completely achievable, and when I hit that point I'll have to rework the > design, but at this point I'm not convinced that it cannot be done. > > As for security concerns, all your mapping out to userspace are the time > variables and the functions needed to convert those to a accurate time > of day. This is almost what you suggest below, but with the additional > math from the kernel. The method I'm suggesting is very similar to > x86-64's arch/x86-64/kernel/vsyscall.c. > > > >>>I had to do most of this for the i386 vsyscall-gettimeofday port, but I >>>was unhappy with the duplication of code (and bugs), as well as the fact >>>that I was then being pushed to re-do it for each architecture. While >>>its not currently implemented (I was hoping to keep the discussion >>>focused on the correctness of what's been implemented), I feel the plan >>>for user-mode access won't be too complex. I'm still open for further >>>discussion if you'd like, obviously performance is quite important, and >>>I want to calm any fears you have, but I'm sure the new ntp code plenty >>>of performance issues to look at before we start digging into usermode >>>access, so maybe we can come back to this discussion later? >> >>The functions in the timer source structure is a central problem for IA64. We >>cannot take that out right now. > > > Don't worry I'm not submitting this code just yet. :) I'll need all (or > at least most of the important) architecture maintainers to sign on > before I try to push this code in. > > Right now I'm trying to shake out possible problems with the design and > the first pass implementation of the code. You're helping me do that, > and I thank you for it. Concessions will be made, but for now I'm going > to try to preserve the current design and work around the problems as > they arise. > > > >>The full parameterization of timer access as I have suggested also allows >>user page access to timers with simply mapping a page to access the timer. >>No other gimmicks are needed since the timer source structure contains all >>information to calculate a timer offset. >> >> >>>Yes, but x86-64 has one way, and ia64 does it another, and i know ppc >>>folks have talked about their own user mode time access. Chasing down a >>>time bug across arches gets to be fairly hairy, so I'm trying to >>>simplify that. >> >>The simplest thins is to provide a data structure without any functions >>attached that can simply be copied into userspace if wanted. If an arch >>needs special time access then that is depending on the arch specific >>methods available and such a data structure as I have proposed will >>include all the info necessary to implement such user mode time access. > > > Ehhh.. I really don't like the idea of giving all the raw values to > userspace and letting user-code do the timeofday calculation. Fixing > bugs in each arches timeofday code is hard enough. Imagine if we have to > go through and fix userspace too! It would also make a user/kernel data > interface that we'd have to preserve. I'd like to avoid that and instead > use the vsyscall method to give us greater flexibility. Plus I doubt > anyone would want to implement the NTP adjustments in userspace? eek! > > >>A requirement to call functions in the kernel to determine time makes >>these solution impossible. And its getting extremely complex if one has to >>invoke different functions for each timer supported. > > > The struct timesource interface of read()/delta()/cyc2ns() was the best > generalization I could come up with. I feel they're necessary for the > following reasons: > > cyc2ns(): In this conversion we can optimize the math depending on the > timesource. If the timesource freq is a power of 2, we can just use > shift! However if its a weird value and we have to be very precise, we > do a full 64bit divide. We're not stuck ith one equation given a freq > value. > > delta(): Some counters don't fill 32 or 64 bits. ACPI PM time source is > 24 bits, and the cyclone is 40. Thus to do proper twos complement > subtraction without overflow worries you need to mask the subtraction. > This can be done by exporting a mask value w/ the freq value, but was > cleaner when moved into the timesource. > > read(): Rather then just giving the address of the register, the read > call allows for timesource specific logic. This lets us use jiffies as a > timesource, or in cases like the ACPI PM timesource, where the register > must be read 3 times in order to ensure a correct value is latched, we > can avoid having to include that logic into the generic code, so it does > not affect systems that do not use or have that timesource. I am not convince that 3 reads are in fact needed. In fact, I am amost certain that two is more than enough. In fact, it takes so long to read it that I just use one read and a sanity check in the HRT code. Here is the code I use: unsigned long quick_get_cpuctr(void) { static unsigned long last_read = 0; static int qgc_max = 0; int i; unsigned long rd_delta, rd_ans, rd = inl(acpi_pm_tmr_address); /* * This will be REALLY big if ever we move backward in time... */ rd_delta = (rd - last_read) & SIZE_MASK; last_read = rd; rd_ans = (rd - last_update) & SIZE_MASK; if (likely((rd_ans < (arch_cycles_per_jiffy << 1)) && (rd_delta < (arch_cycles_per_jiffy << 1)))) return rd_ans; for (i = 0; i < 10; i++) { rd = inl(acpi_pm_tmr_address); rd_delta = (rd - last_read) & SIZE_MASK; last_read = rd; if (unlikely(i > qgc_max)) qgc_max = i; /* * On my test machine (800MHZ dual PIII) this is always * seven. Seems long, but we will give it some slack... * We note that rd_delta (and all the vars) unsigned so * a backward movement will show as a really big number. */ if (likely(rd_delta < 20)) return (rd - last_update) & SIZE_MASK; } return (rd - last_update) & SIZE_MASK; } -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 7:43 ` George Anzinger @ 2004-09-03 19:32 ` john stultz 0 siblings, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-03 19:32 UTC (permalink / raw) To: george anzinger Cc: Christoph Lameter, lkml, tim, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 00:43, George Anzinger wrote: > john stultz wrote: > > On Thu, 2004-09-02 at 17:47, Christoph Lameter wrote: > > > >>On Thu, 2 Sep 2004, john stultz wrote: > > read(): Rather then just giving the address of the register, the read > > call allows for timesource specific logic. This lets us use jiffies as a > > timesource, or in cases like the ACPI PM timesource, where the register > > must be read 3 times in order to ensure a correct value is latched, we > > can avoid having to include that logic into the generic code, so it does > > not affect systems that do not use or have that timesource. > > I am not convince that 3 reads are in fact needed. In fact, I am amost certain > that two is more than enough. In fact, it takes so long to read it that I just > use one read and a sanity check in the HRT code. Here is the code I use: [snip] I'd have to defer to Dominik (acpi-pm author) for that. But the case remains, to get a "good" value, you have to do more then just give access to the raw register. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 1:30 ` john stultz 2004-09-03 7:43 ` George Anzinger @ 2004-09-03 16:18 ` Christoph Lameter 2004-09-03 21:00 ` john stultz 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-03 16:18 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2 Sep 2004, john stultz wrote: > > The only way curent way to enter the kernel from glibc with a fastcall is > > the EPC. > > Hmm. I must be explaining myself poorly, or not understanding you. I > apologize for not understanding this EPC/fastcall business well enough. > I'd like to use EPC from a user-executable kernel page to escalate > privileges to access the hardware counter. I don't care if I have to use > the the current fastcall (fsys.S) interface or not. However you're > making sounds like this isn't possible, so I'll need to do some > research. The mechanism you envision could probably be created on IA64. Its certainly not available on all platforms and potentially cannot be available. Also if you want to pursue the approach with functions you need to enter the kernel at least 3 times(!) in order to call the 3 different functions you defined in order to determine time. You need to retrieve time, then do the time diff call then the nanosecond conversion. > > The EPC call already does do a *secure* transfer like this on IA64 and > > will execute kernel code without user space mapping. This idea raises all sorts > > of concerns.... > > Yes, but its not portable. Reducing duplicate code so timeofday > maintenance isn't a nightmare is the first goal here. It may not be > completely achievable, and when I hit that point I'll have to rework the > design, but at this point I'm not convinced that it cannot be done. It does not need to be portable since it is an architecture specific optimization which could be accomplished in variety of ways on other platforms. The time source information structure could be platform independent. The architectures can optimize the way they interpret the timer source information structure without additional calls to kernel functions. I would expect the logic for time retrieval not to be subject to much change. The current ASM for IA64 f.e. would probalby work without change for your new approach if the time source information would not require function calls. However, the proposal makes these optimizations impossible. > > The simplest thins is to provide a data structure without any functions > > attached that can simply be copied into userspace if wanted. If an arch > > needs special time access then that is depending on the arch specific > > methods available and such a data structure as I have proposed will > > include all the info necessary to implement such user mode time access. > > Ehhh.. I really don't like the idea of giving all the raw values to > userspace and letting user-code do the timeofday calculation. Fixing > bugs in each arches timeofday code is hard enough. Imagine if we have to > go through and fix userspace too! It would also make a user/kernel data > interface that we'd have to preserve. I'd like to avoid that and instead > use the vsyscall method to give us greater flexibility. Plus I doubt > anyone would want to implement the NTP adjustments in userspace? eek! The raw values in user space can be used for specialized purposes (somewhat like a generalized form of HPET) by applicationbs. This is not intended for real system time. > cyc2ns(): In this conversion we can optimize the math depending on the > timesource. If the timesource freq is a power of 2, we can just use > shift! However if its a weird value and we have to be very precise, we > do a full 64bit divide. We're not stuck with one equation given a freq > value. I have never seeen a timesource freq with the power of 2. Division is not necessary since one can realize this by multiplying with a certain factor and then shifting the result right instead of dividing. > delta(): Some counters don't fill 32 or 64 bits. ACPI PM time source is > 24 bits, and the cyclone is 40. Thus to do proper twos complement > subtraction without overflow worries you need to mask the subtraction. > This can be done by exporting a mask value w/ the freq value, but was > cleaner when moved into the timesource. I would suggest it is better to add this mask and avoid another call to a tiny function that does simply mask and subtract. Note that compilers are more efficient if they get a sufficiently large chunk of code. This is in particular necessary on IA64 and other processors given the inherent parallelism in their internal CPU. A function call is typically much slower than a subtract and and operation. > read(): Rather then just giving the address of the register, the read > call allows for timesource specific logic. This lets us use jiffies as a > timesource, or in cases like the ACPI PM timesource, where the register > must be read 3 times in order to ensure a correct value is latched, we > can avoid having to include that logic into the generic code, so it does > not affect systems that do not use or have that timesource. I think it is essential to have the capability to use a function. But nevertheless it is quite inefficient to have yet another small function that simply reads a value from a memory location. What I proposed is to be able to specify a memory location in the time source structure. That way this function call can be avoided for most timer source. Any specialized time source will be able to use a function call but will then not be as fast as the time sources that can simply setup a memory address. > But I doubt I'll convince you with words, so let me work on it a bit and > see if I can code around your concerns and put you at ease. You've > brought up some good issues, and I'll definitely work to resolve them! Oh. I think this is good in terms of clarifying the issues. I did not realize that these capabilities that you proposed to use existed. My main concern here is efficient and scalable access to time. The ideal solution is one routine that can run straight through with minimal locking and simply do it all for most cases. The seqlock approach takes care of the locking. Also with the ability to specify parameter instead of functions, one could easily setup a timer with a single function call like f.e. setup_timer(TIME_SOURCE_CPU,NULL, 1500000) or setup_timer(TIME_SOURCE_MMIO64, &timer, 4000000) Of course I am skimming by some additional detail like the mask. But this would cut down significantly on the code piece to be maintained. From what I can see my approach saves a lot of duplicated code. The duplication that may exist comes about because of architecture specific time access optimizations. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 16:18 ` Christoph Lameter @ 2004-09-03 21:00 ` john stultz 2004-09-03 22:04 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 21:00 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 09:18, Christoph Lameter wrote: > On Thu, 2 Sep 2004, john stultz wrote: > > > The simplest thins is to provide a data structure without any functions > > > attached that can simply be copied into userspace if wanted. If an arch > > > needs special time access then that is depending on the arch specific > > > methods available and such a data structure as I have proposed will > > > include all the info necessary to implement such user mode time access. > > > > Ehhh.. I really don't like the idea of giving all the raw values to > > userspace and letting user-code do the timeofday calculation. Fixing > > bugs in each arches timeofday code is hard enough. Imagine if we have to > > go through and fix userspace too! It would also make a user/kernel data > > interface that we'd have to preserve. I'd like to avoid that and instead > > use the vsyscall method to give us greater flexibility. Plus I doubt > > anyone would want to implement the NTP adjustments in userspace? eek! > > The raw values in user space can be used for specialized purposes > (somewhat like a generalized form of HPET) by applicationbs. This is not > intended for real system time. I worry these specialized interfaces would become troubling to maintain. The kernel (via regular syscall, fsyscall, or vsyscall) should provide accurate time and should be the final authority for that. If specialized applications want to read the cycle counter (TSC, ITC), /dev/HPET or /dev/mem or whatever, and try to infer time that's fine but its at their own risk. When those specialized assumptions break (TSC being fixed freq, for example) its the applications' problem. The kernel shouldn't codify or encourage those sorts of risky interfaces by exporting frequency information in a standardized interface. > > cyc2ns(): In this conversion we can optimize the math depending on the > > timesource. If the timesource freq is a power of 2, we can just use > > shift! However if its a weird value and we have to be very precise, we > > do a full 64bit divide. We're not stuck with one equation given a freq > > value. > > I have never seeen a timesource freq with the power of 2. Division is not > necessary since one can realize this by multiplying with a certain factor > and then shifting the result right instead of dividing. Well, it was a theoretical example, but I agree with your point. The math tricks you suggest are usable depending on the frequency of the timesource. Attaching that sort of logic to the timesource implementation gives us more flexibility. It could be that the flexibility is unnecessary, and if so I'm fine w/ changing the design, but I'm not confident of that just yet. > > delta(): Some counters don't fill 32 or 64 bits. ACPI PM time source is > > 24 bits, and the cyclone is 40. Thus to do proper twos complement > > subtraction without overflow worries you need to mask the subtraction. > > This can be done by exporting a mask value w/ the freq value, but was > > cleaner when moved into the timesource. > > I would suggest it is better to add this mask and avoid another call to > a tiny function that does simply mask and subtract. Note that compilers > are more efficient if they get a sufficiently large chunk of code. This is > in particular necessary on IA64 and other processors given the inherent > parallelism in their internal CPU. A function call is typically much > slower than a subtract and and operation. Yep. As I said before, you've got very good points. The delta() function is the one I'm least attached to, but for now I'm going to hang on to it for readability reasons. It will be simple to change to a mask value later after everyone has reviewed the code and is confident of its correctness. > > read(): Rather then just giving the address of the register, the read > > call allows for timesource specific logic. This lets us use jiffies as a > > timesource, or in cases like the ACPI PM timesource, where the register > > must be read 3 times in order to ensure a correct value is latched, we > > can avoid having to include that logic into the generic code, so it does > > not affect systems that do not use or have that timesource. > > I think it is essential to have the capability to use a function. But > nevertheless it is quite inefficient to have yet another small function > that simply reads a value from a memory location. What I proposed is to be > able to specify a memory location in the time source structure. That way > this function call can be avoided for most timer source. Any specialized > time source will be able to use a function call but will then not be as > fast as the time sources that can simply setup a memory address. I really do like the time_interpolator changes you've submitted in earlier threads. The core generalizing of timesources changes are very similar between our two implementations, so your comments are very important to me. I am open to adding these direct memory accessors to the time counters, but again its a performance optimization, and right now clarity is king. > > But I doubt I'll convince you with words, so let me work on it a bit and > > see if I can code around your concerns and put you at ease. You've > > brought up some good issues, and I'll definitely work to resolve them! > > Oh. I think this is good in terms of clarifying the issues. I did not > realize that these capabilities that you proposed to use existed. My main > concern here is efficient and scalable access to time. The ideal solution > is one routine that can run straight through with minimal locking and > simply do it all for most cases. The seqlock approach takes care of the > locking. Yep, performance and scalability is *very* important. However correctness is even more important, and right now we have enough trouble with just that. All of the arch specific performance hacks have complicated the current code such that minor changes in other subsystems (sometimes not so) subtly break things. Untangling these quirks have been a large part of what I and many others have worked on for the last few years. So for correctness sake we do have to balance clarity and performance. Minimizing trade-offs is my goal right now. I'm starting to feel like I'm just being repetitious, so I'll try to move on now. Sorry for the long-winded emails. > Also with the ability to specify parameter instead of functions, one could > easily setup a timer with a single function call like f.e. > > > setup_timer(TIME_SOURCE_CPU,NULL, 1500000) > > or > > setup_timer(TIME_SOURCE_MMIO64, &timer, 4000000) > > Of course I am skimming by some additional detail like the mask. But this > would cut down significantly on the code piece to be maintained. From what > I can see my approach saves a lot of duplicated code. The duplication that > may exist comes about because of architecture specific time access > optimizations. Well, I realize this is just a simple sketch, but how do you then handle cpu freq changes? Suddenly generic code has to deal with that as well? And then there's those buggy time sources that have to be read multiple times, or drifty ITCs that (may or maynot) need cmpxchgs to ensure they do not go backwards. Making the generic code have to deal with all of these cases will be ugly (and may even hurt performance). Clearly, you've said that function calls should be available in those cases, so I don't want to miss-characterize you comments. However I do want to point it out so that others keep in mind that you can't always just give a pointer and a freq value. Once again, I thank you and everyone else for the great comments! Hopefully I'll get more arch hooks in soon so you and others can actually start playing with (and hopefully improving) the code. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 21:00 ` john stultz @ 2004-09-03 22:04 ` Christoph Lameter 2004-09-03 23:00 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-03 22:04 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 3 Sep 2004, john stultz wrote: > > The raw values in user space can be used for specialized purposes > > (somewhat like a generalized form of HPET) by applicationbs. This is not > > intended for real system time. > > I worry these specialized interfaces would become troubling to maintain. > The kernel (via regular syscall, fsyscall, or vsyscall) should provide > accurate time and should be the final authority for that. > > If specialized applications want to read the cycle counter (TSC, ITC), > /dev/HPET or /dev/mem or whatever, and try to infer time that's fine but > its at their own risk. When those specialized assumptions break (TSC > being fixed freq, for example) its the applications' problem. The kernel > shouldn't codify or encourage those sorts of risky interfaces by > exporting frequency information in a standardized interface. How can a timer have a variable frequency and being used for timing purposes. Maybe I just do not understand the point on TSC having a fixed frequency (I certainly hope so but is that true in the power save modes?). True it is the applications problem if it breaks but we have applications for SGI systems that demand access to a time source with the least operating system interference possible. > > > cyc2ns(): In this conversion we can optimize the math depending on the > > > timesource. If the timesource freq is a power of 2, we can just use > > > shift! However if its a weird value and we have to be very precise, we > > > do a full 64bit divide. We're not stuck with one equation given a freq > > > value. > > > > I have never seeen a timesource freq with the power of 2. Division is not > > necessary since one can realize this by multiplying with a certain factor > > and then shifting the result right instead of dividing. > > Well, it was a theoretical example, but I agree with your point. The > math tricks you suggest are usable depending on the frequency of the > timesource. Attaching that sort of logic to the timesource > implementation gives us more flexibility. It could be that the > flexibility is unnecessary, and if so I'm fine w/ changing the design, > but I'm not confident of that just yet. s/depending on/independent of/ ??? Are there any examples where the flexibility is needed? > > > delta(): Some counters don't fill 32 or 64 bits. ACPI PM time source is > > > 24 bits, and the cyclone is 40. Thus to do proper twos complement > > > subtraction without overflow worries you need to mask the subtraction. > > > This can be done by exporting a mask value w/ the freq value, but was > > > cleaner when moved into the timesource. > > > > I would suggest it is better to add this mask and avoid another call to > > a tiny function that does simply mask and subtract. Note that compilers > > are more efficient if they get a sufficiently large chunk of code. This is > > in particular necessary on IA64 and other processors given the inherent > > parallelism in their internal CPU. A function call is typically much > > slower than a subtract and and operation. > > Yep. As I said before, you've got very good points. The delta() function > is the one I'm least attached to, but for now I'm going to hang on to it > for readability reasons. It will be simple to change to a mask value > later after everyone has reviewed the code and is confident of its > correctness. Are you sure about that readability argument? A single statement to calculate the delta is much more readable and verifyable to be correct than a gazillion of small functions that (one hopes and prays) all do the same correctly. > > > read(): Rather then just giving the address of the register, the read > > > call allows for timesource specific logic. This lets us use jiffies as a > > > timesource, or in cases like the ACPI PM timesource, where the register > > > must be read 3 times in order to ensure a correct value is latched, we > > > can avoid having to include that logic into the generic code, so it does > > > not affect systems that do not use or have that timesource. > > > > I think it is essential to have the capability to use a function. But > > nevertheless it is quite inefficient to have yet another small function > > that simply reads a value from a memory location. What I proposed is to be > > able to specify a memory location in the time source structure. That way > > this function call can be avoided for most timer source. Any specialized > > time source will be able to use a function call but will then not be as > > fast as the time sources that can simply setup a memory address. > > I really do like the time_interpolator changes you've submitted in > earlier threads. The core generalizing of timesources changes are very > similar between our two implementations, so your comments are very > important to me. I am open to adding these direct memory accessors to > the time counters, but again its a performance optimization, and right > now clarity is king. Clarity is gained by centralizing these things. The design results in numerous functions that are defined and which all mostly do the same. For clarities sake please change the spec.... > Yep, performance and scalability is *very* important. However > correctness is even more important, and right now we have enough trouble > with just that. All of the arch specific performance hacks have > complicated the current code such that minor changes in other subsystems > (sometimes not so) subtly break things. Untangling these quirks have > been a large part of what I and many others have worked on for the last > few years. So for correctness sake we do have to balance clarity and > performance. Minimizing trade-offs is my goal right now. Performance as well as clarify as well as arch independence really demand that these small functions be given up. > > Of course I am skimming by some additional detail like the mask. But this > > would cut down significantly on the code piece to be maintained. From what > > I can see my approach saves a lot of duplicated code. The duplication that > > may exist comes about because of architecture specific time access > > optimizations. > > Well, I realize this is just a simple sketch, but how do you then handle > cpu freq changes? Suddenly generic code has to deal with that as well? > And then there's those buggy time sources that have to be read multiple > times, or drifty ITCs that (may or maynot) need cmpxchgs to ensure they > do not go backwards. Making the generic code have to deal with all of > these cases will be ugly (and may even hurt performance). The reading from memory only works for well behaved time sources. If something unusual comes along then a function needs to be called and thus a performance hit is taken. If its important enough (like the drifty logic on IA64) then one may consider putting that logic into the generic code. I did not not say not to have functions at all. I think a function call must be possible at least to retrieve the timer value for special cases. If the CPU freq cases is to be handled by the generic code or a special function is a design decision that you probably have to make. Over time other logic will develop that may first show up as new time retrieval function but that is then recognized to be of universal importance so that it will then be moved into the generic code. > Clearly, you've said that function calls should be available in those > cases, so I don't want to miss-characterize you comments. However I do > want to point it out so that others keep in mind that you can't always > just give a pointer and a freq value. Exactly. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 22:04 ` Christoph Lameter @ 2004-09-03 23:00 ` john stultz 2004-09-04 0:11 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 23:00 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 15:04, Christoph Lameter wrote: > On Fri, 3 Sep 2004, john stultz wrote: > > > The raw values in user space can be used for specialized purposes > > > (somewhat like a generalized form of HPET) by applicationbs. This is not > > > intended for real system time. > > > > I worry these specialized interfaces would become troubling to maintain. > > The kernel (via regular syscall, fsyscall, or vsyscall) should provide > > accurate time and should be the final authority for that. > > > > If specialized applications want to read the cycle counter (TSC, ITC), > > /dev/HPET or /dev/mem or whatever, and try to infer time that's fine but > > its at their own risk. When those specialized assumptions break (TSC > > being fixed freq, for example) its the applications' problem. The kernel > > shouldn't codify or encourage those sorts of risky interfaces by > > exporting frequency information in a standardized interface. > > How can a timer have a variable frequency and being used for > timing purposes. Maybe I just do not understand the point on TSC having a > fixed frequency (I certainly hope so but is that true in the power save > modes?). TSC frequency changes are a reality i386 has been living with for awhile now. In some cases, there isn't anything else that is performant enough to use as a timesource, so people want to be able to generate time as accurately as possible from a variable frequency counter. Me, I think its somewhat crazy, but I've got to make it work. > True it is the applications problem if it breaks but we have applications > for SGI systems that demand access to a time source with the least > operating system interference possible. Yep, totally fine with that. If application developers will take the risk that's fine. Personally, I'd like the kernel's timeofday function to be fast enough that developers don't feel the need to attempt these special case hacks, but there will always be folks who need to live on the edge. > > > > cyc2ns(): In this conversion we can optimize the math depending on the > > > > timesource. If the timesource freq is a power of 2, we can just use > > > > shift! However if its a weird value and we have to be very precise, we > > > > do a full 64bit divide. We're not stuck with one equation given a freq > > > > value. > > > > > > I have never seeen a timesource freq with the power of 2. Division is not > > > necessary since one can realize this by multiplying with a certain factor > > > and then shifting the result right instead of dividing. > > > > Well, it was a theoretical example, but I agree with your point. The > > math tricks you suggest are usable depending on the frequency of the > > timesource. Attaching that sort of logic to the timesource > > implementation gives us more flexibility. It could be that the > > flexibility is unnecessary, and if so I'm fine w/ changing the design, > > but I'm not confident of that just yet. > > s/depending on/independent of/ ??? > > Are there any examples where the flexibility is needed? I'm a fan of Barbie's law: "Math is hard, let's go shopping!", so forgive my inability to immediately see through the details. In dealing with both high and low res timesources (some with microsecond frequencies, some run in picosecond frequencies), the math can be tuned for each case. Low res time sources are mainly a multiply, and possibly a divide only if it has a non-integer nanosecond per cycle value. For high-res timesources, we have to divide, so we would want to aproximate the frequency and use the multiply and shift as you suggested earlier. Maybe this is completely able to be generalized and I'm just not sharp enough to see it just yet, but it seems to me that having a timesource specific cyc2ns allows for further optimization then without. > > > > delta(): Some counters don't fill 32 or 64 bits. ACPI PM time source is > > > > 24 bits, and the cyclone is 40. Thus to do proper twos complement > > > > subtraction without overflow worries you need to mask the subtraction. > > > > This can be done by exporting a mask value w/ the freq value, but was > > > > cleaner when moved into the timesource. > > > > > > I would suggest it is better to add this mask and avoid another call to > > > a tiny function that does simply mask and subtract. Note that compilers > > > are more efficient if they get a sufficiently large chunk of code. This is > > > in particular necessary on IA64 and other processors given the inherent > > > parallelism in their internal CPU. A function call is typically much > > > slower than a subtract and and operation. > > > > Yep. As I said before, you've got very good points. The delta() function > > is the one I'm least attached to, but for now I'm going to hang on to it > > for readability reasons. It will be simple to change to a mask value > > later after everyone has reviewed the code and is confident of its > > correctness. > > Are you sure about that readability argument? A single > statement to calculate the delta is much more readable and verifyable > to be correct than a gazillion of small functions that (one hopes and > prays) all do the same correctly. Your point is valid, and in moving the arch specific code into arch independent code, I'm largely trying to reduce the tangle of functions. Having 3 hardware specific functions isn't all that crazy. > > > > read(): Rather then just giving the address of the register, the read > > > > call allows for timesource specific logic. This lets us use jiffies as a > > > > timesource, or in cases like the ACPI PM timesource, where the register > > > > must be read 3 times in order to ensure a correct value is latched, we > > > > can avoid having to include that logic into the generic code, so it does > > > > not affect systems that do not use or have that timesource. > > > > > > I think it is essential to have the capability to use a function. But > > > nevertheless it is quite inefficient to have yet another small function > > > that simply reads a value from a memory location. What I proposed is to be > > > able to specify a memory location in the time source structure. That way > > > this function call can be avoided for most timer source. Any specialized > > > time source will be able to use a function call but will then not be as > > > fast as the time sources that can simply setup a memory address. > > > > I really do like the time_interpolator changes you've submitted in > > earlier threads. The core generalizing of timesources changes are very > > similar between our two implementations, so your comments are very > > important to me. I am open to adding these direct memory accessors to > > the time counters, but again its a performance optimization, and right > > now clarity is king. > > Clarity is gained by centralizing these things. The design results in > numerous functions that are defined and which all mostly do the same. For > clarities sake please change the spec.... I disagree, but that doesn't mean I'm right. If more people complain as the patch develops, I'll rectify the code so it is less confusing. There's no reason for such a hard line at this point. The code is in no way close to finished, or entering the kernel, so give me a chance to roll with it for a bit. I'm just wary of premature optimization traps. Worse case if we cannot come to some agreement on this, there's nothing stopping architectures from keeping their own timeofday subsystem. I'm just tired of implementing the same feature or fixing the same bug over and over and over for each arch. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 23:00 ` john stultz @ 2004-09-04 0:11 ` Christoph Lameter 0 siblings, 0 replies; 84+ messages in thread From: Christoph Lameter @ 2004-09-04 0:11 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 3 Sep 2004, john stultz wrote: > > True it is the applications problem if it breaks but we have applications > > for SGI systems that demand access to a time source with the least > > operating system interference possible. > > Yep, totally fine with that. If application developers will take the > risk that's fine. Personally, I'd like the kernel's timeofday function > to be fast enough that developers don't feel the need to attempt these > special case hacks, but there will always be folks who need to live on > the edge. Yeah. I tuned the gettimeofday function on IA64 to be faster than the glibc's use of the ITC but there are still some voices that demand a memory mapped timer. > > Are there any examples where the flexibility is needed? > > I'm a fan of Barbie's law: "Math is hard, let's go shopping!", so > forgive my inability to immediately see through the details. In dealing > with both high and low res timesources (some with microsecond > frequencies, some run in picosecond frequencies), the math can be tuned > for each case. Low res time sources are mainly a multiply, and possibly > a divide only if it has a non-integer nanosecond per cycle value. For > high-res timesources, we have to divide, so we would want to aproximate > the frequency and use the multiply and shift as you suggested earlier. A division can always be avoided. This is the equation nanoseconds = x / y * timer_value where x/y = scaling factor and one can always set Y = 2^py to substitute a shift instead of doing division. Just scale x also appropriately by 2^px. This works to an arbitrary accuracy if one has a long enough integer type. That is why I suggested at least the intermediate use of 128bit. 64bit is fine for the current cases but I would expect a need for 128bit to arise in the future. Math is sometimes simple and may simplify a lot of things. > Maybe this is completely able to be generalized and I'm just not sharp > enough to see it just yet, but it seems to me that having a timesource > specific cyc2ns allows for further optimization then without. I have not encountered a case that could not be handled by the above mentioned transformation. Mathematically I would think I could say it is impossible to find such a case. > > Are you sure about that readability argument? A single > > statement to calculate the delta is much more readable and verifyable > > to be correct than a gazillion of small functions that (one hopes and > > prays) all do the same correctly. > > Your point is valid, and in moving the arch specific code into arch > independent code, I'm largely trying to reduce the tangle of functions. > Having 3 hardware specific functions isn't all that crazy. I would rather have only one (the read timer function) and that function would be optional for non-well-behaved timer sources. > Worse case if we cannot come to some agreement on this, there's nothing > stopping architectures from keeping their own timeofday subsystem. I'm > just tired of implementing the same feature or fixing the same bug over > and over and over for each arch. These are all good reasons for centralizing the functionality instead of dispersing it in many functions. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz 2004-09-02 21:11 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) john stultz 2004-09-02 22:19 ` [RFC][PATCH] new timeofday core subsystem (v.A0) Christoph Lameter @ 2004-09-03 1:39 ` George Anzinger 2004-09-03 1:58 ` john stultz 2004-09-03 6:42 ` Albert Cahalan 2 siblings, 2 replies; 84+ messages in thread From: George Anzinger @ 2004-09-03 1:39 UTC (permalink / raw) To: john stultz Cc: lkml, tim, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > All, > This patch implements the architecture independent portion of the time > of day subsystem. Included is timeofday.c (which includes all the time > of day management and accessor functions), ntp.c (which includes the ntp > scaling code, leap second processing, and ntp kernel state machine > code), interface definition .h files, the example jiffies timesource > (lowest common denominator time source, mainly for use as example code) > and minimal hooks into arch independent code. > > The patch does not function without minimal architecture specific hooks > (i386 example to follow), and it can be applied to a tree without > affecting the code. > > I look forward to your comments and feedback. > > thanks > -john ~ > +static cycle_t jiffies_read(void) > +{ > + cycle_t ret = get_jiffies_64(); > + return ret; > +} > + > +static cycle_t jiffies_delta(cycle_t now, cycle_t then) > +{ > + /* simple subtraction, no need to mask */ > + return now - then; > +} This should be inline... > + > +static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) > +{ > + > + cyc *= NSEC_PER_SEC/HZ; Hm... This assumes that 1/HZ is what is needed here. Today this value is 999898. Not exactly reachable by NSEC_PER_SEC/HZ. Or did I miss something, like the relationship of jiffie to 1/HZ and to real time. > + ~ > +int ntp_leapsecond(struct timespec now) > +{ > + /* > + * Leap second processing. If in leap-insert state at > + * the end of the day, the system clock is set back one > + * second; if in leap-delete state, the system clock is > + * set ahead one second. The microtime() routine or > + * external clock driver will insure that reported time > + * is always monotonic. The ugly divides should be > + * replaced. ?? Is this really to be hidden? I rather though this was just the same as a user driven clock setting. In any case, it is a slew of the wall clock and should be presented to abs timer code so that the abs timers can be corrected. This means a call to "clock_was_set()" AND an update of the mono_to_wall value. > + */ > + static time_t leaptime = 0; > + > + switch (ntp_state) { > + case TIME_OK: > + if (ntp_status & STA_INS) { > + ntp_state = TIME_INS; > + /* calculate end of today (23:59:59)*/ > + leaptime = now.tv_sec + SEC_PER_DAY - (now.tv_sec % SEC_PER_DAY) - 1; > + } > + else if (ntp_status & STA_DEL) { > + ntp_state = TIME_DEL; > + /* calculate end of today (23:59:59)*/ > + leaptime = now.tv_sec + SEC_PER_DAY - (now.tv_sec % SEC_PER_DAY) - 1; > + } > + break; > + > + case TIME_INS: > + /* Once we are at (or past) leaptime, insert the second */ > + if (now.tv_sec > leaptime) { > + ntp_state = TIME_OOP; > + printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n"); > + > + return -1; > + } > + break; > + > + case TIME_DEL: > + /* Once we are at (or past) leaptime, delete the second */ > + if (now.tv_sec >= leaptime) { > + ntp_state = TIME_WAIT; > + printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n"); > + > + return 1; > + } ~ > +/* do_gettimeofday(): > + * Returns the time of day > + */ > +void do_gettimeofday(struct timeval *tv) > +{ > + nsec_t wall, sys; > + unsigned long seq; > + > + /* atomically read wall and sys time */ > + do { > + seq = read_seqbegin(&system_time_lock); > + > + wall = wall_time_offset; > + sys = __monotonic_clock(); > + > + } while (read_seqretry(&system_time_lock, seq)); > + > + /* add them and convert to timeval */ > + *tv = ns2timeval(wall+sys); > +} I am not sure you don't want to seperate the locking from the clock read. This so one lock can be put around larger bits of code. For example, in posix-times.c we need to get all three clocks under the same lock (that being monotonic, wall_time_offset, and jiffies (and possibly as sub jiffie value)). Something like: void get_tod_parts(nsec_t *wall, nsec_t *mon) { *wall = wall_time_offset; *mon = __monotonic_clock(); } This could then be used in a larger function with out the double locking. > + > + > +/* do_settimeofday(): > + * Sets the time of day > + */ > +int do_settimeofday(struct timespec *tv) > +{ > + /* convert timespec to ns */ > + nsec_t newtime = timespec2ns(tv); > + > + /* atomically adjust wall_time_offset to the desired value */ > + write_seqlock_irq(&system_time_lock); > + > + wall_time_offset = newtime - __monotonic_clock(); > + > + /* clear NTP settings */ > + ntp_clear(); > + > + write_sequnlock_irq(&system_time_lock); Also need a clock_was_set() call here. > + ~ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 1:39 ` George Anzinger @ 2004-09-03 1:58 ` john stultz 2004-09-03 6:42 ` Albert Cahalan 1 sibling, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-03 1:58 UTC (permalink / raw) To: george anzinger Cc: lkml, tim, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 18:39, George Anzinger wrote: > john stultz wrote: > > +static cycle_t jiffies_read(void) > > +{ > > + cycle_t ret = get_jiffies_64(); > > + return ret; > > +} > > + > > +static cycle_t jiffies_delta(cycle_t now, cycle_t then) > > +{ > > + /* simple subtraction, no need to mask */ > > + return now - then; > > +} > > This should be inline... Unfortunately they cannot be, as they are used as function pointers through the struct timesource interface. > > + > > +static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) > > +{ > > + > > + cyc *= NSEC_PER_SEC/HZ; > > Hm... This assumes that 1/HZ is what is needed here. Today this value is > 999898. Not exactly reachable by NSEC_PER_SEC/HZ. Or did I miss something, > like the relationship of jiffie to 1/HZ and to real time. You're right. You'd think with the recent discussions I would have already fixed that, but it slipped by. I'll fix it to use ACTHZ in the next release. Good catch. > > +int ntp_leapsecond(struct timespec now) > > +{ > > + /* > > + * Leap second processing. If in leap-insert state at > > + * the end of the day, the system clock is set back one > > + * second; if in leap-delete state, the system clock is > > + * set ahead one second. The microtime() routine or > > + * external clock driver will insure that reported time > > + * is always monotonic. The ugly divides should be > > + * replaced. > > ?? Is this really to be hidden? I rather though this was just the same as a > user driven clock setting. In any case, it is a slew of the wall clock and > should be presented to abs timer code so that the abs timers can be corrected. > This means a call to "clock_was_set()" AND an update of the mono_to_wall value. Hmm. Looking at that comment, it seems a bit outdated. I'll see if I can update it to be more clear. And yes, clock_was_set() is on my list. I realized I dropped it, but I wanted to make sure I was using it properly before I just threw it in. > > +void do_gettimeofday(struct timeval *tv) > > +{ > > + nsec_t wall, sys; > > + unsigned long seq; > > + > > + /* atomically read wall and sys time */ > > + do { > > + seq = read_seqbegin(&system_time_lock); > > + > > + wall = wall_time_offset; > > + sys = __monotonic_clock(); > > + > > + } while (read_seqretry(&system_time_lock, seq)); > > + > > + /* add them and convert to timeval */ > > + *tv = ns2timeval(wall+sys); > > +} > I am not sure you don't want to seperate the locking from the clock read. This > so one lock can be put around larger bits of code. For example, in > posix-times.c we need to get all three clocks under the same lock (that being > monotonic, wall_time_offset, and jiffies (and possibly as sub jiffie value)). Well, all the values in timeofday.c are protected by the system_time_lock, so I'm trying to minimize the locks. I do want to kill off xtime and thus the xtime lock, so jiffies will be by itself and will need a jiffies_lock of some sort. However part of this effort is to ween folks off of using jiffies as a time value, thus in the future it should only be used inside the timer subsystem as a interrupt counter. Thinking about it more, the NTP code could probably drop the ntp_lock and just use the system_time_lock, but I think I'll push that optimization off for a bit. > Something like: > void get_tod_parts(nsec_t *wall, nsec_t *mon) > { > *wall = wall_time_offset; > *mon = __monotonic_clock(); > } > > This could then be used in a larger function with out the double locking. I understood your point above, but I'm not sure I see double locking. __monotonic_clock() is lock free for exactly this reason. > > +int do_settimeofday(struct timespec *tv) > > +{ > > + /* convert timespec to ns */ > > + nsec_t newtime = timespec2ns(tv); > > + > > + /* atomically adjust wall_time_offset to the desired value */ > > + write_seqlock_irq(&system_time_lock); > > + > > + wall_time_offset = newtime - __monotonic_clock(); > > + > > + /* clear NTP settings */ > > + ntp_clear(); > > + > > + write_sequnlock_irq(&system_time_lock); > > Also need a clock_was_set() call here. Yep! Thanks for the feedback, George! -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 1:39 ` George Anzinger 2004-09-03 1:58 ` john stultz @ 2004-09-03 6:42 ` Albert Cahalan 2004-09-03 7:24 ` George Anzinger 2004-09-03 19:18 ` john stultz 1 sibling, 2 replies; 84+ messages in thread From: Albert Cahalan @ 2004-09-03 6:42 UTC (permalink / raw) To: george Cc: john stultz, lkml, tim, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 21:39, George Anzinger wrote: > john stultz wrote: > > + > > +static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) > > +{ > > + > > + cyc *= NSEC_PER_SEC/HZ; > > Hm... This assumes that 1/HZ is what is needed here. Today this value is > 999898. Not exactly reachable by NSEC_PER_SEC/HZ. Or did I miss something, > like the relationship of jiffie to 1/HZ and to real time. HZ not being HZ is the source of many foul problems. NTP should be able to correct for the error. For systems not running NTP, provide a fake NTP to make corrections based on the expected frequency error. Based on that, skip or double-up on the ticks to make them be exactly HZ over long periods of time. > > +int ntp_leapsecond(struct timespec now) > > +{ > > + /* > > + * Leap second processing. If in leap-insert state at > > + * the end of the day, the system clock is set back one > > + * second; if in leap-delete state, the system clock is > > + * set ahead one second. The microtime() routine or > > + * external clock driver will insure that reported time > > + * is always monotonic. The ugly divides should be > > + * replaced. Don't optimize until the patch is in and stable. The divides can be removed much later. Wait months, if not forever, before making the code less readable. The same goes for arch-specific non-syscall hacks. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 6:42 ` Albert Cahalan @ 2004-09-03 7:24 ` George Anzinger 2004-09-03 19:27 ` john stultz 2004-09-03 19:18 ` john stultz 1 sibling, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-03 7:24 UTC (permalink / raw) To: Albert Cahalan Cc: john stultz, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Albert Cahalan wrote: > On Thu, 2004-09-02 at 21:39, George Anzinger wrote: > >>john stultz wrote: > > >>>+ >>>+static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) >>>+{ >>>+ >>>+ cyc *= NSEC_PER_SEC/HZ; >> >>Hm... This assumes that 1/HZ is what is needed here. Today this value is >>999898. Not exactly reachable by NSEC_PER_SEC/HZ. Or did I miss something, >>like the relationship of jiffie to 1/HZ and to real time. > > > HZ not being HZ is the source of many foul problems. > > NTP should be able to correct for the error. For systems > not running NTP, provide a fake NTP to make corrections > based on the expected frequency error. > > Based on that, skip or double-up on the ticks to make > them be exactly HZ over long periods of time. There are several problems here. First, to make this possible, you will have to outlaw several values for HZ (1024 comes to mind). Second, like it or not, 1/HZ or something close to it, is the timer resolution. I think we need to try and keep this a power of 10, mostly because there are a lot of folks who just don't understand it if it is not. And third, you need to get real close to it with the hardware timer. If you introduce NTP or some such to fix things, well, things just break another place. For example, we started 2.6 with HZ=1000 and had problems like: > time sleep 10 sleeping for 9.9 seconds. This will just not do. Any corrections made to the wall clock really need to be made to the timer system as well. As it is we assume that, by correctly choosing the tick value, the wall clock will be correct on average, even under NTP. I.e. that the NTP correction will be, over time, very small. We really do want code that is much more accurate than "time sleep 10" to be right, i.e. if we sleep for x nanoseconds, the wall clock will have changed by x nanoseconds while we did so. > > ~ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 7:24 ` George Anzinger @ 2004-09-03 19:27 ` john stultz 2004-09-03 22:10 ` George Anzinger 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 19:27 UTC (permalink / raw) To: george anzinger Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 00:24, George Anzinger wrote: > Albert Cahalan wrote: > > On Thu, 2004-09-02 at 21:39, George Anzinger wrote: > > > >>john stultz wrote: > > > > > >>>+ > >>>+static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) > >>>+{ > >>>+ > >>>+ cyc *= NSEC_PER_SEC/HZ; > >> > >>Hm... This assumes that 1/HZ is what is needed here. Today this value is > >>999898. Not exactly reachable by NSEC_PER_SEC/HZ. Or did I miss something, > >>like the relationship of jiffie to 1/HZ and to real time. > > > > > > HZ not being HZ is the source of many foul problems. > > > > NTP should be able to correct for the error. For systems > > not running NTP, provide a fake NTP to make corrections > > based on the expected frequency error. > > > > Based on that, skip or double-up on the ticks to make > > them be exactly HZ over long periods of time. > > There are several problems here. First, to make this possible, you will have to > outlaw several values for HZ (1024 comes to mind). Second, like it or not, 1/HZ > or something close to it, is the timer resolution. I think we need to try and > keep this a power of 10, mostly because there are a lot of folks who just don't > understand it if it is not. And third, you need to get real close to it with > the hardware timer. If you introduce NTP or some such to fix things, well, > things just break another place. For example, we started 2.6 with HZ=1000 and > had problems like: > > time sleep 10 > sleeping for 9.9 seconds. This will just not do. Any corrections made to the > wall clock really need to be made to the timer system as well. As it is we > assume that, by correctly choosing the tick value, the wall clock will be > correct on average, even under NTP. I.e. that the NTP correction will be, over > time, very small. We really do want code that is much more accurate than "time > sleep 10" to be right, i.e. if we sleep for x nanoseconds, the wall clock will > have changed by x nanoseconds while we did so. I feel trying to keep two notions of time, one in the timeofday code and one in the timer code is the real issue. Trying to better keep them synced will just lead to madness. Instead the timer subsystem needs to move to using monotonic_clock(), or get_lowres_timestamp() instead of using jiffies for its notion of time. Jiffies is just an interrupt counter. This somewhat ignores the fact that we're discussing a "jiffies timesource", which I only implemented as a lowest common denominator / if you really don't have anything better / thanks the academy for its' worst time source ever award/ simple example of the timesource code. More and more I think I need a t-shirt that reads "jiffies ain't time". thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 19:27 ` john stultz @ 2004-09-03 22:10 ` George Anzinger 2004-09-03 23:32 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-03 22:10 UTC (permalink / raw) To: john stultz Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > On Fri, 2004-09-03 at 00:24, George Anzinger wrote: > >>Albert Cahalan wrote: >> >>>On Thu, 2004-09-02 at 21:39, George Anzinger wrote: >>> >>> >>>>john stultz wrote: >>> >>> >>>>>+ >>>>>+static nsec_t jiffies_cyc2ns(cycle_t cyc, cycle_t* remainder) >>>>>+{ >>>>>+ >>>>>+ cyc *= NSEC_PER_SEC/HZ; >>>> >>>>Hm... This assumes that 1/HZ is what is needed here. Today this value is >>>>999898. Not exactly reachable by NSEC_PER_SEC/HZ. Or did I miss something, >>>>like the relationship of jiffie to 1/HZ and to real time. >>> >>> >>>HZ not being HZ is the source of many foul problems. >>> >>>NTP should be able to correct for the error. For systems >>>not running NTP, provide a fake NTP to make corrections >>>based on the expected frequency error. >>> >>>Based on that, skip or double-up on the ticks to make >>>them be exactly HZ over long periods of time. >> >>There are several problems here. First, to make this possible, you will have to >>outlaw several values for HZ (1024 comes to mind). Second, like it or not, 1/HZ >>or something close to it, is the timer resolution. I think we need to try and >>keep this a power of 10, mostly because there are a lot of folks who just don't >>understand it if it is not. And third, you need to get real close to it with >>the hardware timer. If you introduce NTP or some such to fix things, well, >>things just break another place. For example, we started 2.6 with HZ=1000 and >>had problems like: >> > time sleep 10 >>sleeping for 9.9 seconds. This will just not do. Any corrections made to the >>wall clock really need to be made to the timer system as well. As it is we >>assume that, by correctly choosing the tick value, the wall clock will be >>correct on average, even under NTP. I.e. that the NTP correction will be, over >>time, very small. We really do want code that is much more accurate than "time >>sleep 10" to be right, i.e. if we sleep for x nanoseconds, the wall clock will >>have changed by x nanoseconds while we did so. > > > I feel trying to keep two notions of time, one in the timeofday code and > one in the timer code is the real issue. Trying to better keep them > synced will just lead to madness. Instead the timer subsystem needs to > move to using monotonic_clock(), or get_lowres_timestamp() instead of > using jiffies for its notion of time. Jiffies is just an interrupt > counter. Well, there may be a better way. Suppose we change all the accounting code to work with nanoseconds. That way when we do an accounting pass we would just add what has elapsed since the last pass. Still need some way to do user timers that are tightly pegged to the clock. A thought I had along these lines was to program a timer for each tick. The PIT is much too slow for this, but the APIC timers seem to be rather easy to program. I am not sure how fast the access is but it can not be anything like the PIT. Under this scheme we would use the monotonic clock to figure out just how far out the next tick should be and program that. This could be modified to use repeating hardware if it is available. (What does the HPET provide? Does it interrupt?) Since the APIC clock is not the reference clock (the PIT & pm timer clock is) we would have to correct these from time to time but that is rather easy (I do it today in HRT code). This brings up another issue. We know what the PIT clock frequency is but not what the TSC clock frequency. Currently we do a calibration run at boot to figure this but I can easily show that this run consistently gives the wrong answer (I am sure this has to do with the I/O access delays). If we are going to use the TSC (or any other "clock" that is not derived from the PITs 14.3181818MHZ ital) we need both a way to get the correct value AND a way to adjust it for drift over time. (Possibly this is the same thing.) Of course this is all just x86 stuff. Other archs will have their own issues. For what its worth, the current 2.6 HRT patch uses the PIT as the 1/HZ time interrupt. It uses the APIC, if present, for the high-res interrupt source. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 22:10 ` George Anzinger @ 2004-09-03 23:32 ` john stultz 2004-09-04 0:02 ` George Anzinger 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 23:32 UTC (permalink / raw) To: george anzinger Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Really quick: I'm off on vacation until Weds. Hopefully I've addressed in some way everyone's comments and my email box won't be stuffed when I return. I might peek in every once in awhile, though. Just one last response... On Fri, 2004-09-03 at 15:10, George Anzinger wrote: > john stultz wrote: > > I feel trying to keep two notions of time, one in the timeofday code and > > one in the timer code is the real issue. Trying to better keep them > > synced will just lead to madness. Instead the timer subsystem needs to > > move to using monotonic_clock(), or get_lowres_timestamp() instead of > > using jiffies for its notion of time. Jiffies is just an interrupt > > counter. > > Well, there may be a better way. Suppose we change all the accounting code to > work with nanoseconds. That way when we do an accounting pass we would just add > what has elapsed since the last pass. Yep, that'd work too. > Still need some way to do user timers that are tightly pegged to the clock. I see that as just a problem of programming the timer interrupt generator. If you have a nanosecond (or as high as the best timesource on your system provides) resolution notion of time, then there is nothing to peg or tie the timer with. You simply just program it to go off every X nanoseconds. Should it be so inaccurate that it does not go off right at X nanoseconds, then you've hit a limit of the hardware. Pick a more accurate interval length, dynamically change your interval, or live with it. If the soft-timers use monotonic-clock() to determine when they expire, then you won't get accumulating drift (as monotonic-clock()is NTP frequency adjusted), only the minor jitter caused latency caused by the interrupt programming. > A thought I had along these lines was to program a timer for each tick. The PIT > is much too slow for this, but the APIC timers seem to be rather easy to > program. I am not sure how fast the access is but it can not be anything like > the PIT. Under this scheme we would use the monotonic clock to figure out just > how far out the next tick should be and program that. Yep, tickless systems also start being possible. We just have interrupts for scheduled events. > This could be modified to use repeating hardware if it is available. (What does > the HPET provide? Does it interrupt?) Since the APIC clock is not the > reference clock (the PIT & pm timer clock is) we would have to correct these > from time to time but that is rather easy (I do it today in HRT code). Don't know the details of interrupt generation, so I can't tell ya. I'm not sure I followed the correction bit? > This brings up another issue. We know what the PIT clock frequency is but not > what the TSC clock frequency. Currently we do a calibration run at boot to > figure this but I can easily show that this run consistently gives the wrong > answer (I am sure this has to do with the I/O access delays). If we are going > to use the TSC (or any other "clock" that is not derived from the PITs > 14.3181818MHZ ital) we need both a way to get the correct value AND a way to > adjust it for drift over time. (Possibly this is the same thing.) Of course > this is all just x86 stuff. Other archs will have their own issues. Again, monotonic_clock() and friends are NTP adjusted, so drift caused by inaccurate calibration shouldn't be a problem the interval timer code should need to worry about (outside of maybe adjusting its interval time if its always arriving late/early). If possible the timesource calibration code should be improved, but that's icing on the cake and isn't critical. Again, thanks to everyone for the great feedback and discussion! -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 23:32 ` john stultz @ 2004-09-04 0:02 ` George Anzinger 2004-09-08 18:07 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-04 0:02 UTC (permalink / raw) To: john stultz Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > Really quick: I'm off on vacation until Weds. Hopefully I've addressed > in some way everyone's comments and my email box won't be stuffed when I > return. I might peek in every once in awhile, though. > > Just one last response... > > On Fri, 2004-09-03 at 15:10, George Anzinger wrote: > >>john stultz wrote: >> >>>I feel trying to keep two notions of time, one in the timeofday code and >>>one in the timer code is the real issue. Trying to better keep them >>>synced will just lead to madness. Instead the timer subsystem needs to >>>move to using monotonic_clock(), or get_lowres_timestamp() instead of >>>using jiffies for its notion of time. Jiffies is just an interrupt >>>counter. >> >>Well, there may be a better way. Suppose we change all the accounting code to >>work with nanoseconds. That way when we do an accounting pass we would just add >>what has elapsed since the last pass. > > > Yep, that'd work too. > > >>Still need some way to do user timers that are tightly pegged to the clock. > > > I see that as just a problem of programming the timer interrupt > generator. If you have a nanosecond (or as high as the best timesource > on your system provides) resolution notion of time, then there is > nothing to peg or tie the timer with. You simply just program it to go > off every X nanoseconds. Should it be so inaccurate that it does not go > off right at X nanoseconds, then you've hit a limit of the hardware. > Pick a more accurate interval length, dynamically change your interval, > or live with it. If the soft-timers use monotonic-clock() to determine > when they expire, then you won't get accumulating drift (as > monotonic-clock()is NTP frequency adjusted), only the minor jitter > caused latency caused by the interrupt programming. > > >>A thought I had along these lines was to program a timer for each tick. The PIT >>is much too slow for this, but the APIC timers seem to be rather easy to >>program. I am not sure how fast the access is but it can not be anything like >>the PIT. Under this scheme we would use the monotonic clock to figure out just >>how far out the next tick should be and program that. > > > Yep, tickless systems also start being possible. We just have interrupts > for scheduled events. > > >>This could be modified to use repeating hardware if it is available. (What does >>the HPET provide? Does it interrupt?) Since the APIC clock is not the >>reference clock (the PIT & pm timer clock is) we would have to correct these >>from time to time but that is rather easy (I do it today in HRT code). > > > Don't know the details of interrupt generation, so I can't tell ya. I'm > not sure I followed the correction bit? > > >>This brings up another issue. We know what the PIT clock frequency is but not >>what the TSC clock frequency. Currently we do a calibration run at boot to >>figure this but I can easily show that this run consistently gives the wrong >>answer (I am sure this has to do with the I/O access delays). If we are going >>to use the TSC (or any other "clock" that is not derived from the PITs >>14.3181818MHZ ital) we need both a way to get the correct value AND a way to >>adjust it for drift over time. (Possibly this is the same thing.) Of course >>this is all just x86 stuff. Other archs will have their own issues. > > > Again, monotonic_clock() and friends are NTP adjusted, so drift caused > by inaccurate calibration shouldn't be a problem the interval timer code > should need to worry about (outside of maybe adjusting its interval time > if its always arriving late/early). If possible the timesource > calibration code should be improved, but that's icing on the cake and > isn't critical. > Are you providing a way to predict what clock count provide a given time offset INCLUDING ntp? If so, cool. If not we need to get this conversion right. We will go into this more on your return. Have fun. > Again, thanks to everyone for the great feedback and discussion! > -john > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-04 0:02 ` George Anzinger @ 2004-09-08 18:07 ` john stultz 2004-09-09 0:08 ` George Anzinger 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-08 18:07 UTC (permalink / raw) To: george anzinger Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 17:02, George Anzinger wrote: > > Again, monotonic_clock() and friends are NTP adjusted, so drift caused > > by inaccurate calibration shouldn't be a problem the interval timer code > > should need to worry about (outside of maybe adjusting its interval time > > if its always arriving late/early). If possible the timesource > > calibration code should be improved, but that's icing on the cake and > > isn't critical. > > > Are you providing a way to predict what clock count provide a given time offset > INCLUDING ntp? If so, cool. If not we need to get this conversion right. We > will go into this more on your return. Sorry, I'm not sure what you mean. Mind expanding on the idea while my brain warms back up? thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-08 18:07 ` john stultz @ 2004-09-09 0:08 ` George Anzinger 2004-09-09 0:51 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-09 0:08 UTC (permalink / raw) To: john stultz Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > On Fri, 2004-09-03 at 17:02, George Anzinger wrote: > >>>Again, monotonic_clock() and friends are NTP adjusted, so drift caused >>>by inaccurate calibration shouldn't be a problem the interval timer code >>>should need to worry about (outside of maybe adjusting its interval time >>>if its always arriving late/early). If possible the timesource >>>calibration code should be improved, but that's icing on the cake and >>>isn't critical. >>> >> >>Are you providing a way to predict what clock count provide a given time offset >>INCLUDING ntp? If so, cool. If not we need to get this conversion right. We >>will go into this more on your return. > > > Sorry, I'm not sure what you mean. Mind expanding on the idea while my > brain warms back up? The issue is this: A user wants a timer to fire at exactly time B which is several hours later than now (time A). We need to know how to measure this time with the timer resources (not the clock as you are talking about it). Currently we do something like delta_jiffies = timespec_to_jiffies(B - A) and set up a jiffies timer to expire in delta_jiffies. At this time we "assume" in timespec_to_jiffies() that we _know_ how long a jiffie is in terms of wall clock nanoseconds. We also assume (possibly incorrectly) that this number is "good enough" even with ntp messing things up. I think this means that we assume that, on the average, we have the right conversion and that any drift will be a) small and b) on the average 0 (or real close to it). To my mind this is a few too many assumptions, but this is what we do today. If you are proposing that ntp corrections take care of systemic constant drift (which by the way, linux use to do WRT HZ=1024 division errors), we will need a way of working this back into the timesepec_to_jiffies() code or we will not be able to do timers with any real accuracy. An alternative, and one that I prefer, but, I think, runs counter to your proposal, is to derive the timer interrupts from the clock after the ntp correction. Some hardware will have a problem with this (x86) but other hardware is rather constrained to do things this way. The PPC counter interrupt generator, for example, counts the same pulses as the clock counter. > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 0:08 ` George Anzinger @ 2004-09-09 0:51 ` john stultz 2004-09-09 3:14 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-09 0:51 UTC (permalink / raw) To: george anzinger Cc: Albert Cahalan, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 2004-09-08 at 17:08, George Anzinger wrote: > john stultz wrote: > > On Fri, 2004-09-03 at 17:02, George Anzinger wrote: > > > >>>Again, monotonic_clock() and friends are NTP adjusted, so drift caused > >>>by inaccurate calibration shouldn't be a problem the interval timer code > >>>should need to worry about (outside of maybe adjusting its interval time > >>>if its always arriving late/early). If possible the timesource > >>>calibration code should be improved, but that's icing on the cake and > >>>isn't critical. > >>> > >> > >>Are you providing a way to predict what clock count provide a given time offset > >>INCLUDING ntp? If so, cool. If not we need to get this conversion right. We > >>will go into this more on your return. > > > > > > Sorry, I'm not sure what you mean. Mind expanding on the idea while my > > brain warms back up? > > The issue is this: A user wants a timer to fire at exactly time B which is > several hours later than now (time A). We need to know how to measure this time > with the timer resources (not the clock as you are talking about it). Currently > we do something like delta_jiffies = timespec_to_jiffies(B - A) and set up a > jiffies timer to expire in delta_jiffies. At this time we "assume" in > timespec_to_jiffies() that we _know_ how long a jiffie is in terms of wall clock > nanoseconds. We also assume (possibly incorrectly) that this number is "good > enough" even with ntp messing things up. I think this means that we assume > that, on the average, we have the right conversion and that any drift will be a) > small and b) on the average 0 (or real close to it). Why must we use jiffies to tell when a timer expires? Honestly I'd like to see xtime and jiffies both disappear, but I'm not very familiar w/ the soft-timer code, so forgive me if I'm misunderstanding. So instead of calculating delta_jiffies, just mark the timer to expire at B. Then each interrupt, you use get_fast_timestamp() to decide if now is greater then B. If so, expire it. Then we can look at being able to program timer interrupts to occur as close as possible to the next soft-timer's expiration time. -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 0:51 ` john stultz @ 2004-09-09 3:14 ` Christoph Lameter 2004-09-09 3:32 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-09 3:14 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 8 Sep 2004, john stultz wrote: > Why must we use jiffies to tell when a timer expires? Honestly I'd like > to see xtime and jiffies both disappear, but I'm not very familiar w/ > the soft-timer code, so forgive me if I'm misunderstanding. > > So instead of calculating delta_jiffies, just mark the timer to expire > at B. Then each interrupt, you use get_fast_timestamp() to decide if now > is greater then B. If so, expire it. > > Then we can look at being able to program timer interrupts to occur as > close as possible to the next soft-timer's expiration time. Would it not be best to have some means to determine the time in nanoseconds since the epoch and then use that for long waits? That cuts out the timer dependencies. Could we have a generic interface for kernel time that simply provides nanoseconds for everything and hides all the details of time scaling? Maybe as simple as u64 now(void); ? One can then calculate the wait time in nanoseconds which may then be passed to another timer routine which may take the appropriate action depending on the time frame involved. I.e. for a few hundred nsecs do busy wait. If longer reschedule and if even longer queue the task on some event queue that is handled by the timer tick or something else. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 3:14 ` Christoph Lameter @ 2004-09-09 3:32 ` john stultz 2004-09-09 4:31 ` George Anzinger 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-09 3:32 UTC (permalink / raw) To: Christoph Lameter Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 2004-09-08 at 20:14, Christoph Lameter wrote: > On Wed, 8 Sep 2004, john stultz wrote: > > > Why must we use jiffies to tell when a timer expires? Honestly I'd like > > to see xtime and jiffies both disappear, but I'm not very familiar w/ > > the soft-timer code, so forgive me if I'm misunderstanding. > > > > So instead of calculating delta_jiffies, just mark the timer to expire > > at B. Then each interrupt, you use get_fast_timestamp() to decide if now > > is greater then B. If so, expire it. > > > > Then we can look at being able to program timer interrupts to occur as > > close as possible to the next soft-timer's expiration time. > > Would it not be best to have some means to determine the time in > nanoseconds since the epoch and then use that for long waits? The proposal has get_lowres_timeofday() which does just that, although for timer stuff, I would guess monotonic_clock() or get_lowres_timestamp(), which returns the number of (ntp adjusted) nanoseconds the system has been running, would be better. > One can then calculate the wait time in nanoseconds which may then be > passed to another timer routine which may take the appropriate action > depending on the time frame involved. I.e. for a few hundred nsecs do busy > wait. If longer reschedule and if even longer queue the task on some > event queue that is handled by the timer tick or something else. I'm not sure about the busy wait bit, but yes, at some point I'd like to see the timer subsystem use the timeofday subsystem instead of jiffies for its timekeeping. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 3:32 ` john stultz @ 2004-09-09 4:31 ` George Anzinger 2004-09-09 6:37 ` Jesse Barnes 2004-09-09 19:07 ` john stultz 0 siblings, 2 replies; 84+ messages in thread From: George Anzinger @ 2004-09-09 4:31 UTC (permalink / raw) To: john stultz Cc: Christoph Lameter, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > On Wed, 2004-09-08 at 20:14, Christoph Lameter wrote: > >>On Wed, 8 Sep 2004, john stultz wrote: >> >> >>>Why must we use jiffies to tell when a timer expires? Honestly I'd like >>>to see xtime and jiffies both disappear, but I'm not very familiar w/ >>>the soft-timer code, so forgive me if I'm misunderstanding. >>> >>>So instead of calculating delta_jiffies, just mark the timer to expire >>>at B. Then each interrupt, you use get_fast_timestamp() to decide if now >>>is greater then B. If so, expire it. >>> >>>Then we can look at being able to program timer interrupts to occur as >>>close as possible to the next soft-timer's expiration time. >> >>Would it not be best to have some means to determine the time in >>nanoseconds since the epoch and then use that for long waits? > > > The proposal has get_lowres_timeofday() which does just that, although > for timer stuff, I would guess monotonic_clock() or > get_lowres_timestamp(), which returns the number of (ntp adjusted) > nanoseconds the system has been running, would be better. > > > >>One can then calculate the wait time in nanoseconds which may then be >>passed to another timer routine which may take the appropriate action >>depending on the time frame involved. I.e. for a few hundred nsecs do busy >>wait. If longer reschedule and if even longer queue the task on some >>event queue that is handled by the timer tick or something else. > > > I'm not sure about the busy wait bit, but yes, at some point I'd like to > see the timer subsystem use the timeofday subsystem instead of jiffies > for its timekeeping. > Yes, I think this is the way we want to go. Here are the "rubs" I see: a.) resolution. If you don't put a limit on this you will invite timer storms. Currently, by useing 1/HZ resolution, all timer "line up" on ticks and reduce the interrupt overhead that would occure if we actually tried to give "exactly" what was asked for. This is a matter of math and can be handled (assuming we resist the urge to go shopping :)) b.) For those platforms with repeating timers that can not "hit" our desired resolution (i.e. 1/HZ) there is an additional overhead to program the timer each interrupt. In principle we do this in the HRT patch, but there we only do it for high resolution timers, which we assume are rather rare. It is good to have a low res timer that is also accurate. Even better if we can keep the overhead low by not having to reprogram a timer each tick. In the ideal world we would have a hardware repeating timer that is reasonably accurate (we might want to correct it every second or so) to generate the low res timing interrupts and a high res timer that we can program quickly for high resolution interrupts. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 4:31 ` George Anzinger @ 2004-09-09 6:37 ` Jesse Barnes 2004-09-09 8:09 ` George Anzinger 2004-09-09 19:07 ` john stultz 1 sibling, 1 reply; 84+ messages in thread From: Jesse Barnes @ 2004-09-09 6:37 UTC (permalink / raw) To: george Cc: john stultz, Christoph Lameter, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wednesday, September 8, 2004 9:31 pm, George Anzinger wrote: > a.) resolution. If you don't put a limit on this you will invite timer > storms. Currently, by useing 1/HZ resolution, all timer "line up" on ticks > and reduce the interrupt overhead that would occure if we actually tried to > give "exactly" what was asked for. This is a matter of math and can be > handled (assuming we resist the urge to go shopping :)) This can be bad though if lots of CPUs hit it at the same time or nearly so if they're all trying to write the same cacheline or two. Jesse ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 6:37 ` Jesse Barnes @ 2004-09-09 8:09 ` George Anzinger 0 siblings, 0 replies; 84+ messages in thread From: George Anzinger @ 2004-09-09 8:09 UTC (permalink / raw) To: Jesse Barnes Cc: john stultz, Christoph Lameter, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Jesse Barnes wrote: > On Wednesday, September 8, 2004 9:31 pm, George Anzinger wrote: > >>a.) resolution. If you don't put a limit on this you will invite timer >>storms. Currently, by useing 1/HZ resolution, all timer "line up" on ticks >>and reduce the interrupt overhead that would occure if we actually tried to >>give "exactly" what was asked for. This is a matter of math and can be >>handled (assuming we resist the urge to go shopping :)) > > > This can be bad though if lots of CPUs hit it at the same time or nearly so if > they're all trying to write the same cacheline or two. I think that most of the SMP issues your talking about have gone away with 2.6 where we have seperate timer lists for each cpu. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 4:31 ` George Anzinger 2004-09-09 6:37 ` Jesse Barnes @ 2004-09-09 19:07 ` john stultz 2004-09-09 20:49 ` George Anzinger 1 sibling, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-09 19:07 UTC (permalink / raw) To: george anzinger Cc: Christoph Lameter, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 2004-09-08 at 21:31, George Anzinger wrote: > john stultz wrote: > > I'm not sure about the busy wait bit, but yes, at some point I'd like to > > see the timer subsystem use the timeofday subsystem instead of jiffies > > for its timekeeping. > > > Yes, I think this is the way we want to go. Here are the "rubs" I see: > > a.) resolution. If you don't put a limit on this you will invite timer storms. > Currently, by useing 1/HZ resolution, all timer "line up" on ticks and reduce > the interrupt overhead that would occure if we actually tried to give "exactly" > what was asked for. This is a matter of math and can be handled (assuming we > resist the urge to go shopping :)) Well, basically we have the same resolution as we do today. Right now you specify the time in jiffies which has different resolutions depending on configuration, so I'm not sure I see the problem. I mean, I guess someone could be off put when they ask for a 1 ns expiration time and it returns in 1 ms, but I thought timers followed sleep() semantics, so it shouldn't be that much of an issue. > b.) For those platforms with repeating timers that can not "hit" our desired > resolution (i.e. 1/HZ) there is an additional overhead to program the timer each > interrupt. In principle we do this in the HRT patch, but there we only do it > for high resolution timers, which we assume are rather rare. It is good to have > a low res timer that is also accurate. Even better if we can keep the overhead > low by not having to reprogram a timer each tick. > > In the ideal world we would have a hardware repeating timer that is reasonably > accurate (we might want to correct it every second or so) to generate the low > res timing interrupts and a high res timer that we can program quickly for high > resolution interrupts. Even in this case, inaccurate hardware timers can only cause a single tick jitter. The current problem we're seeing is that the timer inaccuracy accumulates, so if we're 1% off, a sleep(600) returns 6 seconds late. Using the time source instead of jiffies for expiration would avoid the accumulation, so we'd be at most a single tick late. Or is the problem more subtle? thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 19:07 ` john stultz @ 2004-09-09 20:49 ` George Anzinger 2004-09-13 21:29 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-09 20:49 UTC (permalink / raw) To: john stultz Cc: Christoph Lameter, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: > On Wed, 2004-09-08 at 21:31, George Anzinger wrote: > >>john stultz wrote: >> >>>I'm not sure about the busy wait bit, but yes, at some point I'd like to >>>see the timer subsystem use the timeofday subsystem instead of jiffies >>>for its timekeeping. >>> >> >>Yes, I think this is the way we want to go. Here are the "rubs" I see: >> >>a.) resolution. If you don't put a limit on this you will invite timer storms. >> Currently, by using 1/HZ resolution, all timer "line up" on ticks and reduce >>the interrupt overhead that would occur if we actually tried to give "exactly" >>what was asked for. This is a matter of math and can be handled (assuming we >>resist the urge to go shopping :)) > > > Well, basically we have the same resolution as we do today. Right now > you specify the time in jiffies which has different resolutions > depending on configuration, so I'm not sure I see the problem. I mean, I > guess someone could be off put when they ask for a 1 ns expiration time > and it returns in 1 ms, but I thought timers followed sleep() semantics, > so it shouldn't be that much of an issue. What we get down to here is setting up a tick source that depends on the clock. This issue of resolution, as I said, is easy to solve. > > >>b.) For those platforms with repeating timers that can not "hit" our desired >>resolution (i.e. 1/HZ) there is an additional overhead to program the timer each >>interrupt. In principle we do this in the HRT patch, but there we only do it >>for high resolution timers, which we assume are rather rare. It is good to have >>a low res timer that is also accurate. Even better if we can keep the overhead >>low by not having to reprogram a timer each tick. >> >>In the ideal world we would have a hardware repeating timer that is reasonably >>accurate (we might want to correct it every second or so) to generate the low >>res timing interrupts and a high res timer that we can program quickly for high >>resolution interrupts. > > > Even in this case, inaccurate hardware timers can only cause a single > tick jitter. The current problem we're seeing is that the timer > inaccuracy accumulates, so if we're 1% off, a sleep(600) returns 6 > seconds late. Using the time source instead of jiffies for expiration > would avoid the accumulation, so we'd be at most a single tick late. > > Or is the problem more subtle? No, I think you have it. The x86 problem, in this regard, is that the "clock" xtal is only connected to the pm timer and the PIT, both of which are slow to access (not to mention all the hardware errors). The x86 TSC is driven by another xtal which, to the best of my knowledge, is not a "clock" grade xtal. We also don't know what the actual TSC frequency is, thus the calibration loop. What we get down to, and what others have said, is that the TSC is great for interpolation but bad as a long term clock. I suspect that a time solution that required us to reprogram the PIT each tick would not fly, just because of the access overhead. One that required reprogramming the PIT every second or so _might_ be ok. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-09 20:49 ` George Anzinger @ 2004-09-13 21:29 ` Christoph Lameter 2004-09-13 22:25 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-13 21:29 UTC (permalink / raw) To: George Anzinger Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Hmm... I am still thinking that making xtime an u64 could help simplify a lot of things. Together with some other parameters it would allow a tickless system where there is no need to update xtime regularly. It would also simplify gettimeofday and jiffies calculation. In order to calculate the current time, one then has to use a counter that started running at a certain point. The counter can be used to calculate an offset to xtime after scaling it. It is then only necessary to update xtime if the value calculated by xtime and the counter deviates significantly from the real value. Then time keeping may be corrected (after xtime has been updated, we might have handed out time before this event and we cannot change time keeping without a new starting point for the calculation of the offset!) by 1. Adding a number to xtime resulting in a time jump forward. This needs to be a small number. 2. Slowing down the clock by reducing the scaling (time cannot jump backward. This is the only way to correct the clock if its running too fast). 3. Speeding up the clock by increasing the scaling. This would presumably be done if the time jumps become too large. All of this is based a lot on the work done for the time interpolator. /**************************************************************** * A concept for a new timer module * ***************************************************************/ struct time_source_t { u8 type; u8 shift; u32 multiply; void *address; }; #define TIME_SOURCE_CPU 0 #define TIME_SOURCE_MMIO32 1 #define TIME_SOURCE_MMIO64 2 #define TIME_SOURCE_FUNCTION 15 struct seqlock_t xtime_lock; /* Protects access to xtime time_source and time_source_last, may be used by time retrieval routines if 64 bit atomic operations are not available */ u64 xtime; struct time_source_t *time_source; u64 time_source_last; u64 inline time_source_get(struct time_source_t *s) { switch (s->type) { case TIME_SOURCE_CPU : return cycles(); case TIME_SOURCE_MMIO32 : return readl(s->address); case TIME_SOURCE_MMIO64 : return readq(s->address); case TIME_SOURCE_FUNCTION: { u64 (*x)(void); x = s->address; return x(); } default : return 0; } } u64 time_source_to_ns(u64 x) { return ((x-time_source_last)*time_source->multiply) >> time_source->shift; } inline u64 now(void) { u64 t; unsigned long seq; do { seq = read_seqbegin(&xtime_lock); t = xtime + time_source_to_ns(time_source_get(time_source)); } while (unlikely(seq_retry(&xtime_lock,seq))); return t; } /* Time adjustment (only possible forward, to go backwards adjust time_source->multiply and wait ...) */ void time_adjust_skip(u64 ns) { u64 c; write_seqlock(&xtime_lock); c = time_source_get(time_source); xtime += ns + time_source_to_ns(c); time_source_last = c; write_sequnlock(&xtime_lock); } void time_adjust_slower(void) { u64 c; write_seqlock(&xtime_lock); c = time_source_get(time_source); xtime += time_source_to_ns(c); time_source_last = c; time_source->multiply--; write_sequnlock(&xtime_lock); } void time_adjust_faster(void) { u64 c; write_seqlock(&xtime_lock); c = time_source_get(time_source); xtime += time_source_to_ns(c); time_source_last = c; time_source->multiply++; write_sequnlock(&xtime_unlock); } /* Switch to another time source */ void new_time_source(struct time_source_t *s) { u64 c_old; u64 c_new; write_seqlock(&xtime_lock); c_old = time_source_get(time_source); c_new = time_source_get(s); xtime += time_source_to_ns(c_old); time_source_last = c_new; time_source = s; write_sequnlock(&xtime_lock); } struct time_source_t *make_time_source(u64 freq, int t,void *a, int bits) { struct time_source_t *s = kmalloc(sizeof(struct time_source_t)); s->shift = 64 - bits; s->multiply = (NSEC_PER_SEC << s->shift) / freq; s->address = a; s->type = t; } /* Values in use in the kernel and how they may be derived from xtime */ #define jiffies (now()/1000000) #define seconds(x) ((x)/1000000000) #define nanoseconds_within_second(x) ((x)%1000000000) #define microseconds_within_second(x) (nanoseconds_within_second(x) / 1000) u32 time(void) { return seconds(now()); } void gettimeofday(struct timeval *p) { u64 t=now(); p->tv_sec = seconds(t); p->tv_usec = microseconds_within_second(t); } void clock_gettime(int clock, struct timespec *p) { u64 t=now(); p->tv_sec = seconds(t); p->tv_nsec = nanoseconds_within_second(t); } ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-13 21:29 ` Christoph Lameter @ 2004-09-13 22:25 ` john stultz 2004-09-13 22:45 ` Christoph Lameter 2004-09-14 17:49 ` Christoph Lameter 0 siblings, 2 replies; 84+ messages in thread From: john stultz @ 2004-09-13 22:25 UTC (permalink / raw) To: Christoph Lameter Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Mon, 2004-09-13 at 14:29, Christoph Lameter wrote: > Hmm... I am still thinking that making xtime an u64 could help simplify a > lot of things. Together with some other parameters it would allow a > tickless system where there is no need to update xtime regularly. It would > also simplify gettimeofday and jiffies calculation. Looks interesting! Regardless of our earlier butting heads, both this and my earlier proposal are indeed very similar. My first pass comments: o How would settimeofday work? Would it be forced to use time_adjust_XYZ? Do we still need the wall_to_monotonic uglyness to manage posix CLOCK_MONOTONIC timers? My suggestion: split xtime into two values (system_time, wall_time_offset) like in my proposal. This allows you to keep a true monotonically increasing time base (system_time) and wall_time_offset can then be used to adjust to wall time. settimeofday() would only change wall_time_offset. o time_source_to_ns() needs some method of masking the subtraction in order to handle timesources that overflow under 64 bits. o How would your time_adjust_XYZ interfaces merge w/ adjtimex and the NTP wall_time_update()/second_overflow() code? I spent quite a bit of time on my ntp.c implementation and I'm not 100% confident in it. Could you explain in further detail how yours would work? o My only other nit is that you use a different name then xtime. If you're changing the type, you might as well use a meaningful name. Unfortunately I'm off working on other things for the next two weeks, but once that is over I look forward to trying to integrate some of your design ideas into my own. Keeping my timeofday and ntp core code, but using your timesource interface looks to be quite promising. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-13 22:25 ` john stultz @ 2004-09-13 22:45 ` Christoph Lameter 2004-09-14 6:53 ` Ulrich Windl 2004-09-14 17:49 ` Christoph Lameter 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-13 22:45 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Mon, 13 Sep 2004, john stultz wrote: > My first pass comments: > o How would settimeofday work? Would it be forced to use > time_adjust_XYZ? Do we still need the wall_to_monotonic uglyness to > manage posix CLOCK_MONOTONIC timers? You could simply set xtime and time_source_last = time_source_get(time_source); I think the existing wall_to_monotonic stuff is okay. Monotonic time is rarely used and thus its best to keep xtime on CLOCK_REALTIME to simplify things for general time access. > My suggestion: split xtime into two values (system_time, > wall_time_offset) like in my proposal. This allows you to keep a true > monotonically increasing time base (system_time) and wall_time_offset > can then be used to adjust to wall time. settimeofday() would only > change wall_time_offset. xtime is also truly monotonic increasing in my proposal. Setting the time is an extraordinary event which I would not count against monotoneity. The clock corrections that I proposed all preserve the monotoneity of xtime. > o time_source_to_ns() needs some method of masking the subtraction in > order to handle timesources that overflow under 64 bits. Yes. My interpolator logic limits begin to show.... > o How would your time_adjust_XYZ interfaces merge w/ adjtimex and the > NTP wall_time_update()/second_overflow() code? I spent quite a bit of > time on my ntp.c implementation and I'm not 100% confident in it. Could > you explain in further detail how yours would work? I have no idea how adjtimex works. Would have to look at that in detail first. The timer interpolator stuff also does time adjustments and I am sure that something like that is in my mind. The time interpolator only ever corrects the clock forward. Otherwise it adjusts the scaling if the clock is running too fast. Thats also what I provided in the proposal. > o My only other nit is that you use a different name then xtime. If > you're changing the type, you might as well use a meaningful name. xtime is the traditional name. Maybe renaming it to intentionally break old code would be good but I thought it would be good to understand the approach. > Unfortunately I'm off working on other things for the next two weeks, > but once that is over I look forward to trying to integrate some of your > design ideas into my own. Keeping my timeofday and ntp core code, but > using your timesource interface looks to be quite promising. That sounds very promising. I may also have to make time for different projects but I will try to do as much as I can. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-13 22:45 ` Christoph Lameter @ 2004-09-14 6:53 ` Ulrich Windl 0 siblings, 0 replies; 84+ messages in thread From: Ulrich Windl @ 2004-09-14 6:53 UTC (permalink / raw) To: Christoph Lameter Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On 13 Sep 2004 at 15:45, Christoph Lameter wrote: > > o My only other nit is that you use a different name then xtime. If > > you're changing the type, you might as well use a meaningful name. > > xtime is the traditional name. Maybe renaming it to intentionally break > old code would be good but I thought it would be good to understand > the approach. > I think direct access to xtime should vanish. Provide an inlinable function to get the current time coarsely and quickly (that's what reading xtime is). Ulrich ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-13 22:25 ` john stultz 2004-09-13 22:45 ` Christoph Lameter @ 2004-09-14 17:49 ` Christoph Lameter 2004-09-15 0:57 ` George Anzinger 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-14 17:49 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Here is a rev on the concept of a time source subsystem taking into account some of the feedback: 1. add mask 2. rename time source so that its different from xtime 3. provide set_time functionality 4. add monotonic time support 5. Give some example of time sources including a function time source. note that the original concept already includes the ability to switch time sources. I would think that this would be necessary especially if the CPU time sources frequency switches or if the system goes into sleep mode. /**************************************************************** * A concept for a new timer module * V2 * * Christoph Lameter, September 14, 2003 ***************************************************************/ struct time_source_t { u8 type; u8 shift; u32 multiply; void *address; u64 mask; }; #define TIME_SOURCE_CPU 0 #define TIME_SOURCE_MMIO32 1 #define TIME_SOURCE_MMIO64 2 #define TIME_SOURCE_FUNCTION 15 struct seqlock_t time_lock; /* Protects access to time_base time_source time_mono_offset and time_source_at_base */ u64 time_base; u64 time_mono_offset; struct time_source_t *time_source; u64 time_source_at_base; /* time source value at time time_base */ u64 inline time_source_get(struct time_source_t *s) { switch (s->type) { case TIME_SOURCE_CPU : return cycles(); case TIME_SOURCE_MMIO32 : return readl(s->address); case TIME_SOURCE_MMIO64 : return readq(s->address); case TIME_SOURCE_FUNCTION: { u64 (*x)(void); x = s->address; return x(); } } u64 time_source_to_ns(u64 x) { return (((x-time_source_at_base) & time_source->mask)*time_source->multiply) >> time_source->shift; } inline u64 now(void) { u64 t; unsigned long seq; do { seq = read_seqbegin(&time_lock); t = time_base + time_source_to_ns(time_source_get(time_source)); } while (unlikely(seq_retry(&time_lock,seq))); return t; } #define now_mono() (now()+time_mono_offset) /* Time adjustment (only possible forward, to go backwards adjust time_source->multiply and wait ...) */ void time_adjust_skip(u64 ns) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); time_base += ns + time_source_to_ns(c); time_source_at_base = c; write_sequnlock(&time_lock); } void time_adjust_slower(void) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); time_base += time_source_to_ns(c); time_source_at_base = c; time_source->multiply--; write_sequnlock(&time_lock); } void time_adjust_faster(void) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); time_base += time_source_to_ns(c); time_source_at_base = c; time_source->multiply++; write_sequnlock(&time_unlock); } /* Switch to another time source */ void new_time_source(struct time_source_t *s) { u64 c_old; u64 c_new; write_seqlock(&time_lock); c_old = time_source_get(time_source); c_new = time_source_get(s); time_base += time_source_to_ns(c_old); time_source_at_base = c_new; time_source = s; write_sequnlock(&time_lock); } struct time_source_t *make_time_source(u64 freq, int t,void *a, int bits) { struct time_source_t *s = kmalloc(sizeof(struct time_source_t)); s->shift = 64 - bits; s->multiply = (NSEC_PER_SEC << s->shift) / freq; s->address = a; s->type = t; s->mask = 1 << bits -1; } void time_set(u64 ns) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); /* Adjust monotonic time base */ time_mono_offset += time_base + time_source_to_ns(c) - ns; /* Setup new time base */ time_base = ns; time_source_at_base = c; write_sequnlock(&time_lock); } void time_init(struct time_source_t *s) { { write_seqlock(&time_lock); time_base = 0; time_mono_offset = 0; time_source_at_base = time_source_get(s); time_source = s; write_sequnlock(&time_lock); } /* Values in use in the kernel and how they may be derived from xtime */ #define jiffies (now()/1000000) #define seconds(x) ((x)/1000000000) #define nanoseconds_within_second(x) ((x)%1000000000) #define microseconds_within_second(x) (nanoseconds_within_second(x) / 1000) u32 time(void) { return seconds(now()); } void gettimeofday(struct timeval *p) { u64 t=now(); p->tv_sec = seconds(t); p->tv_usec = microseconds_within_second(t); } void clock_gettime(int clock, struct timespec *p) { u64 t=now(); p->tv_sec = seconds(t); p->tv_nsec = nanoseconds_within_second(t); } /* Exampe of a CPU time time source */ make_time_source(1200000000, TIME_SOURCE_CPU, NULL, 44); /* A memory based time source at 20Mhz 55 bits wide */ make_time_source(20000000, TIME_SOURCE_MMIO64, &my_timer_address, 55); /* Example of handling a difficult time source. In SMP systems the CPU time sources are notoriously difficult to * sync. If we have such a problem then insure at least sure that the time source never goes backward. */ u64 time_source_last; u64 get_cpu_time_filtered() { u64 x; u64 l; do { l = time_source_last; x = cycles(); if (x<l) return l; /* the cmpxchg is going to hurt in terms of scalability ! */ } while (cmpxchg(&time_source_last, l, x) != l); return x; } Generate the time_source with make_time_source(1200000000, TIME_SOURCE_FUNCTION, cpu_time_filtered, 44); ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-14 17:49 ` Christoph Lameter @ 2004-09-15 0:57 ` George Anzinger 2004-09-15 3:32 ` Christoph Lameter 2004-09-15 6:46 ` Christoph Lameter 0 siblings, 2 replies; 84+ messages in thread From: George Anzinger @ 2004-09-15 0:57 UTC (permalink / raw) To: Christoph Lameter Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Christoph Lameter wrote: > Here is a rev on the concept of a time source subsystem taking into > account some of the feedback: > > 1. add mask > 2. rename time source so that its different from xtime > 3. provide set_time functionality > 4. add monotonic time support > 5. Give some example of time sources including a function time source. > > note that the original concept already includes the ability to switch time > sources. I would think that this would be necessary especially if the CPU > time sources frequency switches or if the system goes into sleep mode. > > /**************************************************************** > * A concept for a new timer module > * V2 > * > * Christoph Lameter, September 14, 2003 > ***************************************************************/ > > struct time_source_t { > u8 type; > u8 shift; > u32 multiply; > void *address; > u64 mask; > }; > > #define TIME_SOURCE_CPU 0 > #define TIME_SOURCE_MMIO32 1 > #define TIME_SOURCE_MMIO64 2 > #define TIME_SOURCE_FUNCTION 15 > > struct seqlock_t time_lock; /* Protects access to time_base time_source time_mono_offset and time_source_at_base */ > u64 time_base; > u64 time_mono_offset; > struct time_source_t *time_source; > u64 time_source_at_base; /* time source value at time time_base */ > > u64 inline time_source_get(struct time_source_t *s) { > switch (s->type) { > case TIME_SOURCE_CPU : return cycles(); > case TIME_SOURCE_MMIO32 : return readl(s->address); > case TIME_SOURCE_MMIO64 : return readq(s->address); > case TIME_SOURCE_FUNCTION: { > u64 (*x)(void); > x = s->address; > return x(); > } > } > > u64 time_source_to_ns(u64 x) { > return (((x-time_source_at_base) & time_source->mask)*time_source->multiply) >> time_source->shift; > } This seems to assume that the time souce is incrementing. On some archs, I think, it decrements... > > inline u64 now(void) { > u64 t; > unsigned long seq; > > do { > seq = read_seqbegin(&time_lock); > t = time_base + time_source_to_ns(time_source_get(time_source)); > } while (unlikely(seq_retry(&time_lock,seq))); > return t; > } > > #define now_mono() (now()+time_mono_offset) > > /* Time adjustment (only possible forward, to go backwards adjust time_source->multiply and wait ...) */ > void time_adjust_skip(u64 ns) { > u64 c; > > write_seqlock(&time_lock); > c = time_source_get(time_source); > time_base += ns + time_source_to_ns(c); > time_source_at_base = c; > write_sequnlock(&time_lock); > } So we would do "time_adjust_skip(0);" to update time_source_at_base? > > void time_adjust_slower(void) { > u64 c; > > write_seqlock(&time_lock); > c = time_source_get(time_source); > time_base += time_source_to_ns(c); > time_source_at_base = c; > time_source->multiply--; > write_sequnlock(&time_lock); > } > If we do a "good" job of choosing <multiply> and <shift> this will be a "very" small change. Might be better to pass in a "delta" to change it by. Then you would only need one function. > void time_adjust_faster(void) { > u64 c; > > write_seqlock(&time_lock); > c = time_source_get(time_source); > time_base += time_source_to_ns(c); > time_source_at_base = c; > time_source->multiply++; > write_sequnlock(&time_unlock); > } > > /* Switch to another time source */ > void new_time_source(struct time_source_t *s) { > u64 c_old; > u64 c_new; > > write_seqlock(&time_lock); > c_old = time_source_get(time_source); > c_new = time_source_get(s); > > time_base += time_source_to_ns(c_old); > > time_source_at_base = c_new; > time_source = s; > > write_sequnlock(&time_lock); > } > > struct time_source_t *make_time_source(u64 freq, int t,void *a, int bits) { > struct time_source_t *s = kmalloc(sizeof(struct time_source_t)); > > s->shift = 64 - bits; > s->multiply = (NSEC_PER_SEC << s->shift) / freq; > s->address = a; > s->type = t; > s->mask = 1 << bits -1; > } The mask and the shift value are not really related. The mask is a function of the number of bits the hardware provides. The shift is related to the value of freq. Me thinks they should not be tied together here. > > void time_set(u64 ns) { > u64 c; > > write_seqlock(&time_lock); > c = time_source_get(time_source); > /* Adjust monotonic time base */ > time_mono_offset += time_base + time_source_to_ns(c) - ns; > /* Setup new time base */ > time_base = ns; > time_source_at_base = c; > write_sequnlock(&time_lock); > } > > void time_init(struct time_source_t *s) { > { > write_seqlock(&time_lock); > time_base = 0; > time_mono_offset = 0; > time_source_at_base = time_source_get(s); > time_source = s; > write_sequnlock(&time_lock); > } > > /* Values in use in the kernel and how they may be derived from xtime */ > #define jiffies (now()/1000000) This assumes HZ=1000. (Assuming there is an HZ any more, that is.) Not all archs will want this value. Possibly: #define jiffies ((now() * HZ) / 1000000000) > #define seconds(x) ((x)/1000000000) > #define nanoseconds_within_second(x) ((x)%1000000000) > #define microseconds_within_second(x) (nanoseconds_within_second(x) / 1000 > > u32 time(void) { > return seconds(now()); > } > > void gettimeofday(struct timeval *p) { > u64 t=now(); > > p->tv_sec = seconds(t); > p->tv_usec = microseconds_within_second(t); > } > > void clock_gettime(int clock, struct timespec *p) { > u64 t=now(); > > p->tv_sec = seconds(t); > p->tv_nsec = nanoseconds_within_second(t); > } > > /* Exampe of a CPU time time source */ > > make_time_source(1200000000, TIME_SOURCE_CPU, NULL, 44); > > /* A memory based time source at 20Mhz 55 bits wide */ > > make_time_source(20000000, TIME_SOURCE_MMIO64, &my_timer_address, 55); > > /* Example of handling a difficult time source. In SMP systems the CPU time sources are notoriously difficult to > * sync. If we have such a problem then insure at least sure that the time source never goes backward. > */ > > u64 time_source_last; > > u64 get_cpu_time_filtered() { > u64 x; > u64 l; This will need to be "static"; > > do { > l = time_source_last; > x = cycles(); > if (x<l) return l; > /* the cmpxchg is going to hurt in terms of scalability ! */ > } while (cmpxchg(&time_source_last, l, x) != l); > return x; > } > > Generate the time_source with > > make_time_source(1200000000, TIME_SOURCE_FUNCTION, cpu_time_filtered, 44); > - Ok, so now lets hook this up with interval timers: #define ns_per_jiffie (NSEC_PER_SEC / HZ) #define jiffies_to_ns(jiff) (jiff * ns_per_jiffie) This function is a request to interrupt at the next jiffie after the passed reference jiffie. If that time is passed return true, else false. int schedule_next_jiffie(unsigned long ref_jiffie) { u64 remains; u64 now_frozen = now(); u32 jiffies_frozen = ((now_frozen * HZ) / 1000000000); if (jiffies_frozen > ref_jiffies) return 1; remains = now_frozen - jiffies_to_ns(ref_jiffie + 1); /* * set_timer_to_interrupt(u64 ns) is provided by the arch layer */ set_timer_to_interrupt_in(remains); return 0; } On interrupt: time_adjust_skip(0); do_timer(); Or, possibly do_timer() does the time_adjust_skip(0); do_timer() will set up the softirq to run the timer list. The timer list could could call schedule_next_jiffie, but there are SMP issues here. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 0:57 ` George Anzinger @ 2004-09-15 3:32 ` Christoph Lameter 2004-09-15 8:04 ` George Anzinger 2004-09-15 6:46 ` Christoph Lameter 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 3:32 UTC (permalink / raw) To: George Anzinger Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Tue, 14 Sep 2004, George Anzinger wrote: > > u64 time_source_to_ns(u64 x) { > > return (((x-time_source_at_base) & time_source->mask)*time_source->multiply) >> time_source->shift; > > } > > This seems to assume that the time souce is incrementing. On some archs, I > think, it decrements... This could be handled by a function that transforms the value read from the counter into an incrementing value. I.e. u64 get_rev_timerval(void) { return 1<< 55 - readq(TIMER_PORT); } > So we would do "time_adjust_skip(0);" to update time_source_at_base? There is no reason to update time_source_at_base unless adjustments need to be done or a danger exists of the counter wrapping around (16 bit counter?) > If we do a "good" job of choosing <multiply> and <shift> this will be a "very" > small change. Might be better to pass in a "delta" to change it by. Then you > would only need one function. These are the raw routines. Higher level function could translate a delta into the appropriate adjustments. > The mask and the shift value are not really related. The mask is a function of > the number of bits the hardware provides. The shift is related to the value of > freq. Me thinks they should not be tied together here. They are related because the maximum shift for a 64 bit value without loosing bits is 64 - number of significant bits. This basically insures maximum precision when scaling the counter. > > /* Values in use in the kernel and how they may be derived from xtime */ > > #define jiffies (now()/1000000) > > This assumes HZ=1000. (Assuming there is an HZ any more, that is.) Not all > archs will want this value. Possibly: > #define jiffies ((now() * HZ) / 1000000000) Right. I just thought of the standard case HZ=1000. > > u64 get_cpu_time_filtered() { > > u64 x; > > u64 l; > This will need to be "static"; Nope. time_source_last is the global. l is just a copy of time_source_last. > Ok, so now lets hook this up with interval timers: > > #define ns_per_jiffie (NSEC_PER_SEC / HZ) > #define jiffies_to_ns(jiff) (jiff * ns_per_jiffie) > > This function is a request to interrupt at the next jiffie after the passed > reference jiffie. If that time is passed return true, else false. One could do this but we want to have a tickless system. The tick is only necessary if the time needs to be adjusted. But you are right there is the need for timer event scheduling that is not included yet. This should be a method of the time source. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 3:32 ` Christoph Lameter @ 2004-09-15 8:04 ` George Anzinger 2004-09-15 8:54 ` Dominik Brodowski ` (2 more replies) 0 siblings, 3 replies; 84+ messages in thread From: George Anzinger @ 2004-09-15 8:04 UTC (permalink / raw) To: Christoph Lameter Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Christoph Lameter wrote: > On Tue, 14 Sep 2004, George Anzinger wrote: > > >>>u64 time_source_to_ns(u64 x) { >>> return (((x-time_source_at_base) & time_source->mask)*time_source->multiply) >> time_source->shift; >>>} >> >>This seems to assume that the time souce is incrementing. On some archs, I >>think, it decrements... > > > This could be handled by a function that transforms the value read from > the counter into an incrementing value. I.e. > > u64 get_rev_timerval(void) { > return 1<< 55 - readq(TIMER_PORT); > } > > >>So we would do "time_adjust_skip(0);" to update time_source_at_base? > > > There is no reason to update time_source_at_base unless adjustments need > to be done or a danger exists of the counter wrapping around (16 bit > counter?) Yes, for example the pm counter is 24 bits. A lot of platforms have 32 bit counters... > > >>If we do a "good" job of choosing <multiply> and <shift> this will be a "very" >>small change. Might be better to pass in a "delta" to change it by. Then you >>would only need one function. > > > These are the raw routines. Higher level function could translate a delta > into the appropriate adjustments. > > >>The mask and the shift value are not really related. The mask is a function of >>the number of bits the hardware provides. The shift is related to the value of >>freq. Me thinks they should not be tied together here. > > > They are related because the maximum shift for a 64 bit value without > loosing bits is 64 - number of significant bits. This basically insures > maximum precision when scaling the counter. Lets assume the pm counter which has 24 bits. This means your shift is 40 bits. In "s->multiply = (NSEC_PER_SEC << s->shift) / freq;" you will have an overflow. Here you need to keep (NSEC_PER_SEC << s->shift) in 64 bits AND multiply must also be 32 bits or less. I really don't think you can choose the scale so easily. > > > >>>/* Values in use in the kernel and how they may be derived from xtime */ >>>#define jiffies (now()/1000000) >> >>This assumes HZ=1000. (Assuming there is an HZ any more, that is.) Not all >>archs will want this value. Possibly: >>#define jiffies ((now() * HZ) / 1000000000) > > > Right. I just thought of the standard case HZ=1000. > > >>>u64 get_cpu_time_filtered() { >>> u64 x; >>> u64 l; >> >>This will need to be "static"; > > > Nope. time_source_last is the global. l is just a copy of > time_source_last. Right, I miss read the function. cycles() should be now() if I am reading this right. > > >>Ok, so now lets hook this up with interval timers: >> >>#define ns_per_jiffie (NSEC_PER_SEC / HZ) >>#define jiffies_to_ns(jiff) (jiff * ns_per_jiffie) >> >>This function is a request to interrupt at the next jiffie after the passed >>reference jiffie. If that time is passed return true, else false. > > > One could do this but we want to have a tickless system. The tick is only > necessary if the time needs to be adjusted. I really think a tickless system, for other than UML systems, is a loosing thing. The accounting overhead on context switch (which increases as the number of switchs per second) will cause more overhead than a periodic accounting tick once a respectable load appears. The periodic accounting tick has a flat overhead that does not depend on load. > > But you are right there is the need for timer event scheduling that is > not included yet. This should be a method of the time source. > I am not sure that is the right thing to do here. For example, on SMP systems today we have a timer event interrupt per cpu. This is much more scaleable and not so easy to do if we tie it to the time source. All we need is a reasonably accurate short term counter. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 8:04 ` George Anzinger @ 2004-09-15 8:54 ` Dominik Brodowski 2004-09-15 17:54 ` George Anzinger 2004-09-15 9:12 ` Andi Kleen 2004-09-15 15:46 ` Christoph Lameter 2 siblings, 1 reply; 84+ messages in thread From: Dominik Brodowski @ 2004-09-15 8:54 UTC (permalink / raw) To: George Anzinger Cc: Christoph Lameter, john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, Sep 15, 2004 at 01:04:04AM -0700, George Anzinger wrote: > >One could do this but we want to have a tickless system. The tick is only > >necessary if the time needs to be adjusted. > > I really think a tickless system, for other than UML systems, is a loosing > thing. The accounting overhead on context switch (which increases as the > number of switchs per second) will cause more overhead than a periodic > accounting tick once a respectable load appears. ^^^^^^^^^^^^^^^^ On a largely idle system (like notebooks on battery power in typical use) the accounting overhead will be less a problem. However, the CPU being woken up each millisecond will cause an increased battery usage. So if the load is less than a certain threshold, tickless systems do make much sense. Dominik ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 8:54 ` Dominik Brodowski @ 2004-09-15 17:54 ` George Anzinger 0 siblings, 0 replies; 84+ messages in thread From: George Anzinger @ 2004-09-15 17:54 UTC (permalink / raw) To: Dominik Brodowski Cc: Christoph Lameter, john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Dominik Brodowski wrote: > On Wed, Sep 15, 2004 at 01:04:04AM -0700, George Anzinger wrote: > >>>One could do this but we want to have a tickless system. The tick is only >>>necessary if the time needs to be adjusted. >> >>I really think a tickless system, for other than UML systems, is a loosing >>thing. The accounting overhead on context switch (which increases as the >>number of switchs per second) will cause more overhead than a periodic >>accounting tick once a respectable load appears. > > ^^^^^^^^^^^^^^^^ > > On a largely idle system (like notebooks on battery power in typical use) > the accounting overhead will be less a problem. However, the CPU being > woken up each millisecond will cause an increased battery usage. So if > the load is less than a certain threshold, tickless systems do make much > sense. At MontaVista I have been working on a thing we call VST which looks ahead in the timer list and, finding nothing for N ticks, turns off the ticker until that time. It is not tickless, unless the system is idle, but then it can go tickless for as long as the max value that can be programmed on the wakeup timer. Interrupts prior to that time will, of course, also wake the system. Seems like the best of both worlds to me. An early version of this is on the HRT sourceforge site. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 8:04 ` George Anzinger 2004-09-15 8:54 ` Dominik Brodowski @ 2004-09-15 9:12 ` Andi Kleen 2004-09-15 15:46 ` Christoph Lameter 2 siblings, 0 replies; 84+ messages in thread From: Andi Kleen @ 2004-09-15 9:12 UTC (permalink / raw) To: George Anzinger Cc: Christoph Lameter, john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott > I really think a tickless system, for other than UML systems, is a loosing > thing. The accounting overhead on context switch (which increases as the You can run the accounting independently at much lower frequency (10ms is perfectly fine as 2.4 has proven - i suspect even lower would be ok too) IMHO it should be a sysctl so that users can do a trade off between power consumption and accounting accuracy. And there is one important special that doesn't need any accounting for user space at all: the idle loop. It still needs some way to account interrupts, but that could be done in do_IRQ or also do with rather low frequency. -Andi ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 8:04 ` George Anzinger 2004-09-15 8:54 ` Dominik Brodowski 2004-09-15 9:12 ` Andi Kleen @ 2004-09-15 15:46 ` Christoph Lameter 2004-09-15 18:00 ` George Anzinger 2 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 15:46 UTC (permalink / raw) To: George Anzinger Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 15 Sep 2004, George Anzinger wrote: > Lets assume the pm counter which has 24 bits. This means your shift is 40 bits. > In "s->multiply = (NSEC_PER_SEC << s->shift) / freq;" you will have an overflow. > Here you need to keep (NSEC_PER_SEC << s->shift) in 64 bits AND multiply must > also be 32 bits or less. I really don't think you can choose the scale so easily. Thanks for pointing that out. Will fix that. I sure wish one could have a 128 bit intermediate result. > > Nope. time_source_last is the global. l is just a copy of > > time_source_last. > > Right, I miss read the function. cycles() should be now() if I am reading this > right. Sortof. now() is the time in nanoseconds whereas cycles() is a counter value of the counter in the cpu. > > One could do this but we want to have a tickless system. The tick is only > > necessary if the time needs to be adjusted. > > I really think a tickless system, for other than UML systems, is a loosing > thing. The accounting overhead on context switch (which increases as the number > of switchs per second) will cause more overhead than a periodic accounting tick > once a respectable load appears. The periodic accounting tick has a flat > overhead that does not depend on load. I am not following you here. Why does the context switch overhead increase? Because there are multiple interrupts for different tasks done in the tick? ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 15:46 ` Christoph Lameter @ 2004-09-15 18:00 ` George Anzinger 2004-09-15 18:28 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: George Anzinger @ 2004-09-15 18:00 UTC (permalink / raw) To: Christoph Lameter Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Christoph Lameter wrote: ~ >>>One could do this but we want to have a tickless system. The tick is only >>>necessary if the time needs to be adjusted. >> >>I really think a tickless system, for other than UML systems, is a loosing >>thing. The accounting overhead on context switch (which increases as the number >>of switchs per second) will cause more overhead than a periodic accounting tick >>once a respectable load appears. The periodic accounting tick has a flat >>overhead that does not depend on load. > > > I am not following you here. Why does the context switch overhead > increase? Because there are multiple interrupts for different tasks done > in the tick? > Each task has several timers, i.e. time slice, time limit, and possibly itimer profile. Granted only one of these needs to be sent to the timer code, but that takes a bit of time, not much, but enough to increase the context switch overhead such that a system with a modest amount of context switching will incur more timer management overhead than the periodic tick generates. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 18:00 ` George Anzinger @ 2004-09-15 18:28 ` Christoph Lameter 0 siblings, 0 replies; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 18:28 UTC (permalink / raw) To: George Anzinger Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 15 Sep 2004, George Anzinger wrote: > > I am not following you here. Why does the context switch overhead > > increase? Because there are multiple interrupts for different tasks done > > in the tick? > > > Each task has several timers, i.e. time slice, time limit, and possibly itimer > profile. Granted only one of these needs to be sent to the timer code, but that > takes a bit of time, not much, but enough to increase the context switch > overhead such that a system with a modest amount of context switching will incur > more timer management overhead than the periodic tick generates. I thought that the timer handling could be separate from the time slice handling of the scheduler. With the ability schedule events as needed the scheduler could generate its own independent timing from the necessary clock adjustments. The tick would be disassembled into its various components. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 0:57 ` George Anzinger 2004-09-15 3:32 ` Christoph Lameter @ 2004-09-15 6:46 ` Christoph Lameter 2004-09-15 16:32 ` john stultz 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 6:46 UTC (permalink / raw) To: George Anzinger Cc: john stultz, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott V3 of the concept for a time subsystem. * add a function to the time_source_t to be able to trigger the timer interrupt at a certain counter value instead of at regular intervals * function to map back from nanoseconds to counter units * add primitive scheduler that wakes up for timer events and figures out when to schedule the next event. * Some info at the end on how maybe have the NTP code interact with this stuff. The deferral of the time_base updates actually is beneficial since it allows a more accurate scaling of the counter and thus does not loose remaining nanoseconds as often. /**************************************************************** * A concept for a new timer module * V3 * * Christoph Lameter, September 14, 2003 ***************************************************************/ struct time_source_t { u8 type; u8 shift; u32 multiply; void *address; u64 mask; int (*interrupt_at)(u64 counter_value); }; #define TIME_SOURCE_CPU 0 #define TIME_SOURCE_MMIO32 1 #define TIME_SOURCE_MMIO64 2 #define TIME_SOURCE_FUNCTION 15 struct seqlock_t time_lock; /* Protects access to time_base time_source time_mono_offset and time_source_at_base */ u64 time_base; u64 time_mono_offset; struct time_source_t *time_source; u64 time_source_at_base; /* time source value at time time_base */ u64 inline time_source_get(struct time_source_t *s) { switch (s->type) { case TIME_SOURCE_CPU : return cycles(); case TIME_SOURCE_MMIO32 : return readl(s->address); case TIME_SOURCE_MMIO64 : return readq(s->address); case TIME_SOURCE_FUNCTION: { u64 (*x)(void); x = s->address; return x(); } } /* Convert from time source units to ns */ u64 time_source_to_ns(u64 x) { return ((x & time_source->mask)*time_source->multiply) >> time_source->shift; } /* Convert from ns time to time source units */ u64 ns_to_time_source(u64 x) { return (x << time_source->shift ) / time_source->multiply; } inline u64 now(void) { u64 t; unsigned long seq; do { seq = read_seqbegin(&time_lock); t = time_base + time_source_to_ns(time_source_get(time_source) - time_source_at_base); } while (unlikely(seq_retry(&time_lock,seq))); return t; } #define now_mono() (now()+time_mono_offset) /* Time adjustment (only possible forward, to go backwards adjust time_source->multiply and wait ...) */ void time_adjust_skip(u64 ns) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); time_base += ns + time_source_to_ns(c - time_source_at_base); time_source_at_base = c; write_sequnlock(&time_lock); } void time_adjust_slower(void) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); time_base += time_source_to_ns(c - time_source_at_base); time_source_at_base = c; time_source->multiply--; write_sequnlock(&time_lock); } void time_adjust_faster(void) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); time_base += time_source_to_ns(c - time_source_at_base); time_source_at_base = c; time_source->multiply++; write_sequnlock(&time_unlock); } /* Switch to another time source */ void new_time_source(struct time_source_t *s) { u64 c_old; u64 c_new; write_seqlock(&time_lock); c_old = time_source_get(time_source); c_new = time_source_get(s); time_base += time_source_to_ns(c_old - time_source_at_base); time_source_at_base = c_new; time_source = s; write_sequnlock(&time_lock); } struct time_source_t *make_time_source(u64 freq, int t,void *a, int bits, void *i) { struct time_source_t *s = kmalloc(sizeof(struct time_source_t)); s->shift = 64 - bits; s->multiply = (NSEC_PER_SEC << s->shift) / freq; s->address = a; s->type = t; s->mask = 1 << bits -1; s->interrupt_at = i; } void time_set(u64 ns) { u64 c; write_seqlock(&time_lock); c = time_source_get(time_source); /* Adjust monotonic time base */ time_mono_offset += time_base + time_source_to_ns(c - time_source_at_base) - ns; /* Setup new time base */ time_base = ns; time_source_at_base = c; write_sequnlock(&time_lock); } void time_init(struct time_source_t *s) { { write_seqlock(&time_lock); time_base = 0; time_mono_offset = 0; time_source_at_base = time_source_get(s); time_source = s; write_sequnlock(&time_lock); } /* Values in use in the kernel and how they may be derived from xtime */ #define jiffies (now() * HZ /1000000000) #define seconds(x) ((x)/1000000000) #define nanoseconds_within_second(x) ((x)%1000000000) #define microseconds_within_second(x) (nanoseconds_within_second(x) / 1000) u32 time(void) { return seconds(now()); } void gettimeofday(struct timeval *p) { u64 t=now(); p->tv_sec = seconds(t); p->tv_usec = microseconds_within_second(t); } void clock_gettime(int clock, struct timespec *p) { u64 t=now(); p->tv_sec = seconds(t); p->tv_nsec = nanoseconds_within_second(t); } /* Exampe of a CPU time time source */ int itm_setup(u64 counter) { set_itm(counter); return 1; /* Success */ } make_time_source(1200000000, TIME_SOURCE_CPU, NULL, 44,itm_setup); /* A memory based time source at 20Mhz 55 bits wide, no ability to fire an interrupt */ make_time_source(20000000, TIME_SOURCE_MMIO64, &my_timer_address, 55, NULL); /* Example of handling a difficult time source. In SMP systems the CPU time sources are notoriously difficult to * sync. If we have such a problem then insure at least sure that the time source never goes backward. */ u64 time_source_last; u64 get_cpu_time_filtered() { u64 x; u64 l; do { l = time_source_last; x = cycles(); if (x<l) return l; /* the cmpxchg is going to hurt in terms of scalability ! */ } while (cmpxchg(&time_source_last, l, x) != l); return x; } Generate the time_source with make_time_source(1200000000, TIME_SOURCE_FUNCTION, cpu_time_filtered, 44, itm_setup); /* event scheduler to be able to run events at specific times */ struct event { u64 when; u64 param; void (*run)(u64); struct event next; } struct event *event_queue; void event_run(void) { u64 t; spin_lock(event_handler); t = now(); redo: /* Run all events scheduled before now */ while (event_queue && event_queue->when <= t) { struct event *e=event_queue; e->run(e->param); event_queue = e->next; kfree(e); } if (event_queue = NULL) { spin_unlock(event_handler); return; } t = now(); d = t - event_queue->when; if (d<= 0) goto retry; time_source->interrupt_at(time_source_at_base + ns_to_time_source(event_queue->when - time_base)); spin_unlock(event_handler); } void event_new(u64 w,void (*r)(u64), u64 p) { struct event *e=kmalloc(sizeof(struct event)); e->when =w; e->param = p; e->run = r; spin_lock(event_handler); if (event_queue) { /* Insert time event at the appropriate time */ struct event *p = event_queue; while (p->next && p->next->w < w) p = p->next; e->next = p->next; p->next = e; } else event_queue=e; spin_unlock(event_handler); event_run(); } void scheduled_timer_interrupt(void) { event_run(); } /* simulation of the old tick behavior */ tick(u64 when) { /* do tick stuff */ /* time base update .... */ time_source_adjust(0); /* Schedule next timer tick */ event_new(when + NSEC_PER_SEC / HZ , tick, when + NSEC_PER_SEC / HZ); } /* New tick would be scheduled by the ntp logic when a correction is needed. * ntp logic needs to decide when to skip a few nanosecond or slow down the clock or * make the clock run faster. * One way to do this is to accumulate a time difference to real time. * if this time difference is small and positive then skip time forward a bit. * if the time difference is negative then slow down the clock. * if the time difference is way too high then accelerate the clock */ ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 6:46 ` Christoph Lameter @ 2004-09-15 16:32 ` john stultz 2004-09-15 16:46 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-15 16:32 UTC (permalink / raw) To: Christoph Lameter Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Tue, 2004-09-14 at 23:46, Christoph Lameter wrote: > V3 of the concept for a time subsystem. > * add a function to the time_source_t to be able to trigger the timer > interrupt at a certain counter value instead of at regular intervals > * function to map back from nanoseconds to counter units > * add primitive scheduler that wakes up for timer events and > figures out when to schedule the next event. > * Some info at the end on how maybe have the NTP code interact with > this stuff. > > The deferral of the time_base updates actually is beneficial since it > allows a more accurate scaling of the counter and thus does not loose > remaining nanoseconds as often. > > /**************************************************************** > * A concept for a new timer module > * V3 > * > * Christoph Lameter, September 14, 2003 > ***************************************************************/ > > struct time_source_t { > u8 type; > u8 shift; > u32 multiply; > void *address; > u64 mask; > int (*interrupt_at)(u64 counter_value); > }; Could you explain the interrupt_at function further? > /* Exampe of a CPU time time source */ > int itm_setup(u64 counter) { > set_itm(counter); > return 1; /* Success */ > } > > make_time_source(1200000000, TIME_SOURCE_CPU, NULL, 44,itm_setup); > > /* A memory based time source at 20Mhz 55 bits wide, no ability to fire an interrupt */ > > make_time_source(20000000, TIME_SOURCE_MMIO64, &my_timer_address, 55, NULL); > > /* Example of handling a difficult time source. In SMP systems the CPU time sources are notoriously difficult to > * sync. If we have such a problem then insure at least sure that the time source never goes backward. > */ > > u64 time_source_last; > > u64 get_cpu_time_filtered() { > u64 x; > u64 l; > > do { > l = time_source_last; > x = cycles(); > if (x<l) return l; > /* the cmpxchg is going to hurt in terms of scalability ! */ > } while (cmpxchg(&time_source_last, l, x) != l); > return x; > } > > Generate the time_source with > > make_time_source(1200000000, TIME_SOURCE_FUNCTION, cpu_time_filtered, 44, itm_setup); Out of curiosity, what happens in your fsyscall code in these cases where TIME_SOURCE_FUNCTION is used? > /* event scheduler to be able to run events at specific times */ > > struct event { > u64 when; > u64 param; > void (*run)(u64); > struct event next; > } > > struct event *event_queue; > > void event_run(void) { > u64 t; > > spin_lock(event_handler); > t = now(); > redo: > /* Run all events scheduled before now */ > while (event_queue && event_queue->when <= t) { > struct event *e=event_queue; > > e->run(e->param); > event_queue = e->next; > kfree(e); > } > if (event_queue = NULL) { > spin_unlock(event_handler); > return; > } > t = now(); > d = t - event_queue->when; > if (d<= 0) goto retry; > > time_source->interrupt_at(time_source_at_base + ns_to_time_source(event_queue->when - time_base)); > spin_unlock(event_handler); > } > > void event_new(u64 w,void (*r)(u64), u64 p) { > struct event *e=kmalloc(sizeof(struct event)); > > e->when =w; > e->param = p; > e->run = r; > spin_lock(event_handler); > if (event_queue) { > /* Insert time event at the appropriate time */ > struct event *p = event_queue; > while (p->next && p->next->w < w) p = p->next; > e->next = p->next; > p->next = e; > } else > event_queue=e; > spin_unlock(event_handler); > event_run(); > } > > void scheduled_timer_interrupt(void) { > event_run(); > } > > /* simulation of the old tick behavior */ > tick(u64 when) { > /* do tick stuff */ > > /* time base update .... */ > time_source_adjust(0); > /* Schedule next timer tick */ > event_new(when + NSEC_PER_SEC / HZ , tick, when + NSEC_PER_SEC / HZ); > } Hmm. This is getting somewhat tangled in my mind. Who is calling tick() originally? How is event_run being called? > /* New tick would be scheduled by the ntp logic when a correction is needed. > * ntp logic needs to decide when to skip a few nanosecond or slow down the clock or > * make the clock run faster. > * One way to do this is to accumulate a time difference to real time. > * if this time difference is small and positive then skip time forward a bit. > * if the time difference is negative then slow down the clock. > * if the time difference is way too high then accelerate the clock > */ Well, this is still a bit vague. do_adjtimex gives us 4 values we need to use in adjusting time. The parts-per-million (ppm - all of which are signed) tick adjustment value, the ppm frequency adjustment, the ppm offset adjustment, and the singleshot offset adjustment length (# of nsecs in which we apply the maximum ppm singleshot offset adjustment). How do these 4 values get translated to adjusting the time source? You might want to just swipe my timeofday-core patch and re-work the timesource.h interface to your liking (make it like time_source_t). That way you get all the NTP details as well as the interrupt separation for free. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 16:32 ` john stultz @ 2004-09-15 16:46 ` Christoph Lameter 2004-09-15 17:13 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 16:46 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 15 Sep 2004, john stultz wrote: > > struct time_source_t { > > u8 type; > > u8 shift; > > u32 multiply; > > void *address; > > u64 mask; > > int (*interrupt_at)(u64 counter_value); > > }; > > Could you explain the interrupt_at function further? Generates a timer interrupt when a certain counter value has been reached. This is a common feature of most clock chips that I am aware of. > > make_time_source(1200000000, TIME_SOURCE_FUNCTION, cpu_time_filtered, 44, itm_setup); > > Out of curiosity, what happens in your fsyscall code in these cases > where TIME_SOURCE_FUNCTION is used? It backs out of the fast call track and does a regular call instead. > > /* simulation of the old tick behavior */ > > tick(u64 when) { > > /* do tick stuff */ > > > > /* time base update .... */ > > time_source_adjust(0); > > /* Schedule next timer tick */ > > event_new(when + NSEC_PER_SEC / HZ , tick, when + NSEC_PER_SEC / HZ); > > } > > Hmm. This is getting somewhat tangled in my mind. Who is calling tick() > originally? How is event_run being called? This is just some toying around with the concept. Maybe tick could be called by the regular timer interrupt? > > > /* New tick would be scheduled by the ntp logic when a correction is needed. > > * ntp logic needs to decide when to skip a few nanosecond or slow down the clock or > > * make the clock run faster. > > * One way to do this is to accumulate a time difference to real time. > > * if this time difference is small and positive then skip time forward a bit. > > * if the time difference is negative then slow down the clock. > > * if the time difference is way too high then accelerate the clock > > */ > > Well, this is still a bit vague. do_adjtimex gives us 4 values we need > to use in adjusting time. The parts-per-million (ppm - all of which are > signed) tick adjustment value, the ppm frequency adjustment, the ppm > offset adjustment, and the singleshot offset adjustment length (# of > nsecs in which we apply the maximum ppm singleshot offset adjustment). > > How do these 4 values get translated to adjusting the time source? I agree this is vague and I have no clue here. I hoped that you could come up with some may to use these functions for adjtimex etc? I would need some time to figure out how the adjusting works in order to come up with a solution. I thought we agreed you do the NTP and time adjustment things ;-) I have never touched that code.... > You might want to just swipe my timeofday-core patch and re-work the > timesource.h interface to your liking (make it like time_source_t). That > way you get all the NTP details as well as the interrupt separation for > free. Will have a look at it and put it together when I have some time. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 16:46 ` Christoph Lameter @ 2004-09-15 17:13 ` john stultz 2004-09-15 17:30 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-15 17:13 UTC (permalink / raw) To: Christoph Lameter Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 2004-09-15 at 09:46, Christoph Lameter wrote: > On Wed, 15 Sep 2004, john stultz wrote: > > > > struct time_source_t { > > > u8 type; > > > u8 shift; > > > u32 multiply; > > > void *address; > > > u64 mask; > > > int (*interrupt_at)(u64 counter_value); > > > }; > > > > Could you explain the interrupt_at function further? > > Generates a timer interrupt when a certain counter value has been reached. > This is a common feature of most clock chips that I am aware of. Well, not all time sources have that feature. Both TSC, and cyclone don't. You'd have to do something else for those. This is why my proposal has absolutely nothing to do with interrupt generation. It has a single hook that needs to be called only every once in a while, which can be done however any architecture wants. Interrupt generation has more to do with soft-timers and scheduling then time of day anyway. > > > make_time_source(1200000000, TIME_SOURCE_FUNCTION, cpu_time_filtered, 44, itm_setup); > > > > Out of curiosity, what happens in your fsyscall code in these cases > > where TIME_SOURCE_FUNCTION is used? > > It backs out of the fast call track and does a regular call instead. Ok, just curious. > > > /* simulation of the old tick behavior */ > > > tick(u64 when) { > > > /* do tick stuff */ > > > > > > /* time base update .... */ > > > time_source_adjust(0); > > > /* Schedule next timer tick */ > > > event_new(when + NSEC_PER_SEC / HZ , tick, when + NSEC_PER_SEC / HZ); > > > } > > > > Hmm. This is getting somewhat tangled in my mind. Who is calling tick() > > originally? How is event_run being called? > > This is just some toying around with the concept. Maybe tick could be > called by the regular timer interrupt? Ok, no worries. Its just one of the big reasons I started my earlier proposal is to lessen the dependencies and cleanup the interactions between the interrupt generation, soft-timer and the time of day code. > > > /* New tick would be scheduled by the ntp logic when a correction is needed. > > > * ntp logic needs to decide when to skip a few nanosecond or slow down the clock or > > > * make the clock run faster. > > > * One way to do this is to accumulate a time difference to real time. > > > * if this time difference is small and positive then skip time forward a bit. > > > * if the time difference is negative then slow down the clock. > > > * if the time difference is way too high then accelerate the clock > > > */ > > > > Well, this is still a bit vague. do_adjtimex gives us 4 values we need > > to use in adjusting time. The parts-per-million (ppm - all of which are > > signed) tick adjustment value, the ppm frequency adjustment, the ppm > > offset adjustment, and the singleshot offset adjustment length (# of > > nsecs in which we apply the maximum ppm singleshot offset adjustment). > > > > How do these 4 values get translated to adjusting the time source? > > I agree this is vague and I have no clue here. I hoped that you > could come up with some may to use these functions for adjtimex etc? I > would need some time to figure out how the adjusting works in order to > come up with a solution. I thought we agreed you do the NTP and time > adjustment things ;-) I have never touched that code.... Heh. Well, I don't think anyone wants to do the NTP work, but I did have to go through it to implement my proposal. Unfortunately it isn't something that can easily be tacked on to any design, so you're going to have to understand some aspects of it if you want to do your own design. That said, if you have any questions, hopefully I can explain my interpretation of the existing code as well as my own implementation. > > You might want to just swipe my timeofday-core patch and re-work the > > timesource.h interface to your liking (make it like time_source_t). That > > way you get all the NTP details as well as the interrupt separation for > > free. > > Will have a look at it and put it together when I have some time. Sure thing, and I'll try to take a stab at it myself after next week. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 17:13 ` john stultz @ 2004-09-15 17:30 ` Christoph Lameter 2004-09-15 18:48 ` john stultz 0 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 17:30 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 15 Sep 2004, john stultz wrote: > On Wed, 2004-09-15 at 09:46, Christoph Lameter wrote: > > On Wed, 15 Sep 2004, john stultz wrote: > > > > > > struct time_source_t { > > > > u8 type; > > > > u8 shift; > > > > u32 multiply; > > > > void *address; > > > > u64 mask; > > > > int (*interrupt_at)(u64 counter_value); > > > > }; > > > > > > Could you explain the interrupt_at function further? > > > > Generates a timer interrupt when a certain counter value has been reached. > > This is a common feature of most clock chips that I am aware of. > > Well, not all time sources have that feature. Both TSC, and cyclone > don't. You'd have to do something else for those. This is why my > proposal has absolutely nothing to do with interrupt generation. It has > a single hook that needs to be called only every once in a while, which > can be done however any architecture wants. > > Interrupt generation has more to do with soft-timers and scheduling then > time of day anyway. Hmm... I think this is a serious issue. The ability to exactly time an interrupt and therefore specific actions is important. Maybe we can have a fall back to soft interrupts if interrupt_at == NULL? If the scheduler could assign dynamic time slices to processes then new more effective designs of process scheduling become possible. F.e. on an SMP system if a single process is running and no other contention exists then the scheduler can simply let the process run without interruption. On the other hand the system may reduce the time slices assigned to a process that causes significant load on the system in order to insure the responsiveness of other applications despite the load. If one can schedule a tick dynamically then also NTP time correction can be "scheduled" when it is likely that the clock has sufficiently deviated from standard time. As soon as the time source has been sufficiently tamed by scaling it correctly then the periods of checkup on the time source could be gradually expanded. Real time features such as posix-timer's also depend on the ability to deliver a signal at an exact point in time. Soft timers can only give a very rough approximation in these cases. So I think this feature is essential. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 17:30 ` Christoph Lameter @ 2004-09-15 18:48 ` john stultz 2004-09-15 19:58 ` George Anzinger ` (2 more replies) 0 siblings, 3 replies; 84+ messages in thread From: john stultz @ 2004-09-15 18:48 UTC (permalink / raw) To: Christoph Lameter Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 2004-09-15 at 10:30, Christoph Lameter wrote: > On Wed, 15 Sep 2004, john stultz wrote: > > > Well, not all time sources have that feature. Both TSC, and cyclone > > don't. You'd have to do something else for those. This is why my > > proposal has absolutely nothing to do with interrupt generation. It has > > a single hook that needs to be called only every once in a while, which > > can be done however any architecture wants. > > > > Interrupt generation has more to do with soft-timers and scheduling then > > time of day anyway. > > Hmm... I think this is a serious issue. The ability to exactly time an > interrupt and therefore specific actions is important. Maybe we can have a > fall back to soft interrupts if interrupt_at == NULL? Oh, its an issue, but one the time of day subsystem shouldn't solve. The code that manages interval (be it timer or whatever) interrupts should use the time of day subsystem to ensure they are triggering accurately, and manipulate the intervals as needed. > If the scheduler could assign dynamic time slices to processes then new > more effective designs of process scheduling become possible. F.e. on an > SMP system if a single process is running and no other contention exists > then the scheduler can simply let the process run without interruption. > > On the other hand the system may reduce the time slices assigned to a > process that causes significant load on the system in order to insure the > responsiveness of other applications despite the load. Yep, all of this is outside what the time of day subsystem should do. Interval interrupt scheduling should be done by other code (I suggest the timer code do it). > If one can schedule a tick dynamically then also NTP time correction can > be "scheduled" when it is likely that the clock has sufficiently deviated > from standard time. As soon as the time source has been sufficiently tamed > by scaling it correctly then the periods of checkup on the time source > could be gradually expanded. NTP adjustments need to be applied smoothly and consistently as time progresses. Adjustment to the NTP parameters occur at do_adjtimex and at interval boundaries (ie: the interrupt_hook in my code, or look at update_wall_time and second_overflow in the existing code). So in my understanding its not really a schedulable function. > Real time features such as posix-timer's also depend on the ability to > deliver a signal at an exact point in time. Soft timers can only give a > very rough approximation in these cases. > > So I think this feature is essential. I think the functionality is essential, but that it doesn't belong in the time of day code. Basically we have two things we're trying to do: 1. Keep accurate time 2. Generate hardware interrupts accurately While frequently the same hardware can do both, not all hardware is usable for both functions. Thus I believe we should cleanly split these two subsystems. My proposal only provided the keep accurate time part, however one could using that functionality, to then manipulate hardware interrupts to ensure accuracy in the timer subsystem. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 18:48 ` john stultz @ 2004-09-15 19:58 ` George Anzinger 2004-09-15 20:20 ` Christoph Lameter 2004-09-16 7:02 ` Ulrich Windl 2 siblings, 0 replies; 84+ messages in thread From: George Anzinger @ 2004-09-15 19:58 UTC (permalink / raw) To: john stultz Cc: Christoph Lameter, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott john stultz wrote: ~ > > >>Real time features such as posix-timer's also depend on the ability to >>deliver a signal at an exact point in time. Soft timers can only give a >>very rough approximation in these cases. >> >>So I think this feature is essential. > > > I think the functionality is essential, but that it doesn't belong in the time of day code. > > Basically we have two things we're trying to do: > > 1. Keep accurate time > 2. Generate hardware interrupts accurately > > While frequently the same hardware can do both, not all hardware is > usable for both functions. Thus I believe we should cleanly split these > two subsystems. My proposal only provided the keep accurate time part, > however one could using that functionality, to then manipulate hardware > interrupts to ensure accuracy in the timer subsystem. > The thing I think is missing in all this is that, in some platforms, the hardware to provide the interrupts is more accurate. We have, IMHO, three cases here: a) The interrupts are accurate but the clock info (e.g. TSC) is not. b) The clock in accurate but the timer is not, and c) The clock and interrupt come from the same accurate hardware source. The X86 is in class a) in that the PIT is accurate and the TSC is not. The muddy part here is the pm-timer which is accurate but takes a _long_ time to access. PPCs are in class c) as are some MIPS, ARM, and PARISC. I am not sure about the rest and can not lay my hands on one of class b). For the class a) machines, I think the best approach is to use a clock that has reasonable short term accuracy and to use the timer to, from time to time, correct it. This should be on the order of an internal NTP sort of correction. For the class a) and b) machines, I think it would be wise to not seperated the timer and the clock so much as to make it "hard" to use one to correct the other. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 18:48 ` john stultz 2004-09-15 19:58 ` George Anzinger @ 2004-09-15 20:20 ` Christoph Lameter 2004-09-16 7:02 ` Ulrich Windl 2 siblings, 0 replies; 84+ messages in thread From: Christoph Lameter @ 2004-09-15 20:20 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Wed, 15 Sep 2004, john stultz wrote: > On Wed, 2004-09-15 at 10:30, Christoph Lameter wrote: > > On Wed, 15 Sep 2004, john stultz wrote: > > > > > Well, not all time sources have that feature. Both TSC, and cyclone > > > don't. You'd have to do something else for those. This is why my > > > proposal has absolutely nothing to do with interrupt generation. It has > > > a single hook that needs to be called only every once in a while, which > > > can be done however any architecture wants. > > > > > > Interrupt generation has more to do with soft-timers and scheduling then > > > time of day anyway. > > > > Hmm... I think this is a serious issue. The ability to exactly time an > > interrupt and therefore specific actions is important. Maybe we can have a > > fall back to soft interrupts if interrupt_at == NULL? > > Oh, its an issue, but one the time of day subsystem shouldn't solve. The > code that manages interval (be it timer or whatever) interrupts should > use the time of day subsystem to ensure they are triggering accurately, > and manipulate the intervals as needed. Huh? These are time related events. They cannot be managed effectively outside of the time subsystem. You need at least a regularly interrupting time source even for the current tick based approach. > I think the functionality is essential, but that it doesn't belong in the time of day code. Hmm. I was thinking all of this was about managing the various time sources not only the time of day code. > Basically we have two things we're trying to do: > > 1. Keep accurate time > 2. Generate hardware interrupts accurately > > While frequently the same hardware can do both, not all hardware is > usable for both functions. Thus I believe we should cleanly split these > two subsystems. My proposal only provided the keep accurate time part, > however one could using that functionality, to then manipulate hardware > interrupts to ensure accuracy in the timer subsystem. These two are intrisincally related and all modern hardware that I know of does both. In a sense the regularly occuring timer tick is also a yet another form of time source but not does provide a counter (unless simulated by jiffies). The timer subsystem needs to manage all the available time sources of the system effectively and provide time services to other subsystems. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-15 18:48 ` john stultz 2004-09-15 19:58 ` George Anzinger 2004-09-15 20:20 ` Christoph Lameter @ 2004-09-16 7:02 ` Ulrich Windl 2 siblings, 0 replies; 84+ messages in thread From: Ulrich Windl @ 2004-09-16 7:02 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, lkml, tim, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On 15 Sep 2004 at 11:48, john stultz wrote: > NTP adjustments need to be applied smoothly and consistently as time > progresses. Adjustment to the NTP parameters occur at do_adjtimex and at Actually I think (constant) NTP adjustments need to be applied any time. That's why I even tried to correct the TSC interpolation between ticks (If you visit the PPSkit patch, that's the "nanoscale" thing). The "constant adjustment values" are updated regularly also (per tick, per second, per NTP update). The basic thought was: If the time between two ticks is not 1/HZ the TSC interpolation (that is calibrated against the timer ticks) also will not interpolate between 1/HZ ticks. I assumed the TSC interpolations is wrong by the same percentage the timer interrupts are. Thus I tried to reversely apply the frequency error learnt through NTP even to the TSC interpolation (in addition to the time increments every tick). My code however has the problem that I ran out of significant bits during several muliplies and divides, while supporting that maximum imaginable frequency range for TSC. Even with the best oscillator, it won't reach single nanosecond precision as I don't have enough bits for them. Regards, Ulrich ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC][PATCH] new timeofday core subsystem (v.A0) 2004-09-03 6:42 ` Albert Cahalan 2004-09-03 7:24 ` George Anzinger @ 2004-09-03 19:18 ` john stultz 1 sibling, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-03 19:18 UTC (permalink / raw) To: Albert Cahalan Cc: george anzinger, lkml, tim, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 23:42, Albert Cahalan wrote: > > > +int ntp_leapsecond(struct timespec now) > > > +{ > > > + /* > > > + * Leap second processing. If in leap-insert state at > > > + * the end of the day, the system clock is set back one > > > + * second; if in leap-delete state, the system clock is > > > + * set ahead one second. The microtime() routine or > > > + * external clock driver will insure that reported time > > > + * is always monotonic. The ugly divides should be > > > + * replaced. > > Don't optimize until the patch is in and stable. > The divides can be removed much later. Wait months, > if not forever, before making the code less readable. > > The same goes for arch-specific non-syscall hacks. Yep. Code readability is crucial, although performance is also a concern that *has* to be addressed. As much as I'm probably digging myself a hole in doing all of this, I really don't want to work on time for the rest of my life, so making the code clear and readable is my only hope for passing this on. :) -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-02 21:07 [RFC] New Time of day proposal (updated 9/2/04) john stultz 2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz @ 2004-09-02 22:09 ` Christoph Lameter 2004-09-02 22:22 ` john stultz 2004-09-03 9:54 ` Dominik Brodowski 2004-09-03 15:17 ` Andi Kleen 3 siblings, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-02 22:09 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george, albert, Ulrich.Windl, len.brown, linux, davidm, ak, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott > timeofday_hook() > now = read(); /* read the timesource */ > ns = cyc2ns(now - offset_base); /* calc nsecs since last call */ > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > system_time += ntp_ns; /* add scaled value to system_time */ > ntp_advance(ns); /* advance ntp state machine by ns */ > offset_base = now; /* set new offset_base */ This would only work if the precision of the timer used is <=1ns and if you are actually able to caculate the nanoseconds that have passed. What do you do if the interval is lets say 100ns and the time the timeofday hook is being called can be anytime within this 100ns interval since the time source is not always precise? I think its unavoidable to do some correction like provided by the time interpolator if the clock source does not provide ns. > monotonic_clock() > now = read(); /* read the timesource */ > ns = cyc2ns(now - offset_base); /* calculate nsecs since last hook */ > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > return system_time + ntp_ns; /* return system_time and scaled value This works analogous to the time interpolator today. The scaling of the timesource is omitted tough. Looks like we will be able to keep the fastcalls for clock_gettime and gettime ofday. > o What is the cost of throwing around 64bit values for everything? > - Do we need an arch specific time structure that varies size > accordingly? 64bit may be necessary at a minimum because with 4Ghz machine we may have counters with the frequency >2^32 cycles per second. I would think that 128bit may be necessary (at least as an intermediate result during the scaling of the timesource to nanoseconds) since we want to be accurate to the nanosecond. > o Some arches (arm, for example) do not have high res timing hardware > - In this case we can have a "jiffies" timesource > - cyc2ns(x) = x*(NSEC_PER_SEC/HZ) > - doesn't work for tickless systems David M.s time interpolator logic is needed in those cases to insure that the clock works properly and that nanoseconds offset can be calculated in a consistent way although the exact timing of the increas / reading of the counter may vary within a certain time period. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-02 22:09 ` [RFC] New Time of day proposal (updated 9/2/04) Christoph Lameter @ 2004-09-02 22:22 ` john stultz 2004-09-02 22:47 ` Christoph Lameter 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-02 22:22 UTC (permalink / raw) To: Christoph Lameter Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2004-09-02 at 15:09, Christoph Lameter wrote: > > timeofday_hook() > > now = read(); /* read the timesource */ > > ns = cyc2ns(now - offset_base); /* calc nsecs since last call */ > > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > > system_time += ntp_ns; /* add scaled value to system_time */ > > ntp_advance(ns); /* advance ntp state machine by ns */ > > offset_base = now; /* set new offset_base */ > > This would only work if the precision of the timer used is > <=1ns and if you are actually able to caculate the nanoseconds that have > passed. What do you do if the interval is lets say 100ns and the time the > timeofday hook is being called can be anytime within this 100ns interval > since the time source is not always precise? Well, with the exception of the TSC, none of the current time sources have <=1ns resolution, but I'm not sure I understand the problem you're trying to point out. Could you clarify? > I think its unavoidable to do some correction like provided by the time > interpolator if the clock source does not provide ns. Could you point to the specific correction you describe? > > o What is the cost of throwing around 64bit values for everything? > > - Do we need an arch specific time structure that varies size > > accordingly? > > 64bit may be necessary at a minimum because with 4Ghz machine we may > have counters with the frequency >2^32 cycles per second. > > I would think that 128bit may be necessary (at least > as an intermediate result during the scaling of the timesource to > nanoseconds) since we want to be accurate to the nanosecond. I worry 128bit math might be a bit too rich for the majority of systems at the moment. I am open to it, although I suspect we can use other tricks to get the same accuracy within a constrained bitspace. > > o Some arches (arm, for example) do not have high res timing hardware > > - In this case we can have a "jiffies" timesource > > - cyc2ns(x) = x*(NSEC_PER_SEC/HZ) > > - doesn't work for tickless systems > > David M.s time interpolator logic is needed in those cases to insure that > the clock works properly and that nanoseconds offset can be calculated in > a consistent way although the exact timing of the increas / reading of the > counter may vary within a certain time period. Again, could you point me to the code (is this with your new patch, or the older code)? I've looked at the time interpolator code, but I'm not sure exactly what you mean. Thanks for the feedback! I really appreciate it! -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-02 22:22 ` john stultz @ 2004-09-02 22:47 ` Christoph Lameter 0 siblings, 0 replies; 84+ messages in thread From: Christoph Lameter @ 2004-09-02 22:47 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, 2 Sep 2004, john stultz wrote: > On Thu, 2004-09-02 at 15:09, Christoph Lameter wrote: > > > timeofday_hook() > > > now = read(); /* read the timesource */ > > > ns = cyc2ns(now - offset_base); /* calc nsecs since last call */ > > > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > > > system_time += ntp_ns; /* add scaled value to system_time */ > > > ntp_advance(ns); /* advance ntp state machine by ns */ > > > offset_base = now; /* set new offset_base */ > > > > This would only work if the precision of the timer used is > > <=1ns and if you are actually able to caculate the nanoseconds that have > > passed. What do you do if the interval is lets say 100ns and the time the > > timeofday hook is being called can be anytime within this 100ns interval > > since the time source is not always precise? > > Well, with the exception of the TSC, none of the current time sources > have <=1ns resolution, but I'm not sure I understand the problem you're > trying to point out. Could you clarify? > > > I think its unavoidable to do some correction like provided by the time > > interpolator if the clock source does not provide ns. > > Could you point to the specific correction you describe? I drop my objections. I was thinking too much in terms of the old code.... This should work fine. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-02 21:07 [RFC] New Time of day proposal (updated 9/2/04) john stultz 2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz 2004-09-02 22:09 ` [RFC] New Time of day proposal (updated 9/2/04) Christoph Lameter @ 2004-09-03 9:54 ` Dominik Brodowski 2004-09-03 19:41 ` john stultz 2004-09-03 15:17 ` Andi Kleen 3 siblings, 1 reply; 84+ messages in thread From: Dominik Brodowski @ 2004-09-03 9:54 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george, albert, Ulrich.Windl, clameter, len.brown, davidm, ak, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Hi John, Thanks for this exciting code. A few questions: - Where do you intend to put the "delay" code to? Generalize it as well? - cpufreq hooks to tsc.c and i386_tsc.c[*] can easily be added. For them to work _better_ than current code: can timeofday_hook() be called (with IRQs disabled) _anywhen_ from kernel context? [*] actually, only one of them needs the notifier, AFAICS... - what about keeping lower-priority timesources still "active" in some sort to a) enable loading _and_ unloading timsources (even modularizing them becoms possible which should make testing easier...), b) call them every couple of seconds to verify the currently used timesource is still sane (and if not, call cpufreq_delayed_get() for example or disable the timesource). This would mean that e.g. pmtmr and pit can be used to "verify" and "backup" tsc, or otherwise... The "clock=tsc" override would only affect the priority of the timesource then, making it so large that no other timesource can "preempt" it, but doesn't avoid making other timesources available for backup and verification purposes. Thanks, Dominik ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-03 9:54 ` Dominik Brodowski @ 2004-09-03 19:41 ` john stultz 2004-09-03 20:26 ` Dominik Brodowski 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 19:41 UTC (permalink / raw) To: Dominik Brodowski Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 02:54, Dominik Brodowski wrote: > Hi John, > > Thanks for this exciting code. A few questions: > > - Where do you intend to put the "delay" code to? Generalize it as well? I'm putting that off. Some of the timesources aren't granular enough to be used, so I don't see an easy way to cleanly use time source delays for some cases and loop delays for others. So I just plan to leave it as an arch specific implementation for now. > - cpufreq hooks to tsc.c and i386_tsc.c[*] can easily be added. For them to > work _better_ than current code: can timeofday_hook() be called (with IRQs > disabled) _anywhen_ from kernel context? > [*] actually, only one of them needs the notifier, AFAICS... Yep, that's on my list. I'm trying to keep to just cleaning up one thing at a time, but since I've got to re-work the timer_tsc.c code anyway, I figured I'd try to organize all the TSC related functions (synchronization, cpufreq, get_cycles(), tsc_delay maybe?) into tsc.c. This will simplify things if we ever get around to correctly fixing the SMP systems w/ different speed cpus issue. > - what about keeping lower-priority timesources still "active" in some sort to > a) enable loading _and_ unloading timsources (even modularizing them > becoms possible which should make testing easier...), > b) call them every couple of seconds to verify the currently used > timesource is still sane (and if not, call cpufreq_delayed_get() for > example or disable the timesource). This would mean that e.g. pmtmr > and pit can be used to "verify" and "backup" tsc, or otherwise... > The "clock=tsc" override would only affect the priority of the > timesource then, making it so large that no other timesource can > "preempt" it, but doesn't avoid making other timesources available > for backup and verification purposes. That's totally the plan, although I want to put the control into sysfs. Load a module, and echo "acpi-pm" or whatever into the sysfs file. I just left it out for the first pass so folks would focus on the core timeofday and NTP code (I can keep wishing, right? :) thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-03 19:41 ` john stultz @ 2004-09-03 20:26 ` Dominik Brodowski 2004-09-03 21:05 ` john stultz 2004-09-06 6:26 ` Ulrich Windl 0 siblings, 2 replies; 84+ messages in thread From: Dominik Brodowski @ 2004-09-03 20:26 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, Sep 03, 2004 at 12:41:22PM -0700, john stultz wrote: > > - cpufreq hooks to tsc.c and i386_tsc.c[*] can easily be added. For them to > > work _better_ than current code: can timeofday_hook() be called (with IRQs > > disabled) _anywhen_ from kernel context? > > [*] actually, only one of them needs the notifier, AFAICS... > > Yep, that's on my list. I'm trying to keep to just cleaning up one thing > at a time, but since I've got to re-work the timer_tsc.c code anyway, I > figured I'd try to organize all the TSC related functions > (synchronization, cpufreq, get_cycles(), tsc_delay maybe?) into tsc.c. > This will simplify things if we ever get around to correctly fixing the > SMP systems w/ different speed cpus issue. What about removing cpu_freq_khz you have in your patch, adding a per-CPU value, and just use the value of the boot CPU for the other CPUs if !CONFIG_DIFFERENT_CPU_SPEEDS or something like that? > > - what about keeping lower-priority timesources still "active" in some sort to > > a) enable loading _and_ unloading timsources (even modularizing them > > becoms possible which should make testing easier...), > > b) call them every couple of seconds to verify the currently used > > timesource is still sane (and if not, call cpufreq_delayed_get() for > > example or disable the timesource). This would mean that e.g. pmtmr > > and pit can be used to "verify" and "backup" tsc, or otherwise... > > The "clock=tsc" override would only affect the priority of the > > timesource then, making it so large that no other timesource can > > "preempt" it, but doesn't avoid making other timesources available > > for backup and verification purposes. > > That's totally the plan, although I want to put the control into sysfs. Even better. Thought about that too, but was worried you'd dislike the sysfs overhead. Thanks, Dominik ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-03 20:26 ` Dominik Brodowski @ 2004-09-03 21:05 ` john stultz 2004-09-06 6:26 ` Ulrich Windl 1 sibling, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-03 21:05 UTC (permalink / raw) To: Dominik Brodowski Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 13:26, Dominik Brodowski wrote: > On Fri, Sep 03, 2004 at 12:41:22PM -0700, john stultz wrote: > > > - cpufreq hooks to tsc.c and i386_tsc.c[*] can easily be added. For them to > > > work _better_ than current code: can timeofday_hook() be called (with IRQs > > > disabled) _anywhen_ from kernel context? > > > [*] actually, only one of them needs the notifier, AFAICS... > > > > Yep, that's on my list. I'm trying to keep to just cleaning up one thing > > at a time, but since I've got to re-work the timer_tsc.c code anyway, I > > figured I'd try to organize all the TSC related functions > > (synchronization, cpufreq, get_cycles(), tsc_delay maybe?) into tsc.c. > > This will simplify things if we ever get around to correctly fixing the > > SMP systems w/ different speed cpus issue. > > What about removing cpu_freq_khz you have in your patch, adding a per-CPU > value, and just use the value of the boot CPU for the other CPUs if > !CONFIG_DIFFERENT_CPU_SPEEDS or something like that? Well, for now I just want to re-implement what we already provide. If I can fix something simple while I'm doing I will, but I don't want to change behavior too much in order to simplify testing (if vanilla works fine, and my patch breaks then I want it to be a core bug rather then a "I changed the semantics of something slightly" issue). Once I'm having to manage less code, I'll be very much interested in fixing those outstanding issues. One thing at a time, and all. > > > - what about keeping lower-priority timesources still "active" in some sort to > > > a) enable loading _and_ unloading timsources (even modularizing them > > > becoms possible which should make testing easier...), > > > b) call them every couple of seconds to verify the currently used > > > timesource is still sane (and if not, call cpufreq_delayed_get() for > > > example or disable the timesource). This would mean that e.g. pmtmr > > > and pit can be used to "verify" and "backup" tsc, or otherwise... > > > The "clock=tsc" override would only affect the priority of the > > > timesource then, making it so large that no other timesource can > > > "preempt" it, but doesn't avoid making other timesources available > > > for backup and verification purposes. > > > > That's totally the plan, although I want to put the control into sysfs. > > Even better. Thought about that too, but was worried you'd dislike the sysfs > overhead. Overhead? Not sure I see where that'd be a problem. The available timesource management code needs to be split off into a timesource.c, but I'll get to that later. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-03 20:26 ` Dominik Brodowski 2004-09-03 21:05 ` john stultz @ 2004-09-06 6:26 ` Ulrich Windl 2004-09-06 11:56 ` Alan Cox 2004-09-07 16:14 ` Christoph Lameter 1 sibling, 2 replies; 84+ messages in thread From: Ulrich Windl @ 2004-09-06 6:26 UTC (permalink / raw) To: Dominik Brodowski Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On 3 Sep 2004 at 22:26, Dominik Brodowski wrote: ... > What about removing cpu_freq_khz you have in your patch, adding a per-CPU > value, and just use the value of the boot CPU for the other CPUs if > !CONFIG_DIFFERENT_CPU_SPEEDS or something like that? ... I would not mention SMP at that stage, but most of the existing code on IA32 suumes that all CPUs run with the same frequency. I only heared that at least on Alphas this is not true. Probably you'll need a per-CPU state regarding time. Most likely resulting in the "local CPU's time" and a global concept of time that should not be in contradiction with any CPU's time. That might mean that any process always has to query the global time, involving extra overhead. You mentioned it... Regards, Ulrich ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-06 6:26 ` Ulrich Windl @ 2004-09-06 11:56 ` Alan Cox 2004-09-07 16:14 ` Christoph Lameter 1 sibling, 0 replies; 84+ messages in thread From: Alan Cox @ 2004-09-06 11:56 UTC (permalink / raw) To: Ulrich Windl Cc: Dominik Brodowski, lkml, tim, george anzinger, albert, clameter, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Llu, 2004-09-06 at 07:26, Ulrich Windl wrote: > I would not mention SMP at that stage, but most of the existing code on IA32 > suumes that all CPUs run with the same frequency. I only heared that at least on Except for the tsc functionality Linux 2.4.x/2.6.x does perfectly well on split clock x86. Several people ran mixed 300/450Mhz systems with dual celerons ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-06 6:26 ` Ulrich Windl 2004-09-06 11:56 ` Alan Cox @ 2004-09-07 16:14 ` Christoph Lameter 1 sibling, 0 replies; 84+ messages in thread From: Christoph Lameter @ 2004-09-07 16:14 UTC (permalink / raw) To: Ulrich Windl Cc: Dominik Brodowski, lkml, tim, george anzinger, albert, Len Brown, David Mosberger, Andi Kleen, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Mon, 6 Sep 2004, Ulrich Windl wrote: > On 3 Sep 2004 at 22:26, Dominik Brodowski wrote: > > ... > > What about removing cpu_freq_khz you have in your patch, adding a per-CPU > > value, and just use the value of the boot CPU for the other CPUs if > > !CONFIG_DIFFERENT_CPU_SPEEDS or something like that? > ... > > I would not mention SMP at that stage, but most of the existing code on IA32 > suumes that all CPUs run with the same frequency. I only heared that at least on > Alphas this is not true. Probably you'll need a per-CPU state regarding time. Most > likely resulting in the "local CPU's time" and a global concept of time that > should not be in contradiction with any CPU's time. That might mean that any > process always has to query the global time, involving extra overhead. CPUs in SGI Altix systems may also run at different frequencies. Typically 2 CPUs have a common clock source. For example in a system of 16 CPUs, there may exist 8 clock sources that slightly diverge from one another. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-02 21:07 [RFC] New Time of day proposal (updated 9/2/04) john stultz ` (2 preceding siblings ...) 2004-09-03 9:54 ` Dominik Brodowski @ 2004-09-03 15:17 ` Andi Kleen 2004-09-03 20:11 ` john stultz 3 siblings, 1 reply; 84+ messages in thread From: Andi Kleen @ 2004-09-03 15:17 UTC (permalink / raw) To: john stultz Cc: lkml, tim, george, albert, Ulrich.Windl, clameter, len.brown, linux, davidm, ak, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Thu, Sep 02, 2004 at 02:07:19PM -0700, john stultz wrote: > All, > Here again is my updated time of day proposal. Also to follow is > working code for i386. I wanted to get this out so folks could actually > play around with the core architecture independent code. The i386 > specific bits function, but are unfinished. Thus they have been broken > out and should mostly be ignored for this release. > > I'd really appreciate any feedback or concerns about this proposal and > the following code. Thanks for working on this. Some cleanup in this area was long overdue. > Functions: > ---------- > timeofday_hook() > now = read(); /* read the timesource */ > ns = cyc2ns(now - offset_base); /* calc nsecs since last call */ > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > system_time += ntp_ns; /* add scaled value to system_time */ > ntp_advance(ns); /* advance ntp state machine by ns */ > offset_base = now; /* set new offset_base */ > > monotonic_clock() > now = read(); /* read the timesource */ > ns = cyc2ns(now - offset_base); /* calculate nsecs since last hook */ > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > return system_time + ntp_ns; /* return system_time and scaled value I am not sure why you have different hooks for timeofday and monotic. It may be better to read the timeonly once and then convert to monotonic or TOD. > */ > > settimeofday(desired) > wall_offset = desired - monotonic_clock(); /* set wall offset */ > > gettimeofday() > return wall_offset + monotonic_clock(); /* return current timeofday */ Hmm, I am missing something here, but how do you handle the case where the timer interrupt uses HPET, but the offset from last timer interrupt is determined using TSC. But some machines don't have a reliable TSC, so it may need to use a different "offset" source. I guess you would need two different drivers here? > o vsyscalls/userspace gettimeofday() > - Mark functions and data w/ __vsyscall attribute > - Use linker to put all __vsyscall data in the same set of pages That won't work because the kernel needs to write to these variables (e.g. to update the wall time from the interrupt handler). So in general you need two different mappings, one read only for user space and another writable one for the kernel. Regarding your patch, putting the delta in a virtual call seems a bit of overkill. Is that really needed? That was just from a quick grep, i haven't read it in detail. -Andi ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-03 15:17 ` Andi Kleen @ 2004-09-03 20:11 ` john stultz 2004-09-04 13:00 ` Andi Kleen 0 siblings, 1 reply; 84+ messages in thread From: john stultz @ 2004-09-03 20:11 UTC (permalink / raw) To: Andi Kleen Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, 2004-09-03 at 08:17, Andi Kleen wrote: > On Thu, Sep 02, 2004 at 02:07:19PM -0700, john stultz wrote: > > timeofday_hook() > > now = read(); /* read the timesource */ > > ns = cyc2ns(now - offset_base); /* calc nsecs since last call */ > > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > > system_time += ntp_ns; /* add scaled value to system_time */ > > ntp_advance(ns); /* advance ntp state machine by ns */ > > offset_base = now; /* set new offset_base */ > > > > monotonic_clock() > > now = read(); /* read the timesource */ > > ns = cyc2ns(now - offset_base); /* calculate nsecs since last hook */ > > ntp_ns = ntp_scale(ns); /* apply ntp scaling */ > > return system_time + ntp_ns; /* return system_time and scaled value > > > I am not sure why you have different hooks for timeofday and monotic. > It may be better to read the timeonly once and then convert to monotonic > or TOD. You might be a bit confused: The timeofday_hook (should be timeofday_interrupt_hook, my bad) is called by the semi-periodic-irregular-interval(also known as "timer") interrupt. Its what does the housekeeping for all the timeofday code so we don't run into a counter overflow. monotonic_clock() is an accessor that returns the amount of time that has been accumulated since boot in nanoseconds. do_timeofday() is also an accessor which uses monotonic_clock + wall_time_offset to calculate wall time. > > */ > > > > settimeofday(desired) > > wall_offset = desired - monotonic_clock(); /* set wall offset */ > > > > gettimeofday() > > return wall_offset + monotonic_clock(); /* return current timeofday */ > > > Hmm, I am missing something here, but how do you handle > the case where the timer interrupt uses HPET, but the offset > from last timer interrupt is determined using TSC. But > some machines don't have a reliable TSC, so it may need > to use a different "offset" source. > > I guess you would need two different drivers here? Not really, this is very doable. Time of day is now completely isolated from the timer interrupt code, so it doesn't care who calls the timeofday_interrupt_hook (HPET's interrupt or the i8253's interrupt could do it). Also it doesn't have to be regular or exact, just frequent enough that the timesource doesn't overflow. > > o vsyscalls/userspace gettimeofday() > > - Mark functions and data w/ __vsyscall attribute > > - Use linker to put all __vsyscall data in the same set of pages > > That won't work because the kernel needs to write to > these variables (e.g. to update the wall time from the interrupt > handler). So in general you need two different mappings, one > read only for user space and another writable one for the kernel. Well, I was thinking about this and I don't see why the kernel and userspace can't share the same data and functions for read only access? Why exactly does it need to be mapped twice? We only write to the data from kernel mode using different functions, so the same symbols should be usable. I could be missing something, and my userspace implementation plans weigh fairly heavily on this, so do please correct me. > Regarding your patch, putting the delta in a virtual call > seems a bit of overkill. Is that really needed? That was > just from a quick grep, i haven't read it in detail. Yep. Christoph point is starting to get to me. I'll probably swap out delta() for a mask variable do the masking in the generic code. However, I do feel its a micro-optimization which hurts readability, so I'll put off that change until later when the code has been thoroughly reviewed. I came up with the read/delta/cyc2ns interface after trying to think of the simplest way to use ANY time source, no matter how strange. cycle_t's are basically magic cookies given by the timesource that can be manipulated generically and converted into nanoseconds. It is likely an over-virtualization, but I'd prefer to keep things general and flexible until more arch maintainers tell me its unnecessary. thanks -john ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-03 20:11 ` john stultz @ 2004-09-04 13:00 ` Andi Kleen 2004-09-07 16:10 ` Christoph Lameter 2004-09-08 18:25 ` john stultz 0 siblings, 2 replies; 84+ messages in thread From: Andi Kleen @ 2004-09-04 13:00 UTC (permalink / raw) To: john stultz Cc: Andi Kleen, lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Fri, Sep 03, 2004 at 01:11:58PM -0700, john stultz wrote: > The timeofday_hook (should be timeofday_interrupt_hook, my bad) is > called by the semi-periodic-irregular-interval(also known as "timer") > interrupt. Its what does the housekeeping for all the timeofday code so > we don't run into a counter overflow. > > monotonic_clock() is an accessor that returns the amount of time that > has been accumulated since boot in nanoseconds. Ok, but you need different low level drivers for those. The TSC is not stable enough as a long term time source, but it is best&fastest for the offset calculation between timer interrupts. Or you would need to make the single driver handle both TSC and HPET/PIT, which would probably defeat the extensibility of the new architecture. Even in that case you would need different read() calls. > > > > Hmm, I am missing something here, but how do you handle > > the case where the timer interrupt uses HPET, but the offset > > from last timer interrupt is determined using TSC. But > > some machines don't have a reliable TSC, so it may need > > to use a different "offset" source. > > > > I guess you would need two different drivers here? > > Not really, this is very doable. Time of day is now completely isolated > from the timer interrupt code, so it doesn't care who calls the > timeofday_interrupt_hook (HPET's interrupt or the i8253's interrupt > could do it). Also it doesn't have to be regular or exact, just frequent > enough that the timesource doesn't overflow. I am not talking about the interrupt source here, just from which time source xtime is refreshed. > > > > o vsyscalls/userspace gettimeofday() > > > - Mark functions and data w/ __vsyscall attribute > > > - Use linker to put all __vsyscall data in the same set of pages > > > > That won't work because the kernel needs to write to > > these variables (e.g. to update the wall time from the interrupt > > handler). So in general you need two different mappings, one > > read only for user space and another writable one for the kernel. > > Well, I was thinking about this and I don't see why the kernel and > userspace can't share the same data and functions for read only access? For reading only kernel & user could probably share, although you may run into some interesting problems on architecturs that use truly different "segments" for user and kernel space (like sparc64 or m68k or possibly s390) > Why exactly does it need to be mapped twice? We only write to the data > from kernel mode using different functions, so the same symbols should > be usable. The kernel needs to write too. And at least on x86 there is no way to make a page read only for user space and writable for kernel space on the same mapping. > the simplest way to use ANY time source, no matter how strange. > cycle_t's are basically magic cookies given by the timesource that can > be manipulated generically and converted into nanoseconds. It is likely > an over-virtualization, but I'd prefer to keep things general and > flexible until more arch maintainers tell me its unnecessary. Sounds like a redundancy with the cputime type the s390 people did. I don't think we should have two ADTs who do such similar things. -Andi ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-04 13:00 ` Andi Kleen @ 2004-09-07 16:10 ` Christoph Lameter 2004-09-07 18:24 ` George Anzinger 2004-09-08 18:25 ` john stultz 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-07 16:10 UTC (permalink / raw) To: Andi Kleen Cc: john stultz, lkml, tim, george anzinger, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Sat, 4 Sep 2004, Andi Kleen wrote: > On Fri, Sep 03, 2004 at 01:11:58PM -0700, john stultz wrote: > > The timeofday_hook (should be timeofday_interrupt_hook, my bad) is > > called by the semi-periodic-irregular-interval(also known as "timer") > > interrupt. Its what does the housekeeping for all the timeofday code so > > we don't run into a counter overflow. > > > > monotonic_clock() is an accessor that returns the amount of time that > > has been accumulated since boot in nanoseconds. > > Ok, but you need different low level drivers for those. The TSC is not > stable enough as a long term time source, but it is best&fastest for > the offset calculation between timer interrupts. I thought the NTP daemon etc would even that out? ITC (TSC on IA64) is used by default on IA64 for all time keeping purposes. The CPU has on chip support for timer interrupt generation. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-07 16:10 ` Christoph Lameter @ 2004-09-07 18:24 ` George Anzinger 2004-09-07 20:55 ` Christoph Lameter 2004-09-08 6:26 ` Ulrich Windl 0 siblings, 2 replies; 84+ messages in thread From: George Anzinger @ 2004-09-07 18:24 UTC (permalink / raw) To: Christoph Lameter Cc: Andi Kleen, john stultz, lkml, tim, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Christoph Lameter wrote: > On Sat, 4 Sep 2004, Andi Kleen wrote: > > >>On Fri, Sep 03, 2004 at 01:11:58PM -0700, john stultz wrote: >> >>>The timeofday_hook (should be timeofday_interrupt_hook, my bad) is >>>called by the semi-periodic-irregular-interval(also known as "timer") >>>interrupt. Its what does the housekeeping for all the timeofday code so >>>we don't run into a counter overflow. >>> >>>monotonic_clock() is an accessor that returns the amount of time that >>>has been accumulated since boot in nanoseconds. >> >>Ok, but you need different low level drivers for those. The TSC is not >>stable enough as a long term time source, but it is best&fastest for >>the offset calculation between timer interrupts. > > > I thought the NTP daemon etc would even that out? ITC (TSC on IA64) is > used by default on IA64 for all time keeping purposes. The CPU has on chip > support for timer interrupt generation. Yes, and it is designed (read the "rock" is carefully choosen) for time keeping. On the x86 the "rock" drives the pm timer and the PIT, but a somewhat less stable "rock" drives the TSC. Also, we don't "know" what rate the TSC is actully clocking so we must "discover" it at boot time. This process either is inaccurate or slow (I think we use ~ 50 ms these days which gives an error of ~10 TSC cycles on a 800MHZ box). FWIW the problem here is the sync up with the I/O backplane to find the start and ending of the measured time. I suspect that the IA64 "tells" you what its clock rate is. Right? > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-07 18:24 ` George Anzinger @ 2004-09-07 20:55 ` Christoph Lameter 2004-09-07 21:42 ` George Anzinger 2004-09-08 6:26 ` Ulrich Windl 1 sibling, 1 reply; 84+ messages in thread From: Christoph Lameter @ 2004-09-07 20:55 UTC (permalink / raw) To: George Anzinger Cc: Andi Kleen, john stultz, lkml, tim, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Tue, 7 Sep 2004, George Anzinger wrote: > Also, we don't "know" what rate the TSC is actully clocking so we must > "discover" it at boot time. This process either is inaccurate or slow (I think > we use ~ 50 ms these days which gives an error of ~10 TSC cycles on a 800MHZ > box). FWIW the problem here is the sync up with the I/O backplane to find the > start and ending of the measured time. > > I suspect that the IA64 "tells" you what its clock rate is. Right? Not the CPU itself. There is a special hardware I/O interface called the PAL/SAL that allows one to retrive that information. Doesnt the BIOS on i386 allow you to get to that information? ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-07 20:55 ` Christoph Lameter @ 2004-09-07 21:42 ` George Anzinger 0 siblings, 0 replies; 84+ messages in thread From: George Anzinger @ 2004-09-07 21:42 UTC (permalink / raw) To: Christoph Lameter Cc: Andi Kleen, john stultz, lkml, tim, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott Christoph Lameter wrote: > On Tue, 7 Sep 2004, George Anzinger wrote: > > >>Also, we don't "know" what rate the TSC is actually clocking so we must >>"discover" it at boot time. This process either is inaccurate or slow (I think >>we use ~ 50 ms these days which gives an error of ~10 TSC cycles on a 800MHZ >>box). FWIW the problem here is the sync up with the I/O backplane to find the >>start and ending of the measured time. >> >>I suspect that the IA64 "tells" you what its clock rate is. Right? > > > Not the CPU itself. There is a special hardware I/O interface called the > PAL/SAL that allows one to retrieve that information. Doesn't the BIOS on > i386 allow you to get to that information? Not as far as I know. If it did we would kick out the calibrate loop and boot some 50ms faster (always a good thing). Could be somebody else has better info here... -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-07 18:24 ` George Anzinger 2004-09-07 20:55 ` Christoph Lameter @ 2004-09-08 6:26 ` Ulrich Windl 1 sibling, 0 replies; 84+ messages in thread From: Ulrich Windl @ 2004-09-08 6:26 UTC (permalink / raw) To: george Cc: Andi Kleen, john stultz, lkml, tim, albert, Ulrich.Windl, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On 7 Sep 2004 at 11:24, George Anzinger wrote: > Christoph Lameter wrote: ... > > I thought the NTP daemon etc would even that out? ITC (TSC on IA64) is > > used by default on IA64 for all time keeping purposes. The CPU has on chip > > support for timer interrupt generation. > > Yes, and it is designed (read the "rock" is carefully choosen) for time keeping. Believe me: NTP cannot repair a broken clock; it can only "tune" a clock that is systematically off. ... Regards, Ulrich ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [RFC] New Time of day proposal (updated 9/2/04) 2004-09-04 13:00 ` Andi Kleen 2004-09-07 16:10 ` Christoph Lameter @ 2004-09-08 18:25 ` john stultz 1 sibling, 0 replies; 84+ messages in thread From: john stultz @ 2004-09-08 18:25 UTC (permalink / raw) To: Andi Kleen Cc: lkml, tim, george anzinger, albert, Ulrich.Windl, clameter, Len Brown, linux, David Mosberger, paulus, schwidefsky, jimix, keith maanthey, greg kh, Patricia Gaughen, Chris McDermott On Sat, 2004-09-04 at 06:00, Andi Kleen wrote: > On Fri, Sep 03, 2004 at 01:11:58PM -0700, john stultz wrote: > > The timeofday_hook (should be timeofday_interrupt_hook, my bad) is > > called by the semi-periodic-irregular-interval(also known as "timer") > > interrupt. Its what does the housekeeping for all the timeofday code so > > we don't run into a counter overflow. > > > > monotonic_clock() is an accessor that returns the amount of time that > > has been accumulated since boot in nanoseconds. > > Ok, but you need different low level drivers for those. The TSC is not > stable enough as a long term time source, but it is best&fastest for > the offset calculation between timer interrupts. > > Or you would need to make the single driver handle both TSC > and HPET/PIT, which would probably defeat the extensibility of > the new architecture. Even in that case you would need different > read() calls. Well, actually with this design the TSC, when used, would be the long term timesource. Interpolating between two timesources (timer ticks and TSC) is what causes a great deal of the time trouble on i386. You get into the issue of: "which one do you trust?" (some laptops change freq w/o notification, while other desktops have buggy interval timers) which takes us then into the problems lost-tick detection brought up. Instead, I'm suggesting we make our best guess, and then select a timesource to trust. Make it trivial for folks to work around bad selections (say a buggy HPET shows up, have the user echo "acpi-pm" > /sysfs/something/timesource, or use the "clock=" boot option), and make it easy to add blacklist/sanity checking code to the timesource init functions (so it does the right thing automatically in the next release). So basically, rather then confuse the core code, my plan is to leave it up to the timesource code to ensure they provide good time. > > > Hmm, I am missing something here, but how do you handle > > > the case where the timer interrupt uses HPET, but the offset > > > from last timer interrupt is determined using TSC. But > > > some machines don't have a reliable TSC, so it may need > > > to use a different "offset" source. > > > > > > I guess you would need two different drivers here? > > > > Not really, this is very doable. Time of day is now completely isolated > > from the timer interrupt code, so it doesn't care who calls the > > timeofday_interrupt_hook (HPET's interrupt or the i8253's interrupt > > could do it). Also it doesn't have to be regular or exact, just frequent > > enough that the timesource doesn't overflow. > > I am not talking about the interrupt source here, just from which > time source xtime is refreshed. Well, at the end of the day, the hope is xtime is removed completely and replaced w/ get_lowres_timestamp()/get_lowres_timeofday(). However, right now instead of making a patch with thousands of single line replacements, I'm just updating xtime in timer_interrupt_hook(). I'm not sure if that answers your question, so let me know if I need to clarify myself further. > > > > o vsyscalls/userspace gettimeofday() > > > > - Mark functions and data w/ __vsyscall attribute > > > > - Use linker to put all __vsyscall data in the same set of pages > > > > > > That won't work because the kernel needs to write to > > > these variables (e.g. to update the wall time from the interrupt > > > handler). So in general you need two different mappings, one > > > read only for user space and another writable one for the kernel. > > > > Well, I was thinking about this and I don't see why the kernel and > > userspace can't share the same data and functions for read only access? > > For reading only kernel & user could probably share, although > you may run into some interesting problems on architecturs > that use truly different "segments" for user and kernel space > (like sparc64 or m68k or possibly s390) > > > Why exactly does it need to be mapped twice? We only write to the data > > from kernel mode using different functions, so the same symbols should > > be usable. > > The kernel needs to write too. And at least on x86 there > is no way to make a page read only for user space and writable > for kernel space on the same mapping. Ah! You're right. I was confusing page permission semantics w/ file permissions (ie: permissions don't affect kernel mode, just user mode). That does throw a wrench in the simplicity of my design. I *really* would like to avoid the extended linker trickery that x86-64 has, but it may be necessary in the end. > > the simplest way to use ANY time source, no matter how strange. > > cycle_t's are basically magic cookies given by the timesource that can > > be manipulated generically and converted into nanoseconds. It is likely > > an over-virtualization, but I'd prefer to keep things general and > > flexible until more arch maintainers tell me its unnecessary. > > Sounds like a redundancy with the cputime type the s390 people did. > I don't think we should have two ADTs who do such similar things. Hmm. I should look into that. Thanks for the heads up. And once again, thanks for the feedback. -john ^ permalink raw reply [flat|nested] 84+ messages in thread
end of thread, other threads:[~2004-09-16 7:14 UTC | newest] Thread overview: 84+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-09-02 21:07 [RFC] New Time of day proposal (updated 9/2/04) john stultz 2004-09-02 21:09 ` [RFC][PATCH] new timeofday core subsystem (v.A0) john stultz 2004-09-02 21:11 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) john stultz 2004-09-02 21:12 ` [RFC][PATCH] new timeofday i386 timesources (v.A0) john stultz 2004-09-03 1:44 ` [RFC][PATCH] new timeofday i386 hooks (v.A0) George Anzinger 2004-09-03 2:06 ` john stultz 2004-09-03 8:07 ` Ulrich Windl 2004-09-03 18:09 ` George Anzinger 2004-09-02 22:19 ` [RFC][PATCH] new timeofday core subsystem (v.A0) Christoph Lameter 2004-09-02 22:28 ` john stultz 2004-09-02 22:42 ` Christoph Lameter 2004-09-02 23:14 ` john stultz 2004-09-02 23:39 ` Christoph Lameter 2004-09-03 0:07 ` john stultz 2004-09-03 0:47 ` Christoph Lameter 2004-09-03 1:30 ` john stultz 2004-09-03 7:43 ` George Anzinger 2004-09-03 19:32 ` john stultz 2004-09-03 16:18 ` Christoph Lameter 2004-09-03 21:00 ` john stultz 2004-09-03 22:04 ` Christoph Lameter 2004-09-03 23:00 ` john stultz 2004-09-04 0:11 ` Christoph Lameter 2004-09-03 1:39 ` George Anzinger 2004-09-03 1:58 ` john stultz 2004-09-03 6:42 ` Albert Cahalan 2004-09-03 7:24 ` George Anzinger 2004-09-03 19:27 ` john stultz 2004-09-03 22:10 ` George Anzinger 2004-09-03 23:32 ` john stultz 2004-09-04 0:02 ` George Anzinger 2004-09-08 18:07 ` john stultz 2004-09-09 0:08 ` George Anzinger 2004-09-09 0:51 ` john stultz 2004-09-09 3:14 ` Christoph Lameter 2004-09-09 3:32 ` john stultz 2004-09-09 4:31 ` George Anzinger 2004-09-09 6:37 ` Jesse Barnes 2004-09-09 8:09 ` George Anzinger 2004-09-09 19:07 ` john stultz 2004-09-09 20:49 ` George Anzinger 2004-09-13 21:29 ` Christoph Lameter 2004-09-13 22:25 ` john stultz 2004-09-13 22:45 ` Christoph Lameter 2004-09-14 6:53 ` Ulrich Windl 2004-09-14 17:49 ` Christoph Lameter 2004-09-15 0:57 ` George Anzinger 2004-09-15 3:32 ` Christoph Lameter 2004-09-15 8:04 ` George Anzinger 2004-09-15 8:54 ` Dominik Brodowski 2004-09-15 17:54 ` George Anzinger 2004-09-15 9:12 ` Andi Kleen 2004-09-15 15:46 ` Christoph Lameter 2004-09-15 18:00 ` George Anzinger 2004-09-15 18:28 ` Christoph Lameter 2004-09-15 6:46 ` Christoph Lameter 2004-09-15 16:32 ` john stultz 2004-09-15 16:46 ` Christoph Lameter 2004-09-15 17:13 ` john stultz 2004-09-15 17:30 ` Christoph Lameter 2004-09-15 18:48 ` john stultz 2004-09-15 19:58 ` George Anzinger 2004-09-15 20:20 ` Christoph Lameter 2004-09-16 7:02 ` Ulrich Windl 2004-09-03 19:18 ` john stultz 2004-09-02 22:09 ` [RFC] New Time of day proposal (updated 9/2/04) Christoph Lameter 2004-09-02 22:22 ` john stultz 2004-09-02 22:47 ` Christoph Lameter 2004-09-03 9:54 ` Dominik Brodowski 2004-09-03 19:41 ` john stultz 2004-09-03 20:26 ` Dominik Brodowski 2004-09-03 21:05 ` john stultz 2004-09-06 6:26 ` Ulrich Windl 2004-09-06 11:56 ` Alan Cox 2004-09-07 16:14 ` Christoph Lameter 2004-09-03 15:17 ` Andi Kleen 2004-09-03 20:11 ` john stultz 2004-09-04 13:00 ` Andi Kleen 2004-09-07 16:10 ` Christoph Lameter 2004-09-07 18:24 ` George Anzinger 2004-09-07 20:55 ` Christoph Lameter 2004-09-07 21:42 ` George Anzinger 2004-09-08 6:26 ` Ulrich Windl 2004-09-08 18:25 ` john stultz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox